Regex, applescript & automator

zlu · July 25, 2016, 9:00pm

Hi guys,

I have a text record that look like this:

Do you guys think that I can use regex or another similar method through apple script and/or automator to split the one text record containing this text into several text files, using the numbers as cues to split the text and create each new file?

If yes, has it been done before? Are you aware of suchlike automation?

I’ve been looking around without success, and I gotta tell you: I got myself a 1300 pages single-document that should be split into 50000 text files… and am really afraid of doing it by hand

BLUEFROG · July 25, 2016, 9:08pm

Is the document conforming (consistent) enough to be run through a script??

korm · July 25, 2016, 9:21pm

Even if a script could handle that, how would you ever determine that the 50,000 files were all correctly formed?

zlu · July 26, 2016, 5:12am

Thanks guys.

The doc is 100% consistent, no problem with that.
It consists of chunk of text separated by numbers.

Good question, there will be some checking needed, but again, the text file is super consistent.
Actually, when I said 1300 page long document, it’s just to give you a rough idea of the size when printed. The file I have is just a single text file. But if needed I can also get the same text already divided in several separate textfiles. I just thought that one text file would be a better thing, you know, a one script thing.

Frederiko · July 26, 2016, 1:39pm

Awk is what I would use but Csplit would also seem to work. You will have to work out the correct regex in either event.

Frederiko

zlu · July 28, 2016, 11:38am

Well, never heard of awk or csplit. It looks promising for my needs, I’ll dwelve into that and want to thank you for your input!

pvonk · July 29, 2016, 2:09pm

Wow, AWK! I used to introduce that language in my upper level computer languages course 25 years ago. These days, I rarely hear of anyone mentioning AWK.

BLUEFROG · July 29, 2016, 2:34pm

I keep ignoring AWK (on account of my undying love for sed), but in the things I’ve done with AWK, it is very cool indeed!

pvonk · July 29, 2016, 2:44pm

Unfortunately I tossed out my only reference book on AWK last year when I retired.

BLUEFROG · July 29, 2016, 2:59pm

Fortunately, AWK is still well loved so there’s a ton of tutorials and info on it still.

s3mpai · August 12, 2016, 4:54pm

A lot of guides and documentation will assume you’re using GNU awk(1) — if you get into a corner and can’t figure out why something that should work isn’t and it’s erroring about functions that aren’t reflecting the reality of a guide, you should install GNU awk and use that explicitly in your scripts.

Easiest way to install it is brew install gawk via Homebrew.

Should consider a new thread purely devoted to newfangled text processing tools that slice and dice data — some of my favorites lately are ack (neilb.bitbucket.org/csvfix/) and I don’t know how I ever managed to work with CSV files before csvfix because it’s simply life-changing. e.g. You can treat csv files as if they were SQL and generate statements accordingly. And if that hasn’t already blown your mind, performing operations on block selections from csv data and performing basic validation of the contents of a file will probably do it.

BLUEFROG · August 13, 2016, 3:36am

Bleh! Haha! Thanks for the input. Just my own two cents - I always limit my dependencies and don’t usually suggest things that have to be installed. Just my personal preference. s’all good.

Interesting links. Thanks!

s3mpai · August 13, 2016, 6:46pm

As long as you’re fine with the limitations of BSD awk you wouldn’t need to install GNU awk.