Regex, applescript & automator

Hi guys,

I have a text record that look like this:

Do you guys think that I can use regex or another similar method through apple script and/or automator to split the one text record containing this text into several text files, using the numbers as cues to split the text and create each new file?

If yes, has it been done before? Are you aware of suchlike automation?

I’ve been looking around without success, and I gotta tell you: I got myself a 1300 pages single-document that should be split into 50000 text files… and am really afraid of doing it by hand :confused: :confused: :confused: :confused:

Is the document conforming (consistent) enough to be run through a script??

Even if a script could handle that, how would you ever determine that the 50,000 files were all correctly formed?

Thanks guys.

The doc is 100% consistent, no problem with that.
It consists of chunk of text separated by numbers.

Good question, there will be some checking needed, but again, the text file is super consistent.
Actually, when I said 1300 page long document, it’s just to give you a rough idea of the size when printed. The file I have is just a single text file. But if needed I can also get the same text already divided in several separate textfiles. I just thought that one text file would be a better thing, you know, a one script thing.

Awk is what I would use but Csplit would also seem to work. You will have to work out the correct regex in either event.

Frederiko

Well, never heard of awk or csplit. It looks promising for my needs, I’ll dwelve into that and want to thank you for your input!

Wow, AWK! I used to introduce that language in my upper level computer languages course 25 years ago. These days, I rarely hear of anyone mentioning AWK.

I keep ignoring AWK (on account of my undying love for sed), but in the things I’ve done with AWK, it is very cool indeed! :smiley:

Unfortunately I tossed out my only reference book on AWK last year when I retired.

Fortunately, AWK is still well loved so there’s a ton of tutorials and info on it still. :smiley:

A lot of guides and documentation will assume you’re using GNU awk(1) — if you get into a corner and can’t figure out why something that should work isn’t and it’s erroring about functions that aren’t reflecting the reality of a guide, you should install GNU awk and use that explicitly in your scripts.

Easiest way to install it is brew install gawk via Homebrew.

Should consider a new thread purely devoted to newfangled text processing tools that slice and dice data — some of my favorites lately are ack (neilb.bitbucket.org/csvfix/) and I don’t know how I ever managed to work with CSV files before csvfix because it’s simply life-changing. e.g. You can treat csv files as if they were SQL and generate statements accordingly. And if that hasn’t already blown your mind, performing operations on block selections from csv data and performing basic validation of the contents of a file will probably do it.

Bleh! :mrgreen: Haha! Thanks for the input. Just my own two cents - I always limit my dependencies and don’t usually suggest things that have to be installed. Just my personal preference. s’all good. :smiley:

Interesting links. Thanks!

As long as you’re fine with the limitations of BSD awk you wouldn’t need to install GNU awk.