Get a tab-delineated text file into DT as individual files

moses · December 26, 2004, 7:30am

Two other users posted a request in two different places as to how to get a single large tab-delineated text file into DT as individual files. As far as I know DT does not support this directly, but I figured there must be a work-around. It got me thinking and I just figured out how to do it and ran a successful test. So, here’s how to do it:

Before you begin:

You will need MS Word, or similar word processing app.
You will need to download “PDFpen” from smileonmymac.com (They have a free trial period so if you are only going to do this once you could just get it for this only.)

Here’s the trick – I realized that if you could get the tab-delineated file split BEFORE bringing it into DT, then it would be possible. Here’s how I just did it:

Take the tab-delineated text file (mine had over 3500+ records from a FilemakerPro database) and open it in Word. Then, do a Find and Replace action: Find = Paragraph Mark, Replace = Manual Page Break. Ran “Replace All.” This turns each record in my file into a seperate page.
Print the file, but when the print dialogue box comes up select “Save as PDF” and save the file as a PDF document.
Take the PDF document and open it in PDFpen. PDFpen has a Script menu, in that menu select “Split PDF”. It will then ask you where you want the split files saved. Continue on and it will then proceed to split each page of your entire PDF file into individual PDF docs. (Be warned, this script will take a little while…) Once completed, if everything has gone correctly up to this point you will end up with a folder containing one PDF file for each record of your original text file.
Open DT. Select the Preferences. Select the “PDF & PS” tab. Next to “Index and Convert” select “Use built-in pdftotext” AND check “Convert to Plain Text”
Now, from the Finder, just drag and drop the folder which contains all the PDF files into DT. It will import them all as individual plain text files.

*The only downside to this method is that each file will be named “Page 001…” and so on. You could of course take the folder of PDF’s into a batch renamer program first and give them a more meaningful name (I suggest the freeware “R-Name” www2.mitsuya.nuem.nagoya-u.ac.jp … index.html but I haven’t figured out how to automatically get a name from the actual record info yet… at least your data will be in DT as individual files and you could slowly rename them as you use them (selecting text and using the contextual menu option “Set Title As” I find very handy for naming individual docs).

Hope this helps. Before you try to do this with a file containing thousands of records I suggest you do a test with a smaller file to make sure it handles the way you want. Perhaps someone will have a better idea as to how to do this. Again, I think the trick is get the records split into individual files before you bring it into DT. Of course, hopefully DT will eventually include importing tab-delineated files as a feature.

eiron · January 10, 2005, 8:55am

Great Idea! I hadn’t thought of using word for transferring my Filemaker exports - i’ve been using HTML tables instead.

However I have used a similar workaround for big text files that I wanted to break up. It isn’t much easier, but only requires MSWord, and avoids the renaming problem.

Open The tab delineated text file in Word
Replace paragraph marks (^p) with paragraph marks followed by some other unique symbol (I use ^p•) This marks each record.
Replace linefeeds (^l) with Paragraph marks (^p)
Replace your unique symbol (•) With an empty field set to style Headline1. (This makes the first line/field of each record into an outlining header, and deletes the •)
Use the Master Document view (The Master Document palette should appear) Select all. Click Create Subdocument on palette. (Each Heading1 line should begin a new Sub document.)
Save to a new empty folder that will be populated with one file for every record. The file will be named for the text of the opening field of each record. (If you have symbols that MS considers illegal for filenames , replace them out first by searching within style Heading1 and removing them. )
Import or Link into Devon

It’s not a pretty fix. But it works.

Hope this helps.

Bill_DeVille · January 11, 2005, 1:15am

Moses:

Thanks. That Word Master Document/Save Subdocuments trick really works!

For anyone who has MS Word, this approach is a good solution to “parsing” tab text files into individual records for import into DEVONthink.

I’m sure there must be other solutions. Perhaps scripts with BBEdit or TexEdit+? Anyone got suggestions?

eboehnisch · January 11, 2005, 6:43am

DEVONthink Pro is/will be able too import tab-delimited text into a table (which basically is a group of individual records). Unfortunately, this feature is not complete in the current beta of DEVONthink Pro. You can import files and view them, but the table/forms editor is not ready yet.

Besides that, I’d suggest that an AppleScript droplet that splits one large file into many smaller ones would be the best solution for those without DEVONthink Pro.

Best,

Eric.