Splitting a PDF

frmoses · September 4, 2005, 11:34pm

For converting PDF’s, another app you might want to try is: Trapeze ($30 US)
mesadynamics.com/trapeze.htm
It has various conversion options and might be useful for what’s being discussed here.

rickl · September 5, 2005, 1:22am

Andre and Frmoses,

Thanks for keeping this thread going. I just tried out Trapeze. I’m impressed both by the range of software available and by the fact that DT people, few though we seem to be, seem to know them all. The export to text gave me a document with recognizable paragraphs and just a few weird breaks here and there. But pasting to CP Notebook as an outline again gave me one cell per line. Fortunately, a “Remove line breaks” command in TextWrangler (no doubt a “Remove line endings” command from the Format services would serve as well) did the trick, and now I have a proper outline in Notebook. Now I’m about to go and pay for Trapeze and trust it works as well on the hundreds of 2-300-page PDFs I need it for.

Each PDF is a complete journal issue, and the articles therein have little relation to each other. And there are far too many articles for me to read even all the ones that seem interesting. So I’m planning to try the Ariew Method in something of a reversal of a normal way of working, letting Notebook help me extract interesting paragraphs and then building up to reading complete papers that seem important.

Ariew · September 5, 2005, 1:55am

I’m very glad that the Notebook method turned out to be useful!

Rickl, I hope you continue to report on the limits and successes you encounter.

Cheers,

Andre

frmoses · September 5, 2005, 2:46am

rickl,
Do keep us updated on how this works for you. I myself, though I recommended it, have not actually bought Trapeze, as my needs for this are limited at this time. I have only used it for some very short PDF conversions, allowed in the free mode. But I do have some possible uses for what you are trying to do, so I, as well as others I am sure, are interested in what you discover actually works, or doesn’t.

Oh, BTW, here’s something you might want to try that I have done before in order to get a big document into smaller “sweet spot” documents of one paragraph each into DT [you need to have MS Word (or another good word processor)]:

open your RTF into Word, and use the Find/Replace feature to replace all paragraph marks with page breaks.
then “Print” but choose the “Save to PDF” option in the print dialogue
then, take your “big” PDF full of lots of pages, and use one of the PDF converter apps to split all the pages apart [PDFpen works great for this – it has an included script that does this, and could be done in the trial version]
this gets all your PDF pages into individual documents
you might want to then use a batch renamer to name them all something meaningful (I like the free app R-Name: www2.mitsuya.nuem.nagoya-u.ac.jp … index.html )
have your DT prefs to convert PDF’s into text and then drop the whole bunch into DT – the result will be a new DT doc for every paragraph from your original
– this is all more laborious to describe then to do, I used this method to get files out of a FileMaker Pro database (via a text export) into individual DT docs, and it worked admirably

rickl · September 7, 2005, 8:34am

frmoses:

rickl,
Do keep us updated on how this works for you. I myself, though I recommended it, have not actually bought Trapeze, as my needs for this are limited at this time. I have only used it for some very short PDF conversions, allowed in the free mode. But I do have some possible uses for what you are trying to do, so I, as well as others I am sure, are interested in what you discover actually works, or doesn’t.

Oh, BTW, here’s something you might want to try that I have done before in order to get a big document into smaller “sweet spot” documents of one paragraph each into DT [you need to have MS Word (or another good word processor)]:

open your RTF into Word, and use the Find/Replace feature to replace all paragraph marks with page breaks.

then “Print” but choose the “Save to PDF” option in the print dialogue

then, take your “big” PDF full of lots of pages, and use one of the PDF converter apps to split all the pages apart [PDFpen works great for this – it has an included script that does this, and could be done in the trial version]

this gets all your PDF pages into individual documents

you might want to then use a batch renamer to name them all something meaningful (I like the free app R-Name: www2.mitsuya.nuem.nagoya-u.ac.jp … index.html )

have your DT prefs to convert PDF’s into text and then drop the whole bunch into DT – the result will be a new DT doc for every paragraph from your original
– this is all more laborious to describe then to do, I used this method to get files out of a FileMaker Pro database (via a text export) into individual DT docs, and it worked admirably

Thanks for this suggestion. I didn’t know about batch renamers and the trick of changing paragraph breaks to page breaks is a nice one. I can see how the process could be quite a simple and click if everything went to plan. The weak link in the chain seems to be Trapeze, because, although it does way better than Acrobat 5.0, it quite often equates single lines or whole pages with paragraphs, so I’m having to do a lot of hand-tweaking to avoid thousands and thousands of PDFs, which, based on past experience, are liable to slow DT down (though if I had DT reconvert them to RTF that might solve the problem).

ryannagy · June 16, 2012, 10:50pm

Three cheers for

which works like a charm to split pdfs. Amazing.