PDF to RTF Issue

ChemBob · February 14, 2005, 1:53am

I converted a PDF to RTF format after a search in DT that turned up the document. I wanted to drag or copy parts of it to another program. DT called TextLightning that did the conversion. Unfortunately after the conversion, upon opening the RTF in DT in its own window, the text extended out to 18 inches from margin go margin in parts of the document. And nothing on the ruler would respond to allow me to change the margins, or anything else. It is as though the ruler is dead. Also some of the lines broke in the middle, making them extra short. Adding insult to injury, hitting the button “Open with TextEdit” opened the original PDF in Preview rather than the new text (RTF) in TextEdit.

This is frustrating because I’m trying very hard and spending lots of money in an attempt to increase my workflow efficiency and behaviour like this is interrupting, distracting, and drains away any potential timesavings from using these tools. Has anyone else seen this behavior? Do you folks at DT have a solution to this problem please?

Thanks,
ChemBob

Bill_DeVille · February 14, 2005, 4:31am

ChemBob:

I can’t replicate your problem of the RTF capture opening under Preview instead of TextEdit.

Otherwise, I feel your pain.

Generally, I just Index import PDFs. That way, the text content is available in DT for searches, etc.

Things may get more complicated if I want to extract a clipping of text from the PDF file. In that case, I’ll do a second, temporary import (temporary, meaning I’ll throw this import away after copying a text selection). For that copy, I prefer TextLightning because it retains character styles (usually), and is much easier to read, especially for multi-column originals. (For those who don’t have TextLighning, that may not justify its purchase. It’s much slower than pdftotext, and may choke on some PDFs.)

Cocoa text under Panther can’t render tables (I’m hoping Apple will improve that under Tiger). The only way I can get a table rendered in RTF is by copying it as an image and pasting it in (but the text isn’t searchable).

By the way, I’ve got the full version of Actobat. It can do RTF export, but that’s usually a lousy job, also, if the original document has a complex layout.

If I’m working with a long PDF document and I want to take multiple text selections from it, sometimes the easiest route is to do an OCR translation to Microsoft Word RTF, choosing a frames conversion. (Sometimes its just easier to retype what I want from a PDF file, like I used to do in typewriter days!)

DEVONtechnologies isn’t responsible for the difficulties of clipping text from PDFs. I hope Apple tackles these problems with Tiger.

moses · February 15, 2005, 12:23am

ChemBob, I think you should try the PDF to text conversion program called “Trapeze” – I have tested it and liked it better than TextLightning. It has options as to how the text gets converted, for me it made a much more useable text document. I don’t do this much, so don’t have a lot of experience with it – but I did try the two side by side and preferred Trapeze. Of course, you will have to do the conversion outside of DT then bring the converted doc into DT. Trapeze is at:
mesadynamics.com/

ChemBob · February 15, 2005, 12:41am

Thanks, I’ll give it a look!

ChemBob

ChemBob · February 15, 2005, 5:05pm

OK, I tried Trapeze and it worked well on the first 3 pages in demo mode but I don’t really want to cough up another 30 bucks for yet another shareware when I’ve got so much invested in unused/improperly functioning tools already. I have Adobe Acrobat Pro and that totally screwed up the PDF to RTF conversion, converting a 2.8 MB PDF to a 16 MB RTF! So I tried using TextLightning externally to DT and created an RTF. The line breaks were, as within DT, screwed up. But I opened the document in Word and was able to get it straightened out by using Word’s controls, then saved it as an RTF.

Here’s the dicey part. So I imported the new RTF into DT, putting it into the same group as the PDF of the report. Did exactly the same search that found the original PDF. It found the PDF but NOT the RTF! I quit DT and reopened. Same result. I verified and repaired, backed up and optimized, and got exactly the same result. The RTF is not found in the search that finds the PDF, even though all the text is there. It is, in fact not in the list of found files anywhere, even though many other less relevant documents are. What gives? How does DT do these searches anyway? How much confidence can we have that we are finding all the documents that are relevant to a search when it is not even spotting this RTF at all in the search?

Does anyone have a clue about this behavior? Help is appreciated. It seems like I’m spending all my productive time trying to figure out why my productivity tools aren’t behaving as expected.

ChemBob