Preferences for import pdf & Ps

rolfschmolling · April 28, 2006, 9:24pm

Hi folks,
according to the manual (and online help and numerous postings I’ve looked at) I should be able to set the preferences as follows:

a) use pdfkit (Tiger)
and
b) convert to richtext

The option )b doesn’t show at all in the preferencepane of DevonThinkPro 1.1.1.

Huh!? It is importand because it sets how things are imported/sent to central file-folder etc.

Any suggestions?

Rolf

Bill_DeVille · April 28, 2006, 11:37pm

Rolf:

User options for capturing data are simplified in DT Pro 1.1.1, both to remove potential sources of confusion and in preparation for the revised database structure of the release of version 2.0 in the future.

To copy files into the DT Pro database package file, use File > Import > Files & Folders or the equivalent, drag & drop (to the Dock icon, to the floating Groups panel, or to the Names column of a DT Pro view window).

To establish link documents in DT Pro that link to external files, use File > Index or the equivalent, Command-Option-drag & drop (to the Dock icon, to the floating Groups panel, or to the Names column of a DT Pro view window). The previous File > Link option has been eliminated.

Similarities:

In either mode (Import/copy or Index/link) DT Pro will capture the text of readable file types into the database for searching and analysis.

In either mode (Import/copy or Index/link) DT Pro will ignore unrecognized file types unless the user has checked the option available in DEVONthink Pro > Preferences > Import Tab/File types for “Unknown file types”. If that option is checked, the Import mode will copy unknown file types into the database Files folder and create a blank “link” document, and the Index mode will create a blank document linked to the external file. In either mode, no text will be captured, but the contents of the Info panel (Name, Path, etc) will be searchable.

Differences:

In the Import mode, plain or rich text that has been captured is directly editable in the document’s text pane.

In the Index mode, plain or rich text that has been captured is not editable in the document’s text pane.

Exception for Word .doc files:

No matter which mode is selected, Word .doc files always remain externally linked. Do not delete Word files if future use is contemplated, as the database contains only RTF, not the original file.

About editing, saving and synchronization to the database:

In the Index mode, Actions > Launch Path will open the external file under its parent application, where edit changes will be made and saved. When the document is next opened in DT Pro’s database, those edit changes will be displayed and will, of course, be available for searching and analysis. So one-way synchronization from the edited linked file to the database content is automatic. This is also true if Actions > Open With is selected and a different application capable of editing and saving the externally linked file is chosen. A ‘lightning bolt’ symbol is appended to the names of documents imported using Index.

The user is cautioned that such synchronization is not necessarily the case for items imported in the Import mode. See the notes below that begin with the asterisk symbol:

Editing the RTF for a Word document does not change the original Word file. Using Actions > Launch Path or Open With and selecting MS Word will open the externally linked .doc file under Word, but changes made and saved will not change the database content unless the external file is reimported. “Open With” doesn’t necessaryily mean “Edit With”; the “Save As” option should be used to save any edit changes – I recommend using “Launch Path”. Currently, it may be preferable to use the Index mode to capture .doc files, as there are no ambiguities about edit/save results for files captured using Index.
Synchronization of edited and saved files that have been copied into the database Files folder does work (including “unknown” file types), when Launch Path or Open with is used. That includes images, PDFs, QuickTime media and (if the Preferences option is checked) “unknown” file types. Warning: Previously imported PDFs or other file types (images, QT media) that were copied to the “body” of the database instead of to the Files folder cannot be reliably edited and synchronized in this way. Likewise, PDFs imported using the previous option to capture plain or rich text cannot be synchronized in this way to edit changes of the external PDF file.

When DT Pro version 2.0 is released:

All files captured using the Import mode will be copied into the database Files folder structure, including those file types (text-based files including rich and plain text, HTML etc.) that are currently stored in the “body” of the database. Word .doc files will also be copied into the Files folder structure. The existing differences and ambiguities depending on file type when files are edited and saved will “go away” and edit/save will result in synchronization from the edited/saved file to the database content.

All files captured using the Index mode will continue to behave in the current way.

A fully Imported database will be more portable than an Indexed database, as it can be easily copied to another computer or to a DVD. The memory requirements of databases captured in either mode will be equivalent (currently, an Imported database can have much larger memory requirements, depending on the file types imported and the previous Preferences choices). There will be no essential differences concerning searching and information analysis using either ‘type’ of database.

rolfschmolling · April 29, 2006, 7:18am

Wow Bill,

that is a very comprehensive list of information. I’d suggest to put that into the manual because it is still somewhat short on such specifics…

besides it seems to stay a hot topic in the forum (and got me prompted to ask bacuse I didn’t get it)

Thanks a million.

Greetings
Rolf

Bill_DeVille · April 29, 2006, 8:08am

Rolf:

Just edited a mistake in the previous post. The equivalent to Index is Command-Option-drag & drop.

I’m working up a “Starting with DT Pro 1.1.1” introduction and hope to have it finished in a couple of weeks.

mdl · May 13, 2006, 5:15am

I’m not sure the original question was really answered here. As I understand it, the manual suggests that DevonThink can convert PDFs to rich text as it imports them. According to the manual, “Check ‘Convert to Plain/Rich Text’ if you don’t want to import the PDF itself, but convert it to editable rich text.” (This is the option that is not actually available in the PDF/PS preferences.)

In other words, what one would get on importing a PDF is an editable, self-contained rich text document in the database (not just an index file linking to some outside PDF). One can of course get this by using the Convert command in the Data menu, but it would be much more convenient if PDFs could be automatically converted to Rich Text upon importing, as the manual suggests. For instance, let’s say I’ve just downloaded 50 sample course syllabi in PDF format and I want to import these into the database as Rich Text documents (for smaller size, editability, etc.). If the automatic conversion feature were present, then these would simply appear in the database in Rich Text Format on import. As it stands, now I have to import the PDFs, convert them to RTF, sort the folder by file type, and then delete the PDFs (4 steps instead of 1).

Thus, the absence of the automatic “Convert to Rich Text” option in the PDF preferences panel is not just a “simplification” of import options, but the elimination of a distinct feature.

Please correct me if I’m wrong, but conversion on import doesn’t seem to be included in the possibilities sketched out by Bill. Why is this feature now absent? Am I missing something?

Bill_DeVille · May 13, 2006, 9:02am

mdl:

Yes, the choices in the previous Preferences > PDF & Postscript panel have been simplified and the option to import only plain or rich text from PDFs is gone.

The combination of possible settings was ambiguous and often caused user confusion. For example, if the user chose to copy the PDF to the database Files folder but also checked the option to import only rich text, the PDF was not copied to the database Files folder. Some users who thought they had copied the PDF to the database Files folder then deleted the original PDF.

Still another option, to copy the PDF to the ‘body’ of the database had become obsolete and tended to increase memory requirements.

NOW:

Import copies the PDF to the database Files folder and displays a PDF+text view of the document. The original PDF may be deleted without affecting the database.

Index leaves the PDF file external to the database and displays a PDF+text view of the document. If the PDF is deleted the document is no longer readable in the database.

IF YOU WANT TO CONVERT TO EDITABLE TEXT:

For an Imported PDF, select Data > Convert > Rich text. A new RTF document will be created, with the same creation and modification dates. This text document can be safely deleted without deleting the PDF file from the database Files folder.

For an Indexed PDF, select Data > Convert > Rich text. The PDF+text view will be replaced by rich text in the database and will no longer be available.

Personally, I would probably use another option, such as launching the PDF under Preview. Then select all or any desired text and copy it to the clipboard. The text can be pasted into a DT Pro document or into a TextEdit document.

For many PDFs – and likely the kind you described – there’s not much difference between the PDF+text and RTF size for the document display. If the PDFs are Index captured, there’s not much difference in database size for PDF+text versus RTF display, for your example files.

But the PDF+text documents are much easier to read.

cgrunenberg · May 13, 2006, 9:10am

The documentation is not up-to-date but you could use a simple droplet to achieve the same task, e.g. save the following script as an application, then drop PDF documents on it.


on open these_items
	repeat with this_item in these_items
		try
			set this_path to POSIX path of this_item
			tell application "DEVONthink Pro"
				set theRecord to import this_path without unstyled
				try
					if type of theRecord is picture then
						set theConversion to convert record theRecord to rich
						set name of theConversion to (name of theRecord) as string
						delete record theRecord
					end if
				end try
			end tell
		end try
	end repeat
end open