Exporting limited field data for selected items

mwra · August 28, 2012, 3:22pm

Export from DT seems less catered for than import. I’d like to export a tab-delim list of just some fields for the current selection of item(s) in the database. Some messing around in AppleScript got me the following that actually puts the data on the clipboard but which is fine for my purposes:

set dataString to "Name\tURL\n"

tell application "DEVONthink Pro"
	set itemList to selection of front window
	tell front window
		repeat with anItem in itemList
			set itemName to ""
			set itemLink to ""
			set itemName to name of anItem
			set itemLink to reference URL of anItem
			set dataString to dataString & itemName & "\t" & itemLink & "\n"
			
		end repeat
	end tell
	
end tell
tell application "Finder"
	set the clipboard to dataString as Unicode text
end tell

FWIW, the data is used to create notes automatically in Tinderbox, that have DT back-links to the DT database item.

Am I over looking a built-in method for this? Are there export templates or do I have to roll my own data exports? Thanks…

korm · August 28, 2012, 4:30pm

Hi Mark, you got it right. DEVONthink isn’t much of a data-export vehicle, so a custom script such as the one you posted is the way to go – especially if you want to import to Tinderbox, etc.

I took the liberty to adjust the script syntax a bit.

-- revision of
-- https://discourse.devontechnologies.com/t/exporting-limited-field-data-for-selected-items/14511/1

set pTab to tab -- "tab" has a specific meaning to DEVONthink; it is not the \t character
set dataString to "Name" & pTab & "URL" & return

tell application id "com.devon-technologies.thinkpro2"
	set itemList to selection
	repeat with anItem in itemList
		set dataString to dataString & (name of anItem) & pTab & (reference URL of anItem) & return
	end repeat
	set the clipboard to dataString as Unicode text
end tell

mwra · August 28, 2012, 5:01pm

Thanks! Absolutely no problem with anyone polishing my limited AppleScript smarts.

Is there a OPML export script at all, so that I can have more direct control over what data gets written to which values? For instance, to pull across the DT inbound link with the title and text of a doc I may need to insert the URL into the beginning or end of the text. To do that I must control the value written to the item’s ‘_note’ OPML attribute.

korm · August 28, 2012, 5:08pm

There is the vanilla File > Export > As Outliner Processor Markup Language, which pretty much does what your script does - name, reference URL, and text.

In the forums there are lots of threads on OPML export (including for Tinderbox) – including an extensive dialog + scripts that Charles Turner and I had a few years ago. (Everything I know about escaping OPML, I learned from Charles ).

mwra · August 29, 2012, 9:40am

The OPML export as built-in doesn’t work as:

Large sections of text are missing or substituted as ‘␣’ characters**.
I don’t get the item link as the URL. I need “x-devonthink-item://F2CA8FC3-FD65-43BE-85F7-3572CE530893” and not “http://www.example.com/folder/somedoc.pdf”. I accept this is not the logical default as per the OPML spec, but it’s what I need.

** In fairness to the app, the same source PDF’s text copied via Preview gives the same problem so this in’t necessarily a DT problem. Seemingly random sections of text get copied as a ‘␣’ character. Is this a common problem with PDFs? I’d never seen this before today but doubtless it depends on the how the PDF was authored. Guess it’s Murphy’s Law that my test subjects mostly show this glitch in their text.

Though I’m not suggesting the default URL data is wrong it’s not what I need for my purposes. Ergo, I think I do need to be able to write custom OPML.

These things are never as simple as might be assumed. (The less one knows the simpler they appear!).

Side note: otherwise the DT -> OPML -> Tinderbox works beautifully.

korm · August 29, 2012, 9:59am

The export is going to look to the text layer (if there is one) of the PDF, so if there’s an issue with the text layer then you’ll get artifacts like this. For example, when a scanned document is OCRd there are frequently issues with the text layer. It’s easy to check in DEVONthink. Select any PDF whose kind is “PDF+Text” and then Data > Convert > to Rich Text. What you see in the resultant RTF is the best you can get from the OPML export or anything else that copies text from a PDF.

Check the script here for another approach.

mwra · August 29, 2012, 10:24am

Thanks. The RTF text looks like this (same, but styled, as the OPML export) :

Notice the sections of ruler-like characters? In the source doc that is readable text and in the same font as preceding/succeeding text. The first instance of garbage is a complete line within a body text paragraph. So I assume this is broken at source - bad encoding? The PDF is OmniOutliner’s Manual which reports as being made by Adobe InDesign CS2 (v4.0.4) on a Mac.

The other test doc with the issue was made with (a German) Adobe PDFMaker 9.1 for Word.

My hunch is this text is not easily machine-read, by DT or Preview, etc.

korm · August 29, 2012, 1:39pm

That’s interesting. I downloaded the same PDF and converted it over here in DEVONthink Pro Office 2.4.1 running on 10.8.1, and see no decoding errors.

mwra · August 29, 2012, 2:05pm

I’m using DTPro v2.4.1 on OS X 10.6.8 but the PDF is from Dec 2011. I wonder…

Downloaded the same doc afresh from Omni and all is OK. So the error is a badly-created PDF which clearly has since been fixed.

It’s probably worth noting in the support KB somewhere that if you see lots of ‘␣’ characters exported it’s data DT can’t read, likely due to errors in the original PDF and the best bet is to try and get a fresh copy of the PDF if possible.

BTW, if you want to see the old bad copy, in case it helps validate the problem (if only as not fixable for text retrievable) just email me. FWIW, I ‘printed’ part of the ‘bad’ PDF out as a new PDF. Text looked OK on screen but was, as I’d feared, still corrupted when copied.

I hope that helps anyone else tripping over the mystery ‘␣’ output!