Export exploded notes (or any selection) to OPML

korm · April 20, 2010, 12:27pm

[Withdrawn. Don’t have the inclination to continue rewriting it.]

AsafKeller · April 20, 2010, 2:19pm

Research.opml.zip (3.3 KB)Thanks much for posting this (as well as the previous, RTF parsing script).
I cannot get this OPML script to work. TB complains that it creates not well formed XML, and TextWrangler says the same, but does not find any gremlins. I attach a sample file. Thanks for any help!

cturner · April 20, 2010, 3:22pm

Hi korm-

Great that you posted this stuff. I haven’t tried the script, and won’t be able to for a while, but I wonder if your “conversion” to Unicode is the source of your gremlins?

I’m not sure that Tbox is 100% happy with Unicode, and Applescript deals with UTF-16, which I know Tbox doesn’t support (eg no Chinese character support). You may be getting Unicode two-byte line endings that cause trouble with Tbox. Again, haven’t tried this, so take my suggestions with grains of salt.

Also, should you ever cozy up to Python and py-appscript, here’s a script that will take a selection of DTPO groups and export the hierarchy in OPML format:

# Build an OPML file suitable for import into Tinderbox
# Charles Turner
# 2010-04-12

from appscript import *
import re


def escape(m):
  if (m.group(0) == '"'):
    return '&quot;'
  elif (m.group(0) == '\t'):
    return '&#9;'
  elif (m.group(0) == '\n'):
    return '&#10;'
  elif (m.group(0) == '>'):
    return '&gt;'
  elif (m.group(0) == '<'):
    return '&lt;'
  elif (m.group(0) == '&'):
    return '&amp;'

def opmlify(txt):
  return re.sub('(")|(\t)|(\n)|(>)|(<)|(&)', escape, txt)

def makeset(lst):
  return '; '.join(lst)

def authordate(name):
  r = re.search('^(\d+)-(\w+)\.', name)
  if (r != None):
    return r.group(1), r.group(2)
  else:
    return "", ""

def children(record):
  props = record.properties.get()
  nm = opmlify(props[k.name])   # If name needs opmlify
  print '<outline text="%s" dtname="%s"' % (nm, nm)
  ad = authordate(props[k.name])   # sure should need it here
  print 'idate="%s" author="%s"' % (ad[0], ad[1])
  print 'dthinkurl="x-devonthink-item://%s"' % props[k.uuid]
  print 'dthinktags="%s"' % opmlify(makeset(props[k.tags]))
  note_t = props[k.type]
  if ((note_t == k.text) or (note_t == k.rtf) or (note_t == k.rtfd)):
    print '_note="%s"' % opmlify(props[k.plain_text]),
  else:
    print '_note=""',
  kids = dtpo.get(record.children)
  if (kids != []):
    print '>'
    for j in kids:
      children(j)
    print '</outline>'
  else:
      print '/>'

dtpo = app('DEVONthink Pro.app')
selection = dtpo.selection.get()

print '<?xml version="1.0" encoding="UTF-8"?>'
print '<opml version="1.0">'
print '<head><title>Scratch</title></head>'   # Tbox doesn't use this info
print '<body>'

for i in selection:
  children(i)

print '</body>'
print '</opml>'

Although it prints to a terminal window instead of writing a file, I’ve had no trouble with Tbox processing the format. Perhaps the logic is if use even if you don’t decide to jump.

Best wishes,

Charles

korm · April 20, 2010, 4:30pm

I’ve withdrawn the script due to these and other bug reports. If I have a chance to rework it, I’ll repost later.

@Asaf - your text has numerous non-Ascii characters embedded in it, which BBEDIT did not zap. I round-tripped the original OPML through OmniOutliner and saved as a new OPML file, which Tinderbox accepted. Try that if you can. OmniOutliner is pretty tolerant of error.

@Charles - the Unicode thing was one attempt to get rid of the Tinderbox-doesn’t-accept-this-file syndrome. It worked for some documents, but obviously, from your and Asaf’s findings this is not a universal solution. One of the reason’s I’ve withdrawn the script while I search for a 100% reliable way to get text from DTPO to Tinderbox.

cturner · April 20, 2010, 8:52pm

Hi korm-

If you withdraw your script we can’t be of much help!

I did look at AsafKeller’s sample output, and it looks like your “opmlify” function from Christian isn’t doing its job: there are linefeeds (not ) in the text. Is it being called correctly?

This would explain why Tbox chokes, but BBEdit doesn’t see any gremlins.

I’ll opmlify Asaf’s file and see if it works. Will report back later…

HTH, Charles

korm · April 20, 2010, 9:27pm

Hi Charles, et all.

I’ve completely redone the script (see my OP at the top of this thread) and eliminated all of Christian’s code (sorry Christian, it’s just not working). The data cleansing relies wholly on unix tr. Even with this, the script moves along at a good pace. I’m not getting gremlins any more. (I hope).

@Asaf, if you could try with the revised script I’d appreciate it.

cturner · April 20, 2010, 9:36pm

Korm-

I fiddled with Asaf’s test file and there are two problems:

For some reason, the text isn’t getting opmlified. Check the call to Christian’s function as I’ll assume he gave you code that worked.
There is no space between the closing quote of the Name attribute and the next attribute designator, which is “Text=”. The Tbox parser is choking on it, even though OO3 accepts it.

After that, I bet it’ll work!

Also, I’d see if your script works without shelling out to “tr,” or coercing the text to Unicode. My intuition is you’ll be fine without all that stuff, which only slows down the script.

HTH, Charles

cturner · April 20, 2010, 9:46pm

Korm-

Your new version works, EXCEPT the text isn’t escaped for OPML, so no joy…

Put Christian’s function back in and I’m sure you’ll get good results.

C

korm · April 20, 2010, 9:59pm

OK

cturner · April 20, 2010, 10:57pm

You can do it!

Anyway, send me the old script…

cturner · April 22, 2010, 10:08pm

Okay- You can find what I think is a working script here:

http://vze26m98.net/tbx/opml/explode_opml_v0.01.zip

Korm took down his statement of requirements so I had to make this up as I went along. This Applescript takes a selection of records (not groups) of pretty much any kind of file that DT will store, as long as it has a “plain text” representation in the DT database. This would include: text, rtf, pdf, html, sheets…

To the extent that the “plain text” of your selection(s) has paragraphs (ie, delimitation by newline), the script will “explode” (Tbox-talk) the paragraphs into separate OPML entities, and create attribute data taken from the original record.

The selection of records is formed as a sequence of siblings, and the paragraphs are a sequence of children to each parent record. Unlike “explosion” in Tbox, the DT attribute data is replicated into each (child) paragraph.

Although I don’t like to write in Applescript, I will support this piece of work of mine. If it’s giving you trouble (different from your having trouble with it), I’d love it if you would send your original DT documents to me so I can reproduce the problem. My email is in the source file in the download.

The above URL will always have the latest version, and I’ll continue to use this thread for support discussion/announcements.

Enjoy! But backup your data; screw-ups in the script’s use are your problem.

Thanks to Korm for starting all this and supplying a decent chunk of base code. Thanks to Christian Grunenberg for some code that disappeared, and also being a great programmer!

(Now, back to musicology…)

Best wishes, Charles

cturner · April 23, 2010, 9:36pm

Hi all-

Asaf Keller and Paul Walters surfaced an interesting issue with the first revision of the script. Although Applescript deals with text as UTF-16 internally, its default file write command translates to MacRoman.

So the files created by the script had a header file stating a UTF-8 encoding, and MacRoman for text. This really screwed up the parsing of 8-bit characters, which were viewed (correctly) as malformed by Tinderbox, OmniOutliner and BBedit, among others.

(So Korm, if you’re reading this, the above was the source of almost all your trouble. After all the care you took internally, the file write command introduced “gremlins” at the very last moment. Applescript is a really dumb language!)

You can find a revision that corrects these deficiencies here:

http://vze26m98.net/tbx/opml/explode_opml_v0.02.zip

Best wishes, Charles

NZT-48 · September 26, 2017, 2:37am

hi,

just wondering if:

-this script is still functional, and can work with Tinderbox 6, and;

-if it can be used to export a section of highlighted text from an RTF file (stored in DEVONthink), and export it to Tinderbox so that it also links to the DTP document & section of the highlighted text.

if so, which iteration of this script should i use?

thanks.

korm · September 26, 2017, 10:03am

The last version Charles posted in 2010 is functional and can work with any version of Tinderbox since 2010 through today. But it needs to be modernized to use attribute names that exist in Tinderbox instead of idiosyncratic attribute names that are unique to the script.

The script was not written to do that.

(a) it does not parse text looking for highlights;
(b) DEVONthink does not support links (in x-devonthink-item form) to specific text, paragraphs, or other internal portions of a document, so no script can meet that requirement without some custom gymnastics to make it work
(c) it does not support RTF – i.e., the text exported is plain and styles are lost.

Drag and drop between DEVONthink and Tinderbox 7 is the preferred method, now, of getting data from DEVONthink to Tinderbox.

You can of course test any of the scripts here yourself and see what they do. Be aware that this particular script was written to “explode notes” – meaning it takes each selected text file in DEVONthink and creates a parent entry for that file in an OPML file, then for that file it creates a separate child OPML entry for each paragraph of of that file. If anyone is looking for a script that merely exports the entire text of a file as a single OPML entry, this is not what you are looking for.

(BTW, you can attempt to use the standard Export > as OPML command in DEVONthink, but in my experience it makes ill-formed OPML and the export is not usually useful without some post-processing and cleanup in an external XML editor before importing the file into Tinderbox.)

NZT-48 · September 27, 2017, 4:32am

korm:

NZT-48:

-this script is still functional, and can work with Tinderbox 6, and;

The last version Charles posted in 2010 is functional and can work with any version of Tinderbox since 2010 through today. But it needs to be modernized to use attribute names that exist in Tinderbox instead of idiosyncratic attribute names that are unique to the script.

-if it can be used to export a section of highlighted text from an RTF file (stored in DEVONthink), and export it to Tinderbox so that it also links to the DTP document & section of the highlighted text.

The script was not written to do that.

(a) it does not parse text looking for highlights;
(b) DEVONthink does not support links (in x-devonthink-item form) to specific text, paragraphs, or other internal portions of a document, so no script can meet that requirement without some custom gymnastics to make it work
(c) it does not support RTF – i.e., the text exported is plain and styles are lost.

Drag and drop between DEVONthink and Tinderbox 7 is the preferred method, now, of getting data from DEVONthink to Tinderbox.

You can of course test any of the scripts here yourself and see what they do. Be aware that this particular script was written to “explode notes” – meaning it takes each selected text file in DEVONthink and creates a parent entry for that file in an OPML file, then for that file it creates a separate child OPML entry for each paragraph of of that file. If anyone is looking for a script that merely exports the entire text of a file as a single OPML entry, this is not what you are looking for.

(BTW, you can attempt to use the standard Export > as OPML command in DEVONthink, but in my experience it makes ill-formed OPML and the export is not usually useful without some post-processing and cleanup in an external XML editor before importing the file into Tinderbox.)

got it. thank you.

i realize i’m overlapping with a post i put up in another thread, and i’m sorry about that.

i’ll look into drag & drop for Tinderbox 7. i was hoping to avoid the exorbitant upgrade cost, but might need to bite that bullet.

the question i raised in the other post - which doesn’t have to be answered here - was simply: is there another, suggested solution to what i’m seeking to do that you (or other users) could recommend?

just reposting that in case other interested users here might want have the same question and aren’t reading that thread.

thanks.