Converting indexed files into imported

macula · December 4, 2011, 8:59pm

This is the reverse procedure of what has been discussed in the forum already: I am trying to move external files (indexed) into the database, as I don’t need direct access to these files via the Finder, and I want to keep my folder structure tidy. The problem is that in my first days as a DT user, before familiarizing myself with the software and its wonderful “x-devonthink://” URIs, I used an awful lot of “hard” links to indexed files. These hard links have the form: “file://…”

My question: Is there an automated way to import those indexed files into my database and convert any hard links to these files into x-devonthink links?

Many thanks!

korm · December 4, 2011, 9:17pm

Converting indexed to imported is done by selecting the document(s), control-clicking, and chosing Move Into Database from the contextual menu. If you make a Smart Group with the predicate “Instance is Indexed” you’ll have a list of all your indexed files. Select the whole list, control-click, and Move into Database. (CAVEAT: exit DEVONthink and make a backup copy of the database before doing this – just to be safe.)

Could you explain a bit where the file:// links are? That part of your request might be addressed by a script, but it would be helpful to know a bit more.

macula · December 5, 2011, 5:05am

Thanks, korm, for pointing out this feature. As for “file://” links, they are really a bad idea that I would rather not talk about, but if you insist:

You can link to any file in the filesystem using a URI of the form:

[file://](file://)

For example:

file:///Users/MyUserName/Desktop/hello.txt

Notice the triple slash in the above link—two slashes for the URI prefix (file://) and one for the file path.

This allows you to link to a file that is neither imported nor indexed into DevonThink. But as I said, in retrospect this can easily become messy and has no real advantages that I can think of (except, perhaps, when the file is in a format that DT cannot read at all?).

So to get back to my original question, it would be useful to have a script that would:
— Scan the entire database for documents containing “file://” links
— Import or index the files pointed at by those links
— Convert the “file://” links to “x-devonthink-item://” links

korm · December 5, 2011, 11:10am

I know what file:// links are – I was looking for info on where you used such links. From your description, I’ll assume the links are contained in RTF files (or files that could be converted to RTF), which makes the following answer a bit easier to implement, but not much.

First, you might get better answers than mine if you repost this request into the Scripting forum, because solving the problem requires custom scripting. Off the top of my head, logic of the code could be:

[size=85]select items in database
-with each item:
–if item is RTF
–select text of the item
—if delimiters of the text are RTF hypertext delimiters (see example script)
----if this link is type “file://”
-----if this file exists
------tell DEVONthink to import the file [major problems: see note below]
------get the reference URL (x-devonthink-item://) of the imported file
----- replace the “file://” URI with “x-devonthink-item://” URI
----continue replacing links in this file
—save the file
–continue with next RTF item
-finish examining items
finish script
[/size]

Major problems (list is not exhaustive):

the process of correlating the location of the item in the filesystem that “file://” points to, and the location of that item in the database
what to do when the “file://” URI occurs more than once in the database - could test everything in the database to see if the item was already imported, or import a new copy (and fix the duplicates later)
what to do with links not in RTF files (e.g., in .doc, .docx, .pages, etc.) - even if inspecting links in these filetypes is scriptable, you’ll need a custom script for each file type

… I think this is where I get off this ride. Certainly finding a solution is feasible, but the time involved is …

macula · December 5, 2011, 11:46am

Your assumption is correct: I have used the file:// links in rtf files.

This addresses problem #3 in your list (no other formats to deal with).

Problem #1 does not apply in my case as I wouldn’t mind placing all newly imported files in the same “repository” group within my DT database.

Problem #2 could be solved by duplicating the file before importing it (in a way that would not involve a modification of its name, probably by appending a “_temp” suffix in the file to be imported, keeping the filename of the duplicate file unaltered, and having the suffix removed from the imported filename).

But indeed, I am not sure that the number of file:// links in my database is large enough to justify such a scripting effort (much as it could be a moderate effort in the hands of a very expert coder).

Thanks!