Moving from Mariner Paperless to DevonThink

BLUEFROG · December 4, 2023, 2:46pm

Thanks for sharing, though that’s far beyond the expertise of many people to construct on their own.
This doesn’t provide any information about the actual documents though.

eric_g_97477 · December 4, 2023, 3:03pm

@BLUEFROG I am not sure what you are looking for.

eric_g_97477 · December 4, 2023, 3:53pm

@BLUEFROG @chrillek

Do you know if it is possible to change the “addition date” of a record via AppleScript?

For a transfer, I need to maintain this information from Paperless. I like the fact that the UI does not permit this data to be changed. It looks like my only option is to use the date field. However, it is occasionally necessary for me to make a change that is going to count as a modification, but I need to maintain the Addition Date information so various Smart Groups would not get screwed up.

I could create a custom metadata field for this purpose, but it will get annoying when I return to my normal workflow of scanning in a document and then being required to manually enter the information. Although, perhaps I did run across mention feature where on add data could be automatically set so I would not have to do this manually.

BLUEFROG · December 4, 2023, 3:56pm

No, it’s read-only. It’s specifically when a document is added to a database. That’s not variable.

eric_g_97477 · December 4, 2023, 3:59pm

Understood. I just need a way to keep track of my old documents and when they were added to the paperless database…which was not variable.

As I mentioned, it looks like I can create a custom metadata field called “Real Addition Date” or something. When I scan in a document, is there a feature where I could have the current date automatically copied into the field?

I suppose I could figure out how that date is being stored in your database and change it, but not sure I want to go to that extreme…still may look into it.

SlickSlack · December 4, 2023, 4:03pm

Or just make a custom metadata field called Paperless_added and store the date there.

eric_g_97477 · December 4, 2023, 4:05pm

@SlickSlack Agreed. I can do that as I mentioned above. Although, I was calling it “Real Addition Date.” The problem is that I will need the field to contain accurate information for new documents coming in as well to keep my sanity when setting up Smart Groups. Do you know if there a feature where I could have the current date automatically copied into the field?

BLUEFROG · December 4, 2023, 4:12pm

Of course. See Windows > Sidebar: Navigate > Smart Rules and its links.

chrillek · December 4, 2023, 4:16pm

DT has a field creation date accessible via scripts. And you might want to search the forums for additional hints, as your questions have already been discussed here before. Especially the sub-forum on automation might be relevant for you.

SlickSlack · December 4, 2023, 4:19pm

Yes you can employ a:

Smart Rule or
Batch Process after you import a bunch of files.

Or have the Smart Group check both date fields.

My real advice would be this though:
Start processing your new media into DevonThink, get a feel for how it works, read the manual, set up your custom metadata, read the manual, and then, like I did, set it up custom metadata again but with experience guiding you. Keep Paperless running with all the old media there and when you’ve learned the ins and outs of DevonThink, employ a strategy to get from Paperless to DT.
Don’t waste time trying to exactly copy the few very cool things about Paperless that become less cool once you realize the power of DT and it’s Classify and Smart Rules and Replicate/Duplicate functions.
I went down that path a bit but it wasn’t worth it. I’ve slowly copied over the stuff I wanted from Paperless as I needed it.
Your needs might be more pressing though so - grain of salt etc.

eric_g_97477 · December 5, 2023, 12:48am

I have successfully transferred my data from Paperless to DevonThink. I will explain below. The procedure I used will only be useful to those familiar with both Python and AppleScript. Additionally, there was no attempt to make this general or generic to handle the myriad of ways Paperless is used. My usage of Paperless was uncomplicated. It didn’t take much effort to translate that to DevonThink.

The only part of this process that I may question was creating a custom metadata field called “Title” so I could associate a custom title do my document. I did not want to change the name of the document itself and did not see an alternative field.

I started with the following python script:

import sqlite3
import os
import json

PATH_TO_DB = "/Users/ericg/Desktop/untitled folder 2/DocumentWallet.documentwalletsql"
BASE_PATH_TO_DOCUMENTS = "/Users/ericg/Desktop/untitled folder 2"

QUERY = '''
WITH DOC_TAGS AS (
		SELECT tags.Z_14RECEIPTS1 as ID, ZTAG.ZNAME as NAME  
		FROM Z_14TAGS tags
			JOIN ZTAG ON Z_PK = tags.Z_18TAGS
	),
	GROUPED_DOC_TAGS AS (
		SELECT DOC_TAGS.ID, GROUP_CONCAT( DOC_TAGS.NAME, '<<[]>>'  ) as TAGS 
			FROM DOC_TAGS
			GROUP BY ID
	)
SELECT 	ZRECEIPT.Z_PK as ID, 
		DATETIME(ZRECEIPT.ZIMPORTDATE + 978307200, 'unixepoch') as IMPORTED,
		ZRECEIPT.ZMERCHANT as TITLE, 
		ZCATEGORY.ZNAME as CATEGORY, 
		ZSUBCATEGORY.ZNAME as SUBCATEGORY, 
		GROUPED_DOC_TAGS.TAGS, 
		ZRECEIPT.ZNOTES as NOTES,
		ZRECEIPT.ZPATH as PATH
FROM ZRECEIPT
	LEFT JOIN GROUPED_DOC_TAGS ON GROUPED_DOC_TAGS.ID = ZRECEIPT.Z_PK
	LEFT JOIN ZCATEGORY ON ZRECEIPT.ZCATEGORY = ZCATEGORY.Z_PK
	LEFT JOIN ZSUBCATEGORY ON ZRECEIPT.ZSUBCATEGORY = ZSUBCATEGORY.Z_PK
'''

def outputScriptLine(result):
    
    filePath = f'"{os.path.join( BASE_PATH_TO_DOCUMENTS, result[7] )}"'
    
    notes = f'"{result[6].strip()}"' if result[6] is not None else '""'
    
    importDate = f'"{result[1].strip()}"'
    
    tags = result[5].split( '<<[]>>' ) if result[5] is not None else []
    tags = [ result[3], result[4], *tags ]
    tags = filter( lambda x: x is not None, tags )
    tags = [ f'"{x.strip()}"' for x in tags]    
    docTags = f"{{{','.join( tags )}}}" if len(tags) > 0 else '{}'
    
    title = f'"{result[2].strip()}"'

    print( f"my addDocument({filePath}, {notes}, {importDate}, {docTags}, {title})" )
    print( "delay 3" )
    

con = sqlite3.connect( PATH_TO_DB )
cur = con.cursor()
res = cur.execute(QUERY)
results = res.fetchall()

for result in results:
    outputScriptLine( result)

Its job is to generate strings of text that can be pasted into an AppleScript script. Those strings point to the file on disk and provide the metadata I needed to preserve. I put the delay 3 in there between import calls because I was having some trouble getting it to work otherwise. The delay seemed to allow the system time to work a bit before the next import was generated. The delay of three seconds is arbitrary otherwise.

The base AppleScript is:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

on addDocument(filePath, notes, importDate, docTags, title)
	
	tell application id "DNtp"
		
		set dp to import filePath to current group
		
		set comment of dp to notes
		set creation date of dp to importDate
		set tags of dp to docTags
		
		add custom meta data title for "Title" to dp
		
	end tell
	
end addDocument

-- 
-- insert the lines from the python script here
--

Just copy the lines output from the python script into the AppleScript and run. Note, there may be a size limit for AppleScripts, so you may need to divide the lines to page into multiple chunks and run them separately.

If anyone has any questions, let me know. Otherwise, I will be testing out my transfer for the remaining trial period I have for DevonThink.

I do wish I could hide the Name column, but that does not appear possible.

Enjoy.

BLUEFROG · December 5, 2023, 3:09am

Thanks for sharing your approach.

chrillek · December 5, 2023, 9:16am

Wouldn’t it be easier to have the AppleScript code run the SQLite query via do shell script and call addDocument for each result? Personally, I’d use JavaScript for that because it has better string methods than AppleScript. Alternatively, PyDT (posted here some time ago) might be useful as it wraps JXA inside Python, so you could use a single Python script, I guess.

Aside: If you’re using named rows in your SELECT, wouldn’t it improve readability to use these names also in outputScriptLine? AFAICT, the SQLite-Python binding seems to support that.

eric_g_97477 · December 5, 2023, 1:08pm

I am sure there are or will be many who would appreciate it if you were to take what I did and make it better.

chrillek · December 5, 2023, 1:31pm

~~I don’t have Paperless nor the SQLite data to work with. So, not a problem I want to solve. But I can nevertheless propose amendments, can’t I?~~
Ok, I took @eric_g_97477’s bait. Here’s my version in JavaScript.

(() => {
const app = Application("DEVONthink 3");
const targetGroup = app.databases['Test'].incomingGroup();
function addDocument(path, notes, date, tags, title) {
  const newRecord = app.import(path, {to: targetGroup});
  newRecord.tags = newRecord.tags().concat(tags);
  newRecord.creationDate = date;
  newRecord.name = title; /* that's _different_ from the path in DT! */
}
  
  
const queryString = `WITH DOC_TAGS AS (
		SELECT tags.Z_14RECEIPTS1 as ID, ZTAG.ZNAME as NAME  
		FROM Z_14TAGS tags
			JOIN ZTAG ON Z_PK = tags.Z_18TAGS
	),
	GROUPED_DOC_TAGS AS (
		SELECT DOC_TAGS.ID, GROUP_CONCAT( DOC_TAGS.NAME, '<<[]>>'  ) as TAGS 
			FROM DOC_TAGS
			GROUP BY ID
	)
SELECT 	ZRECEIPT.Z_PK as ID, 
		DATETIME(ZRECEIPT.ZIMPORTDATE + 978307200, 'unixepoch') as IMPORTED,
		ZRECEIPT.ZMERCHANT as TITLE, 
		ZCATEGORY.ZNAME as CATEGORY, 
		ZSUBCATEGORY.ZNAME as SUBCATEGORY, 
		GROUPED_DOC_TAGS.TAGS, 
		ZRECEIPT.ZNOTES as NOTES,
		ZRECEIPT.ZPATH as PATH
FROM ZRECEIPT
	LEFT JOIN GROUPED_DOC_TAGS ON GROUPED_DOC_TAGS.ID = ZRECEIPT.Z_PK
	LEFT JOIN ZCATEGORY ON ZRECEIPT.ZCATEGORY = ZCATEGORY.Z_PK
	LEFT JOIN ZSUBCATEGORY ON ZRECEIPT.ZSUBCATEGORY = ZSUBCATEGORY.Z_PK
`;
const basePath = '/Users/ck/Develop/test/Paperless Library.paperless'
const DBPath = `${basePath}/DocumentWallet.documentwalletsql`;
const curApp = Application.currentApplication();
curApp.includeStandardAdditions = true;
const rawDBResult = curApp.doShellScript(`sqlite3 -tabs "${DBPath}" "${queryString}"`, {alteringLineEndings: false});
const resultLines = rawDBResult.split('\n');
resultLines.forEach(r => {
  const resultColumns = r.split('\t');
  if (resultColumns.length === 8) { /* basic sanity check */
    const plDate = new Date(resultColumns[1]);
    const plTitle = resultColumns[2];
    const plCategory = resultColumns[3];
    const plSubCategory = resultColumns[4];
    const plTags = resultColumns[5].split('<<[]>>');
    plTags.push(plCategory, plSubCategory);
    const plNotes = resultColumns[6];
    const docPath = resultColumns[7];
    addDocument(`${basePath}/${docPath}`, plNotes, plDate, plTags, plTitle);
  } 
  })
})()

I tried that with two documents in Paperless, and it seemed to work with them. I didn’t bother to trim the strings coming from SQLite, but that’s trivial to add. Differently from the Python/AS variant, this script uses the Title of the Paperless document to set its name in DT – that seems a reasonable approach as DT separates name and path in two properties.

The script relies on the result from SQLite being separated by tabs, assuming that neither tags nor notes, titles, categories nor subcategories in Paperless contain tabs. If that assumption is wrong, one has to find another way to separate the result columns (in the worst case, printing a single value per line and separating result rows by an empty line).

Also, I used a fixed group to import the documents to (set at the top of the script) instead of the current group. But that’s just a minor detail.

If someone is wondering why the code doesn’t consider empty tag fields as does the Python version: I didn’t see the need for that as the methods used work just fine with empty tags.

nomanisan · December 17, 2023, 7:09am

@chrillek and @eric_g_97477 Thank you, your respective migration scripts look wonderful!

For those of us who are new to scripting, would you be willing to tell us step by step what we need to do – for example, amend this piece of code to reference our local paths, put the JavaScript in Location A, then get SQLite up and running (if necessary, and with any relevant specifics), run Code Block B in Script Editor, put any file listing in Location C, and finally run Code Block D in Python?

Personally, I am familiar with Python but I haven’t used AppleScript or JavaScript in 10+ years, and while I have used MySQL, I have barely (but recently) used SQLite.

Any volunteer help for newbies would be SO appreciated <3

chrillek · December 17, 2023, 8:29am

Mine is a stand-alone script. Copy/paste in script editor. Modify the path to the paperless database and the name/location of the target group. Set the language selector in script editor to JavaScript. Execute script.
SQLite is never „started“ like MySQL, it’s an embedded database.

nomanisan · December 17, 2023, 10:43am

Thank you!

The script worked great.

On my end, there were errors when indexed files were missing from the Paperless library. Creating a new Paperless library and importing the old one into the new one solved this issue.

chrillek · December 17, 2023, 11:42am

If you’re interested in learning more about scripting with/in DT, I suggest the “Automation” chapter in the integrated help. And then there’s

</publicity>

Tattle2th · December 25, 2023, 1:12am

I appreciate all of the above discussion, but am totally out of my element.

Over time , I’ve loaded more than 9000 receipts into Paperless. — created a few additional categories and am not literate with the approaches you’ve outlined above.

Is there someone who can ‘hold my hand’ and guide me to a new program- that will accept all of my old data, and work with my Fujitsu ScanSnap scanner?
Thanks!
Elliott