Moving from Mariner Paperless to DevonThink

eric_g_97477 · December 2, 2023, 7:19pm

Apparently, MarinerSoftware has gone out-of-business, so I need to find an alternative.

I was wondering if DevonThink provided any features to make importing information from Paperless easy?

rmschne · December 2, 2023, 7:24pm

What options do you have in Paperless to export your data and files? what format? are the files unchanged on your computer without using Paperless so that you maybe can simply drag and drop? what have you tried that hasn’t worked as expected?

eric_g_97477 · December 2, 2023, 7:38pm

There aren’t any useful export options. There is one menu option to export to a CSV, but that is just a document containing some metadata about what is in the library.

The paperless document is a package and the format looks fairly straight forward and all of the documents are contained inside. There would likely be work to figure out how to associate the metadata with the documents. The metadata appears to be in a SQL document inside of the package.

I was wondering if the Devon software engineers have done any work to determine how to import the paperless stuff directly. It does not look like Paperless will do much to help out.

chrillek · December 2, 2023, 7:41pm

Do you mean a sqlite file?

eric_g_97477 · December 2, 2023, 7:49pm

yes. I can use a sqlite app like DB Browser for SQLite to open and browser the data, etc.

BLUEFROG · December 2, 2023, 7:54pm

Welcome @eric_g_97477

Sorry but no, at this time there is direct import from Paperless. The only options are whatever export functions are available to you in Paperless.

SlickSlack · December 2, 2023, 8:17pm

When I switched I did do an export of my at-the-time in progress Paperless docs and try to get some of the metadata out. If I recall correctly it involved the metadata from Paperless being stored in each PDFs key words (I think that’s what they’re called) and extracting that using scripting to get it into Custom Metadata with a TON of help from people here on this forum.

I still have about 4 years of Paperless libraries, that I never bothered with at the time, for old financial information, that would be great to turn into DevonThink databases.
Not crucial as they’re on the way to becoming older than the required record retention limit. But it would be nice to have the personal information hidden in there available to the great DevonThink search.
I’ll follow this thread in case anyone who knows how to get stuff converted.

Edit:
link to thread where I tied to work it out

rmschne · December 2, 2023, 8:24pm

Just wondering if the meta data for old info that important. sounds like just to get the documents source would be success.

eric_g_97477 · December 2, 2023, 8:29pm

What kind of import features does DevonThink offer?

Can I provide a file to import along with metadata (tags, etc.)? Information I know I need to maintain:

My custom title for the document
Date the document was scanned or added
Tags I assigned to the document

chrillek · December 2, 2023, 8:32pm

Then one can write a script to extract the data from the database.

BLUEFROG · December 2, 2023, 9:30pm

Hold the Option key and choose Help > Report bug to start a support ticket. Please do a sample export to CSV and attach it.

PS: Bear in mind, it is the weekend, so it may not be looked at until Monday.

SlickSlack · December 2, 2023, 10:02pm

I have been putting that off, assuming at some point I would just bring in the PDFs and that would be that. Hearing that Paperless is gone for good makes it more likely.
I will keep an eye on this thread because there’s still that part of me that doesn’t like to lose (meta)data and the work that went it to acquiring it.

rmschne · December 3, 2023, 6:37am

Perhaps someone will write the code for you? !

eric_g_97477 · December 3, 2023, 1:05pm

With 30 years of Software Engineering experience, I am capable of figuring out how Paperless is storing the data and get it into a format that DevonThink can import while preserving the metadata. I just need to know what that is assuming it is possible.

chrillek · December 3, 2023, 1:18pm

What what is? Please elaborate.

rmschne · December 3, 2023, 1:26pm

Well, let us know how you make out.

eric_g_97477 · December 3, 2023, 1:46pm

What what is?

How to supply a file with the metadata I need to preserve when transferring paperless documents.

SlickSlack · December 3, 2023, 2:05pm

I don’t think there’s any special file format in DevonThink, it’s more that the import process allows DevonThink to add the new files to its own database with associated metadata and file system.

For me it would be:
PDF with the fields from Paperless (various amounts or splits, VATs and sales tax, currency, tax catagory etc) mapped to
DevonThink Custom Metadata fields, probably with similar field headings.

Allowing for the fact that I know very little of software engineering I would infer from my little experience that anything you could write would read, either directly from the Paperless SQLite source or an exported CSV, via scripting within DevonThink (JavaScript JXA or AppleScript) and write to new files in DevonThink.

chrillek · December 3, 2023, 2:18pm

Basically, you’d

grab the PDF in Paperless
read the corresponding metadata in the SQLite database
import the PDF into DT, this creating a new record there
set this record’s (custom) metadata to the metadata you extracted from the SQLite database

You can use JavaScript or AppleScript for that. Not a biggy.

eric_g_97477 · December 4, 2023, 2:36pm

If it helps anyone, as I evaluate how well a transfer from Paperless to DevonThink will go, here is a SQL query for Paperless that grabs all of the information I need.

WITH DOC_TAGS AS (
		SELECT tags.Z_14RECEIPTS1 as ID, ZTAG.ZNAME as NAME  
		FROM Z_14TAGS tags
			JOIN ZTAG ON Z_PK = tags.Z_18TAGS
	),
	GROUPED_DOC_TAGS AS (
		SELECT DOC_TAGS.ID, GROUP_CONCAT( DOC_TAGS.NAME, '<<[]>>'  ) as TAGS 
			FROM DOC_TAGS
			GROUP BY ID
	)
SELECT 	ZRECEIPT.Z_PK as ID, 
		DATETIME(ZRECEIPT.ZIMPORTDATE + 978307200, 'unixepoch') as IMPORTED,
		ZRECEIPT.ZMERCHANT as TITLE, 
		ZCATEGORY.ZNAME as CATEGORY, 
		ZSUBCATEGORY.ZNAME as SUBCATEGORY, 
		GROUPED_DOC_TAGS.TAGS, 
		ZRECEIPT.ZNOTES as NOTES,
		ZRECEIPT.ZPATH as PATH
FROM ZRECEIPT
	LEFT JOIN GROUPED_DOC_TAGS ON GROUPED_DOC_TAGS.ID = ZRECEIPT.Z_PK
	LEFT JOIN ZCATEGORY ON ZRECEIPT.ZCATEGORY = ZCATEGORY.Z_PK
	LEFT JOIN ZSUBCATEGORY ON ZRECEIPT.ZSUBCATEGORY = ZSUBCATEGORY.Z_PK

The reason for the 978307200 is Paperless is using CoreData (CD). CD timestamps start from Jan 1, 2001. However, the Unix Epoch starts in 1970. 978307200 converts the CD timestamp to a Unix Epoch timestamp.

The one thing I am finding odd is that Paperless is showing the timestamp of a document as 3/12/2019. The CD timestamp in the database is 574130186.540019 which is 3/13/2019. Why it is off by one day, I do not know. The vast majority of the dates shown by Paperless are correct. I am guessing this is a Paperless bug.

[UPDATE: initial query wasn’t pulling the local path to the actual document…this one does]