Help/Clarification before switching to DT pro

Hi,

I have discovered recently DEVONthink and I’m really impressed with it. In fact it could be the missing link I was looking for to organize my digital life (except for music and photos where iTunes and Darktable work great for me).

Before purchasing DTpro, could someone clarify the following point for me.
I read many posts in the forums but the following points are still unclear for me.

File integrity:
I have read in some forum posts that DT perform a verification against checksum for files which are imported. Is that correct?
I understood that it does not apply to files which are indexed. But say I import two pdf files A.pdf and B.pdf, will their integrity against bitrot will be verify when clicking the Verify command?
I want to make sure of it before I re-organize my content into several DT databases.

Export database structure:
I see that it possible to export the database content with keeping the group hierarchy as folders, using the command Export > Files and Folders.
This is great.
Can it be automated ?
My idea would that periodically, after ensuring that the files are valid (bitrot) to export any database back to the file system (and keeping the folder/group structures) to an external drive. This is in the event I do not have access anymore to DT or a mac, I could still recover my files with minimum effort using a Linux pc for example.
This is complementary to a proper backup solution of the database itself.

Database recommended size:
I plan in splitting my digital content into several databases based on a relevant subject. Some of the content are videos or 100+ pages PDF with images. In that context, what is the maximum recommended database size? I think one the database might be 10-20GB (due to some videos).

Thanks in advance

that DT perform a verification against checksum for files which are imported.

Development can respond as well but DEVONthink doesn’t checksum files you put into a database.

Can it be automated ?

Yes, this could be automated, likely via a Reminder script.


Database size has been often discussed - here and in the documentation. See this post…

Thanks @BLUEFROG for your answer.

If no checksum is performed on the imported content, what does the verify command ? Verify only database own files (anything not imported by the user such as index or tags)?

Also how to ensure that content does not get corrupted in the database? (Again I’m not speaking about indexed files but imported ones)

Verification checks the integrity of the database, especially regarding missing or orphaned files.

Corruption of content in the database would be a highly abnormal thing, not a primary concern. If it was, you should have the same concern about the macOS filesystem, in general.

I agree.
However if the content of the database is not checked against corruption then the backup database cannot be trusted entirely. It might carry the corrupted files if it happens, granted this is very rare but the probability is not nil, hence the need to check against file corruption when enclosed in the database (outside the database, it is the user’s responsibility)

I usually solve the corruption case with frequent backups and time stamps. Backblaze also has a retrieval service (free up to one month going back in time). But yes, corruption happens and usually it’s the job of the file system to handle that - few apps add this in on top of the file system services.

Yes, I guess frequent backup with time stamps might do the trick. Thanks.

I’m currently weighting the pro and cons of importing vs indexing the files in DT pro. I could importing files that will be contained in a small database (e.g bank notes and receipts) and do regular backup with time stamps. While the most critical files (and bigger database) just index the files and perform the checksum in the finder before doing a backup… just some thoughts as I’m still weighting the pros and cons of going indexed va imported.

Are you aware that a DEVONthink .dtbase2 file is actually not a file but a package? A structure of folders and sub-folders that contain the actual files you moved into the database (sorted by kind, not in the way you sorted them in the database) plus the database specific data? A corruption of those files would actually mean a corruption on the OS file system level.

You can look into the package by right-clicking on it, as you probably know. CAVEAT: Just look, never change anything in the structure, don’t add or delete because that would be a definitive way to damaging the database.

Hi, thanks. Yes I was of aware of that.

I have been looking at solution and found a python package that compute checksum and store them in a database (so it won’t touch the inside of a DT database). I need to check if this actually works. Will try tomorrow and post my results here.

The python package:
https://github.com/ambv/bitrot