DT files in 20 years

True, true.

Oh, that reminds me to ask: whenever I’m attaching a file that’s in DEVONthink, e.g., to share one of my database PDFs in a web app, I see the package contents folders, because I drag from DEVONthink into the other app’s attachment dialog, which reveals the file location in the package in the attachment dialog. Is there a better way?

You could drag and drop to the Finder but I wouldn’t call that better since drag and drop from DEVONthink is simple and efficient. And since it’s only exposing the internal directory an an Open dialog, there’s no real danger.

1 Like

Bluefrog:

I will strongly agree with your advice not to mess with the internals of the database. I mistakenly deleted a database entry and it took me a very long time to figure out how to replace it. I’ll never touch an entry in the database again!

1 Like

Thank you! I’ve had a busy day and I’m just now getting around to replying.

1 Like

Thank you. I will go have a look at what the Library of Congress is doing.

Interesting we are all assuming docx and pdf etc will be perfectly readable. I had to get a Word 5.1 document to open not long ago, that was an interesting exercise (at least docx has some xml in plain text hidden within the file). If files are opened and re-saved at intervals (and get updated) then this seems reasonable but if they’re just left there, this isn’t guaranteed…

1 Like

LibreOffice can do that - for free

2 Likes

Not when the ancient greek font no longer exists except on a few random hard drives of old guys like me, I suspect.

1 Like

Font converters for that exist.

Won’t be instant like other documents but it’s certainly doable.

It was a general point, illustrated with a single example. Other examples are available.

I think the point is, things might not be “perfectly” readable, or retain their complete original visual properties (font, layout) - but a lot will still be retrievable.

Ok

An even simpler solution would be to buy a 1990s vintage computer - easily done on ebay. Your old version of Word will still run on it.

1 Like

The XML is cleartext. An XML parser can read it and do with it whatever you want. Your point was valid for doc, but is mute now.

And the font can be replaced, too.
Of course, standards evolve. But if the user base is large, someone will be interested to find a solution.

You’re probably not finding much because you’re treating the *.dtBase2 file as a proprietary black box, when in fact it’s more like a shoebox; DEVONthink stores your documents inside it in their original formats.

As someone already pointed out in this thread, whether your files remain readable in the long run depends on the availability of the software used to create your documents

I’m seeing three key factors:

  1. Access to a Mac
    1. Outside your control.
  2. Availability of the original applications
    1. e.g., Acrobat, Word, Excel
    2. Outside of your control.
  3. Your ability to preserve the data itself
    1. This is within your control.
    2. Follow the 3-2-1 backup rule and use RAID1 for basic two-disk setups.
    3. or RAID6 for more complex storage arrays.

3-2-1- rule: Three copies of your data, on two different media, one copy off-site.

Of course, if you’re aiming for true archival longevity, only parchment has a proven track record of surviving a few thousand years.

Might be time to start copying your most important files by hand… :wink:

5 Likes

Zoran:

Excellent! This is exactly the answer I was looking for. I greatly appreciate your contribution!

1 Like

The assumption of long-term readability is deceptively optimistic, especially with proprietary file formats. DOCX and PDF may feel stable now - and a lot of techbro’s call it even a worldwide, industry standard - but they’re still bound to the whims of software ecosystems, encoding standards, and even corporate priorities. Your Word 5.1 experience is a perfect example: once ubiquitous, now borderline archaeological.

The embedded XML in DOCX is a saving grace for forensic recovery, but only if you know how to dig. PDFs are worse in that regard: often binary, versioned, and dependent on rendering engines that evolve or disappear. Periodic re-saving helps, but it’s a form of active curation, not passive preservation.

This is why archivists lean toward open, text-based formats and why institutions invest in migration pipelines and format registries. Leaving files untouched for decades is like burying them in amber: they might survive, but you’ll need specialized tools to extract them.

I suppose you could read my post as a sales pitch for open standards and Open Source in general.

On a side note, here’s a reminder that nothing is eternal, even if it has survived millennia: Cuneiform - Wikipedia and The extinction of cuneiform writing : CSMC : University of Hamburg .

It seems to me that the challenging/unknown part is survival of the data itself - Will a thumb drive or SSD retain data for decades, will Dropbox/GDrive/iCLoud retain data or even exist decades from now, etc.

The availability of current computing equipment as vintage used computers into the future is nearly certain.

2 Likes

Docx and PDF are published standards. As long as we know to read the (binary) data, we can write software to decipher it. Provided that we’re also able to read and understand the standards, that is.

3 Likes

I have a rather different attitude to this problem. I’m assuming that in twenty years: a) I will certainly be dead; b) nobody else will have the remotest interest in the files I have collected.

Some months ago a man who lived opposite me died. His family hired a skip and threw everything from his house into it. It made me reflect on how differently we value things.

I have done a lot of historical research in archives, so I appreciate the value of preserving documents. But when it comes to my own material, I don’t expect it to be useful to others. But that is just my personal judgement about my own situation.

6 Likes

That’s a valid point. I’ve admittedly taken it for granted, since periodic format migration is baked into my backup routine.

Following the 3-2-1 rule (or even a partial version), any storage upgrade typically includes a refresh of the underlying format support, if only because newer media demands it. But you’re absolutely right: if files are left untouched for years, readability isn’t guaranteed. Iomega Zip disks come to mind…

I’ll revise my post to reflect that.

Digital preservation always reminds me that “technological progress” doesn’t necessarily mean durability. It depends on your vantage point. We chase performance - faster, smaller, cheaper - but long-term storage gets sidelined. Hard drives have mechanical failure points, SSDs and NVMe degrade silently, DVDs and tapes are clunky, and that’s before we even get to bitrot or data corruption.

And then there’s the psychological toll: the low-grade anxiety that your data might vanish, and the constant vigilance to prevent it.

The contrast is striking. I have a box of handwritten documents from my great-grandfather, dating back to 1840s; still perfectly legible. Will the digital files I create today survive the same span? I’m not so sure.