Long-term preservation of digital data

I do not know whether or not this is the most appropriate forum for this but I want to start a discussion to get the views and suggestions of members. I am primarily concerned with research based material rather than day-to-day material such as receipts etc.

I have lately been increasingly concerned about long-term preservation of the digital data I have accrued and especially what is going to happen to it when I eventually “leave this mortal coil.” I still believe that the only medium that has a chance of lasting for centuries is paper and, although I do have a respectable archive of paper based research material, my DEVONthink Pro Office database includes scans of that paper material and importantly data that I have obtained from the internet and scans of material held in other archives that have been sent to me.

What, then, should we do with this material? (There is an interesting article here that may be a stimulus to people’s thinking: blogs.loc.gov/digitalpreservatio … -archives/) There are major problems, as a correspondent of mine connected with the Society of Genealogustss pits it< “archiving our data is clearly a sensible thing to do, but we need to find some universally accepted way in which it is stored. Just to transfer all of our data onto a DVD will be of no use in future because the formats will become redundant. The operating system has also to be considered. Apple Macs or PCs. Which versions of software are being used, and so on.” (Peter Amsden).

So, what do people think about this?

We would just be guessing, so why even discuss this? BTW, even DVD disks go bad in time (and a short time at that). I believe that the best we can do is migrate data whenever new technology reaches its tipping point. I’ve gone from LPs to several forms of tape to CDs to current digital media when maintaining my music collections. I doubt we’ve reached that stabilizing point, but who can say what the next media will be?

Thank you for your contribution, I shall try to answer your question. Why discuss it? Well that depends on whether or not you think that the research, and don’t forget it is research I am primarily concerned about, (as perhaps opposed to music collections) is important or not, and only you can decide that! Personally the research I am doing, predominantly historical, I believe is important enough to be preserved so that future researchers can access it and thereby gain from it.

I didn’t mean that your research isn’t worth discussing or considering, I mean how can we hypothesize what technology is around the bend and thus discuss ways to save our long-term data. I don’t see how we can plan a long-term solution to how to save or preserve our important data.

My feeling is that we have to take this “a day at a time”.

I think it’s an interesting subject because it points out the frailty of digital media. We have papyri that are literally thousands of years old and CDs that won’t last 30.

I can’t claim to be a prophet about any of this but there are plenty of myths about the digital age that I would like to see dispelled. 8^)

I suggest that you discuss this topic with the current representatives of potential future users of your research and ask them which digital formats and physical media they are looking at as their preferred archive strategies. If those users do not have any preferences, the next best thing is to work with them to establish preferences.

If there is no clearly defined audience for your research materials, a practical answer must be based on reasonable assumptions about techniques and technologies that are expected to survive into the reasonable future.

One way of evaluating these technologies (in absence of specific requirements) is to examine standards that are being used and developed today. For example, AIIM, the Association for Image and Information Management (aiim.org), promotes PDF/A as a long-term digital format. (See aiim.org/Research-and-Public … _Standards. Unfortunately, the publications that detail these recommendation are available only to AIIM members.)

Choice of physical medium is tricky, because every physical medium is likely degrade over time.

Probably the best hedge is to archive in multiple non-proprietary formats that can be read on multiple platforms (PDF, plain text, image, etc.) and on different media (magnetic, optical, paper, as needed), using the best quality available (or affordable) such as acid-free paper and optical media that use gold instead of volatile dyes, and to avoid relying on cloud-based approaches that might or might not be in business for the long haul, as well as complicated hardware such as NAS devices, the thinking being that simpler is better (though these could of course be included as an alternate medium). In addition, multiple copies of the archived materials would be made, and stored in multiple locations that would be protected from calamity.

Before archiving, it will be useful to validate that the files are legible and not corrupted, and the copies of the archives would also be verified. And the archive would need to be updated over time as it changes (as files are edited, added, deleted, and reorganized), and perhaps even recreated if recommendations about formats and media change significantly. An index of some sort to aid in identifying what information is where might also be useful. File names should be cross-platform compatible as well.

All this could be quite an undertaking that might make backups and synching look trivial…practicality and cost would pf course influence how much of this work is done.

Finally, if your research is significant enough to outlast the the digital formats and media that you select, I think that you can reasonably assume that the future users will take action to preserve it by migrating it to contemporary formats and media that are appropriate for longer-term preservation. This point is often overlooked and instead we tend to assume that the techniques and technologies that we use today for archiving are their absolute final implementation. In the final analysis, we can only do what we believe is reasonable, and leave it to our descendants to continue (or not) to manage the archives we leave behind.

Some food for thought:

On a side note: I recall some years back reading about the dilemma faced by the designers of the U.S.'s proposed Yucca Mountain nuclear waste facility regarding preserving important information about the waste that would have been stored for thousands of years. They identified an approach that IIRC used content etched in native formats (i.e., text and graphics) into dense metal or ceramic plates, perhaps microscopically in order to achieve density of information to reduce the physical bulk required (in which case the archive would have included microscopy equipment to be used to access the information and pictographic instructions). Of course, this is an extreme case, but it was an entertaining read. I’ll post back if I can find it.

I’ve thought about this issue mainly in the context of estate planning and not making the future more difficult for my successors in winding up my business or estate matters. It’s motivated me to be increasingly wary of clouds and backup services. If I had some valuables, I wouldn’t ask Amazon or some sugar-sync guy I’ve never met (and never will or want) to look after them for me. The same for my data. So, that leads to a commitment to controlling my own physical archives, and thus to Andrew-Bede’s question. I don’t know anything about the future (which is usually similar to the past where things decay) but I do know that I can go buy a shiny new modern physical drive every year or 18 months and move my data backups over to that device (or, better, two or three for redundancy and protection against contingencies) and keep to up a regular cycle of moving from old to new devices. If I keep up with this (which is not hard to do, just hard to remember to do) then when the trumpet calls and my successor steps in, (notionally) nothing will be lost.

My reliance on software for event alerts has certainly increased with age, when remembering to create them. :slight_smile:

I second that! But, and this is pertinent to the discussion, I have only just got my computer back because of a problem that turned out to be simply a duff charging unit, the point is that because I had not back up everything for 8 weeks or so I was thinking I might lose data because of a dead laptop. You can imagine my relief when it became obvious that the charger was the problem…and how quickly I backed up when all was up and running! We never know when things may go wrong and this highlighted to me the problem, not only did I not have my research data, I also did not have my diary commitments as they too were all on the laptop, thank goodness for my wife’s diary which had the immediate important ones. So perhaps in some cases paper is beter than electronic.

Amen! I prefer a book in my hand to an iPad or computer screen. (Sorry, a wee bit off-topic.)

It’s funny (in a sad way) how we remember, or worse start thinking about backups right when we need them the most. That’s part of the reason we advocate TimeMachine. It may not be perfect in everyone’s eyes but it is an easy and free solution. You just need to remember to plug in a drive and let TimeMachine do its thing. I figure if you can remember to put your pants on before leaving the house, then this should be cake! 8)

One of my favorites: “Every time I get it all together, I forget where I put it.”

No longer the case when armed with DT!