Uploading Images from the archives into DT

kburlingham · November 15, 2023, 12:04am

Hi all,

I’m going to attempt to upload hundreds (thousands?) of image files from graduate school research into DT, but I want to, hopefully, do it correctly the first time. I’m looking for tips for practiced users. I’ve been using DT for about a year but only for new content.

I would like all the files converted to PDF so that they are searchable (or somewhat searchable), I’d also like them to be automatically tagged so that I know archive, collection, box, etc.

Is there anything else that I’m not thinking about? I’m hoping I don’t have to go back into the image files after their uploaded and individually do something to them that could have been done in a batch.

Thanks in advance!

rpallred · November 15, 2023, 2:04am

Images of what?

What are you hoping to be able to do with the images after they are in DT? I see that you want to be able to search by “archive, collection, box”–but search to do what?

BLUEFROG · November 15, 2023, 2:27pm

I’m going to attempt to upload hundreds (thousands?) of image files from graduate school research into DT…

Note: You do not upload image files into DEVONthink. This is not semantics or being pedantic. Uploading is a very specific action involving transmitting data to a remote resource, like a networked disk or online server. You add documents to DEVONthink via importing them. Just so you know.

I would like all the files converted to PDF so that they are searchable (or somewhat searchable),

OCR does not happen automatically when importing document unless you employ a smart rule. However, you can select and do OCR on batches of images, as needed. And no, I wouldn’t batch 1000 images at once.

I’d also like them to be automatically tagged so that I know archive, collection, box, etc.

Tagged on the basis of what criteria?

kburlingham · November 16, 2023, 4:22pm

Thank you for the clarification re: uploading vs. importing. Correct, I am importing.

As for the the tagging question maybe some background would assist:

I’m an historian. These images are photos I have taken at multiple archives. The images are (somewhat) organized in folders or otherwise marked in the image itself with the archive/collection/box info I need for bibliographic/citation reason. I need to preserve that info but I’d love to attach it to the image itself so that when I do a search in DT, I don’t have to go back into the file system to figure out where the document is from. I also think it will just help me to ensure I don’t loose track of what came from where etc.

Perhaps tagging isn’t the best idea for organizing, and maybe there’s another way? I’ve used the DT Historians Annotation script but it needs to be created for each individual document and I really don’t want to do that…I would never get to working with them! So looking for something that can be done on import.

So does this sound correct:

Smart rule: change JPEG to PDF and OCR PDF
Smart rule: that I change with each batch depending on archive/location with an annotation? Can I do that?

Apologies if this is all very basic questions. I’m a novice. But the ability to OCR these images is a game changer for me. Thank you DT!

kburlingham · November 16, 2023, 4:24pm

Hi! I wrote a longer answer below but these are images of documents that I’ll use for analysis and writing…in other words, primary sources. I need to know the location information for bibliographic reasons.

BLUEFROG · November 16, 2023, 4:48pm

No worries and we were all beginners at something once!

Smart rule: change JPEG to PDF and OCR PDF

Not quite.
OCR produces a searchable document, typically a PDF, so it would be one less step.

Smart rule: that I change with each batch depending on archive/location with an annotation? Can I do that?

By location, are you referring to a geographic location? And yes, you could have a smart rule that you change some action as you’re processing documents you’re importing. You could change tags or file things in a different location.

Here’s a simple example and the result in the database…

So any image dropped into the Inbox of this database would get OCR’d, tagged, and filed into the group in the root of the database. You could obviously change the tag and filing location at a different place, as desired.

kburlingham · November 16, 2023, 5:48pm

Thank you for this!

OCR produces a searchable document, typically a PDF, so it would be one less step.

So OCR is the conversion. Awesome.

By location , are you referring to a geographic location? And yes, you could have a smart rule that you change some action as you’re processing documents you’re importing. You could change tags or file things in a different location.

Regarding “location” by that I mean which archive (ex. National Archives) and where in the archive ( ex. Sam Smith Papers, Box 201). But looks like I can just use that same tag function you put above and add more of them for collection and box.

Amazing. Now I’m going to try it. These images have been waiting for this technology…or so I will tell myself since they’ve been sitting on an external hard drive for 15 years.

BLUEFROG · November 16, 2023, 5:57pm

You’re welcome

PS: If you’re working through a box, you could add tags for the location and the box, e.g., Sam Smith Papers and Box 201.

PPS: If you need some more specific assistance, you can hold the Option key in DEVONthink and choose Help > Report bug to start a support ticket.

SlickSlack · November 16, 2023, 9:12pm

Don’t want to muddy the waters here with another suggestion but I would at least investigate using Custom Metadata for some of those fields instead of tags. (or both, whats a few bits here and there </sarcasm off>)
I am sure both are scriptable, but I find a pulldown or controlled and limited list for say, the Archive category, would keep things tidy and the hotkey Ctrl-2 pulls up all your Custom metadata where it’s easily viewable, more-so than a stack of tags.
But that’s my opinion based on how I like the data presented. As they say, your mileage may vary.

kburlingham · November 16, 2023, 9:40pm

I’m intrigued. Can you explain how I might do that? Would the pull down have to be chosen for each JPEG or could it be by batch?

Thanks!

SlickSlack · November 16, 2023, 10:49pm

With the proviso that you should TEST TEST TEST all the metadata workflows (tags and Custom) before you commit to processing the whole shebang, yes.
Yes you can use the “Set” variety of Custom metadata, which as the manual says on page 239

Set: Similar to the Single-line text, this shows a dropdown with values pre-defined for the field in the Data preferences. However, new values can’t be entered outside the preferences

You can go into prefs to add new Archive entries (for example) to the list and then ingest a bunch of JPEGs and OCR, Convert to PDF, select all and choose the Archive they came from in the Custom Metadata panel. (Ctrl-2).

You could also add custom metadata for the location (Sam Smith) and Box Number(001 or 1 or ?) .

You want to be careful to consider if the hierarchy levels are always named and categorized the same way from Archive to Archive. Do they all use the same hierarchy (Archive-Subject-Box#) or are there other ways of IDing things (i.e. Box Catalog number AGFDJ3485897) with QR codes or something like that.

In the archiving of large film, video, and data sets we use LTO tapes that are all barcoded with unique serial numbers from the manufacturer that you can either use in small libraries or replace with your own library system numbers if you have that wherewithal.

Did I mention testing your workflow on a subset of data? Maybe do a box from each archive, go right to the end of the process with those boxes. Search, muck about, move things around, try to break things. Then you can more confidently begin to spend the time ingesting the whole set with a little bit of peace of mind instead of maniacally crossed fingers and misplaced hope. So yes, TEST things first.

kburlingham · November 21, 2023, 8:52pm

Thank you for this! I don’t totally understand it all but I’ll try and follow the steps and see where I land.

And yes, TEST!