Reducing database size - the "rebuild" folder

Hello,

As my database had grown to a whopping size (c. 200gb) I decided to look into ways of reducing it.
Aside from running verify and repair, and backup and optimize, I have been batch reprocessing my ocr’d pdfs with great success (down to 100gb now!), by moving through each of the numbered/lettered folders when exploring the package contents of my database in finder, then running the pdf optimization process in Acrobat pro.

There’s one big (66gb) file in the database on finder, labelled “rebuild”. I was wondering whether this is normally present, what it is filled with, and whether I need it. Any advice would be appreciated.

One advice would be – don’t go into the database package and change files around. That’s a sure way to destroy your database. That the database is still operating is just luck. If you need to reduce file size, then from DEVONthink open the file in Acrobat, reduce it, then save it. It will be saved back into the database package. Let DEVONthink manage the file opening and closing.

I hope you had a backup of the database before surgery started :open_mouth:

Jim or Bill can step in here on the “Rebuild” question.

Thanks for the advice - I tested this on a batch of files that I backed up first, without any problems, oh and I’ve got a full backup of everything on time machine. I thought it wouldnt be a problem as the files arent getting moved or renamed. its a hellavalot faster, as Im not aware of anyway of doing this in batches from within the Devonthink application - we are talking 25000 pdfs so doing it one by one from inside DTPO isnt happening. I’ll let everyone know if I do mess things up and let it serve as a warning

DT allows you to reveal an item in a finder and appears to be designed to cope when a file is edited and saved in the same place. It certainly won’t cope with renames or new files.

I think you should be save to batch-process the files. A reindex afterwards might be good in case the OCR changes.

You could write an applescript which looped through things in DT and opened them in Acrobat, and remote-controlled it to optimize, but I don’t think that’s necessary.

Thanks Ive been through them all now, and halved the size of my database. Im still puzzled by the finder file labelled rebuild though, it hasnt been modified for over a year and is a lot of gbs. Any one know why this is generated?