Advice streamlining after downgrade to lower specs

dennisrhidalgo · January 8, 2018, 1:28am

Hi everybody,
A series of unfortunate events have forced me to switch from the greatest MacBook ever to one with lower specifications. The reason I stayed with a Mac was simply so I could continue using DEVONthink Pro.

I come to you for help to create a computer environment in which I could still use DT without getting it crashed every 5 minutes.

These are the new specs:
2.6 GHz 8GB 251GB SSD
https://support.apple.com/kb/SP703?locale=en_US&viewlocale=en_US

1- My DT databases occupy about 500 GB, so, the first thing I am doing differently is to access them from an external drive.

2- I will only install and run apps that are essential for research.

How much should I keep always free from the hard drive?
What else you recommend?

Thanks in advance.

BLUEFROG · January 8, 2018, 2:44am

The recommended minimum has often been cites as 10% of the drive’s capacity, so 50GB for a 500GB hard drive. I think that’s not a bad standard to adhere to.

I’m not sure why you’re mentioning “crashing every five minutes” but that certainly wouldn’t be expected behavior from any machine.

Also, if your databases are 500GB total…

How is that split up - 5 100GB databases, 10 50GB databases, …?
What kind of content are filling them - mostly text-based files (PDFs, text, etc.), most media files (video, audio), or a mix?
Do you have these open all the time, and do you NEED to (and seriously ask yourself that)?

dennisrhidalgo · January 8, 2018, 3:40am

@BLUEFROG Thanks for replying.

I am dealing with a tinier hard drive now: 251, but I will still try keeping 50GB free at a constant.

Even with a powerful computer (latest MacBook with 16 GB ram, etc.), I had to close a few programs to open and have DT work smoothly. But perhaps I overextended.

I used to have two browsers running with 20 or more tabs each, Evernote, Zotero, FineReader OCR, email app, MSWord and other smaller apps functioning at the same time. To have DT open and work properly, I had to close one browser, reduce the tabs opened in the one standing to 4, close Zotero, Evernote, and FineReader OCR.

DT always opened with the essential 9 databases because closing them and opening to search through them was a pain. Half of them are between 35 to 65GB. Only one is slightly above 100, and I hardly use it these days (otherwise, I would split it). I learned in one thread here (with you) that the ideal was below 45GB or something like that. Is that still current?

99% of the content is text in the form of PDF, and a few HTML or Word documents.

If I have no option, I would have to reduce the open databases, and I may find the ideal balance by trying different numbers at various times.

What do you think? I wonder what others have done when working with databases of this size in a machine with similar specs?

Thanks again

BLUEFROG · January 8, 2018, 6:09am

I’d say this is a candidate for closing it. There is no point in using resources for databases that are rarely used, especially if you are running into issues with performance.

I’m not sure what thread you are referring to but the size in gigabytes isn’t the critical number. If you check out File > Database Properties > … for a given database, the number of words / unique words are more critical. On a modern machine with 8GB RAM, a comfortable limit is 40,000,000 words and 4,000,000 unique words in a database. (Note: This does not scale in a linear way, so a machine with 16GB wouldn’t necessarily have a comfortable limit of 80,000,000 words / 8,000,000 unique words.) This is not a hard and fast rule, but a nice limit balancing information availability with performance.

So text content in a database is far more important.
If you have a database of images, it will have very few words but be large in gigabytes.
If you have a database of emails, it will have many words, but may be smaller in gigabytes.
The second one may perform more poorly as the number of words increases beyond the comfortable limit.

dennisrhidalgo · January 8, 2018, 3:28pm

This is very useful information. I checked and adjusted the databases to come nearer to your suggestions.

But there is significant fluctuation in a few databases. I could create databases resembling your ideal numbers easier with documents that contain images/pdfs with uneven and unclear OCR (historical documents). For example:

3,500,00 unique words and 37,000,000 total words.

But databases with documents in modern typesets (clear OCR), the numbers were rather off:

1,200,000 unique words, 110,000,000 total words (35 GB).

I found this last example a bit puzzling. This particular database gives me no performance issue, that I could tell. Would splitting it further make it work even better?

I suspect the difference occurs because the documents with problematic OCR (documents that cannot convert into clearer typesets) show lots more false words.