How to debug slow PDF load times?

I have a couple PDFs that take really long to open. Clicking on them in DT on my Mac takes legit 5-8 seconds of DT freezing until the app is open. Once it loaded once, it’s fast for a while until I restart my machine again.

I couldn’t figure out which kinds of PDFs cause this and which don’t. Sometimes some specific OCR scans seem to take a while to load, sometimes exports from specific apps, but couldn’t find a clear pattern.

What can I do to optimize this or figure out what’s wrong? Can I process the PDFs in some way that makes them easier for DEVONthink to handle?

OCR settings are:

I keep the dpi high to allow reprints, but also to re-process a PDF when OCR technology becomes better

I’m not sure why your PDFs load slow, but MRC compression takes a lot of time to be decoded, more if it uses the extra font antialiasing. Opening an optimised MRC PDF has significant time differences between Intel Mac and M1 Mac, as the second seems being using part of the silicon to decode it.

It seems DT OCR with “Compress PDF” does not use MRC, of if it uses it, it does not use the last versions (ABBY does not integrate their last versions in their SDK to avoid compete with their own product, that BTW, is a piece of crap under macOS does not matter if it is Intel or M1, the good version is Windows one).

Said that, in my experience, DT OCR technology is at par as other macOS OCR programs.

Take in consideration that first PDF load (as any other first time run after restart) needs to load all underlying frameworks, and verify against Apple Notarisation servers (that not always is fast), more if your disk is mechanical.

I am on m1 actually. I do have a standalone FineReader license that I remote-controlled with AppleScript directly from DEVONthink in the past, but figured DT3 is now good enough.

I’ll see if I can dust that out and give MRC a go! (I remember faintly that I wasn’t a fan of how MRC ends up rendering the text on top of PDFs :/)

Take in consideration that first PDF load (as any other first time run after restart) needs to load all underlying frameworks, and verify against Apple Notarisation servers (that not always is fast), more if your disk is mechanical.

Most PDFs don’t have this load time though, it’s only a handful that take a loooong time the first time I open them so I don’t think it’s the framework loading

The obvious idea would be that they are biiiig too?

Should have included that in the first post but no, they are all between 70kb - 250kb in size. Nothing enormous

Can you share a copy of a PDF that is slow to load

Whether MRC is used is or not is determined by ABBYY on export of the PDF.

Alan, do you know if DT ABBYY embedded version uses MRC or not?

I don’t know for sure, however the SDK defaults to ABBYY deciding if the MRC is used. For B&W docs it is not used and for colour it will determined by a number of different factors, although it is not detailed what these factors are.

1 Like

I think maybe something else is going on here. I just had a 20-30s load time for a simple PDF that I downloaded from the internet on my M1 mac, (after DEVONthink was running for a longer time, with the mac coming from sleep/display off after around 14 hours)

pass-personalausweis-antrag-erwachsene-data.pdf (110.6 KB)

I got myself a new macbook with m1 max chip, installed DEVONthink and everything is snappy, including that PDF I just sent. I’m going to try to reinstall DEVONthink completely and resync from scratch.

Maybe concurrent activities of other tasks (e.g. synchronizing) or other (system) apps caused the delay? Another possibility is that virtual memory was necessary and slowed things down. How much RAM does this Mac have?

This is an older thread but I re-ran into the same issues. I re-installed my M1 mac mini and DEVONthink was super responsive and fast (with the same databases I always use). Then over time, I noticed that it slowed down, and now clicking on PDFs again takes a few seconds for the first time until the preview appears.

It looks like this is something that happens slowly over time for whatever reason. Tried the usual things like clearing cache, optimizing the DBs, etc.

I know how hard it is to debug this and I’m not sure what I could provide either, but just wanted to write an update of how things are going, in case someone has an idea

How large are these documents (number of bytes/pages)? Any concurrent activities according to the Activity window?

What is in the File > Database Properties for the database you’re having an issue with?

The files are all within the 500-800 KB range, but it’s happening on most file types independent of size. I have a PDF that’s 58KB that’s also now taking 1-2 seconds to appear, sometimes longer when the machine has been idling for a while

Is there anything happening under the hood that would explain this happening over time for me? Something that isn’t there when the machine is freshly installed?

It’s probably helpful to mention:

  • the exact version numbers of macOS and DT that you’re using; and
  • whether it also happens when you try to open pdf files that are not in the global inbox (why do you apparently have 108 pdf files in the global inbox? :smiley: )
  • what are the Database Properties of the relevant database (in other words, how much you have in the relevant database). (I’m assuming it’s not happening only in the global inbox.)

Also, are you saying that this is something that has just started happening and didn’t happen previously?

Stephen

2 Likes

How much time does Preview.app need? And what’s the page count of the documents? Thanks!

1 Like

DEVONthink version on all of my machines is 3.8, I usually upgrade right away to a new version

It was counting files in the trash as well and I haven’t cleared them out in a while :sweat_smile:

My most used database is this one:

All (most) of my PDFs have 1 page, max 2. It’s usually PDF scans of physical documents, then OCRd with the built-in tool.
Preview opens everything pretty much instantly, no matter if it was lagging/hanging in the DEVONthink preview. (I didn’t mention this before, but I have the preview open all the time in split view, it’s my primary way to look at documents rather than CMD-O to open the doc in the DT viewer)

It’s very strange because right now it only happens on my M1 mac mini, while my M1 Max Macbook Pro and INTEL Macbook Pro are snappy and fast. (Yes I was surprised too that currently my Intel Macbook feels snappier with opening stuff in DEVONthink than my M1 Mac mini), but I do remember that before re-installing my Intel machine it had similar issues

Yes correct. It’s a bit like performance gradually got worse over the course of weeks/months despite the database being roughly the same.

Maybe one thing worth mentioning is that my Mac Mini is basically a mini-homeserver that’s on almost all the time, with usually 1 quick manual restart every 1-2 days. I have DEVONthink open all the time so that the server can sync with my other devices through bonjour, but also keep iCloud, etc. up to date. Before I had this mac mini, my Intel Macbook was on most of the time, hooked up to a display.

I want to believe that this has something to do with me leaving it on all the time but I have no data to back this up yet.

It could also be something else on this server causing this issue, but it really only happens in DEVONthink, everything else is the same as always

Is there any background activity (see Window > Activity) on your M1? And do both computers use the same preferences (see Preferences > Files > Multimedia) and a similar window layout (e.g. same main view, preview and inspector)?

Sorry for the late reply from my side. There are no background activities on either machine, and yeah I use the same window layout everywhere

I tried to click around more and see if I could trigger it manually somehow in a reproducible way, and think that the See Also & Classify window is causing a lot of the hanging. (I’m guessing the results aren’t loaded async but the UI blocks until all results are returned)
Hiding the pane makes things a good bit faster, though not instant yet. I was just used to always having it open.

I closed a few bigger databases, one that has lots of smaller files from things like mailbox imports, and it makes Classify faster, and as a result also loading times when the side panel is open.

It’s still odd that things get slower over time when the database size is the same on reinstall and later, but if those results are cached it would explain why clicking on the same files again doesn’t cause a freeze / beachball.

Can I do anything to help DEVONthink with classification? I already excluded folders like mailbox dumps from Classification
(By the way, a built-in recursive ‘exclude from classification’ would be very nice :smile:)

Also, I noticed that when I leave DEVONthink open for a while (days), the memory usage climbs up to ~2gb until I restart it, up from around 300MB after it’s freshly started. When I use DEVONthink during those times everything feels veeeeery laggy and slow - like clicks on different things take at least few seconds to load.
Memory usage roughly increases by 100~200MB a day, but I didn’t actually measure this yet.