PDF not finding complete words

PeterHawkey · February 21, 2026, 5:43pm

Hello,

I a brand new user to DT4 working out the why’s and wherefores of the App (I’m very impressed up to now). I have imported some pdfs of music scores from various composers and when I search for Bach, I get nothing. Searching “Ba” finds the Bach piece and highlights the word Bach, but as soon as I add the letter c to the search “Bad” the search returns only the lyrics to a song with the word “back” in it. I’m a bit baffled as to what I’m doing wrong.

Any suggestions?

P

cgrunenberg · February 21, 2026, 6:15pm

Convert the document to plain text - does the converted document contain “Bach”?

BLUEFROG · February 21, 2026, 7:06pm

While this is a bit older of a blog post, the essence is still correct for version 4…

PeterHawkey · February 22, 2026, 8:13am

Hello and thank you both of you for your replies. I understand the issue a little better now, however I have a couple of concerns.

I tried to attach some pdfs to show the problem, but I’m afraid I get an error saying that new users cannot upload documents, so you’ll have to take my word for the following:

The original PDF is 49kB and misses the full word “Bach” when searched. It only recognises “Ba”.

I converted the PDF to a text file (RTF) and the word “Bach” is clearly there.

I converted it to a readable PDF resulting in a PDF+Text file in which the word “Bach” appears on a search. However, the file has now risen from 49kB to 221.8kB, this is quite a rise for the added text layer, but perhaps more importantly, the quality of the PDF has dropped to below an acceptable level for printing.

I run the library for an amateur orchestra and have several thousand PDF files. I’m uncertain about the practicalities of converting all of them to readable PDFs bearing in mind the quality issues and the x4 rise in file size. I can get round this issue to a point by extensively using tags it’s just I was trying to avoid having many hundreds of tags floating around.

I would appreciate your thoughts and advice.

Peter

vinschger · February 22, 2026, 10:37am

When you open the PDF in the macOS Preview app and search for “Bach,” is the term found?

Also, in DTP, is the file type of the pdf displayed as “PDF document” or “PDF + text”?

chrillek · February 22, 2026, 11:11am

If they have no text layer, DT can’t create a full text index and you can’t find words reliably. So it’s either

OCR everything
Or use tags
Or don’t find words reliably
An alternative might be „poor man‘s“ OCR aka Livetext aka Applevision and then tagging automatically, but that might require scripting. Or converting to RTF and using that to tag automatically. Both these approaches would require scripting.

PeterHawkey · February 22, 2026, 1:16pm

Thanks for the suggestion. The original is a PDF only file. Curiously, if I open it in Preview outwith DTP and search for “Bach” I find Bach. Try it in DTP and it only finds Ba. That’s odd

PeterHawkey · February 22, 2026, 1:19pm

I would have no problem OCR-ing the entire library but for the quadrupling of the file size and the serious degradation of the quality.

I suspect tags are the way to go. It’s not the end of the world and not a deal breaker for using DTP, I was just keen to avoid having hundreds of tags floating around.

saltlane · February 22, 2026, 1:33pm

Another option to tags would be to add custom metadata or use the finder comments field, though tags would be more flexible.

MsLogica · February 22, 2026, 1:40pm

It would probably be a bit weird to OCR a music score (since it’s mostly non-language notation) so that’s likely why the PDFs aren’t OCR’d.

Before you start tagging everything, may I ask how the files are named and how you’re planning to file them?

If your scores are titled by composer, for example, you don’t need to tag them. Or if you’re planning to file them by composer, you also don’t need to tag them (all the Bach pieces would be in the Bach folder, for example).

I’m asking because I don’t know what sort of scores you’re looking at, but you might need tags for each orchestra section or chair so it’d be best to think about your filing structure before you start tagging things.

PeterHawkey · February 22, 2026, 3:40pm

Yes, I can see why it seems odd on the surface, however there is quite a level of complexity that I am hoping to ease using DTP. The Piece I’ve described in this thread is a good case in point. It’s Sleepers Awake by Bach arranged for Saxophone Orchestra.

The piece exists in several different formats depending upon the number of players (there are full orchestral parts, Octet version and a quintet version) as well as having the part by different arrangers. So it’s not as simple as putting Bach in the Bach folder alas.

Add onto this that there are scores for each part, as well as parts for the music engraving software, notes to keep track of performances for PRS payments - that’s just for this piece and we have many pieces….

P

BLUEFROG · February 22, 2026, 5:46pm

I agree with @MsLogica and as you’re referring to a broader use of DEVONthink, I recommend you grab a notepad and pen – and a drink of your choice – and draft out:

An organizational structure, how you segregate the documents.
Process workflows, whether it’s for administrative things, getting ready for a performance, etc.

It doesn’t have to be super detailed but just enough to give yourself a useful starting point. Mapping these things lets you get a good view of what you want to accomplish and it doesn’t commit you to any specific method in the app. It also keeps you focused. While it can be done in DEVONthink, there’s a lot to distract you.

PeterHawkey · February 22, 2026, 7:10pm

Thank you everyone for your helpful suggestions. I’m going to have to think of this a little more carefully.

I currently hold the pieces in folders, either on my iCloud or on an external drive, named after the title of the piece - in the case above, the folder is Sleepers Awake. Then they are divided out into audio files, parts, and engraving files. This gets me to what I want fairly quickly, provided that I have been assiduous with my filing, which is not always a given, but it lacks the nuance that I’m hoping DTP will give me such as the ability to search for composers or arrangers, or whittle out which works include specific instruments, or when they were last played. I wondered about having a central repository of just the part PDFs in which I could pile everything and rely on the PDF search to dig out what I want. I see that I am going to have to think more carefully about how I will use DTP.

It looks like tags are my friend.

Thanks again for the help

now, where’s that pen, paper and bottle of wine?…….

BLUEFROG · February 22, 2026, 7:12pm

This is good information, as is your questioning its efficiency. Given the abilities built into DEVONthink – including automation (and access to external AI, if that’s of interest) – I have no doubt you can improve upon it.

Enjoy the wine and thinking!