Searching top of PDF

Hi, I’d like to use devonthink to search published scientific articles based on author. While devonthink does a great job at pulling up all pdfs in my database with the name of a particular author, the problem is, that it includes both papers authored by the searched name and papers citing the name. Is there a way to restrict your search to the top of the page - maybe specifying within the first few lines or by measurements such as x number of pixels from the top of the page - to identify articles written by an author?

Don’t know if this would help, but you could OCR the first page (either using SnapScan or Acrobat if file is a pdf already) so you could search on contents, e.g. summary, authors, title, etc.

DEVONthink has some very powerful search operators. Though it doesn’t have the ability to restrict a search to a portion of a page, some combination of the existing operators might get you to the same result. Look at the detailed listing of operators in Help in the “search operators” topic, and at the search tutorial in Help > Support Assistant.

For example, perhaps your articles consistently use certain words near to the author. A search for

Beesley NEAR “Department of Biological Science”

might be a structure that would work for you, using the NEAR operator.

There are a couple of other options that would work especially well, but would also require additional work on your part to add data to the PDFs. You could add the author’s name(s) to the ‘Author’ field of the PDF (Tools menu>Show Properties…) and then perform your searches on the metadata field. You could also add the author’s name(s) to the comments field and search on comments. This option would allow you to add the author info to a multiple selection of documents, while editing the Properties info would need to be done one document at a time.

Thank you for all of your suggestions. @Korm - your idea is very clever and works well as long as the author does not change their affiliation (which often happens in transition from grad student - post doc - PI). I did think of putting the authors’ names in the comments/metadata of the file, however, this seems very time consuming especially for >1000 files. My workflow is such that I use Sente to download the articles with the citation data and I then index the Sente database into devonthink. This last part is necessary because Sente does not have a powerful search function unlike devonthink. I’m wondering if Sente or some other citation manager has the ability to automatically insert author information into the file metadata either automatically or via script. Does anyone have any info on this? It would be very powerful to be able to search for a list of articles from an author and then narrow down the results using devonthink’s operators (e.g. NEAR) for combinations of keywords.

Seconded. I also use sente to manage my pdfs and use devonthink to index them. I would be interested in any solution put forth about automatically scripting this

I would like to resurrect this suggestion. Restricting search to top N pages (or bottom M pages) could help search for PDFs by author (or by cited authors).

Meanwhile, I will try @korm suggestion above using the NEAR search operator. That’s a clever trick.

1 Like