Finding PDFs with less than X words per page

Hello, I have PDFs that are Bates numbered and labeled CONFIDENTIAL ATTORNEYS EYES ONLY in the footer. The Bates numbering and other footer text is 8 words. The substance of many of the PDFs are not OCRed, but I am unable to find these PDFs by searching for PDFS with zero word count because these documents have words due to the footer text that was added by producing party (e.g., Bates numbers and Confidential designations, etc.). The documents range from 1 page to 80 pages. Is there a way to set up smart folder to find documents with less than X words per page?

Thanks much!

When you add a column for Character Count what’s the value for such a PDF?

It depends on the number of pages. Some of have 30 characters; some have a few thousand.

Thanks!

Sorry but no you can’t search based on pages in the document. DEVONthink looks at the raw text from the entire document as one unit.

1 Like

Can you give me some text that is definitely not in the text of those records? It might be possible to do what you want.

1 Like

Two types of text in footer:

Either:
CONFIDENTIAL - ATTORNEYS’ EYES ONLY ABC DEF 00000X

or

CONFIDENTIAL ABC DEF 00000X

ABC DEF is the Bates prefix, which I’ve changed to anonymize the matter I’m working on.
00000X is a number

Thanks!

What I have in mind would take some time to test (if it’s possible at all).

Could you try this now:

  • Create a Smart Group with kind:PDF
  • Sort the records by Kind

Then scroll through the list and check whether they have all the same Kind.

1 Like
  • Is this text all in one line?
  • How many numbers and are they zero-padded, i.e., 000001 not just 1 ?
1 Like

I created smart group where kind is PDF/PS
I sorted by kind and all documents are PDF+Text kind.

Thanks.

1 Like

all in one line; six digits; zero padded.

Thanks.