Does DEVONthink support searching very large text files (over 1 million lines)?

Hi everyone,

I’m currently searching for a tool that can handle indexing and searching through very large text files, often containing over 1 million lines within individual documents. I recently downloaded DEVONthink and attempted to index several files, some of which are around 20MB in size and include texts that have more than 1 million lines.

While I successfully set up the index, it didn’t seem like DEVONthink searched through the entirety of the files, especially when dealing with such large texts. I’m unsure if this is a limitation of DEVONthink or if I have configured something incorrectly.

Has anyone encountered similar issues or knows if DEVONthink has specific optimizations for handling and searching long text files like these?

Thanks in advance for any insights!

Importing/indexing skips plain text files larger than 256 MB. In addition, the search index of each document is limited to max. 16 MB of pure text.

1 Like

To have a comparison, the DT manual (PDF) is 65 MB. Can you estimate how many “MB of pure text” that is?

Depends on font size, margins and other formatting details.

Something between 250 and 300 words, which can be between 1250 and 1500 characters is an estimate - per A4 sized page, without images or graphics.

Files sizes are usually no useful indicator. A PDF might contain lots of images and no text at all. Converting the PDF to plain text is the easiest way to figure out the numbers (in case of the manual less than 700k)

The total text of the manual is 700K and DT can search 16 MB of text per document?


Then you can add many more features to DT and document them with text :smiley: Danke für die Erklärung!

Welcome @submitter
May I ask what kind of files / data you’re trying to process in these files?

A million lines of plain text is a LOT of text. Several multiples more than even such tomes as the Bible or the Tolkien Legendarium.

Another question is why do you need all those millions of lines in one file…

I’m curious what these texts actually are.