Phrase searches take forever. Why?

I’ve mentioned this before many times, but nothing has been done about it. Phrase Searches in DT and DT Pro are all but useless. A two-word search term in my 31 million word database for which I specify “all words” or “any word” usually returns a set of results in less than one second. But when I use the identical search term as a “phrase” it takes TWO MINUTES!!!

I’ve always found phrase searches to be impossible to use in DT. There really should be no excuse for this pitiful performance in an era when a multi word search term, any search term, will return thousands of examples in Google over the web, and appear as fast as I can type the term in HogBay Notebook, yet in DT/DT Pro it takes forever to search the database located right here on this computer.

Am I really the only person who finds phrase searches impossible to use in this otherwise magnificent piece of software? Surely a solution to this lamentable performance is well overdue.

Rollo

rollo:

My main database has slightly over 16 million words, so is about half the size of yours.

I just did these Phrase searches, as follows:

chlorinated hydrocarbons - 6 items found in 0.549 seconds

aldous huxley - 8 items found in 0.246 seconds

Aldous Huxley (Exact) - 8 items found in 0.007 seconds

global warming - 668 items found in 2.914 seconds

without accounting for - 2 items found in 5.081 seconds

boolean search operators - 1 item found in 0.164 seconds

boolean operators - 16 items found in 0.089 seconds

the admissibility of expert evidence in court - 1 item found in 1.725 seconds

Computer: iMac G5 2 GHz 2 GB RAM. Memory conditions: 1 64 MB VM swap file, 612 MB RAM used, 1416 MB RAM free.

Note: The two searches for Aldous Huxley references were not done consecutively. An Exact search will be faster, as evidenced here. All the other searches were done with Ignore case checked. I just typed in some search strings that I expected would be in my database. And just to check a couple more Exact phrase searches:

Roger J. Williams - 8 items found in 0.112 seconds.

Lynton K. Caldwell - 11 items found in 0.042 seconds

(I’ve had the good fortune to have worked with both the above gentlemen.)

Hi Bill

Phrase searches have never worked rapidly for me. I have no idea why, and still have no clue as to why they work well for you and not for me. No complaints about search speed for “all words” and “any words”, they’re just fine.

Can anyone else shed some light on this mystery?

Rollo

Just tried another two-word search term. 447 seconds!!!

The same search for “all words” … less than half a second. This disparity is beyond my comprehension.

Rollo

Hi Rollo,
I can approve your observation. I have a database consisting of 20 Million words only consisting of PDF documents. When doing a two word ‘Phrase’ search it took between 119 and 192 seconds (tried 4 times with the same phrase and restarting the program inbetween). A search using the preferences ‘All words’ took 5 seconds.
Therefore I always do first a search using ‘All words’. Then I copy the hits in a separate folder and do a ‘Phrase’ search in this folder.
I have no idea how many other people also have this problem and if the developers can solve it. But the program could use the same strategy that I use. Search for ‘all words’ and among the obtained hits do a ‘Phrase’ search.
The search was done one a G4 iBook 800 MHz with 384 MB Ram, 6MB free. DEVONthink uses 120 MB of physical memory and 522 MB of virtual memory.
Maybe memory is the problem. I will try the same search at home on my 2x1.8GHz G5 with 1GB of RAM and post an update tomorrow.

Kind regards,

Schmalex

FYI:

Database with 6.8 M words, (229 groups). Powerbook G4, 1.5 GHz & 1.25G RAM.

A two word search, for words I was almost certain did not exist in the same document (under arm) , was almost instantaneous.

A second two word search, for words I was certain did exist in the same document (behavior analysis), took less than 2 seconds.

As promised in my previous reply here are the numbers for my home system. On a 2x1.8GHz G5 with 1GB of RAM the search (2 word phrase) with the same database took 21 seconds. DEVONthink (1.9.4) had 414 MB of physical RAM, 646 MB of virtual RAM and 11 MB of RAM were free. Still far away from perfect.

Schmalex

1.5 GHz PowerBook G4, 1 Gig RAM, database of 14.5 million words, two word phrase search returned 272 items in 1.74 seconds. This was using the search window under the tools menu.

ChemBob

DT actually does this.

Looks definitely like a memory issue (e.g. lots of free RAM improves the volume caching of OS X a lot). In your case, neither the volume cache nor DT have probably room to operate (and DT and virtual memory accessing the disk at the same time can reduce the performance up to 100 times).

However, v1.1 will improve the speed slightly (by 50%) and reduce the memory usage of phrase searching. The last one might be more important if you’re running under low memory conditions.

E.g. over here (2 GB) I couldn’t get results lasting longer than 0.6s (using the latest build) even when trying the word cast scenario (case insensitive searching for the two most common words in both contents and all other data). All other test scenarios took 0.1-0.2s.

So if v1.1 should still not perform as expected, there are actually only three solutions:

  • add more memory or reduce the number of unused applications running concurrently
  • search only in selected groups and not in the whole database
  • use faster search settings (e.g. only contents instead of all or exact instead of case insensitive)

well, dont know how many millions of word s i have, but my databas is about 80 gb ( yes teh devon agent database ) and i tried some seraches after i read about the problems here, and i cant undertsand whats wrong. dt gives me results in less than one or two secounds, searching for phrases like : gestern war ich einkaufen or : als ich das letzte mal mit im gesprochen habe ( 115 treffer) , so, must be really a memory problem, or wrong SEARCH settings, btw, i tried spotlight, it found the stuff faster…