Need some advice (long post)

I’ve been trying out DEVONthink (Dt) for awhile now, and while I’m continually impressed by its abilities, I also am quickly getting the impression that my initial needs (or rather “wants”) is beyond Dt’s capabilities. I was hoping that if I outline my wants, someone could help me out with how Dt could be used to achieve them. I’ve now spent a lot of time trying to determine an optimal usage scenario for Dt, and either I’m not understanding it 100%, or my needs are beyond the scope of the personal edition of Dt (But the Pro info. on the website hasn’t answered these questions for me).

I’m a pack rat when it comes to data and information, and I have been into PC’s for awhile now, so I’ve amassed a rather sizable collection of files to deal with. I’m also a developer, and want to be able to quickly glean what file would be best for me to peruse when I need to find a solution to a particular problem. This goes for my other interests as well, whether it’s indexing fictional texts, wallpaper images, or my personal and financial documents.

As such, I have about 6.5 gb’s of ebooks I want to index in Dt. The ebooks consist primarily of pdf’s, but also contain a lot of “indexed” html (meaning that it’s a local copy of a website, or a series of html-based manuals, similar to the O’Reilly developer series). These ebooks are arranged, on the hard drive, in manners similar to the following example, and are often mixed in with web clippings, and articles I’ve archived from online sources:

/Manuals and Instructional/Software/Development/PHP/PHP5

I was able to just drag a large series of these over, and into Dt, but somewhere along the way, they stopped copying accurately, meaning that if I dragged a group of files over from my hard drive, to link to Dt, not all of the files would end up linked.

At this point, it appears that I likely have hit the 10000 limit to the personal edition of Dt, but I can’t find anything that’s telling me I’ve hit this limit. Dt simply “beeps” at me (sometimes… Other times it just takes the focus, and does nothing) when I try adding more to the system. I can see that I have “x” number of images in the database, along with a myriad of other information, but I’ve no idea on the PDF’s. Perhaps this is a limitation of the trial version, but again, I can’t find anything specifically telling me what the problem is.

Another problem I’m running into is with the speed of Dt. My database of development related ebooks is not over 41000 objects (a mixture of PDF’s, web pages/sites, images, and related text notes and sych), and Dt really seems to slow down at times. I don’t see the CPU utilization go over 50% on one of my 2 processors, so I don’t see it as a bottleneck with the system… Can I expect similar slowing as my collection continues to grow? I can deal with some slowness… It’s very impressive that Dt is able to do what it does already, and I expect that larger amounts of files will slow it down somewhat, but the increase seems to be really bad at times.

This leads me to thinking about other ways to break down my documents… Ideally, I’d like to keep all my documents with Dt, or a similar program (Dt at this point), from personal documents, to ebooks and reference material (PDFs, saved web pages/sites, etc), to resource material (images, videos & audio files, etc), all arranged under some rather broad root folders (again, “Development”, “Fiction”, “Reference”, “Personal”, etc), which would allow me to quickly narrow my search down by topic.

Anyway… My hope is that Pro will let me do this, as currently the file limitations of the personal edition are simply lower than the number of files I intended on indexing. What will be the hard limits of the Pro version? Any, file size, quantity, etc limitations?

I’ve considered breaking my subjects up into seperate databases. But I really have a problem when it comes to Development, for example. Ideally, I’d like to have my code snippets, notes, refrence material, development related ebooks, images, and so on, all housed in on database, along with information for specific projects, housed in the same database, but clearly segregated from the reference material, resources, etc.

This is certainly possible now, but considering the reference material, what if I’d like to reference the same ebook/reference material in a seperate ebook-only database? There doesn’t appear to be a way for me to share document information between databases, and it seems a big waste of space to duplicate this data.

Again, I’d prefer to do this all within one Dt database, rather than split them up, but is Dt’s speed, and limitations going to allow me to do this? Another problem with this scenario is that there’s no quick and easy way to switch databases. I an redirect the focus of Dt, and then close and re-open the program, but this rather time consuming, especially when considering how slow my start time is on my large database. I’ve thought about trying to achieve this also by pointing Dt’s focus to a linked folder, rather than the folder itself. I can then overwrite this link with differnt links, and then relaunch Dt also, but again, this seems slow and problematic.

I do like Dt a lot, but am leary of spending the time that it will take to correctly arrange, import, and sort my wealth of documents, on a product which isn’t designed for someone with my particular needs.

One more item I was wondering about: I am considering the purchase of an Powerbook, and would like to share “some” data between a Powerbook -based database, and a home-based Dt install, while keeping additional items, both on the powerbook, and on the home PC, seperate from one another. Basically, I’d like to be able to doc my Powerbook up to my home network, and have these shared sections sync up to each other, while keeping their location-specific sections untouched, and intact. I don’t see an obvious way to do something like this, nor any indication that the Pro version will. Is this a really odd request, or am I missing something obvious?

Again, I really do like Dt, and think that there must be some way to tailor it around my needs. Sorry for the long post, but I wanted to be as clear as possible. :smiley:

Hello,

I have used DT personal edition for a while but I am certainly not an expert. I see you have not had a reply to your very comprehensive Topic. Whilst I use DT I also use Hog Bay Notebook. Which is an excellent application. If you have yet to decide to buy DT Pro which may meet your clearly hugh archive of data. Then goto the developer of Hog Bay, Jesse. He has an excellent raport with his ‘Customers’. Just yesterday I suggested considerable feature request and he implemented it the same day. Ask him if Hog Bay Note book will cope with your data. I take it you have also by now asked the nice people here at DT who also provide very good feedback.

hogbaysoftware.com/products/ … tebook.php

hardcat

WhyNot:

I see that you’ve not gotten much response to your 3 December 2004 posting.

While I can’t address all of your points, here’s a stab at several of them.

[1] DT has quit importing all of the material dragged in. Sometimes it accepts new material, sometimes it beeps, and sometimes it doesn’t seem to respond.

I can’t be certain that you’ve hit the 10,000 limit on PDFs and images, but that’s a possibility. Downloaded Web sites can contain gazillions of images. If that’s the problem, DT Pro may be the answer. (Beta testers are running DT Pro 1.9 beta 8 now. Christian’s list of unfinished features is pretty small, so a public beta is probably on the near horizon.)

It’s also possible that DT is hanging up on corrupted files, or on files that it doesn’t know how to deal with (e.g., AppleWorks, WordPerfect) that may be in your folder content. If that’s the problem, try importing a few files at a time from a folder and/or inspect the folder contents for possible problematic file types.

[2] DT occasionally slows down. I’ve got a pretty large database, and I do sometimes see this behavior. Phrase searches, classification and “See Also” operations by DT are memory intensive, which triggers buildup of VM files. In other words, processes get swapped from RAM to VM, and changing processes requires a lot of disk access.

When things slow down enough to bother me, I run the Verify & Repair tool. Then I start Backup & Optimize and take a break for a few minutes – Backup & Optimize takes enough time for me to go get a cup of coffee. Presto – DT is up to speed again.

[3] Should everything be in one big database? Perhaps not. That’s where DT Pro offers possibilities.

Right now I’ve got 6 DT Pro databases. My main working database has evolved over more than two years and contains a number of topics.

One of my other databases is a current experiment in “pruning” the main database to separate out one big topic that has little relationship to the others. I simply did a File > Export > Files & Folders export of that group, then imported it into a new DT Pro database. (But I haven’t yet deleted that topic from my main database.)

A third DT Pro database has been put together from searches of my main database on selected topics. Here’s how I did that: I was researching some environmental policy issues. For each search, I created a new group, then selected the relevant search hits and replicated them into the group folder. Then I did a File > Export > Files & Folders export of the new groups (afterwards, I deleted them). Finally, I created a new DT Pro database and imported this material into the new database. (Replicating search hits can also be useful in reorganizing classifications in my main database.)

The other three DT Pro databases are mailing list archives on special topics such as the Panorama database, the Newton, and hybrid cars.

Your description of your database organization implies that you might very well split your DT PE database when you move to DT Pro.

[4] The current DT Pro beta doesn’t allow concurrent operation of multiple databases. To change databases, you must close the open db before loading a different one. But DEVONtechnologies indicates that concurrent multiple DT Pro databases will come in the version 2.x series. (Obviously, the larger the multiple open dbs, the more intensive demands on RAM and disk activity.)

[5] Can data be moved among multiple DT Pro databases? Yes, using the Export/Import procedures as noted in [3] above. Presumably, when multiple databases can be open at the same time, copy/paste (perhaps even drag & drop) will be possible.

[6] I’m running DT Pro beta on a four-year old TiBook 500 MHz with 1 GB RAM. I remain in awe over how fast most operations work on this computer. But as DT continues to get smarter (and more CPU intensive), one of these days I’ll move to a G5.

Bill …

I’ve noticed that DT struggles with large text files. I exported my old outgoing mail files in mbox format and then converted them to a .txt. The file was 50 MB … :wink: . DT spent more than 5 minutes simply bringing the file in. (1.5 G4 512) Not faulting DT here … it IS a big task.

Have you found a limit to convenient file size??

THX

frankns:

[1] I see a slight delay in opening files of 100 KB, and the delay increases with file size. At 400 KB a file will take a few seconds to open. (I’m running a 500 MHz TiBook.)

I’ve imported hundreds of big PDF files. Two or three produced text file sizes of 1 MB or more, using pdftotext conversion. Opening those files was slow. I cheated by selecting the text and choosing Services > DEVONthink > Summarize to reduce file size. Nowadays, I usually use Index import for large PDF files. That results in smaller text captures. (But there’s a downside if you use Phrase searches, which can take a long time.)

[2] I use Entourage, and use a script to capture selected messages into DT. These are small plain text files (no attachments are captured).

I’ve got DT Pro databases for my mail archives, and they hold several hundred MB. But because the individual files are small, performance is blazing fast.

There’s a similar script for capturing selected messages from Mail. The script churns along rather slowly, but it works. Again, capturing messages as individual files results in fast opening within DT.

DT was rather slow on a TiBook which had only 256Mb RAM. I now have an iMac G5 with 1Gb RAM, and a iBook G4 with 1.25Gb RAM. The difference of processor does not seem to count for much, by contrast the memory leap has speeded up DT considerably. On both machines it runs nicely. It still slowed down, I guess because I have a few files which it cannot deal with very well. I am currently trying what effect it has to delete some files on my iBook (whilst preserving them on the iMac).

Nonetheless, I have also split my database in, currently, 3 DT Pro ones.

BTW, for email searching I use PowerMail, which is incredibly fast in searching and has all kinds of nice search options in addition. Irreplaceable.

I use Gyaz Mail and it will export a standard UNIX mbox … but this cats the separate messages into one file. Given that when you import the mbox, these messages are split apart again … I’m wondering if there is a utility that would do this on the standalone file.

F

I’ve found and begun experimenting with Elmailchemy. Seems to be capable of going into an existing mbox and converting to standard .txt files.

See: weirdkid.com/products/emailchemy/index.html