I would in the IT industry and make use of RFCs all the time. I have imported in simple text formal all 4000+ RFC docs. While this worked well for about a day. Ever sense then I get a beachball for at least 2 minutes upwards of 20 minutes when I go and classify a new doc.
I don’t think my database is really that large other then the RFC import. Any idea what I could do to improve the speed with out removing the RCS docs?
I have a fairly large number of Text, HTML, and PDF documents in my DT database. I’ve found that after adding a hundred or so documents, it helps to run the ‘Backup and Optimize’ command.
Also, try to break the RFCs up into subgroups (ideally by subject matter via AutoGroup, though grouping every 10 RFC #s together would work). When I started building my database, I had very rough groups (e.g. ‘programming’, ‘mathematics’), and found as these groups got larger (several hundred items), moving documents into them started to take awhile. I bit the bullet and organized the groups into a tree hierarchy of 3 or 4 layers, and performance improved noticably.
I should point out that the machine I am using for most of my DT work is pretty powerful (quad G5, 8 Gb ram, with the database stored on an internal SATA non-boot drive with > 100GB free). When I see the pinwheel of doom, it is cause for concern
It would certainly be worth investigating what database structures make DT perform better. I know Bill recommends multiple small project-specific databases, but I’d say 3/4 of my database is reference material that is applicable across most of my projects. It would be nice to know whether database breadth (many groups at the same level) vs depth (many nested groups), average number of records in a group, type of record (e.g. PDF vs RTF vs HTML vs sheets), or size of record (e.g. a 4MB PDF vs 10 400K PDFs) has an effect on performance.
Hi, Jeremy: You didn’t say how much RAM you have. AI operations such as Classify are memory-intensive. Think about what’s going on as DT Pro compares the text of a document to the text patterns in your various groups.
What’s happening when you see the beachball is that your free RAM is gone and data swapping is taking place to and from Virtual Memory swap files on disk.
As suggested above, running Backup & Optimize following addition of a lot of new content helps “compact” the data.
You will also find that after running Classify you can speed things up again by quitting and relaunching DT Pro. And of course you can clean up all those VM swap files by restarting the computer. (Or by running a cache cleaning utility as well, such as OnyX or C ocktail.)
As Christian has noted elsewhere, DT Pro version 2.0’s database structure will be more efficient in use of memory.
Oops… I just remembered you answering this before… there’s an option to schedule Backup & Optimize in the Preferences, but I remember you saying it’s currently disabled. I assume this is still the case?
Check the preferences again. You can set a preference for the frequency of Backup & Optimize to the internal Backup files inside the database package – hourly, daily, weekly, monthly or never.
What’s grayed out is the preference option to do an external backup in this way. I prefer using the Scripts > Export > Backup Archive option for that, anyway.
Personally I don’t depend on a set schedule to decide when to run Verify & Repair followed by Backup & Optimize. I will run those routines whenever I’ve been making substantial changes to the database, so that I’m assured of having a recent backup.