Organising a large database - Best Practice

Hi,

I am looking for advises on how to organise a large database ( 700,000+ html files ). Each file contains the summary of a report and some descriptors (key-words). Sometimes, the subject category is also mentioned in the file.

The first driving idea is not to import all the files but rather index them as importing the files results in DTPO running very slowly, 100,000 being somewhat a upper limit here. Indexing seems to be a practical solution. Putting all the indexed files into a single folder (“source database”) is the idea so far.

The next idea that comes to mind is to create (manually) groups with titles corresponding to subject categories and to run a search on the “source database” to look for the “subject categories”. Files can then be replicated into the corresponding group. Use of replicant is chosen here as each file from the source database can be replicated in different groups if it relates to different subject categories.

The main problem with that sorting scheme is that it involves a lot of manual intervention. Do you have a better way of doing this? Thanks in advance for the advise.

Emmanuel