Where are all the Bytes?

One question concerning the size of the files stored in DT databases. I´m a DTPO user.)

I found that e.g. pdfs in a DT database are very small, say 60 kb or so. When I take this out of the database e.g. to the desktop, the created file is much bigger, for example 200 to 500 kb, but it could also be some MB.

How is this possible?

Thanks ins advance.


You have probably been looking at aliases to the PDFs, not the actual PDFs

I have look at the column which shows the size of the pdf in the database.

What sense sould it make to show the size of aliases?

Those reflect to the actual text content stored in the database. If you go to the Info panel you’ll see the file size there.

Ah, okay, thanks you. But - what´s the use of this feature (reflecting to the actual text content stored in the database)?

That indicates the memory needed as a result of adding that file to your database.

There are currently two sizes:

  1. Size of contents (including meta data) inside the database
  2. External file size

But this will be unified in a future release anyway as this is more or less obsolete since DT Pro 1.1. This version simplified & unified the various import/index/link/copy/don’t copy options & commands.

thanks again for your fast reply.

One last question just to clarify:

Does that mean that I have to “versions” of my pdf: one somewhere (where?), and the contents in the database?

I just want to know this to understand the architecture of DT.

Not exactly, no. There are not two ‘versions’ of the PDF file, only one; that PDF file is either external to the database (Index-captured) or copied into the database package file (Import-captured).

But DT has read the text content of the PDF, integrated it into the database Concordance, and holds metadata about that file (the stuff in the file’s Info panel). Think of it as the database’s “overhead” that now makes it possible to search the text, analyze the text content and perform AI functions such as Classify and See Also.