I’d expect that exporting a PDF document from DT to the file system would be a simple copy operation. Apparently, this is not the case – the PDF gets modified on export. See the following output of
cksum (the first number being the checksum, the second the file size). The test doc was a PDF with text layer, I did nothing with it in DT itself (no tags, no metadata, just importing).
665077346 1967675 File.pdf
665077346 1967675 File-Drop.pdf
2752799228 2192371 File-Export-Files&Folders.pdf
3186470518 2192371 File-Export-Document.pdf
2159412001 2169950 File-Export-PDF.pdf
In order of appearance:
- Original file in the file system
- DT document dragged & dropped into the filesystem
- DT document exported with the “Files & Folder” entry in the Export sub-menu
- Same document exported with the “Document” entry in the Export sub-menu
- Same document exported with the “PDF document” entry in the Export sub-menu
- Not shown here, but the
cksum output of the document in DT’s folder structure is identical to the original one. Which proves that the changes happen on export, not on import.
So, depending on the method to copy a DT document to the file system, we get THREE different sizes and FOUR different PDFs.
Question: Is this behavior intentional? If so, what is the rationale behind an export function that modifies the file it exports without being asked to do so? And what are the differences between these PDFs? Why are they getting bigger on export?
In my mind, exporting should behave exactly as drag&drop if no document conversion occurs. I checked the documentation, but couldn’t find anything at all on the topic of “Export” (while importing is explained in depth).
The export adds e.g. extended file attributes (e.g. the tags) and/or Finder comments, not sure if
chksum uses this information.
Cf this thread:
There might be more going on than adding extended attributes on export: the file size increases by a lot. Also, I didn’t add any meta data in DT, only imported and exported the PDF into the inbox of a test database. So, there were no connects comments nor tags to be added.
Both the import/export (via File > Import/Export > Files & Folders…) and drag & drop definitely just copy/move files. Or did you use File > Export > As PDF-Document…?
I explained what I did in detail where I gave the cksum values.
I might be a bit dense here. But the original document was a PDF. What’s the difference between it and it’s view? Or why would exporting a PDF as PDF modify it?
The only difference is that the view might have unsaved changes. In the end a fresh document is written. Similar to export features e.g. in Pages or Preview.
Of course this could be changed without unsaved changes but a future version might include additional options (that’s the case already for images) and then the results would be different again.
As reported in my post linked by @chrillek , this happens to “naive” pdfs too, without any change or edit of any kind.
As I said: I didn’t do anything to the PDF. Only imported it, no annotation, no tags, no metadata. And the PDF in the database package is exactly identical to the one I imported (according to
cksum, that is), also after exporting.
The whole issue arose because @vixxovs experienced a problem with an exported PDF that the original did not exhibit. And while I’m not convinced that exporting an unmodified document would require re-creating (instead of copying it), I do certainly not understand why “Export/Files and Folders”, “Export/Document” and “Export/as PDF” are creating three different documents (two identically sized ones with different checksums and one slightly smaller) – all three being about 11 percent bigger than the original. If it is absolutely necessary to recreate the PDF on export, I’d expect it to be recreated in the same way in all three cases.
Not to mention that exporting an MD file does (probably) not change it at all.
Not in my sample case: File => Import/Export > Files & Folders… created a file different from the original (size and
cksum). Cf the 2nd and 3rd line in Exporting a PDF changes the file, whereas drag&drop doesn't