Hi,
I seem to remember a thread dealing with the task
record.data = <PDFDocument>
(in JavaScript parlance) or
set record's data to <PDFDocument>
(in AppleScript?). Unfortunately, I can’t find that thread anymore. If some kind soul could point me to it (@pete31, perhaps?), I’d be grateful.
In the meantime: I tried this
const pdfData = $.PDFDocument.alloc.initWithData(decodedData);
const record = app.createRecordWith({name: filename, type: 'pdf'});
record.data = pdfData;
(the whole script is a bit too long, and I’m only interested in this part here). That code should (I think) create a new PDF record with the PDFDocument
created in the first line as content. However, the record doesn’t show anything, and its size is 0.
Now, if I write the decodedData
to a file and import that into DT, everything is find (i.e. I get a PDF record, and it has the right content). Also of interest: The number of pages in pdfData
is 1, as well it should be. So it seems that the PDFDocument
produced by initWithData
is valid. Shouldn’t it then be possible to assign it to the data
property of the record?
Thanks. But I’m still stuck. Here’s what I try to do
- take an EML file stored in DT
- for all the attachments in it (well, PDF, images and HTML)
- create a new record in DT containing just the attachment
Everything works ok if I write the attachment data to a file and then import
that. But the direct way, i.e. creating a new record and assigning to its data
property, gives only empty records. The binary attachment data (i.e. after decoding from Base64) is stored in decodedData
. And either
-
record.data = decodedData
, or
-
record.data = $.PDFDocument.alloc.initWithData(decodedData)
result in an empty record.
What does work, though, is copying the data
property from one record to another one (as described in the quoted thread).
It seems that data()
gives a string representation (something like “****($…”)). If that is what the data
property expects, too, one can obviously not use a PDFDocument
nor decodedData
, though.
It doesn’t, these string representations used by JXA are now actually supported. The complete source might be useful as this record.data = decodedData
should work.
That’s a bit long… Anyway, you’ll have to replace the UUID with then one of an EML record containing at least one PDF attachment.
ObjC.import('Foundation');
ObjC.import('PDFKit');
/* Associate Content-type with a DT record type. This is currently
only used to weed out unsupported types */
const typeFromMIME = {
'application/pdf': 'PDF Document',
'image/jpeg': 'image',
'image/jpg' : 'image',
'image/png' : 'image',
'image/tiff': 'tiff',
'text/html' : 'html'
};
(() => {
const app = Application("DEVONthink 3")
app.includeStandardAdditions = true;
/* For testing: fixed DT record */
const path = app.getRecordWithUuid("%3C4B9A77CE-DC90-4917-822D-377BE19325A0@bru6.de%3E").path();
/* Get the filesystem path of the first selected record */
const error = $();
/* Read the content of the record into an NSString object, return a JavaScript string */
const content = $.NSString.stringWithContentsOfFileEncodingError($(path), $.NSUTF8StringEncoding, error).js;
/* Build a regular expression to match all boundaries */
const boundaries = [... content.matchAll(/boundary="?(.*?)"?;?\n/g)];
if (! boundaries || boundaries.length < 1) {
console.log(`No boundary found in EML`);
}
const allBoundaries = boundaries.map(b => b[1]).join('|');
const boundaryRE = new RegExp(`^--(${allBoundaries})?\n`,'ms');
/* Split the content at the boundaries. */
const parts = content.split(boundaryRE);
/* parts now contains all the message, i.e. body & attachments. Loop over them */
parts.forEach((p,i) => {
/* Split the current part at two subsequent empty lines */
const subparts = p.split(`\n\n`);
/* Split the first part of the current part into lines, store them in header */
const header = subparts[0].split(`\n`);
/* Save the main part of the current part in body */
const body = subparts[1] ;
/* Handle attachments: the first element of the header must contain a Content-Disposition: */
if (/Content-Disposition: (inline|attachment);/.test(header[0])) {
/* Get the header lines with the raw filename and MIME types */
const filenameRaw = header.filter(h => /filename=/.test(h))[0];
const mimeTypeRaw = header.filter(h => /Content-Type:/.test(h))[0];
/* convert raw filename and MIME type to the correct strings */
const filename = filenameRaw.match(/filename="?([^"]*)"?/)[1];
const mimeType = mimeTypeRaw.match(/: (.*)?;/)[1];
/* Get DT's record type corresponding to the current MIME type */
const DTtype = typeFromMIME[mimeType];
if (!DTtype || DTtype !== 'PDF Document') {
/* ignore all attachments with unsupported MIME types */
console.log(`mimetype ${mimeType} not suppored`);
return;
}
/* Decode the body of the attachment into an NSData object.
Remove the last boundary first, otherwise the decode will fail */
const decodedData = $.NSData.alloc.initWithBase64EncodedStringOptions($(body.replace(/^--.*--$/m,"")), $.NSDataBase64DecodingIgnoreUnknownCharacters);
const PDFDoc = $.PDFDocument.alloc.initWithData(decodedData);
const record = app.createRecordWith({name: filename, type: DTtype});
record.data = decodedData; // Gives an empty PDF
record.data = PDFDoc; // doesn't work either
return;
}
})
})()
Does the decoding definitely work? I just saw this in the Console:
2022-07-26 16:15:41.501 DEVONthink 3[96489:3759813] setData (DTRecord <2CF1BC43-2FBD-4BDA-8157-AC820050A093> (/Test.pdf/)): Invalid image.
Just had a closer look, the data received by DEVONthink contains only 60 bytes over here whereas it should be 5 MB. It’s somewhat similar to the stuff here…
http://mail.machomeautomation.com/pipermail/xtensionlist/2016-June/008168.html
…containing dle2
, reco
or usrflist
too but useless for DEVONthink.
As I said above: Writing decodedData
to a file results in the correct PDF. And PDFDocument.pageCount
returns 1, which is also correct. Both seem to indicate (to me, that is) that decodedData
is in fact a working binary PDF.
I’m not sure what the error message means and where the file name comes from in it. I’ve tested it here with a single EML containing two PDF attachments. It’s conceivable that the splitting etc. is not working correctly for all EMLs, though. If body
is too small (i.e. less than 5MB), that’s an indication that something before didn’t work.
I’ll just send you my EML file for testing, ok?
Sure. Although I assume that it’s more likely another weird JXA conversion issue and therefore writing the file works, setting the data doesn’t.
Same results with your email. A valid NSData object but just containing 60 bytes. But I’m not sure if it’s really JXA related or just because the AppleScript suite is not really intended for Objective-C objects created via AppleScript or JXA.
And i"m just getting a zero-byte PDF from this script.
That’s exactly what I’m seeing here, too. But (see above): When I write the data to a file (with the Foundation framework stuff, that is), I get a working PDF.
That might well be the case. So let’s just ignore it – at least there’s a workaround by writing the stuff to a file and importing that.