How do you sort Tags by number of Items?

Are you implying that DT may not like dealing with 2,000+ tags for all my authors?

That number of tags is generally inadvisable, especially if you’re indexing instead if importing.

Good job!

What does that mean? Did it only change a part of the data, did it stop working altogether, did errors pop up, did anything else (not) happen?

View them where? If you mean the “Properties” tab of the inspector, then that can perhaps be expected. I think that DT fills these fields only when importing a PDF, not when changing it.

I’d simplify the setting of copyright

  const copyright = conf.split(" ",2).join(" ");

Your code first splits the conf (twice), only to re-assemble the parts one line later. You need to split only once and then join again.

To answer your first two questions: The script ran without any errors and I saw that the PDF properties in the inspector got updated, but if I included the PDF Author or Title property in the DT view as a column, many (roughly half I think) were just blank. I opened a ticket with DT and I was told that running my script against all 5,000 PDFs at once was not advised, although I didn’t quite understand why. It seems to work for some PDFs, but not all. After I performed a database rebuild in DT, the new fields were in the view. They said it would be better somehow divide up my PDFs into groups and run the script on each group. Indeed, I do notice when the script is running (via Script Editor) that the DT Activity Log (in the bottom left of DT window) will show updates being done to my documents with some delay after when Script Editor said it was working those documents.

Is this all evidence of some internal MacOS/DT/JXA queueing race condition? Do you have insight on this? Should I add some arbitrary sleeps in the JXA code, or look for some other signal from DT that it is still busy processing previous documents and to back off?

Thanks for the advice on my JS code for copyright. I had that there because I thought I could also update the PDF Copyright property, but apparently that field cannot be edited, even with this JXA method. But I can still use your syntax for filling in the Subject property.

PS: I case you are curious why I am bothering to update the PDF properties, it is because most of these articles came from a source that doesn’t publish their own PDF files and I have a separate Python downloader script that accesses the Web HTML version of the articles and uses the Playwright PDF printer function (link) to save the article as a PDF. I didn’t see an obvious way with Playwright to inject the Title/Author/Subject metadata. But now that I’m thinking about it, I probably could have used some other Python library to inject the metadata there, prior to importing to DT. But I would rather not re-create the 5,000 PDF files again…

I don’t think so. Some background: DT is not really involved in all the processing. The script changes the PDF and then writes it back to disk, all without DT doing anything about it. Only the “write to disk” part should cause it to update its indices. I suppose (!) that DT relies on FSEvents sent by macOS whenever a directory or a file changes. As you are changing many files in a short time, DT might have trouble keeping up with all the events. Or the OS is not sending events for every change.
You could add a app.synchronize(…) after your script has processed all files. But that’s perhaps not so relevant anymore, since the data is up-to-date now.

Thanks for the background explanation, which is priceless for some of us newcomers to DT and automation processing.

Can you elaborate on the app.synchronize idea? How does it work? Does it proactively trigger these FSEvents? Also, you suggest doing it at the end of my script, after processing thousands of PDFs. What about doing it periodically, say every 100 files, while the script is running?

I have another document source I may need to import in a similar fashion to DT. But even just for documenting my current solution and for others who may want to borrow my code above, it would be helpful I think to make my script more robust.

Thanks!

See description of the command in DEVONthink’s script suite, e.g. drop the DEVONthink 3.app onto the Script Editor.app in the Dock or Finder.

I just realized I’ve been overcounting the number of tags in my case. It is only ~600. The tag view was saying several thousand but I just realized that number is not the unique tags, but total number of tagged documents.

Unfortunately, it doesn’t say much:

synchronize method : Synchronizes records with the filesystem or databases with their sync locations. Only one of both operations is supported.
synchronize
[database: Database] : The database to synchronize via its sync locations.
[record: Record] : The (external) record to update. New items are added, updated ones indexed and obsolete ones removed. NOTE: This is rarely necessary as databases are usually automatically updated by filesystem events.
→ boolean

However, I successfully added it to my script by putting this line at the end of the records.forEach loop:

		app.synchronize(r);

I am not sure this did anything to improve the behavior. The description above implies this when it says it is rarely needed. Many documents were still “missed” in the DT view after running my script. But I did get better results by progressively introducing a longer and longer delay in the loop.

delay(0);
app.synchronize(r);

This is how I tested and characterized the results:

  1. Add the “Date Modified” field to the DT view and click that column header to sort the view by most recently modified first
  2. Select one “screenful” of PDF records, which is just 30 documents
  3. Run the JS script
  4. Watch the DT view to see the PDF documents get moved to the top of the view as the script runs
Delay Modified Missed
0 23 7
0.25 28 2
0.5 29 1
1 29 1
1.25 28 2
1.5 28 2
1.75 30 0

So I conclude that adding a big delay helps DT keep up with these PDF property changes. Admittedly the changes are being done to files imported into the DT database, which I gather is risky behavior, and likely not supported.

Can anyone comment, though, on why introducing a delay makes it work better? Is DT subscribing to FSevents from macOS even for imported files and might miss some updates if they happen too fast?

For reference, here is my full script:

/* DEVONthink JavaScript to update General Conference PDF file properties
   PDF Author = Custom metadata Speaker (ID 'mdspeaker')
   PDF Title = Custom metadata Title (ID 'mdtitle')
   PDF Subject = Custom metadata Conference (ID 'mdconference') formatted as "<Year> <Month> General Conference" E.g. "1971 April General Conference"
   PDF Copyright = <month> <year>
   Nathan Ellsworth - August 2024
   
   Extensive help taken from chrillek and cgrunenberg from DEVONthink forums:
   
   https://discourse.devontechnologies.com/t/javascript-get-custom-metadata-to-rename-file/68445/24
   https://discourse.devontechnologies.com/t/tag-name-into-custom-metadata/70050/3
   https://discourse.devontechnologies.com/t/custom-metadata-import/77900/7
   https://discourse.devontechnologies.com/t/custom-metadata-import/77900
*/
ObjC.import('PDFKit'); 

function performsmartrule(records) {
   const app = Application("DEVONthink 3");
   app.includeStandardAdditions=true;

   records.forEach (r => {

		const m = r.customMetaData();
		if (m) {
  			const conf = m['mdconference'];
  			const spk  = m['mdspeaker'];
			const title = m['mdtitle'];
			const conf_year = conf.split(" ")[1]
			const conf_month = conf.split(" ")[0]
			const copyright = conf_month + " " + conf_year 
			const subject = copyright + " General Conference"
			console.log(title)
	  		/* app.displayDialog(conf); */
	  		/* const conf = app.getCustomMetadata({for:"mdconference", from:r, defaultValue:""}); */
	  		/* convert record's path to NSURL object */
 	 		const docURL = $.NSURL.fileURLWithPath($(r.path()));
 	 		/* app.displayDialog(r.path()); */
	  		/* load the PDF document from this URL */
	  		const PDFDoc = $.PDFDocument.alloc.initWithURL(docURL);
 	 		/* get the current PDF attributes as a MUTABLE dictionary.
     			other dictionaries can't be modified! */
  			const PDFAttributes = $.NSMutableDictionary.dictionaryWithDictionary(PDFDoc.documentAttributes);
  			/* Set the PDF properties */
			if (subject) { PDFAttributes.setObjectForKey(subject, $("Subject")); }
  			if (spk) { PDFAttributes.setObjectForKey(spk, $("Author")); }
  			if (title) { PDFAttributes.setObjectForKey(title, $("Title")); }
			/* PDFAttributes.setObjectForKey(copyright, $("Copyright")); */
  			/* Update the PDF attributes */
  			PDFDoc.documentAttributes = $(PDFAttributes); 
  			/* Write the PDF doc back to the URL */
  			const result = PDFDoc.writeToURL(docURL);
			console.log(result);
		}
		delay(1.75);
		app.synchronize(r);
	    

	})
}

(() => {
if (currentAppID() === "DNtp") return;
const app = Application("DEVONthink 3");
performsmartrule(app.selectedRecords());
})()

function currentAppID() {
  const p = Application.currentApplication().properties();
  return Application(p.name).id();
}

I believe that’s the case, given that you can open files in other apps from DT. How would it adjust the modified date in these cases?
But @cgrunenberg should know all about that.

That’s right, DEVONthink listens both to filesystem events for imported and indexed items. But the imported/indexed items can be also updated manually via File > Updated (Indexed) Items. That’s what the synchronize record ... command does.

Why does it seem the synchronize also needs a long delay? Not a big deal for 30 documents, but for 500 documents 1.75 seconds works out to be nearly 15 minutes.

The above syntax is not correct, this should be:

app.synchronize({record: r});

Amazing! My script works perfectly now, no delay needed. The ScriptEditor didn’t complain about my original syntax, so thanks for pointing this out, @cgrunenberg. Looking back at the description of the synchronize function, I guess it must to be told explicitly whether to sync a database or a record, hence the need for {record: r}?

Correct.

1 Like