Extract Date from Regular Expression to Rename PDF file

I assume that I will be able to use DEVONthink 2 for 2-3 more years, until when I will need to buy a new Mac anyway that can keep up with Big Sur and following (My current Mac is not capable of running MacOS 11). I do not see any critical improvements in DEVONthink 3 that counterbalance the absence of the old sorter, and I get more and more the feeling that DEVONthink 3 has outgrown my needs. I will watch this space until then and test alternatives, but will hold off the upgrade to DEVONthink 3 until later.

For anyone sharing the pain of having to rename masses of different documents of unpredictable internal structure, I’ve written a script, which does a lot of heavy lifting. It’s Unicode compatible and recognises German, US and ISO dates, also in long form. It’s NOT automatic, but it does have a few tricks up the sleeve.

I recorded a demo, but I couldn’t upload an animated gif here, so you’ll find it the screen recording here.

Happy to share the source code. Deployment is actually a bit tricky, I could use some help here myself on the best practices, it’s an AppleScript and an Perl script using two external Perl packages. I’m not a coder, so any Perl buff will probably collapse in laughter, but that thing saves me a ton of time.

1 Like

I wouldn’t mind to have a look at it, having been kind of a Perl guy in my past. However, given that Apple will no longer deliver Perl (and other scripting languages), it might be a better idea to rewrite the whole thing in JavaScript. It offers similar regexp features as Perl, so it should be feasible.

From your screen recording, I couldn’t really see what the script is doing, though.

I would make a lot of sense to rewrite it in JS — I’m actually more of a JS guy, but Perl is so much more efficient for such kind of task. I’m using dozens of regex operations in the script, and the library Date::Manip to handle the plethora of possible date formats. Also, Unicode handling proved to be pretty challenging as well.

Anyway, the script does the following:

  • Use the file name and current text selection within the document
  • Extract all given dates, remove duplicates, transform to ISO and use the first date found to lead the file name. If a date is selected, use this one. If no date is present at all, use the creation date.
  • Remove unsuitable characters from the file name, transform special characters to UTF-8.
  • Clean up by removing unnecessary or double characters.
  • Append document text selection at the end of the name using a hyphen.
  • Apply TitleCase.

All of these operations are based on my personal needs and taste (being a lawyer and entrepreneur), but it’s very easy to adapt. I tried to automate document recognition for years using regex and essentially failed due to the ever-changing layouts and the plethora of different documents and used date formats. Thus, with the arrival of DNtp 3 and its blazing fast search & document rendering, I decided to change my strategy and do it semi-automatic. I’ve never been as fast — and no more rewriting of dozens of regexes. :wink:

Find me on Twitter @eburgwedel or LinkedIn to exchange contact information.