Converting Links in Mails to PDF. Help needed.

Who can help?
Summary:
I am looking for a script that converts all the “http(s)://”… whatever Links in my Mails to PDFs. These PDF shall be saved in DT.

The Problem:
The “Financial Times Deutschland (Germany)” will stop its services as of Dec 7, 2012. I have sent out and received in the past 10 years or so around 30’000 mails that contain links to that very homepage. Articles in particular that I mailed to friends or that I wanted to keep for whatever reason. Now that they decided to close down, I am panicing a bit as I do not want to loose all that I have collected in the past years resp. sent to others in the past years.
I am very well aware of the scripting features in DT. The script Download->convert to PDF is great (after being modified to include pagecount), but I can simply not extract each link in all of my mails. and second, these links are somewhere in the mails as references.
So, the idea is to have a simple script that goes through the mail, detects a “www” or an http(s)://, and converts the link to pdf and put it eg in the inbox or any other folder.
Convertig the link is not the problem (copy&past from the script mentioned above), but extracting the link within the mails is a challenge (at least for me).
Thanks for your help!
Chris

Check out the scripts in the “Downloads” folder of the DEVONthink Scripts menu. One or more of these might work for you. However, you would probably need to import your email as RTF first. I’d suggest experimenting with the scripts and then ask specific questions here if needed.

Other applications - SiteSucker, etc. – might be better suited for this specialized problem.

Importing 30,000 emails into DEVONthink, possibly converting them to RTF, downloading the linked article as a PDF – all of this will take a very long time – and may fail. Be prepared to spend all the available time between now and 7 December fixing troubleshooting. :open_mouth:

haha… thanks!
well, the mails are already in an “archive” DT database… so, fetching them is already done… but you are right about the crashing and fixing.

the main question remains: how do I extract links resp. “wording” WITHIN a mail (“DT kind”=eml) that starts with either “www” or “http” and pass that on to a script like the scripts->download->“convert urls to pdf dox”.

speed is a secondary problem. DTpro runs on a SSD 480GB with 100Mbit/s Internet speed. That’s usually pretty fast.

I’m going to backtrack on my suggestion. I’ve been experimenting with the scripts I suggested – I don’t believe they will work after all. Perhaps an automator action or script something directly in Mail? Sorry for the bum lead. :frowning: