I have another feature request: I capture web pages from Safari using the Clip to DEVONthink extension. The source I most commonly use only captures correctly if I capture as HTML; capturing as pdf fails (i.e. doesn’t capture what I see on screen), probably because of log-in requirements.
In DT I then recapture the site by clicking on the cogwheel and selecting capture PDF (One Page); that way the site is correctly captured as pdf.
It would be nice to be able to do that using a smart rule (i.e. inbox, kind HTML, capture as pdf one page, delete) - but whilst conversion to pdf is possible via smart rule (but fails on these files in the same way as initially capturing as pdf does), (re-) capturing is not.
Do you use the clutter-free option? Otherwise you could capture these pages as bookmarks and then convert them to PDF (via smart rules or Data > Convert).
I don’t use the clutter-free option; but using “convert” produces an unusable pdf; I have just tried the bookmark method, and that too leads to a pdf which simply displays the log-in page of the site I have captured from. The only reliable method I have found is to capture as html and then in DT itself recapture as pdf.
Unfortunately a smart rule wouldn’t solve this issue, like AppleScript or Data > Convert it would use a background task to download & render the page (to avoid crashes due to WebKit bugs). The next release will include some improvements related to protected pages but it’s hard to tell whether this will improve things in your case too.
Am I correct that if save a Bookmark, a later search using DT3 or DevonSphere can only search on the title of the Bookmark, whereas if I Convert to PDF then a search can search on the entire text that is converted?
A scheduled smart rule using the conditions Kind is Bookmark and Item does not contain comment could actually index bookmarks using a script like this one:
on performSmartRule(theRecords)
tell application id "DNtp"
if (count of theRecords) > 0 then
show progress indicator "Indexing Bookmarks" steps (count of theRecords)
repeat with theRecord in theRecords
try
set theURL to URL of theRecord
step progress indicator theURL
if type of theRecord is bookmark then
set theHTML to download markup from theURL
set theText to get text of theHTML
set comment of theRecord to theText
end if
end try
end repeat
hide progress indicator
end if
end tell
end performSmartRule
I’ve just modified Cristian’s script to reflect the situation if you had already made some personal comments to the bookmark before indexing (or plan to do it later). Script will download and add the text if it wasn’t added earlier, and separate your comment from this text.
on performSmartRule(theRecords)
tell application id "DNtp"
if (count of theRecords) > 0 then
show progress indicator "Indexing Bookmarks" steps (count of theRecords)
repeat with theRecord in theRecords
try
set theURL to URL of theRecord
step progress indicator theURL
if type of theRecord is bookmark then
set theHead to "================================" & linefeed & "Here goes the bookmark text for indexing purposes" & linefeed & "================================"
set theComment to comment of theRecord
if theComment does not contain theHead then
set theHTML to download markup from theURL
set theText to get text of theHTML
set theText to theComment & linefeed & linefeed & theHead & linefeed & linefeed & theText
set comment of theRecord to theText
end if
end if
end try
end repeat
hide progress indicator
end if
end tell
end performSmartRule
Use a scheduled smart rule with the conditions Kind is Bookmark only.