I know I can import websites with their own file structure, but that’s not what I want as the result has not a good usability (lots oft folders an files).
I want to download a pages and its subpages (2 levels) as single PDF files. I can do this manually by surfing inside DT3 and download each page via the wheel menu.
Is there already a way to automate this out of the box or do I have to write an AppleScript?
Yes. The CMS seems to be problematic as SiteSucker has the same problems. So I try it with AppleScript.
But how do I save the PDF object of a window (it shows a browser)? This does not work:
tell application id "DNtp"
tell viewer window 1
set theTitle to name
set contentAsPDF to PDF
set theFile to (path to downloads folder) & theTitle & ".pdf"
save contentAsPDF in theFile
end tell
end tell
When I use create PDF document from URL I’m not logged in.
As this script could be only used while browsing on your own, what is or should be the advantage compared to using Data > Capture > PDF (also available via the navigation bar)?
tell application id "DNtp"
tell viewer window 1
set theTitle to name
set contentAsPDF to PDF
set theURL to URL
set theFile to open for access ((path to downloads folder as string) & theTitle & ".pdf") with write permission
try
write contentAsPDF to theFile
on error
close access theFile
end try
close access theFile
end tell
end tell
Is there a way to directly write into the database (current group) or do I have to import the file and delete it from the source folder?
tell application id "DNtp"
set theGroup to current group
tell think window 1
set theTitle to name
set contentAsPDF to PDF
set theURL to URL
set theRecord to create record with {name:theTitle, type:PDF document, URL:theURL} in theGroup
set data of theRecord to contentAsPDF
end tell
end tell
Well, I’ve found the reason why the web downloader fails. The HTML source does not contain links as the content seem to be generated by JavaScript after loading the URL. This seems to be a general problem in DT as in AppleScript the source property also does not contain the current DOM (so I can’t parse it for links). That’s interesting as the PDF does contain the full content which is generated by JavaScript.
If I understand you correctly, the PDF exported to DT from the browser is ok while the website downloaded with AppleScript in DT and then converted to PDF is missing the dynamically generated elements?
In that case, wget or curl wouldn’t help either. You need a browser to execute the JS code. AppleScript is so old, it has no idea of JavaScript or the DOM. You might have more luck with ObjCAppleScript, perhaps using a WebView.
Or writing a bookmarklet in JavaScript? Although I don’t quite see how all that would work to
Yes, you understand it correctly. I now temporary save the webarchive (which is also complete), get its source and delete the record. Then I can get all links.
As I load all pages into the view before saving as PDF I also can delete not needed elements with JavaScript. That’s great and the results are better then expected.