Downloading a PDF snapshot of a Google Doc

benoit.pointet · May 16, 2020, 8:22pm

I have bookmarks to Google Docs that I would like to snapshot to PDF for offline / mobile usage.

I could code an actionscript but I bang my head on authentication matters, of course.

I see many possible paths:

either I use the Google Drive API with an API key passed as a GET param;
or I can find a way to use the fact that I am logged in to Google Drive in the embedded browser to reuse those credentials;
or somehow magically Devonthink would ask me to login to Google …
Any hints?

Also what applescript verbs use the embedded browser vs the download manager?
How are the two connected, i.e. do they share cookies?

b.

cgrunenberg · May 18, 2020, 11:01am

The cookies are shared. The download manager supports only one script command:

But the internal browser is only scriptable via JavaScript:

benoit.pointet · May 18, 2020, 3:50pm

So add download did the trick, shared cookies helps a lot.

Any mean to define a download destination?

BLUEFROG · May 18, 2020, 3:56pm

Choosing a database is set in the Options in the Action menu of the Download Manager.
This is noted in the Help > Documentation > Windows > Download Manager.

benoit.pointet · May 18, 2020, 7:30pm

Thx @BLUEFROG.

However the Download Mgr does not seem to be able download to a specific location (i.e. in the same group as the original bookmark).

So I ended up using “download url” rather, which worked like a charm once I could figure out how to get its data into a new record:

set theData to download URL theDownloadURL
set theRecord to create record with {name:theName, type:PDF document, MIME type:"application/PDF"} in theGroup
set data of theRecord to theData

Will share the resulting script in this forum once I’m done with it.

suavito · May 20, 2020, 12:27am

I dare to burst in as I’m working on something similar, not for Google Docs but also URL to PDF.

What I’ve got so far is working quite good. I create a PDF from a URL in the destination I want and I convert hashtags to tags. My source is a plain text/Markdown file with a URL at the beginning and hashtags at the end.

What’s causing a problem is the stuff in between: I might have commented the web page and I’d like to transfer that comment to the PDF. Annotations don’t work with PDF says the dictionary. But it doesn’t say the same about comments. I still can’t get access to them, though. What am I doing wrong? Or are there alternatives?

BLUEFROG · May 20, 2020, 12:41am

I might have commented the web page and I’d like to transfer that comment to the PDF.

Commented it where?

suavito · May 20, 2020, 6:36am

In a Markdown file constructed like this:
Paragraph 1: url to create PDF from
Paragraph 2 to x (optional): comment
Last Paragraph (optional): hashtags

I might change the order of comment and hashtags, I might allow the hashtag paragraph to be freely movable, but the question remains where to put the comment in a newly created PDF.

set theNewDocument to create PDF document from theURL in theDestination name theName with readability and pagination
set tags of theNewDocument to theTags

works fine,

set comment of theNewDocument to theComment

does not.

benoit.pointet · May 20, 2020, 7:02am

Thx @suavito for sharing your workflow.

If I understand well you create some proxy md file that will help define how to snapshot the remote resource. right?

I am taking another course of action: using a bookmark as the proxy, adding tags and metadata to it, replicating it to the snapshot.

benoit.pointet · May 25, 2020, 8:32pm

Here’s the current working copy of this script which snapshots any bookmark of a google doc as a PDF doc. I use it for offline usage and search results.

You may use it in a Smart Rule or on selected items.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use script "RegexAndStuffLib" version "1.0.6"
use scripting additions

-- when calling on selection / single file
on run
	tell application id "DNtp"
		snapshotGDocs((selection of think window 1) as list) of me
	end tell
end run

-- when called by smart rule
on performSmartRule(theRecords)
	snapshotGDocs(theRecords as list) of me
end performSmartRule

-- generic wrapper to handle multiple snapshots
on snapshotGDocs(theRecords)
	tell application id "DNtp"
		show progress indicator "Snapshotting Google Docs ..." steps (length of theRecords) + 1
		repeat with theRecord in theRecords
			step progress indicator (name of theRecord) as text
			snapshotGDoc(theRecord) of me
		end repeat
		hide progress indicator
	end tell
end snapshotGDocs

-- record snapshotting
on snapshotGDoc(theBookmark)
	tell application id "DNtp"
		-- prepare some vars
		set bookmarkURL to URL of theBookmark
		set gDocID to regex search once bookmarkURL search pattern "[^/]{32,52}"
		set exportURL to "https://docs.google.com/document/u/0/export?format=pdf&id=" & gDocID
		set exportName to name of theBookmark & " (PDF Snapshot)"
		set exportGroup to first parent of theBookmark
		set referenceURL to get reference URL of theBookmark
		-- download new snapshot
		set exportData to download URL exportURL
		-- cleanup old snapshots
		set oldSnapshots to search "kind:pdf name:~snapshot url==" & referenceURL
		repeat with oldSnapshot in (oldSnapshots as list)
			move record oldSnapshot to trash group of current database
		end repeat
		-- save new snapshot
		set theExport to create record with {name:exportName, type:PDF document, MIME type:"application/PDF", URL:referenceURL} in exportGroup
		set data of theExport to exportData
		-- reproduce replicates and tags of bookmark
		repeat with parentGroup in parents of theBookmark
			if id of parentGroup = id of exportGroup then
				-- do not replicate to first parent
			else
				set theRep to replicate record theExport to parentGroup
			end if
		end repeat
	end tell
end snapshotGDoc

jooz · May 27, 2020, 6:03pm

benoit.pointet:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use script "RegexAndStuffLib" version "1.0.6"
use scripting additions

-- when calling on selection / single file
on run
	tell application id "DNtp"
		snapshotGDocs((selection of think window 1) as list) of me
	end tell
end run

-- when called by smart rule
on performSmartRule(theRecords)
	snapshotGDocs(theRecords as list) of me
end performSmartRule

-- generic wrapper to handle multiple snapshots
on snapshotGDocs(theRecords)
	tell application id "DNtp"
		show progress indicator "Snapshotting Google Docs ..." steps (length of theRecords) + 1
		repeat with theRecord in theRecords
			step progress indicator (name of theRecord) as text
			snapshotGDoc(theRecord) of me
		end repeat
		hide progress indicator
	end tell
end snapshotGDocs

-- record snapshotting
on snapshotGDoc(theBookmark)
	tell application id "DNtp"
		-- prepare some vars
		set bookmarkURL to URL of theBookmark
		set gDocID to regex search once bookmarkURL search pattern "[^/]{32,52}"
		set exportURL to "https://docs.google.com/document/u/0/export?format=pdf&id=" & gDocID
		set exportName to name of theBookmark & " (PDF Snapshot)"
		set exportGroup to first parent of theBookmark
		set referenceURL to get reference URL of theBookmark
		-- download new snapshot
		set exportData to download URL exportURL
		-- cleanup old snapshots
		set oldSnapshots to search "kind:pdf name:~snapshot url==" & referenceURL
		repeat with oldSnapshot in (oldSnapshots as list)
			move record oldSnapshot to trash group of current database
		end repeat
		-- save new snapshot
		set theExport to create record with {name:exportName, type:PDF document, MIME type:"application/PDF", URL:referenceURL} in exportGroup
		set data of theExport to exportData
		-- reproduce replicates and tags of bookmark
		repeat with parentGroup in parents of theBookmark
			if id of parentGroup = id of exportGroup then
				-- do not replicate to first parent
			else
				set theRep to replicate record theExport to parentGroup
			end if
		end repeat
	end tell
end snapshotGDoc

Is download manager a requirement for this script to work?
(my DT version does not have it unfortunately).

benoit.pointet · May 27, 2020, 6:28pm

I am using DTP3 so i don’t know about the simpler versions. Try it out and keep us informed.
My bet is that it could possibly work since the capture mechanism needs to also download URLs.

cgrunenberg · May 28, 2020, 6:08am

The download manager is only required by the add download command which is not used by this script.

jooz · May 29, 2020, 3:43pm

thanks @cgrunenberg
I am getting this error while trying to use the script shared by @benoit.pointet

What i do is 1) i select a doc in google drive 2) fire up a script from the script menu in DT.

benoit.pointet · May 29, 2020, 5:51pm

@jooz how do you « select a doc in gdrive »? In what browser?

I might not have mentioned it clearly: this script downloads a PDF snapshot of a google doc you have bookmarked In Devonthink.

To use the script, first select one or many such bookmarks in Devonthink list view that point to a google doc.

HTH.

jooz · May 29, 2020, 7:16pm

Thanks for additional details.

I saved a gdoc as a bookmark into DT> clicked it inside DT > needed to put in my google pwd again inside DT’s browser > executed the script:

Still same effect

benoit.pointet · May 30, 2020, 9:17am

Damn … I tried to reproduce your issue but couldn’t so far …

Do you run the script from the “main window” (not a “document window”) once the bookmark (aka web location) is selected in the list view?

Like in such a setup:

jooz · May 30, 2020, 7:09pm

Yes that is what I am exactly trying to do:

Click webloc of the file -> select the script -> Boom this error

But I think I now understand what the problem is: if I run the apple script directly:

This is where I put this script:

Could you please tell me where you have this lib located on the mac > i assume my location is wrong.

benoit.pointet · May 30, 2020, 9:04pm

put the “RegexAndStuffLib” script in /Users/Username/Library/Script Libraries/ directly, no sub-folder.

syntagm · October 14, 2020, 5:30am

benoit.pointet:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use script "RegexAndStuffLib" version "1.0.6"
use scripting additions

-- when calling on selection / single file
on run
	tell application id "DNtp"
		snapshotGDocs((selection of think window 1) as list) of me
	end tell
end run

-- when called by smart rule
on performSmartRule(theRecords)
	snapshotGDocs(theRecords as list) of me
end performSmartRule

-- generic wrapper to handle multiple snapshots
on snapshotGDocs(theRecords)
	tell application id "DNtp"
		show progress indicator "Snapshotting Google Docs ..." steps (length of theRecords) + 1
		repeat with theRecord in theRecords
			step progress indicator (name of theRecord) as text
			snapshotGDoc(theRecord) of me
		end repeat
		hide progress indicator
	end tell
end snapshotGDocs

-- record snapshotting
on snapshotGDoc(theBookmark)
	tell application id "DNtp"
		-- prepare some vars
		set bookmarkURL to URL of theBookmark
		set gDocID to regex search once bookmarkURL search pattern "[^/]{32,52}"
		set exportURL to "https://docs.google.com/document/u/0/export?format=pdf&id=" & gDocID
		set exportName to name of theBookmark & " (PDF Snapshot)"
		set exportGroup to first parent of theBookmark
		set referenceURL to get reference URL of theBookmark
		-- download new snapshot
		set exportData to download URL exportURL
		-- cleanup old snapshots
		set oldSnapshots to search "kind:pdf name:~snapshot url==" & referenceURL
		repeat with oldSnapshot in (oldSnapshots as list)
			move record oldSnapshot to trash group of current database
		end repeat
		-- save new snapshot
		set theExport to create record with {name:exportName, type:PDF document, MIME type:"application/PDF", URL:referenceURL} in exportGroup
		set data of theExport to exportData
		-- reproduce replicates and tags of bookmark
		repeat with parentGroup in parents of theBookmark
			if id of parentGroup = id of exportGroup then
				-- do not replicate to first parent
			else
				set theRep to replicate record theExport to parentGroup
			end if
		end repeat
	end tell
end snapshotGDoc

This workflow works really really well! Thanks for that!
How hard is it to expand it to google slides? The export URL doesn’t look like it’s working for slides