Importing Google Reader Starred Items

I’ve just used the Google Takeout service ~ google.com/takeout/ ~ to completely remove all of my data from Google Reader, including all of the following:

You’ll notice that the starred.json file, which contains every single post I’ve ever starred in Google Reader, is massive… 104MB.

I’d love to find a way to import this data into Devonthink for archival purposes that would allow me to individually browse through all of my starred items.

I had previously added my starred item feed to Devonthink, but this method only brings in the last 1000 or so starred items (and I can guarantee that this 104MB file I’m trying to import has tens of thousands!)… :


Any suggestions would be greatly appreciated… Thanks!

No easy method in DEVONthink to handle json files. If you want to individually browse the contents, then that suggests individual snippet files or bookmarks. There are many clever tools and techniques available on the net to read starred.json, convert the contents to bookmarks, or import the contents to Evernote, etc. Search the internet for “starred.json”. If you used one of the suggested methods to create bookmarks from each of your stars, then it’s easy to index or import those bookmarks in DEVONthink for your browsing pleasure.

Thanks for your feedback… this did the trick: https://github.com/kerchen/export_gr2evernote
(using the export2HTMLFiles.py script)

Thanks for this info, guys! Looks like it’ll only be possible to import subscriptions after Newsify (my primary iOS feed reader app) supports migration from Google Reader to Feedly Normandy.

I wish Takeout had the option to export unread items, but I might be able to work around that by marking them as starred. Still looking for the easiest way to do it; can’t find “batch starring” in the web UI. Also looked at making public feeds of tags/folders but feed readers I’ve tested (including DtPO’s) only grab the last 20 items [edit: increased by appending ?n=# to the shared URL, which seems to max out at # = 1000]. Any other suggestions to obtain all my unread items are welcomed.

If you look at the Google Reader Starred Items feed in the screenshot I included above, all of the HTML files that it pulls into Devonthink from that feed do not include the images/graphics from the original posts. Likewise, the python script that I hunted down did download more than 17,000+ starred items, but none of the original images from those articles/posts got downloaded either.

Is there a way to somehow have DEVONthink go out and download the referenced images in the Google Reader Starred Item feed, as well as for the downloaded starred items from the python script? I realize that this might result in gigabytes of data being downloaded, but nonetheless, I would like to at least try this on a sub-selection of items.

Any suggestions would be greatly appreciated… Thanks!

Maybe this:

Re: Offline RSS

Should work if your HTML documents have URL metadata; all mine for RSS feeds items do. A small test with the Convert URLs to web documents script successfully created Web Archive documents. Caveat: the Added/Created/Modified dates of new documents will be current rather than inherited from the originals. Probably not too hard to modify the script to adjust Created and/or Modified dates but my AppleScript-fu is weak.

HTML files do not contain images, they contain links to images. If the image is not at the other end of that link (…anymore or was never there…) then the “missing image” icon is sometimes displayed. Which is the case in the example in the screen shot – try pasting the link into Safari or Chrome and see if the image appears.

Here is a script that will read a “starred.json” file and create bookmarks in DEVONthink for every item in that file. It will prompt for the group in which you wish to place the bookmarks. The script requires “JSON Helper” to be installed and running (available from the App Store) – a highly recommended app for anyone interested in modern scripting.

If you want to use the “categories” tag in the JSON file to create tags for each bookmark, that addition is moderately difficult (some “categories” are control data and not categories (i.e., tags)). If you want to create a web archive instead of a bookmark, change “bookmark” in the “create record” statement. For performance, the script first creates a list of all the bookmark candidates, and then uses that list to cause DEVONthink to create the bookmarks.

Please do not use the script if you do not know what any of that means – the script has no error checking.

(*
this script will create bookmarks from a "starred.json" file produced by Google Takeout

the script requires "JSON Helper" available from the App Store

do not use this script if you do not know what any of that means

*)

set theFile to (choose file with prompt "Select a 'starred.json' file to read:")
open for access theFile
set fileContents to (read theFile)
close access theFile

tell application "JSON Helper"
	set theJSON to read JSON from fileContents
	set theItems to items of theJSON
	set theSources to item 5 of theItems
	set theBookmarks to {}
	
	repeat with thisSource in theSources
		set end of theBookmarks to {pTitle:the title of thisSource, pURL:href of item 1 of alternate of thisSource}
	end repeat
	
end tell

tell application id "DNtp"
	set theGroup to display group selector
	repeat with thisBookmark in theBookmarks
		create record with {name:pTitle of thisBookmark, URL:pURL of thisBookmark, type:bookmark} in theGroup
	end repeat
end tell

Neat. And those could be converted to HTML, Web Archive, and/or PDF documents. I may give this a try later.

Still looking for a way to fetch all my unread articles (just their URLs would be enough) from GR. Adding feeds of public folders/tags to DtPO grabbed both read and unread articles, which is better than nothing. Plus there’s the 1000 article limit, which I hit in a few cases where I’d really like to retain more.

Found a JavaScript to unstar multiple GR articles but couldn’t get it (or its inverse) to work. That be worth a closer look since starring unread items would make them available to Takeout, leaving unwanted read ones behind.

I just tried the script you provided to read a “starred.json” file and create bookmarks in DEVONthink for every item in that file, but it didn’t seem to work.

I initiate the script and it does ask for the “starred.json” file, and once I point to the file the rainbow pinwheel begins spinning for about 2 minutes indicating that the script is busy doing something, but after about 2 minutes it abruptly stops spinning and nothing has happened. No files, bookmarks, etc. ends up in DEVONthink whatsoever.

I literally have approximately 17000-18000 starred items in my 104MB “starred.json” file… is it possible that the script times out? Is there some way to append something to the script to ensure that it doesn’t time out but rather just keeps on working?

Any suggestions would be greatly appreciated… Thanks!