Print view and NYTimes.com auto-redirects

Hi folks,

I’m looking for some advice. My workflow involves clipping articles to Instapaper and using my personal Instapaper RSS feed to import those articles in DT. Then I use a script to capture the paginated PDFs. The articles I save to Instapaper are usually in print view, to remove all the clutter.

There’s only one problem. Several months ago, NYTimes.com tweaked its settings so that anybody who tried to link to an article’s “print view” would be automatically redirected to the non-print view. (Basically if your link has the ?pagewanted=print string, it’ll be removed.) The non-print view is not very good for capturing PDFs. So the script takes print view links from my RSS and saves the non-print view.

Has anybody else run into this problem? I’m not really sure how to fix it, and I’d be curious if anybody had an idea.

Over here, the “print” button works as expected for NYTimes articles in DEVONthink.

… but, I’m a subscriber and logged into the NYTimes site, so the right cookies are set on my machine to let me around the paywall in all my browsers. Maybe this could be the limitation you’re up against?

Actually this has nothing to do with being logged in…I always get to view the articles, because I do subscribe, and I guess my login cookies are persistent.

The problem is, I never get the print view, just the regular article view. The difference can be a few 100kbs per PDF between print and non-print views, and that starts to add up. It’s mostly unnecessary content.

If you click this link you will not get the print view, even though there’s the print view string at the end, because the site automatically removes the string: nytimes.com/2011/12/28/worl … nted=print

I don’t know what “site automatically removes the string”. Clicked the link in DTPO. Clicked “Print”. nytimes.com asked me to log in. Logged in. As shown in the image, the view changed to print. The full URL is displayed. Over here, nytimes.com has always worked that way in DTPO.

Are you running any adblockers? Anti-spyware plugins?

Dear korn,

What I’m describing is the fact that you can’t link to an article’s print view. It sounds to me like you had to take an extra step to print. The link I gave you was to the print view version of the article. If you click on any links to an NYTimes article with a ?pagewanted=print string at the end of the address, you automatically get redirected to the non-print view.

The problem with that is, if you’re using Applescript to turn links into PDFs, then all your links for print views will be turned into non print views. Those versions have photos and ads and create larger files that aren’t as convenient to read.

I was hoping maybe somebody had a suggestion for tweaking an Applescript, or changing my workflow. I have hundreds of bookmarks and I want to download copies for my files. The re-direct makes that difficult.

Best and thanks.

Presumably, NYTimes.com is checking the referrer on the URL.

DEVONthink’s AppleScript dictionary does contain download {markup from|URL|webarchive from} commands that allow you specify referrers. This should work.

Unfortunately, I can’t test it because I’m not a subscriber.

Dear ndouglas,

Can you elaborate a bit on that? I’m really not well versed in AppleScript and I can only tweak things. If the referrer for, say, nytimes.com/2011/12/28/worl … sters.html, can I tell AppleScript to do that using download {markup from|URL|webarchive from}?

I’m using Christian Grunenberg’s script (I might have tweaked it–I can’t remember), which I’ve posted below if you need to take a look at it.


-- Convert URLs to PDF documents
-- Created by Christian Grunenberg on Mon Mar 23 2009.
-- Copyright (c) 2009. All rights reserved.

tell application id "com.devon-technologies.thinkpro2"
	set theSelection to the selection
	if theSelection is not {} then
		try
			show progress indicator "Converting..." steps (count of theSelection)
			repeat with theRecord in theSelection
				set theName to name of theRecord
				set theURL to URL of theRecord
				set theTag to tags of theRecord
				step progress indicator theName
				if theURL begins with "http:" or theURL begins with "https:" then
					create PDF document from theURL name theName with pagination
					
				end if
			end repeat
			hide progress indicator
		on error error_message number error_number
			hide progress indicator
			if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
		end try
	end if
end tell

Thanks for any guidance you can provide…

In my testing (which was limited and annoying since I don’t have a subscription :slight_smile:), Christian’s script worked if you changed


create PDF document from theURL name theName with pagination

to


create PDF document from theURL referrer theURL name theName with pagination

In other words, NYTimes doesn’t care who the referrer is so long as it’s on their site. Probably every website that does referrer checking works the same way, since the alternative (checking referrers against a calculated or stored list of all the pages on the server that link to the specified page) is a big pain in the neck and a big blow to efficiency with very little benefit.

So let me know if that works for you.

Here’s a working version of the script, based on the original from Christian, @adro’s suggestions, and @ndouglas’s suggestions.

Try this script anywhere you want, but it will probably only work at nytimes.com

-- Print bookmarks or other documents that have URLs from nytimes.com

-- Based on: Convert URLs to PDF documents
-- Created by Christian Grunenberg on Mon Mar 23 2009.
-- Copyright (c) 2009. All rights reserved.

-- Modified by Korm based on suggestions from adro and ndouglas
-- This is specific to NYTimes.com 
-- and probably won't work anywhere else

-- You must be logged into nytimes.com for this to work reliably

tell application id "com.devon-technologies.thinkpro2"
	try
		set theGroup to display group selector
		repeat with theRecord in (the selection as list)
			set theName to name of theRecord
			set theURL to URL of theRecord
			set theTag to tags of theRecord
			
			set thePrintURL to theURL & "?pagewanted=print"
			
			
			step progress indicator theName
			if theURL begins with "http:" or theURL begins with "https:" then
				create PDF document from thePrintURL referrer theURL name theName in theGroup with pagination
			end if
		end repeat
		
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
		
	end try
	
end tell

[size=80]Feb 1, 2012 fixed code as @adro suggested, below[/size]

This solution worked perfectly! Thanks!

Dear korm,

This works nicely as well. Although for me it required a small tweak (probably because NYTimes.com has changed things again): &pagewanted=print is now ?pagewanted=print.

Thanks for the help!