Downloading pdf directly into the database

Hi all,

I just started to work with Devon think pro. I use it mainly to archive my references. So I’d like to download pdf files directly from the journal’s webpage to the database, but so far I failed to do so. I suspect the reason is that in many journals the links to the pdfs consist of a javascript link that open the pdf in a pop-up window (e.g. javascript:newWindow(’/doi/pdf/10.1111/j.1461-0248.2004.00639.x’))

How can I open such links in DTP and finally safe pdfs that I want to keep directly in the database?

Cheers, Michi

DT Pro does not yet support popup windows and therefore a workaround is necessary like…

  • open the page in DEVONagent (e.g. via the contextual menu), click on the PDF link and then use Data >Add to DEVONthink > PDF

  • open the page in your default browser (also possible via the contextual menu), click on the PDF link, save the PDF and import the saved file to DEVONthink

  • a script

Thanks for the quick answer. I found indeed a downloadable script to the problem: devon-technologies.com/files … Nthink.zip

It seems that I have to modify the script slightly with a script editor in order to custumize it to my database. I’m a biologist and have no clue about script editing. So, my database is called “references” and the folder the pdfs are supposed to end in is called PDF. How do I need to modify the script?

-- This script saves the pdf currently selected in a
-- browser window into the current folder 
-- By Erico in the DEVONtechnologies User Forum, 2006

tell application "DEVONthink Pro"
	
	set theURL to URL of window 1
	set cuPos to selection of window 1
	
	if cuPos is not {} then
		if kind of item 1 of cuPos is "Group" then
			set import_location to first item of cuPos
		else if parent of item 1 of cuPos is not {} then
			set import_location to first item of parent of item 1 of cuPos
		else
			set import_location to root of current database
		end if
	else --current position is root 
		set import_location to root of current database
	end if
	
	-- Uncomment the line below line to put new pdfs in a special folder on the root instead of the current folder 
	-- Set import_location to (get record at "/file elsewhere" in current database) 
	
	set z to download URL theURL
	
	set html_record to create record with {name:"rename-me", type:picture, URL:theURL} in import_location
	set data of html_record to z
	
end tell

Thanks for any hints,

Michael

Probably it would be best to just use devonagent to access the page, and then use its add to pdf menu choice. But since I too often start browsing sessions in devonthink pro, here’s a modified version of my original script, with your change set. It assume you have the database open you want the pdf to be stored in…

erico



-- This script saves the pdf currently selected in a 
-- browser window into the folder named "PDF" 
-- By Erico in the DEVONtechnologies User Forum, 2006 

tell application "DEVONthink Pro"
	using terms from application "DEVONthink Pro"
		set this_url to URL of window 1
		set this_name to ""
		set this_name to name of window 1
		if this_name is "" then set this_name to "pdf-rename-me.pdf"
		if (this_url ends with ".pdf") or (this_url ends with ".PDF") or this_url ends with ".gif" or this_url ends with ".jpg" or this_url ends with ".png" or this_url contains "/pdf" then
			
			
			if not (exists record at "/PDF") then
				set import_location to create location "/PDF"
			end if
			
			set import_location to (get record at "/PDF" in current database)
			with timeout of 600 seconds
				set z to download URL this_url
			end timeout
			set html_record to create record with {name:this_name, type:picture, URL:this_url} in import_location
			set data of html_record to z
		else
			
			display dialog "error:  this script expects that the current devon think window has a pdf or jpg loaded in the frontmost window. Please try again."
		end if
	end using terms from
end tell


eric oberle
department of history
stanford university

christian,

Maybe this is a good place to mention that it would be great if it were someday scriptable to to pull up those popupwindows windows. Is it thinkable that someday javascript enabled pages will be scriptable in devonthink? I suppose its something for 2.0 if so…but I just thought I’d mention that I’d really be in favor of it.

What I keep wanting to do is be able to issue a command in a script that would do something like:


----beware: this is imaginary code!
......
----code to get url would go here
----
		set the_record to create record with {name:"temp", source:the_source,URL:the_url,} in import_location
		
set new_window to create new window of record the_record with {visible:false} 
repeat while loading of new_window is true
     wait 1
end repeat
 do javascript(javascript_command_extracted_from_window)  in window new_window 
repeat while loading of new_window is true
wait 1
end repeat
set x to source of new_window
close window the_window
delete record the_record

(obviously this code will not work right now…because the visible property is not writable on creation, and there is now “in window” property for javascript…)

I guess I haven’t tried it, but can devonagent apply arbitrary javascript to an open window, or is that not working yet there either…? Hmmm…maybe it would work there or in safari, and I could temporarily work around this problem in DTP for the moment…oh I bet it is buggy in Safari. I suspect that if anyone is going to make this work it will be you Christian.

Probably it isn’t easy to embed the whole java engine into a webkit program, and that’s why no one has made this scriptably doable yet, but it is sort of the missing link for a lot of web pages these days. I just thought I’d mention that this would be greatly appreciated! :smiley:

The other random thing apropos this thread is that it would be really nice if in the scripting language there would be some way to know if the object in the currently viewed window is a pdf other than looking at the “extension” at the end of the url is “.pdf”

Is there any chance that it is possible that “think windows” have a property called “type” that would answer “PDF” or “html” or the like? or is this simply not known? Gee, no, I’m not greedy: invisible windows, window properties, and javascript support…all in due time I guess

one way or another, it all does keep getting better…

cheers,

erico

just in case anyone is reading this thread of me talking to myself, I am happy to report that devonagent will indeed execute arbitrary javascript. Here is an example that works with the CNN.com website. If I have a link open in devon-think pro to a cnn news story, this script will send that link to devonagent, tell devonagent to do the javascript that the cnn.com website uses to do a “print story” (i.e. without ads and junk), and then the script will grab the source and create a new record in devonthink pro. Neat! I can’t wait until this works all in devonthink pro by itself, but it’s pretty simple.

Obviously, this script can be customized for other websites by changing the first line to point to your desired target. The disadvantage of using two programs instead of one is that if you are using a database behind a login proxy or some such thing (jstor.org, I’m thinking of you), you will need to log in with both devonthink and devonagent. Too bad they can’t share cookies!

At any rate, I did check it in safari as well…it didn’t work for me. So score another point for devonthink/agent!

best,

erico



set desired_target_javascript to "return(PT());"

tell application "DEVONthink Pro"
	if not (exists record at "/PDF") then
		set import_location to create location "/PDF"
	end if
	set import_location to (get record at "/PDF" in current database)
	
	
	set the_url to get URL of window 1
end tell
tell application "DEVONagent"
	open URL the_url
	set the_window to window 1
	repeat while loading of window 1
		delay 1
	end repeat
	 do JavaScript (desired_target_javascript ) in the_window
	repeat while loading of window 1
		delay 1
	end repeat
	set the_source to source of window 1
	set the_name to name of window 1
end tell
tell application "DEVONthink Pro"
	set html_record to create record with {name:the_name, type:html, URL:the_url, source:the_source} in import_location
end tell


Dear Enrico,

thank you for your help! I applied the script you posted and indeed a new file appears in the database in the right folder. However, this file is empty and not the pdf I anticipated to have…

I’m sorry to be such a helpless bum…

Cheers, M

Michael,

I’m not sure what is going wrong. Do you have the pdf actually loaded in devonthink’s frontmost window, or just the page linking to it?

The script assumes that you have the actual pdf loaded, and thus that you are running Mac os x 10.4, or that you have the schubertit pdf browser plug in installed, which will enable you to view a pdf in devonthink.

I just modified the script above so that it verifies that you have a pdf open, so you might want to repaste it from above…

but if you are trying to do the pop-up windows, you will need to use DevonAgent, because, as Christian mentioned above, DT PRO doesn’t work with the popup windows yet. Better to use devon agent’s Data–>Add to Devonthink–>PDF menu choice.

try that and tell me what works…

erico

Just a comment about how the PDF is stored in the database (for the script that I’ve used): it is copied into the ‘body’ of the database instead of into the database Files folder. That’s why the Path field in the corresponding document Info panel is blank.

This is probably OT in this particular thread, but the subject comes up in many different threads.

Bill, could you explain the difference between copying something into the ‘body’ of the db and copying into the db Files folder?

Thanks so much!

Martin, when DEVONthink started out back in 2002 all files were stored in a single ‘monolithic’ database (the 10 numbered files inside the database folder or package), not as individual files.

Later on, the option was added to store certain files (PDF, images, QuickTime media), which are often quite large, in the Files folder inside the database folder (DT) or package file (DT Pro).

Storing such big files in the ‘body’ of the database increases memory requirements as they all must be loaded into memory. Storing them in the Files folder means that only the extracted text and metadata is held in the ‘body’ but the file itself doesn’t have to be loaded into memory.

When DT Pro 2.0 is released, all files will be stored in the database Files folder. This will result in reduced memory requirements to load the database.

Hope that helps.

That helps loads, Bill. Thanks a lot!

Just a minor tip - you can replace this code…


   if not (exists record at "/PDF") then 
      set import_location to create location "/PDF" 
   end if 
   set import_location to (get record at "/PDF" in current database) 

…with this code to simplify the scripts:


   set import_location to create location "/PDF"

One of the nice things about DEVONagent is the scanner; very useful for showing all of the PDF’s or other media on a web page.

However, I haven’t found a way to use the scanner and then directly download into my DEVONthink database. I can obviously download onto my desktop or what have you, but I would have thought there’d be a faster way of using the scanner to download straight into DT.

Or am I missing something here?

There are currently two solutions:

  1. Open the “Objects” drawer, select the interesting objects (or all of them) and choose “Download” in the contextual/action menu. Then import the downloaded files to DEVONthink Pro and apply the script to copy the comment to the URL (as DEVONagent adds the URL of downloaded items to the Finder comment)

  2. First, stop the download queue of DEVONagent’s download manager. Then open the “Objects” drawer, select the interesting objects (or all of them) and choose “Download” in the contextual/action menu. As soon as you’ve added all interesting stuff of all pages/results to the download manager, select all items in the download manager and either drag or copy them to DEVONthink Pro’s download manager.

However, actually this could and shoud be simplified in a future release.

Another possibility - open the “Objects” drawer and run this script:


-- Add Scanner Objects to DEVONthink Pro's downloads.
-- Created by Christian Grunenberg on Wed Oct 04 2006.
-- Copyright (c) 2006. All rights reserved.

tell application "DEVONagent"
	activate
	try
		if not (exists browser 1) then error "No browser windows are open."
		
		set theBrowser to browser 1
		set theObjects to scanner objects of theBrowser
		if theObjects is not missing value then
			set theReferrer to (URL of theBrowser) as string
			repeat with theObject in theObjects
				tell application "DEVONthink Pro" to add download |URL| of theObject referrer theReferrer
			end repeat
		end if
	on error error_message number error_number
		if the error_number is not -128 then
			try
				display alert "DEVONagent" message error_message as warning
			on error number error_number
				if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
			end try
		end if
	end try
end tell

Note: Due to a bug of DEVONagent’s scripting, this might not work as expected but v2.0.3 (coming this week) will fix this.

Christian, this is wonderful. However, I ran into one problem: it downloads EVERYTHING on the page, not just what is in the objects drawer, or those objects which are selected. So, for example, if I have only “documents” selected, I’d expect the documents to go to the DT download window; instead, everything gets sent.

Anyway to modify this? Thanks.

That’s the bug which will be fixed by v2.0.3.

I’m new to using DT and owning DT3Pro, but I’m confused by trying to download a PDF from Chrome directly to DT. For example, if I’m at my credit card site and I see the statement PDF in the browser, how can I get this PDF into DT because when I use the clipper, it saves PDF of the page URL, which is unauthenticated in DT, unlike the actual PDF file being shown in Chrome. Seeing as this thread is 13 years old, I’m trying to see how I can download a PDF directly to DT from Chrome in 2019. If anyone can help, I’d really appreciate it.

You could either save to PDF file to the inbox folder or print it to DEVONthink 3. Please note that you might have to install the inbox folder and the PDF services via DEVONthink 3 > Install Add-Ons…

1 Like