Get the url on multiple pictures from the sourcecode

Hi again!

I solved my earlier trouble to capture HTML page within DTP, so now I have my next mission to fix :slight_smile:

Let’s say that I have a picture with name agkjkgjsdkgds.jpg and download the picture embedded from an html link that is for test.html with an url link to thisisatest.com/intro/test.html

The thing here for me is to with a help from an applecript to search inside the source code from multiple html files and if it’s find the picture with the name agkjkgjsdkgds.jpg it should take the url location to that picture from the html file it was downloaded from.

So the goal with this is to have for example 1000 html files with any name and search through all 1000 embedded pictures and copy the url to that embedded pictures it finds in the source code :smiley:

I would really appreciate help with to build a code for this :smiley:

Thank you so much in advanced!

Just wondering but why don’t you set the URL right after the download? Or do you only download the HTML page, not the embedded image?

I download only the HTML pages and when I download the embedded pictures inside the HTML pages.

But the thing here is that the picture link goes to thisisatest.com/fm/agkjkgjsdkgds.jpg But I don’t want that url link. I want the real url adress it come from the beginning. I hope you understand Christian how I mean? :smiley:

Setting the URL of the captured HTML page to the original page’s URL should be sufficient.

It’s a complicated javascript from the html page but my solution would be to find in the source to the HTML page the pictures name. I know exactly what to do, just need some code to get it to work :smiley:

Maybe the “source” property and the “get embedded images” command might help, see AppleScript suite.

I see, good point. But I have already downloaded the pictures, I just need the reference url to the pictures.

I’m not an applescripter. I have try to learn me some about scripting with DTP and will do a try to see if I can solve this.

But I think I can get some help to get this to work perfectly :slight_smile:

Where can I find the applescript suite?
Is it a book or reference from DTP?

Just drag & drop DEVONthink 3 onto the Script Editor.app - after launching the editor will display DEVONthink’s complete AppleScript support.

Aha, nice one, I had no idea about that you could to this to get all the commands to an app from there. Very nice :smiley:

That’s the beauty of AppleScript :slight_smile:

1 Like

Hi again Christian!

I have try now to create an applescript for this feature in the whole day but I just failed on the code. So I really need your help now to correct this.

The code I have written is the following:

tell application id "DNtp"
set theSelection to the selection
if theSelection is not {} then
try
show progress indicator "Converting to url..." steps ( count of theSelection)
repeat with theRecord in theSelection
set theName to name of theRecord
--set theURL to URL of theRecord
set thehtmlsource to get source of theRecord as string
set theimagesource to URL of theRecord as string
set imageurl to get URL of theRecord as string
step progress indicator theSelection
if thehtmlsource contains theimagesource then
set theimagesource to (imageurl of theRecord where theimagesource of theRecord contains ".jpg") as string
else
display alert "Devonthink Pro" message "False:" & " The selection doesn't contain any html document with embedded images with the same name"
end if
end repeat
hide progress indicator
on error error_message number error_number
hide progress indicator
if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
end try
end if
end tell

I will be so appreciate if you can help me to solve this :smiley:

Any attempt at learning is a good attempt!

One thing to note: This just gets the URL of the file itself.

set theimagesource to URL of theRecord as string
set imageurl to get URL of theRecord as string

Yeah that’s totally correct. I saw that now :stuck_out_tongue_winking_eye:

I forget to say before that I can also get the url link like a spotlight comment or keyword if it’s trouble to change the link in the url location of an object :slight_smile:

Scraping web content isn’t a simple matter, so coding this will not be a 100% solution.
This is a simple example…

property myURL : "https://www.automobilesreview.com/pictures/lamborghini/veneno/"

tell application id "DNtp"
	repeat with thisRecord in (selection as list)
		set docSource to source of thisRecord -- This is the HTML markup of the web page.
		get links of docSource type "JPG" base URL myURL -- This is working with the markup from the page.
	end repeat
end tell
--> {"https://www.automobilesreview.com/gallery/lamborghini-veneno/lamborghini-veneno-01.jpg", "https://www.automobilesreview.com/gallery/rinspeed-tatooocom/rinspeed-tatooocom-10.jpg"…}
-- Output truncated for clarity

If you could export one of your HTML pages and post it here plus its URL it would be a lot easier to get the script going.

Sure, I’m a big fan on deviantart and the url to the html page is:

urlsite.zip (285.1 KB)

What I try to do is to download the embedded image and when use the referrer url to a spotlight comment or keyword to (https://www.deviantart.com/euderion/art/Space-Rose-Stock-798280548) for the image instead for:
https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com/f/14742e20-ac3e-4580-b025-163d5b8c6575/dd79x7o-e69b51f9-5df9-4f13-bcae-397053ff3725.jpg/v1/fill/w_1264,h_632,q_70,strp/space_rose__stock__by_euderion_dd79x7o-pre.jpg?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1cm46YXBwOjdlMGQxODg5ODIyNjQzNzNhNWYwZDQxNWVhMGQyNmUwIiwiaXNzIjoidXJuOmFwcDo3ZTBkMTg4OTgyMjY0MzczYTVmMGQ0MTVlYTBkMjZlMCIsIm9iaiI6W1t7ImhlaWdodCI6Ijw9ODAwIiwicGF0aCI6IlwvZlwvMTQ3NDJlMjAtYWMzZS00NTgwLWIwMjUtMTYzZDViOGM2NTc1XC9kZDc5eDdvLWU2OWI1MWY5LTVkZjktNGYxMy1iY2FlLTM5NzA1M2ZmMzcyNS5qcGciLCJ3aWR0aCI6Ijw9MTYwMCJ9XV0sImF1ZCI6WyJ1cm46c2VydmljZTppbWFnZS5vcGVyYXRpb25zIl19.lgp49EsNZN-Wkhhp17ibDr54aRNFvILGxKoqZZ9NaLs

Thanks for all the help :smiley:

Well, this script can download all embedded images of the page (requires a valid URL). However, there are hundreds of embedded images in the source and this simple approach probably won’t work. Efficiency doesn’t seem to be a priority of this website - the size of the page is 1.7 MB :slight_smile:

tell application id "DNtp"
	repeat with thisRecord in (selection as list)
		set theURL to URL of thisRecord
		set theName to name of thisRecord
		set theSource to source of thisRecord
		set theEmbeddedImages to get embedded images of theSource base URL theURL
		
		repeat with thisEmbeddedImage in theEmbeddedImages
			if thisEmbeddedImage begins with "https://images-wixmp" then
				set thisImage to create record with {name:theName, type:picture, URL:theURL} in (parent 1 of thisRecord)
				set theData to download URL thisEmbeddedImage referrer theURL
				set data of thisImage to theData
			end if
		end repeat
	end repeat
end tell

Oh this is a great thread. I’ve been meaning to make a script to scrape markdown files, download online images from them, store them in an ‘=images’ folder, and re-write them with the x-devonthink: url. This is a great start for this, thank you!

Thank you so very super much Christian!

This script was exactly what I was looking after and you have definitely saved my day with this :smiley: