Search through multiple source files after some keyword

Hi!

I try to write a script that can search through multiple .html files and give me an answer if it find any words I’m looking after in the source code or if I want to download the file it founds.
I really need help to get this to be correct.

This is my script I have worked with so far and I’m very new on this…:

property search_string : ""

tell application id "com.devon-technologies.thinkpro2"
	activate
	set theSelection to the selection
	if theSelection is not {} then
		set downloadstate to false
		set downloadStatus to false
		set downloadcheck to false
		try
			repeat
				set search_string to display name editor "Type any keywords you want to search after"
				if search_string is not "" then exit repeat
			end repeat
			
			show progress indicator "Searching links to download......" steps (count of theSelection) with cancel button
			
			repeat with theRecord in theSelection
				set this_URL to the URL of theRecord
				set this_source to the source of theRecord
				set these_links to get text of this_source --base URL this_URL
				step progress indicator theSelection
				
				show progress indicator "Searching links to download..." steps (count of theRecord) with cancel button
				repeat with this_link in these_links
					if the this_link contains "<a href=" or the this_link contains "<img src=" or the this_link contains "title=" or the this_link ends with search_string or the this_link contains "<img width=" or the this_link contains "src=" then
						set downloadStatus to false
						set downloadstate to false
						set downloadcheck to true
					end if
					
					if downloadcheck is true then
						if the this_link contains "<a href=" or the this_link contains "<img src=" or the this_link contains "title=" or the this_link ends with search_string or the this_link contains "<img width=" or the this_link contains "src=" then
							display dialog "found!"
							--add download this_link referrer this_URL without automatic
						else
							if downloadcheck is false then
								step progress indicator (name of theRecord) as string
							end if
						end if
					end if
					
					
				end repeat
				
				if cancelled progress then
					hide progress indicator
					return
				end if
			end repeat
			hide progress indicator
			
			
			if downloadstate is false then
				display dialog "Couldn't find any files containing the keywords"
			end if
			
		on error error_message number error_number
			hide progress indicator
			if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
		end try
	end if
end tell

The best thing would I give an option to choose if I want to download a file (if it should find any source that can be downloaded) or just say that it find the keyword and which page it founds from (because this will be running through multiple html files. Please can someone help me with this…

If you want to write a new better code, you are really welcome to do that…
Thanks in advanced and it’s really appreciated :slight_smile:

A simple solution could just use the “get links of” command and its parameters:


tell application id "com.devon-technologies.thinkpro2"
	try
		set theSelection to the selection
		if theSelection is {} then error "Please select some documents."
		repeat
			set search_string to display name editor "Type the string you want to search after"
			if search_string is not "" then exit repeat
		end repeat
		
		show progress indicator "Searching links to download......" steps (count of theSelection) with cancel button
		
		repeat with theRecord in theSelection
			set this_URL to the URL of theRecord
			set this_source to the source of theRecord
			
			-- All links with a matching URL
			set all_links to get links of this_source base URL this_URL
			repeat with this_link in all_links
				if this_link contains search_string then add download this_link referrer this_URL without automatic
			end repeat
			
			-- All links with a matching description
			set these_links to get links of this_source base URL this_URL containing search_string
			repeat with this_link in these_links
				add download this_link referrer this_URL without automatic
			end repeat
			
			if cancelled progress then exit repeat
		end repeat
		hide progress indicator
		
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

But parsing the complete HTML source code on your own and handling images too would require much more work.

Thank you very much for the script Christian :slight_smile:
But I wonder if you can help me to extend this script further so I can get a dialog who says that the string was found and in which file (if I had a multiple selection) it found it in and give me an option to download it too, So i can choose myself if I only want to know if the string was found and if I also want to download the link…

Thanks in advanced for the help :slight_smile:

This could look like this:


tell application id "com.devon-technologies.thinkpro2"
	activate
	try
		set theSelection to the selection
		if theSelection is {} then error "Please select some documents."
		repeat
			set search_string to display name editor "Type the string you want to search after"
			if search_string is not "" then exit repeat
		end repeat
		
		show progress indicator "Searching links to download......" steps (count of theSelection) with cancel button
		
		repeat with theRecord in theSelection
			set this_URL to the URL of theRecord
			set this_source to the source of theRecord
			
			-- Get all links with a matching description
			set these_links to get links of this_source base URL this_URL containing search_string
			
			-- Add all links with a matching URL
			set all_links to get links of this_source base URL this_URL
			repeat with this_link in all_links
				if this_link contains search_string then set these_links to these_links & this_link
			end repeat
			
			repeat with this_link in these_links
				display alert ((name of theRecord) as string) message this_link as warning buttons {"Cancel", "Skip", "Download"} default button 3
				set this_button to the button returned of the result
				if this_button is equal to "Download" then
					add download this_link referrer this_URL without automatic
				else if this_button is equal to "Cancel" then
					error number -128
				end if
			end repeat
			if cancelled progress then exit repeat
		end repeat
		hide progress indicator
		
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

Yes, this is exactly what I’m was looking after. Thank you very much for the script Christian :slight_smile:

Hi again Christian!

I wonder if you could expand this script so it’s possible to use regex syntax in the search string.

Wonder also if you could place one more button on the search results dialog there the function is to go to that page that is found from the search results.

Thanks

Neither AppleScript nor DEVONthink’s commands support regular expressions, therefore this would be a major revision.

Unfortunately only up to 3 buttons are supported per alert. However, this script selects the currently processed item before displaying the alert:


tell application id "com.devon-technologies.thinkpro2"
	activate
	try
		set theWindow to viewer window 1
		set theSelection to the selection of theWindow
		if theSelection is {} then error "Please select some documents."
		repeat
			set search_string to display name editor "Type the string you want to search after"
			if search_string is not "" then exit repeat
		end repeat
		
		show progress indicator "Searching links to download......" steps (count of theSelection) with cancel button
		
		repeat with theRecord in theSelection
			set this_URL to the URL of theRecord
			set this_source to the source of theRecord
			
			-- Get all links with a matching description
			set these_links to get links of this_source base URL this_URL containing search_string
			
			-- Add all links with a matching URL
			set all_links to get links of this_source base URL this_URL
			repeat with this_link in all_links
				if this_link contains search_string then set these_links to these_links & this_link
			end repeat
			
			repeat with this_link in these_links
				set the selection of theWindow to {theRecord}
				display alert ((name of theRecord) as string) message this_link as warning buttons {"Cancel", "Skip", "Download"} default button 3
				set this_button to the button returned of the result
				if this_button is equal to "Download" then
					add download this_link referrer this_URL without automatic
				else if this_button is equal to "Cancel" then
					error number -128
				end if
			end repeat
			if cancelled progress then exit repeat
		end repeat
		hide progress indicator
		
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

Thanks for the fast reply and script Christian.

One thing I can’t understand is if the search syntax is limited to a number of characters.
I mean, when I try to search for a sentence like:

for example, that I know exist in one of the html files, It doesn't find this one. it doesn't find schema.org or schema either. So something is not right when I try to use the script.

So that was why I ask for regexp, but wonder if this script could be fixed without regexp so I could search for a long sentence for an example, is it possible to do this Christian with your script?

When I google around I found this:
satimage.fr/software/downloa … age389.pkg that is a applescript plugin to support regex but I don’t know how to use this. Maybe this could be used on someway to get this to work :wink:

Thanks

The current script is just matching the links (and adding found links to the download manager on demand).

Matching the HTML source is possible but that’s probably a completely different script. What do you want to do in the end, e.g. view the HTML page?

Ah, I see. Yes it would be very great if I could view the HTML page when it’s found from the source code or have a choice like the script could do now to skip to the next results if I get more results than one :smiley:

thanks

This script should work as desired:


tell application id "com.devon-technologies.thinkpro2"
	activate
	try
		set theWindow to viewer window 1
		set theSelection to the selection of theWindow
		if theSelection is {} then error "Please select some documents."
		
		repeat
			set search_string to display name editor "Type the string you want to search after"
			if search_string is not "" then exit repeat
		end repeat
		
		show progress indicator "Scanning HTML..." steps (count of theSelection) with cancel button
		
		repeat with theRecord in theSelection
			set this_URL to URL of theRecord
			set this_source to the source of theRecord
			if this_source contains search_string then
				display alert ((name of theRecord) as string) message this_URL as warning buttons {"Cancel", "Skip", "View"} default button 3
				set this_button to the button returned of the result
				if this_button is equal to "View" then
					set the selection of theWindow to {theRecord}
				else if this_button is equal to "Cancel" then
					error number -128
				end if
			end if
			step progress indicator (name of theRecord) as string
			if cancelled progress then exit repeat
		end repeat
		hide progress indicator
		
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

Thank you very much for the script Christian. It work like a charm :smiley: :smiley: :smiley: