Hi,
I am trying to use an AppleScript I’ve cobbled together to deal with a mass of data. I have a bunch of blog posts that I’m trying to make sense of. They were all downloaded at once, so the creation date is incorrect, and the tag and author data is embedded in the HTML, where I can’t easily act on it in DT. Basically, the script:
- looks at each item in the selection (I am selecting only HTML files that I’ve found with a smart group)
- renames the file with the Web Page Title (code for this section lifted from the DT Add-on script)
- then combs through the HTML and:
a) extracts the date embedded in the HTML, formats it as an AppleScript date, then changes the date of the DT Item
b) extracts the tags embedded in the HTML, formats it as an AppleScript list, then adds the tags to the DT item
c) extracts the author name embedded in the HTML, then adds it as a comment to the DT item.
d) changes the label so I know the file has been processed.
There are two interrelated problems:
Problem #1: The script works fine on one or a few items, but when I try to process all 50,000+ files it bogs the entire computer down. DT runs at 99.5% of CPU, and each file takes 45 seconds or more to process. Obviously that means it will take years to process all the files.
Problem #2: I am very inexperienced with AppleScript, so I’m sure a big part of the problem lies therein.
I’m guessing that the computer is bogging down simply on the list of 50,000+ DT items. AppleScript wizards: is there a way to make the script so that it gets a smaller chunk (say 20) of the DT items, processes them, then goes back to get another small chunk, until it’s done with all the items in the list?
Problem #2.5 What if there’s a HTML file in there somewhere that can’t be processed according to the rules? I’m not sure I’m handling the error correctly. I want it to just skip it and move to the next one.
Thanks so much.
Here’s the code:
property datetagbeginning : "<li class=\"time\"><a href=\"#\">"
property datetagend : "</a></li>
"
property tagsectionbeginning : "<h4>Tags</h4>"
property tagsectionend : "</div>"
property eachtagbegins : "\">"
property eachtagends : "</a>"
property authorsectionbeginning : "<li class=\"author\">"
property authorsectionend : "</a></li>"
property rightbeforeauthorname : "\">
"
tell application id "com.devon-technologies.thinkpro2"
try
set this_selection to the selection
set this_count to count of this_selection
if this_count > 0 then
show progress indicator "Renaming" steps this_count
repeat with this_item in this_selection
try
set this_type to the type of this_item
set this_source to missing value
step progress indicator (name of this_item) as string
if this_type is equal to html or this_type is equal to webarchive then
set this_source to source of this_item
set this_title to get title of this_source
set the name of this_item to this_title
set originalDelimiters to AppleScript's text item delimiters
copy source of this_item to source_str
set theContents to source_str
set AppleScript's text item delimiters to {datetagbeginning}
--Split the file into a list of strings that start with serialBeginning
--Ignore the first item, which is just the text before the first occurence
set theItem to text item 2 of theContents
set AppleScript's text item delimiters to {datetagend}
set postDate to text item 1 of theItem
set AppleScript's text item delimiters to originalDelimiters
set theMonth to word 1 of postDate
set theDate to word 2 of postDate
set theYear to word 3 of postDate
set theHour to word 4 of postDate
set theMinute to texts 1 thru 2 of word 5 of postDate
set AMPM to texts 3 thru 4 of word 5 of postDate
set postDateTime to current date
if theMonth is equal to "January" then
set the month of postDateTime to January
end if
if theMonth is equal to "February" then
set the month of postDateTime to February
end if
if theMonth is equal to "March" then
set the month of postDateTime to March
end if
if theMonth is equal to "April" then
set the month of postDateTime to April
end if
if theMonth is equal to "May" then
set the month of postDateTime to May
end if
if theMonth is equal to "June" then
set the month of postDateTime to June
end if
if theMonth is equal to "July" then
set the month of postDateTime to July
end if
if theMonth is equal to "August" then
set the month of postDateTime to August
end if
if theMonth is equal to "September" then
set the month of postDateTime to September
end if
if theMonth is equal to "October" then
set the month of postDateTime to October
end if
if theMonth is equal to "November" then
set the month of postDateTime to November
end if
if theMonth is equal to "December" then
set the month of postDateTime to December
end if
set the day of postDateTime to theDate
set the year of postDateTime to theYear
if AMPM is equal to "AM" then
theHour as integer
set theHourInt to result
else
theHour as integer
set theHourInt to result + 12
end if
set the hours of postDateTime to theHourInt
set the minutes of postDateTime to theMinute
set the seconds of postDateTime to 0
postDateTime
set creation date of this_item to postDateTime
copy source of this_item to source_str
set theContents to source_str
set AppleScript's text item delimiters to {tagsectionbeginning}
--Split the file into a list of strings that start with serialBeginning
--Ignore the first item, which is just the text before the first occurence
set chunk1 to text item 2 of theContents
set AppleScript's text item delimiters to {tagsectionend}
set chunk2 to text item 1 of chunk1
set AppleScript's text item delimiters to {eachtagbegins}
set tagList to chunk2
set theItems to text items 2 thru (count of text items of chunk2) of chunk2
set serialArray to tags of this_item
set AppleScript's text item delimiters to {eachtagends}
repeat with nextItem in theItems
set serialArray to serialArray & first text item of nextItem
end repeat
set tags of this_item to serialArray
copy source of this_item to source_str
set theContents to source_str
set AppleScript's text item delimiters to {authorsectionbeginning}
--Split the file into a list of strings that start with serialBeginning
--Ignore the first item, which is just the text before the first occurence
set authorchunk1 to text item 2 of theContents
set AppleScript's text item delimiters to {authorsectionend}
set authorchunk2 to text item 1 of authorchunk1
authorchunk2
set AppleScript's text item delimiters to {rightbeforeauthorname}
set authorname to text item 2 of authorchunk2
set comment of this_item to ("[Author: " & authorname & "]")
set label of this_item to 2
end if
on error from obj to newClass
log {obj, newClass} -- Display from and to info in log window.
end try
end repeat
set originalDelimiters to AppleScript's text item delimiters
hide progress indicator
end if
on error error_message number error_number
hide progress indicator
if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
end try
end tell