Batch processing Firefox bookmarks and re-creating them in DT

My problem

After I realized that importing bookmarks from Firefox won’t preserve their creation and modification date, I tried to come up with an own solution.

A solution

  • First I backup my Firefox bookmarks. This’ll create a json file.
  • Next I process this file with a Python script (see below for a first draft).

With the help of the Python script I am able to generate whatever format is necessary to re-create bookmarks in DEVONthink. So this script will be like a “code generator” that generates AppleScript code.

In a first run I will create locations with a line like this:

app = Application('DEVONthink 3');
app.includeStandardAdditions = true;


loc = app.createLocation("/Firefox/PROJECTS");
// next: set the creation and modification date for this `record`

loc = app.createLocation("/Firefox/PROJECTS/Research Python Pandas");
// again: set the creation and modification date for this `record`

// hundreds of lines with lines like this
// this file will be generated by my Python script

In a second step I will be creating lines like this (taken from Jim’s post here):

tell application id "DNtp"
	set urlName to "DEVONtech"
	set urlURL to "https://www.devontechnologies.com"
	create record with {name:urlName, URL:urlURL, type:bookmark} in incoming group
end tell

Questions

  1. Can I create records at “locations”? My folders are given in a format like this: “Firefox/PROJECTS”
  2. Does it make sense to put all “create records” inside one “tell application id …” or should I put each of the “create record” inside a “tell application”
  3. Is AppleScript able to process 2000+ “create records”? Or do I need to delay every operation?

Python script for processing and re-formating Firefox bookmarks

import datetime
import json

# TODO: Read filename from command line
json_filename = "./bookmarks-2020-06-10.json"

def to_datetime(ms):
    return datetime.datetime.fromtimestamp(ms/1e6).strftime('%Y-%m-%d %H:%M:%S')

# convert json to dict
# https://stackoverflow.com/a/41815530/5115219
with open(json_filename, 'r', encoding="utf-8") as json_file:
    input_dict = json.load(json_file)

# now process the json with a generator
# (to avoid recursion use generators)
# https://stackoverflow.com/a/39016088/5115219
#

FOLDERS_AND_BOOKMARKS = {}
FOLDERS_DATES = {}
def process_json(json_input, folder_path=""):
    global FOLDERS_AND_BOOKMARKS

    # process dicts
    if isinstance(json_input, dict):
        # we have a dict
        guid = json_input['guid']
        title = json_input['title']
        idx = json_input['index']
        date_added = to_datetime(json_input['dateAdded'])
        last_modified = to_datetime(json_input['lastModified'])

        # do we have a container or a bookmark?
        #
        # is there a "uri" in the dict?
        #    if not, we have a container
        if "uri" in json_input.keys():
            uri = json_input['uri']
            # return URL with folder or container (= prev_title)
            # bookmark = [guid, title, idx, uri, date_added, last_modified]
            bookmark = [title, uri, date_added, last_modified]
            FOLDERS_AND_BOOKMARKS[folder_path].append(bookmark)
            yield bookmark

        elif "children" in json_input.keys():
            # So we have a container (aka folder).
            #
            # CREATE A NEW FOLDER, UNLESS IT EXISTS
            # Check if we are processing root container:
            if title != "": # we are not at the root
                folder_path = f"{folder_path}/{title}"
                if folder_path in FOLDERS_AND_BOOKMARKS:
                    pass
                else:
                    FOLDERS_AND_BOOKMARKS[folder_path] = []
                    FOLDERS_DATES[folder_path] = {'date_added': date_added, 'last_modified': last_modified}

            # run process_json on list of children
            # json_input['children'] : list of dicts
            yield from process_json(json_input['children'], folder_path)

    elif isinstance(json_input, list):
        # List of dict is passed.
        # (Process children of container.)

        dict_list = json_input
        for d in dict_list:
            yield from process_json(d, folder_path)

#input_dict
global counter
counter = 0

for el in process_json(input_dict):
    counter += 1

FOLDERS = list(FOLDERS_AND_BOOKMARKS.keys())
META = list(FOLDERS_DATES.keys())

# TODO: Continue here
# Generate AppleScript code that should be consumed by DT.

Just curious but why are the dates that important for you? During the last decade I can’t remember any requests for this.

You first have to create the location, this returns a group which can be used to create records.

This doesn’t make a difference for DEVONthink but the first approach is definitely more readable/maintainable.

Sure.

1 Like

The creation date is valuable because it allows me to see the chronology of events and developments.

Thanks for the reply! But wouldn’t a different file format instead of a bookmarks make more sense if the chronology is important? E.g. a bookmarked webpage might not exist anymore in the future or might be a completely different one due to redirects or sold domains etc.

Once I know how to process the bookmarks, I might save the pages in HTML format.