Interested in your scenarios / experiences around timeline related documents / knowledge (audio / video)

hey community,

I am wondering about DT´s current capabilities, affordances relating to the ever-growing amount of information tied to timeline-based documents – also known as video, film or audio. The question I bring forward is: how can be DT used in/for this? How does one work constructively with timeline-based documents in DT – and what scenarios, experiences, takes are already out there in the user base?

This interest is of course practical: a lot of my documents contain information tied to timeline-based documents now, as I am using vimeo, youtube, audio-‘notes’, podcasts (all of which now know timeline annotations, but also ‘chapterization’ – both increasingly AI-automation based).

It´s interesting and great that DT already allows for 2 ways of working with timeline-based information, even if this is still rather sporadic.

  1. setting timed links in AV document references, and use them as timecode-linked reference within DT
  2. importing chapters from YT into the sidepane (think ‘TOC’) – something that is amazing, if not yet in the documentation

[ see here (for hints on YT TOCs):]

– this rather fragmentary appearance of timeline-based reference seems understandable given there is no obvious / ‘in-use’ standard for timeline metadata and its exchange – after mpeg-7 seems stalled… but I am also greatful for any hints as to such time-line based formats of metadata, and what people here are using / accustomed to.

so what I´d like to ask the DT community is for

a) ways in which they work with timeline-documents in and around DT (directly, or indirectly)

b) insights from people working with timeline-based research (AV related research; annotation techniques etc.), what they can share about systematizing this kind of work in digital research ecologies (including DT :wink: ). Here it would also be fabulous if people share insights stemming from their work about the state/trajectory of technic formats / interoperability / standardizations of timeline-based ‘metadata’ (annotations, chapters, summaries etc.)

thanks already for sharing workflows, experiences, knowledges!


PS : – see here for conceptional context / run-up to this post) :


I will take a stab at this based on my own experiences and current perspective (which may evolve). Everything that is said about video below also pertains to audio.

DevonThink as the backend of the knowledge ecosystem

As a starting point I think we need to appreciate that DevonThink does a large amount of things incredibly well. Then it offers additional functionality that can address many more specific use cases adequately although there may be some third-party app that could address that use case even better. A simple example here would be PDFs, where most editing functionality is available in DT but e.g. bookmarks require a third-party application. Finally, there are things it just can’t do, like editing a MindManager map.

The good thing is that we can simultaneously

  1. leverage DevonThink for its many strengths in database management and automation
  2. rely on it for those more (file-)specific use cases it does support
  3. use third-party applications for certain specialized / niche use cases that it doesn’t cover (via opening externally or indexing)

Quick side note: Whether officially supported or not, I just want to point out that there are no issues in my experience with importing and/or indexing large media libraries in DevonThink. My personal main database is upwards of 2.5TB now and running smoothly on an M1 Macbook Pro with 32GB. Even on a first-generation M1 with 16GB (at that time with a 1.5TB database) the performance was absolutely adequate. Everything is shallow synced to iPad / iPhone via Bonjour without issues.

What DT can and can’t do with video-based contents

From my perspective, working with video in DT sits somewhere between the second and third category i.e. basic workflows are possible, which is great, but there are limits. For example, being able to create links to specific sections in a video and then add related notes is a great foundation. I typically use this when I need to jot down some quick ad-hoc notes for videos that are “standalone” and relatively short.

There are other cases in which the individual video is part of a collection where the whole is greater than the sum of its parts (e.g. a set of academic lectures that is part of an online course). Here, the question is how to really consolidate all this information in order to integrate it as actual understanding, ensure it’s possible to refer back to relevant sections quickly in the future and make it practically useful.

When it comes to this second use case, there are certain limits to what DevonThink can do. Some examples:

  1. There is no way to excerpt a part of a video, i.e. to create shorter “clips” from the full video
  2. There is no way to consolidate excerpts from a video in video-based form for later review
  3. There isn’t any transcription functionality for video contents, so they aren’t searchable

Basically, I use MarginNote to address 1 & 2 and to address 3. is an online service that transcribes videos accurately at a relatively low cost. I add transcribed text into the video item’s annotations in DevonThink so that it can be found in searches. Not much more to say on that, so let’s focus on MarginNote from here on.

Using MarginNote for advanced analysis and consolidation of video-based contents

If you really need to slice and dice and consolidate complex, content-dense videos I cannot recommend MarginNote enough. As a warning, it does have a very unique logic, an initially confusing/overwhelming user interface and a very steep learning curve. But enduring the early frustrations has been absolutely worth it for me, especially for its benefits working with video-based contents.

The principle in MarginNote is that you have mindmap notebooks, which are linked to as many documents and media files as you like.

I press play on a given video opened in the right-hand document view and when there is a particularly relevant section, I just mark that section with the handle bars. Now a new node in the mindmap is created instantly, which contains the excerpted video section. Clicking this node will play back the excerpted section. More video excerpts can be added to this node, as well as text-based comments. Video excerpts combined in a mindmap node can now be played back in sequence by selecting the node. Also, whenever a node in the mindmap is clicked, the main document view (imagine a split screen) automatically jumps back to the respective video section or paragraph in the source document or media file, so the context for the excerpt is always accessible.

There are all the typical benefits of structuring content snippets in mindmaps (including video-based excerpts without limitations), such as connecting nodes through visual cross-links and bringing them into a tree structure with nodes and subnodes.

Another extraordinarily useful feature is the ability to create so called “reference nodes”. To lead into this, imagine you’ve created a complete map of a course lecture’s contents based on video excerpts, each under a main node for lecture 1, lecture 2, lecture 3 etc. Now comes the time to consolidate clusters of related excerpt from all the different lectures, which are dispersed across the mindmap. One great way to select the respective nodes (containing the video excerpts), copy them all as references, scroll to a part of the infinite canvas that is free and paste them there. Now they can again be organized and connected as needed, but at the level of topic clusters (or whatever other grouping logic you wish to use that is different from the main mindmap). Changes made to the reference nodes are synced back to the originals automatically while the structure and organization of the original map remain untouched. The reference nodes also still link back to the original source from which they were excerpted, which remains instantly accessible.

While this is just a glimpse into how I use MarginNote to work with video, I hope this can inspire some further discussion.

Tips on the general setup and using MarginNote in combination with DevonThink

In general, I would recommend not using iCloud sync to synchronize videos managed in MarginNote across devices. You will encounter sync issues that are frustratingly difficult to diagnose / solve / reset. Instead, use the local sync (similar to Bonjour), which is far quicker and reliable for large media files. Note that iCloud sync does work reliably for regular documents in MarginNote, just not videos (with the usual caveats that apply to iCloud in general).

Secondly, marginNote saves all documents and media files in a folder under User/Documents/MarginNote 3 that can be indexed by DevonThink. However, after indexing there are some unique behaviors to be aware of:

  • MN doesn’t allow renaming or deleting files within the folder via the MacOS Finder and since DevonThink integrates with the Finder, deleting and renaming is also not possible from within DevonThink. More specifically, changes get automatically reversed or there are discrepancies. So, as a general rule MarginNote’s document management must be opened to make such changes. If this rule is followed, there are no issues.
  • Importantly, (custom) metadata and tags added to an indexed MarginNote item in DevonThink are retained as expected.

Final Thoughts

Let me end by emphasizing that in my view this is not a question of application X versus DevonThink. It’s a matter of leveraging the strengths of each application for those use cases where they excel the most, based on individual needs. This type of optionality is made possible in the first place by DT’s endless flexibility, as well as advanced indexing and automation capabilities.


In general, I don’t. I tend not to watch too much video content on the 'net as it takes way too long to get through it. Life is too short. I can’t see, even if my consumption of video may change in the future wanting to use DEVONthink to keep anything other than my notes or papers I may write about that video (or referencing it).

The only time-line annotation that I do would be to use the existing date meta data fields (mod, added, etc.). And I have Hazel rules which picks out dates on documents putting that info into file names and tags then pushing the file into DEVONthink.

1 Like

this is quite succinct, @AW2307 – thanks for that great serve!

I think it is a great initial post, as you really lay out the conditions of DT as (central) part of a ‘knowledge ecosystem’ (a term I quite like and use myself) quite convincingly, and also provide a really relevant framing for any work around timebased media documents with this! it puts things into perspective and sets the landscape quite well!

– just to add and insert my first thoughts:

DevonThink as the backend of the knowledge ecosystem

I am still thinking whether I would also call DT my ‚backendof the knowledge ecosystem. surely it does a lot, but it is rather my intelligent go to keeper and sorter, set up to work well with my file system blending stuff with found materials (collected from the web), also allowing some original input (mainly MD notes for me). So, my first associations are ‚clever bucket‘, ‚flat database-of-a-sorts‘, or ‘trickster filing cabinet’, and ‘hyper intelligent library’… I couldn´t yet say it´s my ‘universal backend’, as it does not really provide some things that I regards as central to my ‘knowledge ecosystem’ (eg allowing for good representation of my core knowledge structures; as it misses schematic aspects – maps, graphs –, and generally all kinds of ways to order elements in visual or creative ways or lay them out spatially (in the end I think knowledge lives in 'cognitive spaces`). this might be nit-picky and just about the metaphorical meaning of ‘backend’; or maybe it talks about different cognitive styles… and the great thing is that DT allows for myriad ways to make it a central ‘machine’ in one´s system… but I am sure we wouldn´t disagree here…

but to add to your thoughts about what DT actually provides – and also start talking more about my interest in getting AV-documents into the DT-equation / -ecosystem: what DT really provides to my knowledge ecosystem is:

  1. automatic relations to other relevant content / documents, which DT provides in several ways (proposing filing locations, show linguistically / semantically similar documents, giving interactive concordances of document sets,… etc.)

  2. intelligent ‚links‘ between different filetypes and in some way across different modes of media/documents (e.g. setting it up the right way, one can bring in image metadata and DT would find similar texts, graphic files etc.; turning highlights into separate text-files would be another example of this mode-transition; as would be the different ways in which tags, metadata, and inline-text (e.g.) hashtags can be set to relate to each other, and further build special mechanisms of ‘intelligent-relationing’ to my fingertips; this also goes for filtering, with the example of tags allowing to drill down semantically within a set of very heterogenuous filetypes and media-modes …)

– this also includes allowing some kind of ‘translation’ (conversion) between these different modes / filetypes; e.g. stron conversion capabilities; currently it allows to produce personal indexes of videos (via time-based reference) as text files, living alongside and relating to video files; it allows to preview very heterogenuous sets and allow for that in very different sets (via replicants)

– 3) a very own kind of ‚hyper-index‘: where Pony Notebooks had automatic indexes DT has: most powerful tag-system (hierachical; easily administrable; allowing for drill down; across DBs; transparent to OSX tag; convertible to other metadata systems like IPTC, folder structures etc.); powerful concordance, mentions, annotations that can reference back (this also goes for timestamps in videos!); TOCs (in case of video that is the YT-timemarker TOC; – on top parts of this is intelligent to some degree (linguistic analysis), and intermodal (so it works to a large degree across filetypes and ‚modes‘); a very finegrain search language that can leverage (mostly) all of that; … and probably some other trick ponies, all of them amazing in itself … and potentially of interest to work with AV media (esp. in mix with other kinds of documents)

… so, this is my account of unique leverage points DT provides for my view on ‚knowledge ecosystem‘ and also what / why I think it´s so convincing for (potential) intelligent use / leverage / digest of video and audio (with specific ‚metadata‘)

What DT can and can’t do with video-based contents

First - to understand: what do you mean here?

– is this related to text-like excerpts or to video (the second of which I think asks a little too much of DT)?

… But then where I think your thoughts are most helpful, is turning your description into a kind of conceptual inventory / typology of what kind of informational extension / augmentations exist video (audio) in principle. this also helps to sort out what can (and can´t) be done in DT, what might be of interest, but also what 3rd party tools and even more what kind of technical protocols are out there to really work on these – and to clarify what kind of frictions exist, for DT and in principle.

Positively and very roughly speking, I see in this typology:

  1. time markers (equivalent to notes; highlights; indexes)

  2. sub-clips (think chapters, and sections of text; like stuff that normally goes under a headline…

  3. text-stream augmentation (subtitles; transcriptions; voice-over or note stream)

All these are hybrids of text-mode information and specific position(s) in the timeline of a media document.

What I see here is that DT allows for

  1. in a certain form (via timecoded references internal to the DT system; picking up markers in the specific cases of QuickTime, FinalCut annotated exports, but also YT); – big caveat with that: the markers appear in a TOC (great!), but are not searchable (bummer)

  2. nope (only as emulation via the marker system) – not really as ‚subclips‘ that is selected parts of a video/audio that starts and stops at a certain moment (sometimes, as with chapter, the subclip logic is really transferred into the marker-logic, allowing one to jump to the starting point of a chapter (subclip)

  3. these text-stream-augments can easily brought into DT, as text – that is once they exist (e.g. from SLT files); problem here: but I do not see a way in which they would automatically relate back to the time-index of the actual AV files they belong to…

So bringing this into the frame of thinking raised by AW2307 – thinking about the possible interactions between DT, its ‚file specific use-cases‘ (on-board editing capabilities), and other specialized / 3rd-party apps – the big question for me is: what kinds of exchange between DT and 3rd-party-tools are possible, what would be desirable? (– e.g. what about vimeo chapters? What about subclips made in FinalCut or other apps.) And what are the options on the ‚backend’, i.e. given the technical preconditions (formats, standards)?

As I also try to understand the actual technical preconditions in play, it might be noted that @cgrunenberg
pointed out that what DT can do is related to what the Apple AVKit. But then I can´t really tell what else that would make possible or prohibit in leveraging timeline-related information (metadata) in DT…

  • Then there is the obvious fact that DT loads YT-timemarkers and presents them in the TOC. The technical background here is not clear to me; in this regards it would be interesting to know what technical format / protocol that is based on, whether this taps into some standards, that could also be leveraged for other media-contexts/-platforms/-apps (e.g. vimeo)

Using 3rd-party apps in unison with DT for advanced analysis and consolidation of video-based contents

@AW2307: I wonder what concrete uses in DT can be made out ouf your comment:

What MarginNote is for @AW2307´s probably is Kyno for me (especially since the once industry leading CatDV has now been dissolved into a company-SAAS). The good: Kyno produces flat DBs (via sidecar files) allowing for versatile video annotation, keywording, and also for (annotated) subclips. The downside: the only way I see this can be exported / linked to DT ‚ecosystem’ is a) via an Excel-/Numbers-listing of the subclips b) exporting thumbnails of markers c) exporting subclips as seperate video files d) export FinalCut XML.

Now a-c are not really keeping references to points on the timeline of the referenced video / audio, obviously. d) export of FinalCut XML would seem to hold the promise of somehow keeping / creating a link between text-based info (markers, subclips etc.) and the actual timeline of the video file. But I do not see how that would work in DT yet and how it might be connected / connectible to the Apple AVkit…

But going via Kyno, I couldn´t even produce a TOC like one can for FinalCut exports.

Then FinalCutProX itself is an interesting point. Related to Christian Grünenbergs reference to AVkit, it is possible to let introduce (or use) subclips that are referenced from within FCPX in DT. They appear just as the YT-markers do (– but are also not searchable). Then, irritatingly, markers from within FCPX do not appear in DT. Hm.

The latter behavior cuts out going via another dedicated MAM-like app vor video and annotation, which is KeyFlowPro Thing is it allows for markers (and ‚keywords’), but not for subclips or chapters. I do not own it, so I can not test it. But it seems like not helpful for bringing timeline-related info into DT.

There are some more apps that work with annotation / markup of timelines (like ANVIL or Elan – and of course I´d appreaciate others sharing experiences in relation to use / integration with DT…

Also, the whole subscenario of podcasts and chapters (notes etc.) in here, I haven´t yet tested; but might be able to do that later. But of course comments / chip-ins from people who work with podcasts, esp. chapterized ones are welcome…

thanks, @rmschne for still taking the extra-time to comment, even though you seem to state you are not invested or interested in any scenario linked to timeline-linked information in DT. I can see that given the state of things, it might be the better option to meet DT-on its current grounds and ‘go paper (text documents)’. Still I´d like to raise and pursue the option of meeting video/audio/podcasts on theirs (their grounds)

… you are raising a very substantial problem here, that I also think about. and I guess people all give their answer what that means. maybe ‘personality’ is just about these different patterns of dealing w/ this existential challenge …

I see. and also have dipped a little into Hazel. But I do not yet understand, whether you are referring to specific AV-metadata (related to timeline-positions), or rather reiterate the option of using general file info (– simply ‘accidentally’ related to video/audio files – ) and bring that into DT…?

anyways, thanks for putting your voice in here!

I’m not sure what you are asking … what I do is deposit documents, say an invoice, into a folder that Hazel watches. Hazel setup to look for patterns in the file–in particular date of the invoice. once found, Hazel will rename the file with the date suffixed to the basic file name and then move that file into the DEVONthink Global Inbox for import into DEVONthink. (Yes, can use DEVONthink to look for and extract text patterns in files, but I have Hazel and find it easier to use. Less friction for me). I documented what I do at DEVONthink and Hazel | Musings on Interesting Things a while back.

Basically, what this refers to is a sort of transclusion functionality for video excerpts, so that they can be reconfigured / reordered as needed without touching the original excerpt. To clarify, this is not a DT feature request, just a description of video-related use cases, for which I currently use third-party apps (MarginNote) and am happy to continue doing so :wink:

As mentioned, currently I don’t use DT do work with video beyond jotting down some notes with backlinks to sections of the video… but if at some point more related features were added, I might.

The most valuable video-related feature would probably be automatic transcription to enable searchability. Currently, any non text-based media lives in DT as a second-class citizen.

1 Like

hey rmschne,

that is an interesting workflow around Hazel. so thanks.
and yes, I share your sentiment that DT can by now and with some scrutiny do some things that one would have done with Hazel otherwise in earlier days.

I suspect there is indeed some deeper misunderstanding involved. we can find out on which side and in what respect.
the whole purpose of this thread / scenario is not only related to ‘video / audio vs text-based forms’ in general (or in terms of philosophies of time, styles of knowledge etc pp.), nor is it general purpose metadata (like ‘file creation date’, ‘file author’ etc.) tied to files (– as in ‘pdf invoice could equal a video file’), but the specific affordances of using timestamps related to time-based media formats – that is markers on a timeline; chapters segmenting a timeline; and (textual) metadata specifically tied to that. (the YouTube markers / chapters picked up by DT and turning into an ‘interactive’ TOC – even if not searchable by DT – would be an example of that – see the initially referenced thread); Quicktime chapters and – potentially – their descriptions would be another case, that is as long as picking them up somehow allows for jumping directly to the point in the timeline they are referring to). Basically it´s all the stuff that is/was adressed by MPEG-7 (which somehow got stuck as universal metadata scheme / standard for these matters).
For clarities sake I quote the characterization of MPEG-7 here from Wikipedia:

So, as much as you are not sure what I am asking (– and I might have not been clear enough in outlining the scenario –), I am not sure in what way your example about invoices contributes to the context presumed / sketched here.
So, in other words: your example / contribution would be interesting and relevant to the topic / scenario if Hazel could process files based on information referring directly to the timeline, or to metadata attached / related to that. e.g. if in a video-file there is a time- or chapter-marker with some information (information not part of the general ‘description’ / ‘comment’ etc. field of the overall file), and Hazel could find that and act on that.
– in that case it would also be interesting to learn what metadata-scheme linked to timelines Hazel could process (– the same question stand for DT, YouTube, even the Apple AVkit).

so – if you tell me Hazel can ‘pick up a comment / marker on minute 3 for video xxx’ (either it being there, its name or even further metadata attached to it) and act on that finding specifically (just the same as on string-patterns in a pdf / rtf / doc / csv / whatever) that would be interesting in the given context here.

and that is why I asked back about your comment / contribution.


yes, absolutely! forgot to second that before!
– this is the way a lot of the videoplatforms are going of course; and there are lots of services by now capitalizing on this need to get to AV information as text (and to leverage all that information that is otherwise lost to text-centric processing (eco-)systems).

additional note: in the context / perspective sketched here this would be especially interesting if the transcription somehow has /preserves the capability to directly jump back to the referenced point of the timeline of the video / audio file (as is possible currently with the YT TOC and with the DT-specific AV-/video-references)

1 Like

I don’t know. Ask the folks at Hazel.

Odds are there is other software out there you can find that may help you. Look in the libraries available to Python.

though I have to say I do not fully understand your attitude to this.

Basically you can’t say why you thought / think this belongs here in this particular scenario. After all you posted it as ‘contribution’, and you inserted Hazel into the conversation. Now you are practically saying, you don’t know / want to say why you brought it up in this specific context. Plus you are also hinting you are even not really interested in the larger context of the scenario laid out (– using video more systematically in DT)

What I can ‘practically’ take from your comment is ‘go, look at Hazel; go, look at Python scripts’… ‘search (more) software’, ‘go search people that might be able/willing to help you’ which is … a most generic remark.

So brings us back to questions of life time use, forum culture etc…

But ok. I heard you on the very general thing which is pointing to Hazel and your blog posts on this… so, thx for that, rmschne.