This works well for searching the <p> text content of these files (as well as for viewing and printing them with custom CSS, as it happens) but I also need to search through these files for particular HTML user attributes and their values (essentially attributes with a data- prefix in their names).
Does that sound like something that DEVONthink is equipped to do ?
Thanks, and yes – css signalling of the presence of particular attributes is working well.
I suppose, on reflection, that the challenge of HTML user attributes in this context is that they are tied to particular HTML elements (particular outline rows, in this context),
whereas DEVONthink record indexing (if I have understood it correctly) provides pointers to whole documents, rather than to particular lines.
But I’ll think about a script. This may really be a job for XML tooling and XPath expressions, etc.
As I said: DT indexes the text of the document, not the HTML elements nor their attributes. Those are just markup. Treating the values of the attributes as if they had any meaning is pointless, in my opinion. If you start with data- attributes, why stop there? Class names could have meaning, too. As might have colors, perhaps.
Which leads to the question of one should perhaps also index CSS files …
If someone wants to convey meaning in HTML, they have abundant possibilities. data attributes are not meant for that purpose (if alone because of accessibility issues). To quote the MDN text on data attributes:
Do not store content that should be visible and accessible in data attributes, because assistive technology may not access them. In addition, search crawlers may not index data attributes’ values.
Well, it does, in fact, index the href attributes.
It just a question of design – which attributes one chooses to index, and which to ignore.
But, for the moment, at least – the way forward is clearly to handle custom user data in Bike files by XPath and script.
If there were enough users indexing Bike files in DT, then, arguably, it might be become worth reviewing the indexing of data- attributes (in addition to the existing indexing of href attributes) by DEVONthink too.
Yes – early days, and Bike row attributes are, for the moment, only accessible through the Bike scripting interface.
My understanding is that that is likely to change in tandem with the planned introduction of stylesheets, which will make the attributes more directly visible – without add-on scripts – in the application itself.
Until then, just scouting around for a good app to use for indexing and searching them. DEVONthink seems a good fit for:
indexing of a cloud of outlines in a given folder, and
viewing and printing both <p> text and custom attributes with custom CSS.