Is it possible to search for HTML attribute names, and or the values of such attributes, in DEVONthink ?
I have a folder of HTML <ul> <li> outline files (generated by Jesse Grosjean’s Bike Outliner) which is indexed by DEVONthink.
This works well for searching the <p> text content of these files (as well as for viewing and printing them with custom CSS, as it happens) but I also need to search through these files for particular HTML user attributes and their values (essentially attributes with a data- prefix in their names).
Does that sound like something that DEVONthink is equipped to do ?
Thanks, and yes – css signalling of the presence of particular attributes is working well.
I suppose, on reflection, that the challenge of HTML user attributes in this context is that they are tied to particular HTML elements (particular outline rows, in this context),
whereas DEVONthink record indexing (if I have understood it correctly) provides pointers to whole documents, rather than to particular lines.
But I’ll think about a script. This may really be a job for XML tooling and XPath expressions, etc.
Searching for CSS-generated strings is clearly out of range, but I can imagine extracting and indexing user data- attribute values during DEVONthink’s parse of the HTML.
(Perhaps in the same way that DEVONthink can extract and list the hyperlinks in an HTML document)
it might prove hard to define a sense in which the user data Eight is less there than the Seven,
but I do agree with you, of course, about text which is only in a CSS file.
As an aside, I don’t think we would say that in the structurally identical piece of HTML below, the href link attribute value was less “there” than the label text:
<a href="https://discourse.devontechnologies.com/t/searching-for-html-data-attributes/73609/9">Searching for HTML data attributes?</a>
(And, of course DEVONthink indexes both element content and attribute content in that case, hence the Links panel in the Document tab of the DEVONthink inspector)
Attribute values and element contents are clearly both there,
and both equally indexable at HTML parse time.
As I said: DT indexes the text of the document, not the HTML elements nor their attributes. Those are just markup. Treating the values of the attributes as if they had any meaning is pointless, in my opinion. If you start with data- attributes, why stop there? Class names could have meaning, too. As might have colors, perhaps.
Which leads to the question of one should perhaps also index CSS files …
If someone wants to convey meaning in HTML, they have abundant possibilities. data attributes are not meant for that purpose (if alone because of accessibility issues). To quote the MDN text on data attributes:
Do not store content that should be visible and accessible in data attributes, because assistive technology may not access them. In addition, search crawlers may not index data attributes’ values.
Well, it does, in fact, index the href attributes.
It just a question of design – which attributes one chooses to index, and which to ignore.
But, for the moment, at least – the way forward is clearly to handle custom user data in Bike files by XPath and script.
If there were enough users indexing Bike files in DT, then, arguably, it might be become worth reviewing the indexing of data- attributes (in addition to the existing indexing of href attributes) by DEVONthink too.
Yes – early days, and Bike row attributes are, for the moment, only accessible through the Bike scripting interface.
My understanding is that that is likely to change in tandem with the planned introduction of stylesheets, which will make the attributes more directly visible – without add-on scripts – in the application itself.
Until then, just scouting around for a good app to use for indexing and searching them. DEVONthink seems a good fit for:
indexing of a cloud of outlines in a given folder, and
viewing and printing both <p> text and custom attributes with custom CSS.