Smart Rule Logic

Consider this Smart Rule:

And consider this document as an HTML text file:

image

The rule does find these if I adjust the search terms in the rule with the same format among the “Any” section:
Sue
thanks
records

The rule does not find these if included as search terms:
NYC
$MedQuestLTD
MedScription

What is the pattern for what it will match and what it will not match on?

I am not seeing an issue with this.

I’m curious why are you matching on All if you’re looking for content matches?

I am matching on “All” because sometimes the text I am looking for may be in the Name and at other times in the body of the message so this covers everything. Does “Content” include the Name of the document?

Interestingly I see that my original configuration does work on some messages I sent after I did this post. Is it possible that there is a delay to index a new document and thus I was testing the rule earlier before the indexing was complete? The terms that previously worked were at the end of the alphabet - does the indexing perhaps go from Z to A and span all documents rather than complete an entire document A-Z at one time? And perhaps “Content” instead of “All” would be quicker to index?

No Content doesn’t include the name.

You could use a structure like this…

Is it possible that there is a delay to index a new document and thus I was testing the rule earlier before the indexing was complete?

It’s theoretically possible if there was a large influx of data.

1 Like

I believe I have identified the cause of the problem.

Can you try unzipping this file and then referencing the .html file in a Smart Rule with a Content search for

$MedQuestLTD

The search will fail

Then convert the file into .rtf or .pdf and the search will work

Then go back to the .html file and move the word $MedQuestLTD further down the file into the <html> section. Then too it will work.

So it appears that the search for Content in a smart rule does not include the headers in an .html file even when those headers are displayed and look/act like content in other respects. Is that the desired effect? Or would you consider changing your code to search the entire .html file for content?

At least for now I think my workaround solution will be to have the smart rule convert files like this to either .pdf or .rtf

News Alert Trump threatens to invoke never-used constitutional authority to adjourn Congress and push nominees through.html.zip (4.6 KB)

Actually this is not a valid HTML document, only the HTML body is indexed. How did you create this file?

I created it with Zapier by linking my Office 365 email to Dropbox and then indexing/importing a dropbox folder.

I have created “Categories” in Office 365 which I name starting with $. The category name is part of the html file which Zapier creates. Then I use smart rules to process the files based on the Categories.

The net result is quite cool - it lets me access web-based email anywhere from any device and transfer that email into Devonthink to be sorted and otherwise SmartRuled as desired.

Seems that this app simply inserts some really poorly formatted email headers before the HTML code instead of adding it to the HTML body like it should.

OK I will see if they can fix it.

For now it works fine converting it to .pdf or .rtf in the smart rule.

Come to think of it, if I want to keep it as .html I could probably have the smart rule convert it from those formats back to .html again and I suspect then the email headers would be inside the html body as desired.

Follow-up…

The .html file can be corrected in Zapier by adding a “Formatter” step to the Zap with its “Split text” function

That said, I think it’s simpler to just have the smart rule convert the .html to .pdf or .rtf. And then if desired, convert back to .html from there.