Smart Rule Logic

rkaplan · April 15, 2020, 6:09pm

Consider this Smart Rule:

And consider this document as an HTML text file:

The rule does find these if I adjust the search terms in the rule with the same format among the “Any” section:
Sue
thanks
records

The rule does not find these if included as search terms:
NYC
$MedQuestLTD
MedScription

What is the pattern for what it will match and what it will not match on?

BLUEFROG · April 15, 2020, 9:49pm

I am not seeing an issue with this.

I’m curious why are you matching on All if you’re looking for content matches?

rkaplan · April 15, 2020, 10:14pm

I am matching on “All” because sometimes the text I am looking for may be in the Name and at other times in the body of the message so this covers everything. Does “Content” include the Name of the document?

Interestingly I see that my original configuration does work on some messages I sent after I did this post. Is it possible that there is a delay to index a new document and thus I was testing the rule earlier before the indexing was complete? The terms that previously worked were at the end of the alphabet - does the indexing perhaps go from Z to A and span all documents rather than complete an entire document A-Z at one time? And perhaps “Content” instead of “All” would be quicker to index?

BLUEFROG · April 15, 2020, 10:23pm

No Content doesn’t include the name.

You could use a structure like this…

Is it possible that there is a delay to index a new document and thus I was testing the rule earlier before the indexing was complete?

It’s theoretically possible if there was a large influx of data.

rkaplan · April 16, 2020, 4:21am

I believe I have identified the cause of the problem.

Can you try unzipping this file and then referencing the .html file in a Smart Rule with a Content search for

$MedQuestLTD

The search will fail

Then convert the file into .rtf or .pdf and the search will work

Then go back to the .html file and move the word $MedQuestLTD further down the file into the <html> section. Then too it will work.

So it appears that the search for Content in a smart rule does not include the headers in an .html file even when those headers are displayed and look/act like content in other respects. Is that the desired effect? Or would you consider changing your code to search the entire .html file for content?

At least for now I think my workaround solution will be to have the smart rule convert files like this to either .pdf or .rtf

News Alert Trump threatens to invoke never-used constitutional authority to adjourn Congress and push nominees through.html.zip (4.6 KB)

cgrunenberg · April 16, 2020, 5:54am

Actually this is not a valid HTML document, only the HTML body is indexed. How did you create this file?

rkaplan · April 16, 2020, 10:33am

I created it with Zapier by linking my Office 365 email to Dropbox and then indexing/importing a dropbox folder.

I have created “Categories” in Office 365 which I name starting with $. The category name is part of the html file which Zapier creates. Then I use smart rules to process the files based on the Categories.

The net result is quite cool - it lets me access web-based email anywhere from any device and transfer that email into Devonthink to be sorted and otherwise SmartRuled as desired.

cgrunenberg · April 16, 2020, 11:21am

Seems that this app simply inserts some really poorly formatted email headers before the HTML code instead of adding it to the HTML body like it should.

rkaplan · April 16, 2020, 1:17pm

OK I will see if they can fix it.

For now it works fine converting it to .pdf or .rtf in the smart rule.

Come to think of it, if I want to keep it as .html I could probably have the smart rule convert it from those formats back to .html again and I suspect then the email headers would be inside the html body as desired.

rkaplan · April 17, 2020, 11:29am

Follow-up…

The .html file can be corrected in Zapier by adding a “Formatter” step to the Zap with its “Split text” function

That said, I think it’s simpler to just have the smart rule convert the .html to .pdf or .rtf. And then if desired, convert back to .html from there.