[DT3 Beta1]"Document date" place holder in smart rule not always working

I’m trying to automatically rename German Telekom’s bills with a smart rule. It’s fairly easy to recognize the PDF as a bill. However, using the “document date” placeholder in a smart rule is unreliable. It works sometimes to retrieve the full document date. But not always – no idea, why and when. However, getting at the month and year parts of the document date never seems to work. Even if the document date is correctly recognized as a whole (i.e. 2019-05-06), the month/year part are always zero (or empty, in any case they’re displayed as zeroes).
In addition, double clicking on an placeholder in the smart rule does not open the corresponding setting (like the date format for a “document date” placeholder) but converts this field into something like %sortabledocumentdate% in the rule. Which then doesn’t work at all. It might be a good idea to either ignore double clicks on the placeholder or do something reasonable or make sure that the %…% part is converted into something working again.

Is the date in the text after converting the PDF to text via Data > Convert?

Yes:
IHRE DETAILLIERTE FESTNETZ-RECHNUNG FÜR MAI 2019 02.05.2019

I tried the original process again with a new rule. This time, it worked like a charm. However, I’m quite certain that yesterday I had the problem with document date year/month I mentioned in the post. As I said in the subject line: “not always” :frowning:

In case of reproducable issues a screenshot of the rule would be great.

OK. I’ve set up a new rule for mobile phone bills (they differ from the land line onesin that is only PDF, not PDF+Text). I’ll include a screenshot of the rule. It is working partially:

  • OCR is done on the original file (however, the original is not moved to the trash, as per the OCR global settings)
  • The name of the file is changed, but the document date is not inserted.
    I suppose (!) that in this case the document date is not available to the rules engine?

    If need be, I’ll send you the original PDF via direct e-mail.

Smart rules don’t depend on preferences. But using the action “OCR > Anwenden” instead should make this work.

It makes this work “sort of”. In fact, there’s now a document date available when I use “OCR > Anwenden” (apply) instead of “OCR > searchable PDF”. However, the date is not what it should be: Instead of the billing date, I get the date when the amount will be deducted from my account. To clarify, I include the relevant page (somewhat redacted for privacy reasons). As you can see, the document date is on the upper right hand, whereas the “deduction date” is on the lower left.
However, if I convert the OCR’ed document to pure text, the latter date appears before the former one.
Rechnung_2019_04_25104137000771 Kopie.pdf (469.1 KB)

Hi, I have the same problem. For example (I want the file name of OCR’ed PDFs already in DT3 to be changed to document date and group name) I sometimes have the problem that the document date is either not recognized at all or a second (wrong) date in the document is selected. Also, from time to time future dates are created. File names of documents with unsuccessful recognition then start with 0-00-00.

Unfortunately it is not possible to revoke the execution of the process with CMD-Z that was not executed as expected and thus not restore the old state.

Also the routine for recognizing the document date seems to me to get problems if in the document behind the date still the time when it was printed is written.

How do you think about a preview function for the rules which allows to check the expected result in advance for possible errors?

Greetings
Fred

However, if I convert the OCR’ed document to pure text, the latter date appears before the former one.

Note: The underlying text in a PDF does not necessarily match what you’re looking at onscreen. PDFs aren’t built like page layout or word processing application documents.

The date I’m getting is 2019-05-09. This date is also appearing in the text first after doing OCR (German language, but also tested English), then converting to plain text.

Hi,
I’m aware of the “PDF is not a page lay out”. I tried again with the original version of the document and I get 21-05-2019 as document date (german formatting, obviously). I converted the original PDF to PDF+Text with German as main language.
If you’re interested, I could send you the original document in a personal e-mail.

Hold the Option key and choose Help > Report bug to start a support ticket and attach any documents you feel would be helpful.