Hello!
I am not sure if this question/feature was asked/proposed here previously, and if so - I am sorry for a repetition.
Firstly, I would like to say that the new suite of AI features in DT are incredible and help me greatly with my workflows. However, the more I am using them [ai features], the more I see how limited they are.
Most of the papers that I work with in DT are research papers which have a great deal of mathematical notation, and sometimes many relevant images/plots. I have read the “Getting Started” (and other) sections very carefully before moving from DT3 to DT4, and as far as I understand - in case a pdf file has a text layer, only that layer is shared with Chat.
This is very problematic for me, because most of mathamatical notation (and plots) are not included in the text layer and rather interpreted as pictures. This results in Chat not seeing many important parts of the paper - and therefore renders many of the answers it provides uninformative.
Also, sometimes the text layer is bad (after a bad OCR which scrumbles the words), and in this case the model doesn’t understand anything at all.
Additionally, from - “For PDF documents without a text layer, a certain number of page thumbnails are sent, dependent on the AI model you’re using” - I understand that even for pdfs without a text layer, not all thumbnails are sent? (Perhaps the limitation is set by the number of tokens the model allows?)
Would it be possible to add an option to ask Chat to send [all or part of] the thumbnails instead of the text layer? Maybe by using something like “Please base your answers on your visual recognition of the thumbanails, instead of the text layer.”
Or maybe simply an option to send the whole pdf to Chat (I am not sure if this is possible to implement using API).
Many thanks in advance.
Could you share such a document or a download link? Thanks in advance! In addition, which model/provider do you usually use?
The max. number of images depends on the used model. In addition, Settings > AI > Chat > Usage matters too to control the costs.
One alternative to the chat assistant might be batch processing:
Of course! I switch between OpenAI, Anthropic and Google, but currently I am mostly using Gemini for my queries - with Lite version for actions, and Pro version for Chat.
As to a document, this is somewhat a niche example, but this is where I have encountered this problem originally. I am including the specific file, but from my experience working with math pdfs, I suppose that something similar would happen with almost any pdf you choose.
An Introduction To Neural Ordinary Differential Equations.pdf (2.7 MB)
Try asking it about almost any equation that is longer than 1 or 2 lines. For example - definition of h(x) on page 27 (as numbered in the document). Or try to ask it to rewrite what it sees in definition 2.24 (extension) on page 21. There is a special symbol (like a long vertical line ‘|’) which it can’t see/understand.
And in general, these problems are sprinkled all over the document (basically anywhere where any special mathematical symbol is used).
Here are my results (using Gemini 2.5 Flash):
As you can see, it completely ignores the ‘|’ sign (which resulted in it giving me a wrong explanation).
Also:
It links to a correct page, but the definition is completely mangled.
If directly asked to look at the page (using screenshot), produces expected results (just to prove that it is not a problem with the model):
Thanks!
Thanks for a great workaround! This would help me in many cases, but it would become more complicated if I want to have a continuous conversation about its results.
Thank you for the document! I was able to reproduce the issue. A future revision of the chat assistant might support more flexible document processing similar to the one of smart rules & batch processing (see Chat - Query action on my screenshot above)
Great, thank you very much!