AI settings and transcription

I am so grateful for the AI connections in DT4!
Two questions:

  1. in Settings/AI/Chat where it says Search I have selected only Database - nevertheless both Mistral and Perplexity seem to be searching the web as well as my files when I ask them a chat question – is that intended? is it possible to tell the AI to focus only on the files selected?
  2. Transcription – at the moment if I select a one page pdf file and go to Data/Recognition/Transcribe Texts and Notes nothing happens. If I use the inspector AI chat with Mistral and say ‘transcribe this page’ it says it can’t, but if I use Perplexity it does and does a very good job. Do I need to do something else to enable Data/Recognition/Transcribe Texts and Notes to work?

With apologies I can now answer 2) – I had it set to output transcription to Searchable text not to Annotation. So that is fixed and I apologise for taking people’s time. But I am still interested in an answer to 1).

I must admit, I’m struggling with the same issue as you identified in 1), although in my case it is only searching the web (which is not selected in the preference).

Just as a further follow-up, I think we may be asking too much of DT, or more specifically how AI is integrated.

As stated in the manual (p.27): “Another critical thing to be aware of, AI is not going to “process and connect” years of your documents and information in your database.”

Asking an engine (in this case, Perplexity / Sonar) to interrogate a specific document works, but I also also get web results even though that option is not checked.

Perplexity sonar has web access, as its primary function is to search the web based on user input.

For mistral, it’s surprising though.

Perplexity is able to search the web on its own (more or less the only compelling reason to use these models as they’re quite limited in other aspects), in all other cases DEVONthink does the searching (meaning either a different model or a Wikipedia, PubMed or database search) and provides the results to the AI model.

DEVONthink is able to control the amount of results & used tokens in these cases depending on Settings > AI > Chat > Usage. A screenshot of Settings > AI > Chat as Mistral should only search the web if it’s enabled in the settings.


These are the settings. Correct that Mistral, unlike Perplexity, does not cite websites. But see this answer in response to a question about a pdf of US Congressional committee hearings from 1940:

It is in other words telling me what happened after and in response to these 1940 hearings, not just summarising what is in this text. But that is perhaps from its already ingested knowledge then rather than a new web search?

Yes, that’s the most likely reason (and might frequently also result in hallucinated links)

You can try to specifically prompt it.

For example, prior to your question write something like:

“It is critical that you only use contents from the provided documents as context. You must not use general knowledge in your response.”

Good thought, yes, but using a specific prompt under Perplexity/Sonar does not seem to exclude web results.

I’ve some time later this morning so may try a few other models.

With prompting you can tell an API model without web access not to use general knowledge. But you cannot use prompting to tell an API model with web access not to use its web access :wink:

2 Likes

This makes total sense, yes. I’ll need to move away from Perplexity it seems.

Which models (available in DT) do not have web access?

EDITED TO ADD: Trying Claude 3.7 and it seems to query inside a database without going out to the web. Incredible.

Only Perplexity’s models support web searching on their own (see small magnifying glass icon in the pop-up- menu), all others models only if the built-in web search is enabled in Settings > AI > Chat and (!) if the model supports tool calls (see small gear icon in pop-up menu).

Personally for a quick web search I prefer other, faster models (e.g. Gemini Flash or GPT-4o mini) and DEVONthink’s built-in search support. And for a really deep, comprehensive search and report I usually use Claude 3.7 Sonnet.