dT4b2 AI Crawl Before Walk: Protecting documents from AI

Sherrell · May 13, 2025, 12:09am

I’m trying to carefully navigate the task of familiarizing myself with DT4b2’s powerful new AI features without compromising my data base privacy and without breaking the bank on AI calls. I’ve read the relevant sections of the new DT4b2 manual, checked out DT4b2 online help, and surveyed the many interesting and relevant postings here. (I can’t guarantee I haven’t missed something along the line, so please forgive me if my recon has been less than perfect…).

My Situation:
I have over 2000 PDF+OCR documents.

My DT4b2 AI Settings:
Chat>>>
Chat: ChatGPT
Model: GPT4.1(Nano)
Usage Auto
Context Window: blank for now
Role: blank for now
OpenAI API Key: to be added at chat time
Assistant: no boxes checked
Search: Database (only)
Summaries: Text box checked

My Objective:
I wish to chat with a single document or group of documents with absolute certainty the AI engine does not have access to other documents in my database. There are many documents I NEVER want the AI engine to access.

My Understanding:
I understand that by default the AI engine only has access to the document(s) I have selected at the time I go to Tools>>>Chat with a document.

MY QUESTION:
Is there a way to add an extra layer of protection to prevent the AI engine from having access to documents I do not intend it to EVER access? Why do I ask? Human error. I know it’s inevitable at some point that I will fail to recognize I have documents open that I do now wish the AI engine to access.

For instance, could I tag the documents I wish to have available (or conversely NEVER wish to have available) for inclusion in the AI engine’s knowledge base in some manner?

Suggestions/tutoring humbly solicited!
Tks.

cgrunenberg · May 13, 2025, 5:36am

The database search, if enabled in Settings > AI > Chat, is limited to the current selection in the item list (e.g. List view) or, if there’s none, to the sidebar selection.

Therefore disabling the database search is a good idea if you only intend to work with selected document(s). However, we might add a possibility to exclude items from the chat.

Sherrell · May 13, 2025, 12:19pm

Thank you, Sir!

I can think of a number of reasons a user might wish to have a “padlock” on a document to always exclude if from being accessed by the AI engine. For instance, one might have a business proprietary document, or a document with a really poor OCR, or a document you know contains errors at points relevant to your AI query…

Thanks again for considering this.

RobH · May 13, 2025, 4:25pm

I would give a +1 on being able to exclude items from AI access. Even if only at the group level, it would provide more flexibility.

Question on the AI settings: if I uncheck Web, Wikipedia, and PubMed, the AI will only have access to my documents when chatting or answering questions? Even if the model has web access (like Anthropic recently giving Claude web access)?

Sherrell · May 13, 2025, 4:45pm

Rob,

My interpretation of Christian’s response was that it is not necessary to have the Settings>>>AI>>>Chat>>>Search>>>Database box checked in order to chat with the documents highlighted in List View. My concern, of course is that I might accidentally have something highlighted that I actually do not want AI engine to access at all. So some convenient “Safety” mechanism would be a very welcomed and useful option…
Cheers…

shiiko · May 13, 2025, 4:49pm

+1

Because of a separate post that I wrote, DT4 AI cost tracking I found that a log is made of the Chat interactions. Found here:

~/Library/Application Support/DEVONthink/Chat.log

After reviewing the log I found a bunch of references to documents I would prefer not to have been accessed. My bad. I was testing. I thought that I was being careful. From now on I will monitor the Chat.log closely.

Never-the-less, your comment is on point, human error is the weak link. We need to be able to clearly place fences around the information that can be made available for AI processing.

vinschger · May 13, 2025, 4:49pm

It might be helpful if the settings allow an extra request in the chat. This request could list all the documents that will be used and have the user confirm them before sending them to the LLM.

cgrunenberg · May 13, 2025, 5:19pm

Your documents are local, therefore web search, no matter whether performed by DEVONthink or e.g. by Perplexity, doesn‘t matter.

RobH · May 13, 2025, 6:25pm

Either I didn’t pose the question properly or I don’t understand your answer.

What I was trying to ask… if the AI is only given access to the database, are its answers only derived from the information found in the database? For example, if I query it on some topic, will it limit how and what it responds with to the informaton contained only within the database?

BLUEFROG · May 13, 2025, 6:27pm

It is not considering your whole database (which is discussed in the help). As Criss mentioned, it is considering the selection either in the item list or Navigate sidebar.

This is easy to test on your own.

RobH · May 13, 2025, 7:18pm

Thank you for the clarification.

BLUEFROG · May 13, 2025, 9:19pm

You’re welcome.

cgrunenberg · May 14, 2025, 5:50am

LLMs might use their own trained knowledge too, it’s not possible to strictly separate this.