DT4 - privacy when using AI

cgrunenberg · April 11, 2025, 6:16am

By default generative AI (via chat assistant, batch processing, smart rules or scripts) uses only selected documents. Optionally the chat assistant might use a database search (see Settings > AI > Chat) but the search is also limited to the current selection in the item list or, if there’s none, in the sidebar.

But DEVONthink does never send your original documents:

Image files are scaled and recompressed and send without the original metadata. PDF documents without a text layer are handled likewise, thumbnails of the first n pages (depending on the model) are used.
In case of text-based documents only the raw text or excerpts of it are used, again no metadata
In case of audio/video files the transcription (if available) and n video still images (if supported by the chat model) might be used
Transcribing audo/video files extracts and recompresses the audio track and sends only this to Whisper (if selected in Settings > AI > Transcription)

Furthermore, DEVONthink anonymizes links (including email addresses) to improve both the privacy and to reduce the likelihood that the response will include invalid links as LLMs don’t like stuff like UUIDs or session identifiers. This saves also tokens.

Finally, in case of commercial models supporting tool calls (currently all except Perplexity and Gemini 2.0 Flash Thinking) data is only send on demand and not in advance. And commercial providers do not use API requests (like the ones of DEVONthink) for model training.