They can, e.g., if you do a database rebuild, part of why we tell people to not access and use tile paths inside the database’s internals.
Just to muddy the waters on this - I’ve found Ollama kinda useful to ask simple questions - ‘what’s the word I’m thinking of but can’t pin down right now’.
I’d love to be able to ask it questions about documents in my DT database, particularly around PDFs and papers - ‘which documents mentioned this? what concept is that and which papers cover it?’
More obtusely - I’d love to be able to point it at my in-depth tag hierarchy, which I know I am using in a way DEVONtech frowns upon, and have it magically pick the right tags for a document.
Though Ollama works, it feels like a messy solution as-is. Why are these systems all accessed through JSON and HTTP interfaces, for one? It often feels like they’re engineered by a generation that’s only ever known the web. Guess I’m turning into that curmudgeonly old guy, finally.
It often feels like they’re engineered by a generation that’s only ever known the web.
I’d guess that not too far off the mark.
Guess I’m turning into that curmudgeonly old guy, finally.
Join the club
Elephas has a free trial. Give it a try. You might like it.
DEVONthink allows you to index an external folder that is not kept in their db structure. You can link that to Elephas to auto index, and then, use it for the very thing you described you wanted.
I like DEVONthink because of the structure I can give my documents, but I’m old. I don’t often remember stuff “exactly” so that’s where Elephas comes in. I ask vague questions and then eventually I find what I was looking for. Or, if I do want some condensation of knowledge from disparate parts it will bring things together into a cohesive whole — depending on the question.
I like keeping these two things separate. I don’t fully trust AI to mess around with my files. I mess them up enough all by myself, so I don’t want anyone else to help in that regard. So that’s why I mentioned that DEVONthink should NOT integrate an AI system.
The current generation of GenAI is by and large designed for deployment in remote servers, so it makes sense to have web-based interfaces. The primary paying customers of GenAI are, for the time being, companies who are looking to integrate AI into their web apps — think of the plethora of Electron-based PKM/note-taking apps spawned in the last few years, for instance. These companies are then responsible for developing an aesthetic interface between the end user and the JSON and HTTP stuff.
The industry does not care about making money (directly) from individual experimenters who deploy the system locally. It does not bother to provide better UX for this small group, either.
The industry does not care about making money from individual experimenters using local LLMs?
Or there isn’t much money to be made in that use case?
See e.g. this Reddit discussion for some food for thought.
My speculation about why local LLMs are typically open-sourced, rather than sold as paid software:
- To cultivate a talent pool of potential developers, as well as community code contributions.
- To avoid legal complications when local systems beyond their control generate controversial content.
- Good PR for the companies. Meta perhaps really needs that after a very public Facebook fallout.
- Market for the local models is very small due to their demand for computing power. The market potential is negligible compared to the cost of training and maintaining the models. Therefore, commercialization of local models would likely not be profitable.
It’s IMO a similar situation to Microsoft not caring about pirated Windows on private devices. (Literally every single person with a home desktop PC runs that where I live.) They could have cared, but it’s overall better for them if they simply do not bother.