Which database is used (robustness)? Windows import?

192022 · June 19, 2022, 10:37am

Hello, I’m a Windows user unhappy with the usual SQLite db which serves as the back-end of the usual information manager applications (Ultra Recall, RightNote, MyInfo, MyBase, etc.), since SQLite does not seem to be sufficiently scalable, i.e. I have to divide my data (about half a million items, mostly formatted text (rtf format), but some thousands also with pics (reasonably-sized, in the 50,000 to 200,000 bits size, no MB sizes) within the text items (i.e. the pic code within the rtf code, “blowing” up the respective item size).

In those proprietary SQLite databases, I have to divide my data into chunks of up to just 10,000 (for a bad app), or then, up to 50,000 items (for the best app) per database file, since beyond those numbers, serious speed and even stability issues arise.

Thus, I’ve got not one database, but about 30 databases, and since the applications mentioned above rely upon SQLite’s inbuilt full text search, for searching, which is database-specific, of course, there is no global search scope, but I have to to search every database in which the search targets might be located, separately.

I’m so much fed up with this need to handle 30 databases, instead of just one, that if I found a Mac application really robust enough, I would buy a Mac and that application, but only in that case.

Thus, you understand that I can’t trial Macs, and can’t trial DEVONThink, so please forgive me to ask questions to which I could find out the respective answers if I had those trial possibilities.

For DT (regular version), what database do you use? FileMaker, probably? Then, 1 million of items (db records) should not be a problem? (I’ve got about half a million items to import currently, but I would obviously need the processing capacity (not only the storage capacity which obviously isn’t the problem) to add many more.)

If your one-seat version isn’t robust enough, I might have to buy your server version, so does that version use another database? Which one? And could I use that version without problems on one regular Mac, or would the use of that version imply further costs? (e.g. another, more expensive “server” OS beyond the regular OS included in the regular Mac purchase price?)

Would the import task, from Windows, be realistic? My current application allows for OPML export, but runs on NTFS file system, obviously, and I don’t know about the Mac file system; OPML export-import may not be entirely reliable (the pics?), though.

On the other hand, whilst I don’t know about the Mac file format, my current data being stored in those (Windows) SQLite files, mostly in ANSI format, it’s just the content fields’ contents (i.e. the rtf text with or without pics) which are stored in a special table (one per db), in one single special column, in so-called (binary) “blob” format, and ideally, that “blob” format might be readable by DT, i.e. I could, by script (I know about scripting and SQL), create the necessary tables for DT on the Mac side myself, read the Windows data (incl. the blob data) into them, and then DT could open that compound as a DT database? (If then, for the blobs, it said, “Unknown file format”, all my “real” data would not be accessible anymore, of course, all the “text” data being meta data after all, so the blobs might represent some unresolvable problem? (I don’t know if “blob” is standardized, or specific in Mac vs. Windows, or even specific in my current application.)

Thank you very much!

BLUEFROG · June 19, 2022, 2:43pm

The underlying database is proprietary, not SQL, etc.
A comfortable limit on most machines is 250,000 items per database. Resource consumption increases with the number of items and databases so the more RAM, the better.
DEVONthink can search all open databases.
The Server edition has a built-in web server allowing you to broadcast a DEVONthink database on a network. There is no difference in format.
There is no direct import from SQLite. The best option in this case would be to investigate a Windows tool that would export the data as RTFD files which could be imported into DEVONthink.

192022 · June 19, 2022, 5:07pm

Bluefrog, thank you very much for your very kind and details answer, I especially appreciate your honesty about DT’s scalability.

Thus, if I understand you correctly (just correct where I may be wrong):

I would have to create several DT databases but whenever I open more than just one of them, concurrently, I can then switch / set the search scope to “all open databases” or the like, vs. “just the current one”.

Server edition will not provide any additional functionality ( or “power”) in my given, one-seat scenario.

In order to administer (your “reasonable limit of”) about 250,000 items in one db, I should better buy a 32 GB Mac, instead of a 16 GB Mac…

You speak of “most machines”; I would then buy a 32 GB memory desktop machine (“today” or in some months, but “short-term” indeed), with their modern “2” processor, but indeed, I would prefer to be able to put ALL my data (realistically: about 500 or 600,000 items, including the next 5, 6, 8 years or so) into just ONE db - I spoke of “half a million” above, but in fact, my current data set (distributed over about 30 databases) counts up to some 320,000 or some, so with a “capacity” of “just” 600,000 items, I could probably “live” for 8 more years or more!

Whilst doubling that working memory capacity again, i.e. from 32 to 64 GB, would NOT double the “reasonable item count limit”? So obviously, DT loads lots of items into work memory up-front, but not all of them? (I admit I don’t currently know prices for that work memory doubling, e.g. buying a 64GB work memory Mac, instead of “just” a 32 GB work memory Mac, would be financially “insane”? But if your “most” is “16 GB work memory”, with 32 instead, I then could “get” up to 600,000 realistically?)

I understand that from DT using a proprietary db, “access from the outside” to my data would not be possible? (What I have in mind is data retrieval in “combinations” the inherent DT search functionality might not provide in case; given that DT is said to be far superior than what I use today, that might not be a problem though, at the end of the day.)

It’s the import from Windows which seems to be almost (or then, even really?) impossible? I have read the explanations on the binary .rtfd file format here, NOT_ALLOWED , but there does not seem to exist any tool whatsoever currently that could translate SQLite 3 data and/or its blob data (i.e. another binary - possibly but not necessarily standardized - format containing more or less pure (!) rtf code (but which would have been originally human-readably whilst in blob format it is not anymore)), and I’m not able to write the necessary code for such a transposition into the Apple-proprietary .rtfd file format;

given that up to now, I had thought that opml was a “standard” format to transfer “outliner” files in general, I had hoped that DT would be able to import such .opml files, in case without the pics? (Since I would be able to write the necessary (Windows) script to “fetch” those pics (together with the corresponding IDs, i.e. record identification numbers), from within the rtf text in my app’s “contents”, and then, there would probably have been a way to “import” those pics into the DT database (but which would again imply “external” access to the DT db)? But opml possible is just a “transfer” format for “outliner” data within Windows?

As implied above, I don’t know Mac at all - I’ve been with numerous Windows computers, both desktop and mobile, for several decenniums -, but I’ve known other file storage than NTFS and SQL, e.g. by XML, so it appears quite unfortunate from my point of view (i.e. in my situation) that DT then uses its proprietary db, within .rtfd, so obviously by storing rtf data (i.e. the same format which is then, by my current Windows application, into “blobs”), instead of some more “open” format, since I possibly could write then, the necessary script to transfer (human readable) opml into (equally human readable) xml / opml import into DT.

So, is there seemingly no way at all?

Thank you very much again, Bluefrog, I appreciate your presenting the facts as they are!

(But with no solution whatsoever, I would remain stuck within my current Windows app, and with its above-mentioned, very “limiting”, limits - coping with about 30 databases, instead of just one, sometimes becomes what they call really “atrocious” -, so some “investment” - buying a “good” Mac, and buying DT, and doing some (Windows) scripting, would be worth the effort after all…)

there is, as far as I know and as my look-ups have informed me, no Windows applications currently, similar to DT’s capacity (I would be happy to also use DT’s AI functionality, but obviously and currently, I do without it, so that’s not my “core requirement” here, just “(almost) all in just ONE db, and global (!) search”), and
obviously, DT will not be adapted to Windows (in any foreseeable time at least),

so I would have to import “my stuff” to Mac-DT, in order to find a solution to my problem; do you, or then some “specialized” collegue of yours, on weekdays, envision some path of solution to this problem, perhaps via third-party Mac software (Windows-opml being correctly (i.e. preserving the tree / sub-trees and the content-formatting, perhaps without correct rendering of the pics) read into that third-party Mac tool, then that tool’s correct import into the DT db?

I might not be the very first, previous Windows “user” to look after a better “home” for their data? A google search for “devonthink windows import” just brought a reddit conversation on Windows alternatives to DT (and obviously, there are none, NOT_ALLOWED ), this page NOT_ALLOWED speaking of an Android emulator for PC, in order to access DT data on a PC, and then this forum’s NOT_ALLOWED , for getting “files” from Evernote or Dropbox into DT: nothing about opml or other “outliner” data import.

Then, for google “devonthink import”, there is NOT_ALLOWED and NOT_ALLOWED …

Should / must I give up, or may there be realistic ways then (as said, on work days, we’re Sunday today, so this is really a bad moment)?

Thank you very much again for your interest in this matter!

I had to delete ALL my bona fide links… so please believe me I tried to find a solution…!

rmschne · June 19, 2022, 6:22pm

i do not know enough about your databases to say if or if not moving to DEVONthink will be fruitful for you. But my observation to share is that DEVONthink is nothing like a SQL database unless one “squints” when looking.

But instead of changing everything have you considered changing to a more capable SQL database, e.g. MySQL or PostgreSQL. Both work on both Windows and Mac and are free of license costs. Probably could do all in one database and there are probably some easily available tools to convert from SQLite.

chrillek · June 19, 2022, 7:38pm

You’re right of course regarding the database. I have the impression, though, that the OP is not directly working with SQLite through programs that I don’t know. In that case, switching to MySQL or PostgreSQL would only help if said programs supported them.

Also, we don’t know anything about the kind of data the OP is working with. They might be better dealt with SQL then with DT.

rmschne · June 19, 2022, 7:43pm

nothing in the posts is clear to me and i think i have a handle on both DEVONthink and SQL databases. I actively use both.

cgrunenberg · June 20, 2022, 8:03am

And the largest database I’ve ever seen contained more than 5 million items and billions of words, exceeding the recommended limits about 20 times. It all depends on the computer and also the user’s patience what’s acceptable or not

rfog · June 20, 2022, 8:36am

Mmmmm… I recognize myself in that. Not so many items in each DB but my total count in 18 DB are near 40.000 items, eighteen million unique words (f*cking OCR!!!) and 1.3 billion total words… in a 16 GB M1 with 1 TB disk.

But what @192022 must take into consideration, and it is the most important thing, is DT works differently to those programs. He must change his mind. Forget database sizes, etc., if he decides to go into DT.

I have tested some of those cited Windows applications (and more) and found them very childish in relation to heavy usage. Not to speak about data loss, not finding items and not reliable syncing. I really wonder if some professionals are using those things to do serious work and investigation.

As said, @192022 you must change your mind if you want to use DT. DT is different in Steve Jobs meaning of “think different” (I really think it is the last “think different” thing into macOS/iOS). Perhaps you can export some of your data, go to an apple store, install demo DT version, and play with it.

Nearest applications to DT in Windows are File Center and DocFetcher. The first one is slow like hell, prone to crashes, slow, slower, and as it uses Windows integrated search technology, searches but does not find. Do I’ve said it is slow? The second one, well, second one is a homebrew project, exceedingly difficult to use, ugly, half done and with some never resolved issues, even the paid version. And Java dependent, with all Java limitations in performance and memory.

cgrunenberg · June 20, 2022, 9:21am

BLUEFROG · June 20, 2022, 12:21pm

Indeed.
My definition of “comfortable” minimzes eye-rolling, teeth-gritting, and cursing

rkaplan · June 20, 2022, 12:26pm

Agreed- the nature of the database and types of queries performed makes a big difference.

For some document search features on OCR’d data, DT3 exceeds performance of SQL databases I have tried.

In general I have found DT3 to be limited only if I try to set up thousands of tags. If my number of groups and tags is in the hundreds, I have found the database size numbers DevonTech quotes to be quite conservative; in many situations DT3 can handle much more. Generally DevonTech is understating the capability of its software.

Most notably - I have found that in the era of fast SSD software, the speed/number of processors and amount of RAM do not change DT3 performance much beyond 32Gb RAM with one exception. If all you plan to run is DT3 then it will run well on almost any modern Mac. But if ( as is very likely) you plan to multi-task with multiple apps open, then more capable Mac hardware is desirable.

rmschne · June 20, 2022, 12:47pm

All of the above is true, of course. But I use SQL databases when I am working with relational data models and over the years my models have evolved to scores of tables (and millions of records) in the database. Typically interfaced with specialised apps using so-called “object-orientated” ways of doing things. DEVONthink is a “document manager” and a good one at that.

BLUEFROG · June 20, 2022, 1:15pm

I would have to create several DT databases but whenever I open more than just one of them, concurrently, I can then switch / set the search scope to “all open databases” or the like, vs. “just the current one”.

That is correct.

Server edition will not provide any additional functionality ( or “power”) in my given, one-seat scenario.

That is correct - the Server edition wouldn’t improve the situation.

In order to administer (your “reasonable limit of”) about 250,000 items in one db, I should better buy a 32 GB Mac, instead of a 16 GB Mac…

We always suggest getting as much RAM and internal storage as you can reasonably afford.

As noted by @cgrunenberg, there are some known much larger databases in the wild and yes, a more powerful machine would be more performant with larger databases. The figures we cite are for the average machine and a pleasant experience for the average user.

I understand that from DT using a proprietary db, “access from the outside” to my data would not be possible?

Many people needing access to the same data use indexing, essentially linking to items outside the database while still acting as if the data is imported, e.g., searchability.
However, indexing isn’t the default mechanism for a reason. There is an intimate connection between DEVONthink and indexed data, so you must understand how DEVONthink works with these files. Indexing is discussed in the Help > DEVONthink 3 Help > Documentation > In & Out > Importing & Indexing or manual, especially the Indexing and the file system section. Pay special attention to the Indexing and the filesystem section so you understand how actions in DEVONthink can affect files in the file system.

DEVONthink can import OPML files. However, they are definitely not all standard. You’d have to test with an OPML file of your own.

RTFD is just one supported format and a long-standing favorite on both platforms. You could see about exporting to Markdown, a much simpler plain text format.

You’re very welcome

192022 · June 21, 2022, 9:24am

Thank you very much, Bluefrog from DT and then real users, from the user side if I may say.

I’ll have a thorough look into the linked material (handbook pdf, etc).

What I am looking for, quite desperately, is - some users have this perfectly identified indeed - somethin like my current SQLite front-end, just with a much more powerful database as back-end, and the user is right who says he doubts such front-ends as the one I currently use, are used for “production work”, they all become unwieldy beyond a certain number of (formatted text) records - they are not meant for number crunching or for standardized sql data, they just juse SQLite as their back-end, but, and that’s why they are used by me and many other users, they provide “outlining”, i.e. within their SQLite tables, they also provide the necessary data - most of the time, in the form recordID - title - links to other tables, for various content (content field, blob here, opml export possible, which is then sort of a - not very standardized, but human-readable and thus scriptable - XML format; then perhaps a “comment” field and perhaps some other fields, not many), and the - indexed - field (or “column” as it is called in sql) “recordID of immediate parent”, with in case multiple entries IF the front-end in case allows for cloning items, i.e. items (which may be parent items to whole sub-trees, too) to appear in more than one position of the global “tree” (so this is not really a tree then, but sort of a “graph”, in simili-tree form).

From the above, it appears that the “tree”-building (i.e. what you then see as the “outline”, in a special pane) is done in a rather primitive way, with extreme recursion to gather it all, again and again - since there is no information stored “this item has got the following child items”, all this has to be built again and again, in run-time (There are other ways of “tree-building” in SQL but they are not implemented into these, regularly 1-developer, tools, so from a conceptional / developer point of vue, it’s assumed that the regular user of these tool will NOT grow their material, i.e. their item number, beyond certain “limits”, which are not “hard”, but beyond a certain number of necessary recursions, for “finding” the child items by the sole (if indexed) info “parentID”, it simply becomes totally unrealistic - so it’s much more the “design” that limits the use of these tools, than it would be SQLite’s fact of it not being a “serious” sql db, as (free and extremely powerful, including, just like SQLite, in-built text search) Postgres would be (MySQL is beyond reach, financially, and not needed either, same as Microsoft).

This being said, Postgres could store millions of items of course, and I would assume that somebody who uses it as the back-end for their “outlining” front-end, would know that they would have to implement the “tree-building” in a more sophisticated way (as said, those ways are known and can thus be implemented, by a hightly-qualified developer), so as to not slow down the access to the documents by inappropriate tree-building design: better access implementation would there come “automatically”, with the powerful db, whilst for “consumer stuff” like I currently use, the developers assume it’s “good enough as it is”, the access to the documents in a higher-up 4-digits, or a not-so-big 5-digits range, being “fast enough” - selling a Postgres db, with the same, “sub-standard” access would be ridiculous, since the user expectations than are much higher.

When I speak about “access”, it’s always the “tree-building” or sub-tree-building, i.e. the access to complete “compounds of data”, not so much single, or such some, records; also, moves of whole sub-trees in such a “consumer” tool may become slow or even not entirely reliable, according to the tool - it’s not so that they are “good for nothing” though, some of them remain stable, without data loss, but with wait times then of up to several minutes, due to the above-mentioned, poor “design”.

The thing with trial being, you would really need all of your stuff to be imported, then trial, in order to get a realistic perception of how your data will then really be handled, with some data which doesn’t even represent 5 p.c. of your total data set, no such realistic appreciation is possible - in Europe, you can send back bought (mail-order) hardware within 14 days, but that would become a quite incredible time run then, one should need to have prepared all the necessary import scripts (i.e. here: the script for adjusting the export opml to the import opml, so both would be needed to be known before) before delivery of the hardware.

So this seems to be a way indeed, xml adjustment being absolutely doable, this is WAY more realistic than to fiddle with blobs, between Windows and Mac.

I currently do not know any Windows office tool or the like, let’s say in the 1,000€ range instead of the 100€ range - all the aforementioned consumer tools are less or up to 100€ only, so they are incredibly cheap, but as explained above, you get what you pay for, and you then have to divide your stuff into multiple databases, which is really very “unprofessional” then.

There might be some “groupware”, on rent basis, but I’m clearly not willing to pay probably 80€ per month, or more, for “1 seat”, and yet, they would probably want to force me to rent at least 2 or 3 seats, since it’s groupware in the end, and they don’t address the “personal” market…

I’ll try to better understand DT’s architecture from the links you kindly gave me; I’m not really “fixed” what to do in my situation; I have got to say I would be much more confident in my W>Mac voyage if DT used some standard, powerful SQL db (Postgres being the only candidate here indeed, you couldn’t sell the same with MySQL or MS, both would cost much too much) where it would be said, “7-digit number of documents, no problem”; on the other hand, I know that sql, for such work situations, is considered far from “ideal” nowadays, to begin with, so, objectively, DT using its own, proprietary, (as I understand it) rtf (not xml) document db, is NOT to be considered a downside, just that psychologically, going PC>Mac, and then not even preserving SQL format (what I know and can handle, for queries, updates, etc.) - so I might be in for perhaps quite unpleasant surprises - “this is not possible, that isn’t either”, where with sql, it would be - from my particular previous “experience”.

And I “got” the hint, from the above, that with 32gb memory, I shouldn’t run much else, concurrently, or then would need even more - all that at Apple prices since they don’t allow you to buy your memory from other sellers, and even upgrading the memory might be impossible: they are notorious for locking in the user into any given hardware, bought “as is”, so buying a Mac with “just” 32 gb would be another real risk for me, since if afterwards, I see I need more, I’m stuck again…

I’m not so fond with Mac, I just hear brilliant, very tempting news about DT…

Thank you very much for your very kind help, in the meantime!

rkaplan · June 21, 2022, 12:12pm

@192022 Would you be willing/able to share with us a bit more about the nature of your database and the ways you work with it? If it is not proprietary, some actual screenshots and an understanding of your workflow/goals and the topics/content you are working with may be very helpful.

What I have observed myself with DT3 and others have noted too is that the ways I organize and search information in DT3 is different from how I approach other databases and different from what I anticipated when I first began using DT3. There may be features of DT3 which allow for document/information retrieval more efficiently than you first anticipated given the native features in the software. Or the reverse might be true; there are some workflows that are better suited for DT3 and others that are better suited for traditional relational database.

chrillek · June 21, 2022, 3:41pm

I’m sorry, but I really do not follow. Might be just me, but I do not understand what you’re doing nor what you’re looking for.

You’re using apps (which I do not know) for some purpose(s). Which is/are these purpose(s)?
What are these “trees” you’re talking about? Which kind of data do they store? What is the role of XML in all that?
If (as I kind of seemed to understand from your description) the apps you use rely on a relational database for their functionality, DT is definitely not the way to go for you. DT is very much a tool to manage unstructured data, similar to a NoSQL database.

If you rely on primary keys, relations like master child and all that, you will not be happy with DT. I know that some people here think that it’s the hammer to every kind of nailish looking piece of metal, but I tend to disagree. If you have relational data (and even after your very long posts I have no idea if you do or don’t) and rely on SQL to acess it, DT is not for you.

I do not quite understand why the free (as in beer and in free software) PostgreSQL (not Postgres, btw) is within reach while the free (as in beer and in free software) MySQL is “beyond reach financially”. And then there’s still MariaDB, equally free on all accounts. Of course SQLite is not a serious SQL database (and it was never announced as such, BTW). But there are enough such products available for free. Even Oracle is free for private use and moderate data sets.

Having said that, SQL is not really ideal for building trees, though it is feasible.