DTPO - file size and memory concerns...again.

Carts · December 5, 2006, 8:10pm

This is an old topic made new by the email and OCR features. I am, for the most part, a single dB type user. Importing email and scanned documents would make for a rather large file, and would use up quite a bit of memory. As a point of reference, my email alone sits at about 27k messages which creates a database of approximately 1 gig.

The question: there has been some talk of a move to an alternate database engine, one which would not load as much information into memory. Is this still the company’s thinking? Should I be concerned about maintaining several smaller databases? Some of the AI is lost with multiple files.

Carts

Bill_DeVille · December 5, 2006, 10:36pm

The revised database structure planned for a future revision will somewhat reduce the memory footprint.

But for those with large collections of documents I’ll probably continue to recommend consideration of topical databases, both to keep the databases speedy within any constraints of the RAM resources of a computer and to assist the AI features by keeping less topically relevant materials out of the databases.

In my case I’ve got a separate database into which I’ve archived my collections of email from Entourage and Mail. It holds almost as many messages as yours, many with attachments. I don’t often need to access the older messages, although when I do, I really appreciate the improved searching/filtering features in my database.

Some of the more recent messages in that collection, however, are relevant to current projects. So I identify those for export to the appropriate database.

The needs of other users might lead to different decisions. For me, the email archive is a secondary resource. For others, it might be a primary reference resource to be incorporated into their default database.

As to scanning/OCR with DTPO I’m approaching 2,000 pages scanned through my ScanSnap to DTPO databases.

Most of those scanned documents don’t fit the topical coverage of my main database. The majority fit into a special topical database covering administrative and medical policy and procedure documents for a health care facility. Most of the remainder fit into a database of my financial records.

I’ve probably got well over 100,000 documents in my various DT databases. If I were to consolidate those documents into a single database the performance I’m used to for searches and AI functions would take a big hit, as I don’t have a computer with enough physical RAM to avoid diving into heavy use of Virtual Memory, which is disk-based and slows down many processes if I used a single database. My MacBook Pro has 2 GB RAM, and my Power Mac G5 has 5 GB RAM. I try to hold my individual topical databases to a size that’s comfortable and fast on my MacBook Pro, roughly 20 to 25 thousand documents and a total word count up to about 25 million words. (A Mac Pro fully populated with RAM could handle my document collection in a single database, but I would have trouble running it on my notebook computer, and I favor portable databases.)

Perhaps more importantly, I make a lot of use of searches and AI features when I’m researching material. I’m interested in the history of science, which fits into my main database. But I’ve also got a very large collection of materials about the Apple Newton. Documents by or about Isaac Newton are in my main database, and when I search for “Newton” they are what I want to find. In that case, I wouldn’t want to see thousands of hits for the Newton PDA. Nor do I want to confuse the ‘See Also’ operation by mixing those materials. I think it makes good sense to separate such materials topically into different databases.

By carving out my collection of documents into topical databases that fit comfortably into the capabilities of my MacBook Pro I can switch between them fairly quickly. The occasional documents that fit in more than one database can easily be exported from their current database into one or more additional databases.

Even when, in a future release, memory requirements are somewhat lessened and it will be possible to have multiple concurrent databases open, there will still be ultimate resource restrictions, especially of physical RAM, on a user’s computer. At some point cross-database searches will be possible, so that a search result can be opened in a different database; as a practical reality smaller databases will always open more quickly than larger databases.

I’m spoiled. Many of the searches on my databases can be completed in a few milliseconds. And when I’m running a series of See Also trails, or using Classify on a series of documents I want real-time interactivity. That wouldn’t be possible (and the results would have far less ‘focus’) on my MacBook Pro if I tried to run a single database compiled from all my existing databases.

Perhaps there’s a sense in which everything is related to everything else. But as a practical matter I have little trouble creating topical databases in which the relationships of the items contained in each one are infinitely richer than to the contents of my other databases.

Back in my days as a professional graduate student I picked up more than a hundred hours in philosophy and logic. Lets say that I wanted to manage philosophical books and papers that would assist me in analyzing them. Would it make a lot of sense, for example, to include in a single database everything dealing with Aristotle, Plato, Aquinas, Kant, Sartre, Hegel, James, Carnap, Ayers, Hume, Locke, Whitehead, Russell, Popper and so on? Not, IMHO if one hopes to make much sense of the basic differences and approaches. Similar terminology, for example, doesn’t mean similar concepts. Nor, for that matter, do similar concepts mean similar terminology.

If I were a graduate student in philosophy these days I suspect that I would find DT Pro very useful. But I’m pretty sure I would have a number of different databases covering my studies and research, in order to make the most effective use of DT Pro.

Carts · December 5, 2006, 11:21pm

Thanks Bill, as always you are most complete. I take your point that the results from searches would be cleaner when databases are topical. I will think through my normal work processes and see if I can’t find a nice logical structure to break out a few separate databases.

I suppose I would have been more eager to do this if one could have more than one database open, at at least searchable, at any one time. It’s amazing how impatient I have become, especially when I can recall searching endlessly for paper files, or simply abandoning the notion of finding filed information, in the not so distant past.

Once again, you are the voice of reason.

Carts

BTW - you’re many answers on this forum have been extremely helpful to me. If you’re ever near Waterloo, Ontario, drop me a private message and I’ll buy you lunch. Many thanks.

ndouglas · December 7, 2006, 6:03pm

That’s when I reach for my revolver.

Bill_DeVille · December 7, 2006, 6:42pm

Hi, kalisphoenix. The 1960s were glory days for professional graduate students.

I could go to good universities for free. I had a tax-free federal stipend that was enough to live on, by itself. While registered in one discipline I could work full-time as a research associate with top-flight research professors in other disciplines, paid pretty well and getting publication credit. In the summers, I was on the visiting faculty in still another discipline. I could count on honoraria from the National Science Foundation for presentations at several meetings around the country, travel expenses paid. And travel expenses for serving on advisory committees at a couple of other universities, e.g. Case-Western Reserve and SUNY at Albany. Plus the occasional editing job.

I haven’t been able to afford a Porsche since then.

alexwein · December 7, 2006, 8:32pm

"Back in my days as a professional graduate student I picked up more than a hundred hours in philosophy and logic. Lets say that I wanted to manage philosophical books and papers that would assist me in analyzing them. Would it make a lot of sense, for example, to include in a single database everything dealing with Aristotle, Plato, Aquinas, Kant, Sartre, Hegel, James, Carnap, Ayers, Hume, Locke, Whitehead, Russell, Popper and so on? Not, IMHO if one hopes to make much sense of the basic differences and approaches. Similar terminology, for example, doesn’t mean similar concepts. Nor, for that matter, do similar concepts mean similar terminology.

If I were a graduate student in philosophy these days I suspect that I would find DT Pro very useful. But I’m pretty sure I would have a number of different databases covering my studies and research, in order to make the most effective use of DT Pro."

Well, Bill, I AM a graduate student in philosophy, and I have everything housed in one database. This is, in fact, the reason why I have DT–to do just that. I have folders on most of the thinkers you cite above and many more, and I’m not really interested in their differences and approaches. I’m interested in one thing–having the information I need ready at hand when it’s needed. DT for me is a warehouse into which I dump everything that might possibly be relevant to my work and then some. The beauty of DT for me has been that it lets me do my research and move where I need to go, and allows me to get information into it at the touch of a hot key, without interruption. But to have to wonder which db I have open to be sure an item goes into the right one, etc., well, that would not work for me at all. To have to close and open a number of different dbs sounds quite tedious. And an incredible waste of time. I get that it isn’t so for you. But the whole purpose of having this program is to have a quick, easy place to dump things as I move around doing research or working on a project. And then to quickly find it and make associations with other materials that may be relevant or interesting.

For example, I’m writing a chapter on the concept of intention. I went to my one large db and did a search for just that word, ‘intention.’ Yes, I found some irrelevant stuff, but mostly I found things that were not only pertinent and helpful, but things I long had forgotten I had. And materials I never would have even thought of looking at, which actually had some pretty interesting avenues to follow. How wasteful to have to do this over and over again, in every little db I might have pertaining to such and such a thinker.

That said, I so use tagging with the comments field, sorting by smart groups or manually by groups to keep things organized, in a quick clean up of things after I get it into DT. I also use separate group windows to give me the sense of having different dbs, so I can see things separated out. But with everything actually together, I can do my searching and See Also and all the other things I’ve come to depend on.

So, that all said, now you have me worried about whether I’m compromising DTs AI function! Because I do rely on it. So please, Bill, tell me if the way I’m working is not going to work with DT.

Many thanks,

Alexandria

Bill_DeVille · December 7, 2006, 11:21pm

Hi, Alexandria. Not to worry. There’s a lot of value in your approach.

I was thinking more about projects such as analyzing a school of philosophy – a group of writings having a common theme – or perhaps comparing two such schools.

For example a pragmatist might well use the terms “thesis”, “antithesis” and “synthesis”. But not at all in the context that a Hegelian would use those words. I once heard a pragmatist kidding about a Hegelian Christmas myth, in which Santa Claus would be replaced by the triumvirate of Santa Claus, AntiClaus and SynthiClaus. In riposte to one of the pragmatist’s positions, a Hegelian dismissed it as a “mere fact”. (One gets the impression that facts aren’t important to Hegelians.)

I took a course under Karl Popper. When Popper talked about the use of hypotheses in science, he wasn’t as much interested in whether a particular hypothesis was true or false, as in the question as to whether the hypothesis is “falsifiable”, i.e. testable. Lots of writers since have agreed with Popper, disagreed with him or proposed modifications of his approaches. But there’s a common ground, or context, to that set of writings. If I were working in that area, I would probably spin that set of documents off into its own database, as the AI functions would more accurately ‘see’ similarities in the use of terms such as “true” or “false”. (Those terms as used by Plato or Aquinas would have a different context.)

Then there was the excellent Polish logician who addressed the problem of describing in mathematical terms how one can describe goals/objectives/intentions and formulate decisions. He called it (in English) the logic of intensions (with an “s”) and decisions. Perhaps more directly relevant to the problems of artificial intelligence than to your analysis of “intention” – but perhaps not wholly unrelated.

alexwein · December 8, 2006, 12:36am

"I was thinking more about projects such as analyzing a school of philosophy – a group of writings having a common theme – or perhaps comparing two such schools.

For example a pragmatist might well use the terms “thesis”, “antithesis” and “synthesis”. But not at all in the context that a Hegelian would use those words. I once heard a pragmatist kidding about a Hegelian Christmas myth, in which Santa Claus would be replaced by the triumvirate of Santa Claus, AntiClaus and SynthiClaus. In riposte to one of the pragmatist’s positions, a Hegelian dismissed it as a “mere fact”. (One gets the impression that facts aren’t important to Hegelians.)"

Yes, you are quite right about the contexts and different senses of these terms, and there are times where it would be better to have the different ‘schools’ separated out this way. But in my own work, it is actually much more interesting to have it all together and have to parse it out on my own. There are times where these different senses are very interesting and things come to light in very unexpected ways through exploring these different senses. So having it all together can actually stimulate some unique ways of approaching issues. Using your example, in a way, it would be almost like getting the pragmatists and Hegelians into an interesting dialogue which may or may not be fruitful. But I would want to decide that–that in fact is part of the work for me.

Your Santa Claus example is funny. I wrote my senior’s thesis on Hegel (and Marx) and he’s actually a lot more interesting than the ‘Hegelians’ who followed.

In any case, I’m glad to hear that my approach to using DT is not ‘breaking’ the AI functions in some way. Maybe complicating them a bit, but, as I implied, I mostly like the complications! Makes for some interesting suprises.

Take care, and thanks so much for responding so thoroughly!

Alexandria

PS Any idea why suddenly I can’t use the ‘quote’ function? When I preview it I see the tags as text. HTML is off, perhaps that is why? I don’t know why that would happen all of a sudden. I also don’t see a way to turn it ‘on’ if that is the problem.

alexwein · December 8, 2006, 12:46am

No, it doesn’t have anything to do with the hmtl option, it seems, since it’s off in other forums and I’m not having this issue. Weird.

Maria · December 8, 2006, 4:45am

OT: Alexandria, nice you say something kind about Hegel. We went to the same school, but he finished a bit earlier than I . These Hegel-quotations at school festivals pushed me away for a long time. Maybe I should give him a second chance…

Maria

alexwein · December 8, 2006, 3:52pm

Ha, I should hope he finished before you! Yeah, I have a kind of love/hate relationship with Hegel. There’s a lot more subtlety and insight in his work than he often gets credit for, mostly because of the influcence of the “Hegelians” who followed. But I’m also glad to be done with him, as I am with most German philosophers of that ilk. I work with postmodern thinkers and, well, whatever you want to call philosophers (and others) who are still alive!

Take care,

Alexandria

historydoll · December 8, 2006, 4:58pm

I’ve been following this discussion with some interest, since I am in the early stages of putting my db together (though in the late stages of my dissertation–wish I’d gotten DT sooner!). I’ve been putting together one big db, for the same reasons that Alex mentioned, i.e. the possibility of cross-connections that I might not have seen on my own. Bill, you seemed to be implying that the AI might be less accurate with a bigger databse. Could you expand on that a little?

Thanks.

Bill_DeVille · December 8, 2006, 6:35pm

Hi, historydoll. It’s all there, above. There are reasons to integrate everything into a single database, if you have the computer resources to run that database quickly; and reasons to separate materials into topical databases, either because of computer resource limitations or because for some materials the AI features, which involve contextual analysis, could function better for some contexts.

I’ve collected more than 100,000 documents into my various databases. As I like the ability to transport any of them to my MacBook Pro, all of them run pretty quickly on the notebook computer. That wouldn’t be true if I ran everything in a single database, as many of the things I do would slow down on the notebook with 2 GB RAM.

My main database at about 21,000 documents and 34 million words is actually more conceptually diverse than is Alexandria’s, so please don’t get the impression that I’m pushing for tight conceptual similarities in my databases. My main database, with a theme dealing with environmental science, technology, laws and regulations, policies and issues of many kinds is extremely diverse. There’s chemistry and statistics. There’s genetics, genomics and proteomics. There’s toxicology and pharmacology. There’s risk assessment and sociology. There’s biodiversity, conservation ecology and sustainability literature, along with global climate, energy technology issues, past, present and future. I’m heavily into exploration of alternative approaches to environmental standard-setting for regulatory purposes and comparison of approaches in the U.S. and the European Union (I’m very critical of the precautionary principle used in the EU). I’ve done a lot of work with international environmental science exchanges with developing counties and with issues of appropriate technologies in developing countries, and with access to literature resources in developing countries for graduate education.

Like Alexandria, I treasure the ability of DT Pro’s AI features to help me explore this diverse collection of information and to help me discover unexpected relationships.

But I do have some collections that don’t belong in my main working database. I’ve mentioned those, such as a huge collection of information about the Apple Newton PDA. And I don’t keep my financial information in my main database; that’s not relevant to my use of that database. Now that DTPO has a Web server that lets one ‘broadcast’ the contents of a database, that’s another reason why I would want to exclude ‘private’ or irrelevant information from such a shared database.

Comment: At any moment I probably do have material in my main database that I will export and remove. That’s because my main database is the one that’s almost always the open one. As I browse the Net or scan paper some of that material will go into groups destined to be exported to another database and then deleted from my holding groups such as financial information, the Apple Newton, etc. (I exclude those holding groups from classification, by the way).

If I were a graduate student in philosophy today I would, as does Alexandria, have a large database with ‘everything’ in it. But if I were doing a critical analysis of the literature dealing with the issues of Karl Popper’s Logic of Scientific Discovery I would copy that literature off into a new database for ‘tighter’ contextual analysis in that literature.

Although as Popper’s work is relevant to some ‘eternal’ issues in philosophy such as epistemology and metaphysics, I might well explore that, as well. Back to the main database for that purpose.

If I were a graduate student today in history, I would probably take similar approaches. Have at it.

Over time there have been changing trends in the disciplines of history. Broad brush approaches such as Spengler and Toynbee have been displaced by more reductionist approaches. I first read Arnold Toynbee when I was a child. I think I’ll explore him again, as the broad picture of struggles of civilizations appears very pertinent nowadays.[/i]

historydoll · December 8, 2006, 7:04pm

Thank you, that’s very clear. I find your patience to be amazing, along with your ability to communicate complex concepts–thank you!!

alexwein · December 8, 2006, 10:43pm

"If I were a graduate student in philosophy today I would, as does Alexandria, have a large database with ‘everything’ in it. But if I were doing a critical analysis of the literature dealing with the issues of Karl Popper’s Logic of Scientific Discovery I would copy that literature off into a new database for ‘tighter’ contextual analysis in that literature. "

Well, then we are saying the same thing. Only I take things from DT to Scrivener for project development. But I store it all in DT.

And I have plenty of non-philosophical topics in my db as well. I have many hundreds of documents culled from news sites concerning everything from current events to yogic asana practice, different yogic schools, all sorts of information relating to yoga practice and philosophy. Nutrition information, anatomy, zoology information, quantum physics, entire eBooks ranging from Dante’s inferno to Dostoevsky, language study information, information on history, art, mythology and much more.

Anyway, the point is, while my db is not as huge as Bill’s, my own collection is not as conceptually homogeneous as Bill implied, which hasn’t interfered with my finding the things I need quickly or in making interesting associations. And I actually do have two other dbs, one for ‘business,’ that houses receipts and anything that pertains to family business and one for storing my archived email.

I too am glad to read Bill’s responses. What I’m seeing now is that it might make things a bit more complex and you may get more hits that don’t pertain exactly to what you are looking for. And your db can get pretty huge. But none of this has been a problem for me as yet.

And historydoll, congrats on being in the later stages of the dissertation. I’m revising at present, which I hope to complete by the end of this month, then it’s onto readers reading it and hopefully scheduling the defense! Scary. But so exciting to be almost done.

Alexandria

historydoll · December 8, 2006, 10:57pm

Thank you. I’m not sure I’m actually in the later stages, I’m just out of time! You, however, really are “almost done”–best of luck!! Let us know when we can call you Dr. Alex

BTW, I see that you had the same “quote” problem as I did; it’s not the HTML option, it’s the “Disable BBCode” checkbox.

alexwein · December 11, 2006, 6:13pm

Will do. And thanks for explaining the quoting issue. It came and went and I have no real idea why. I didn’t change anything either way. Good luck, and hang in there!!!

Alexandria

ndouglas · December 12, 2006, 9:25pm

Bill_DeVille:

Hi, kalisphoenix. The 1960s were glory days for professional graduate students.

I could go to good universities for free. I had a tax-free federal stipend that was enough to live on, by itself. While registered in one discipline I could work full-time as a research associate with top-flight research professors in other disciplines, paid pretty well and getting publication credit. In the summers, I was on the visiting faculty in still another discipline. I could count on honoraria from the National Science Foundation for presentations at several meetings around the country, travel expenses paid. And travel expenses for serving on advisory committees at a couple of other universities, e.g. Case-Western Reserve and SUNY at Albany. Plus the occasional editing job.

I haven’t been able to afford a Porsche since then.

Sounds a lot better than my circumstances. I’m still working on my first Bachelor’s degree, with all of 293 credit hours I probably should have mentioned that I’m 6 credits away from a BA in Philosophy (along with five or six other disciplines). The amusing thing is that I actually want and need a degree, and I’ve been trying to get one for several years now – it’s just a streak of bad luck that strains the mind (and the soul).

Dberreby · December 30, 2006, 3:37pm

I agree, and I do this too. The only aspect I find a little bothersome is when I have a DT database active and I come across something that belongs in another one, but I don’t really want to switch databases. I have a ``Move to XYZ’’ folders in several of these databases, but that gets a little clunky. It would be great to have a Copy Selection script that gave me a choice of databases before showing me the choice of possible folders.

All of this speaks to the fundamental issue, which Bill alludes to: Being selective, doing information triage, you risk losing some possible associations among different bits of data; but when you’re open to all possible connections, you lose focus. I think DT handles this tension better than anything else I’ve used.

David

Bob_Sprague · December 30, 2006, 6:01pm

Since we cannot have multiple databases open is there a place for DevonNote in this issue? I was thinking of putting my random notes and household items in DN and my work and research in DTPO. I would then have two databases “open”. Someday DTPO will allow multiple data files open at the same time… until then… Transition from DN would be easy too.

Does anyone else do this? It may not solve the memory issues but it might address the “purity” of one’s work/research database. It might also be convenient.

[Also… I read in “The Balcony” that it was Bill’s birthday… I will also add my best wishes! Bill… hope you have a great year. Those of us who benefit from your input on the forum only know a small part of you but we appreciate and respect it deeply… thank you.]

-bob