Import, Index and overall scope of DT


#1

I am a nearly new user, about to get the pro version. Planning on Pro, not seeing need for Office yet, I have a SnapScan scanner that I will be using much more in the future, but I see DT as easily importing from or indexing a folder (the scanned to folder) and so direct import on scan seems unnecessary at this point.

I have been looking at the demo and the help files for a while now. I Have my own challenges which I will get to in a moment but my questions is, how does it work for others to import vs index files? It worries me to import ALL my files into a proprietary db. I am a writer and a photographer. Is DT a safe place to import my picture files? Or is DT just for text, web and basically… Research?

As a proprietary DB, if I did import all my files and deleted the originals, how are my files available in the future? Sometimes apps fail or some day in the future I may not upgrade DT and then what, can I still access my files that were imported? I have been a Mac user since the mid 80s, sometimes apps to stop being supported, it happens. Of course I may not need to ever do this, but I Want to make sure that I can if I want to.

It seems like DT will work better if I import my files, is that right? I can edit them directly, etc. But, I also use scrivener and Word for my writing. And I use DTP/Art/Layout programs for design projects. I am still assuming that DT is a huge organizational tool for the Mac. Did I miss anything? Is this going to work like the Finder as in, I can import my whole drive essentially? OR just the Documents but not the Microsoft DB (just did that and lost Outlooks files).

Other than making sure my indexed files remain in the same place and that I May not want to take my whole drive to go, why is import a smarter idea than indexing?

Still trying to get my head around this program.

About my own case: I have some real challenges with organization. I Had a pretty severe head injury a while back and I have some other cognitive issues that make organizing information, pretty darned difficult. I like the AI feature a lot. Suddenly things I Had written years ago but had forgotten about appear. I like the auto classify feature, now it looks like I Will get a little help organizing.

Now the embarrassing part, for years I Have made copies of whole documents folders and bought new drives. I Have so many duplicate and misfiled files … Its a real mess. But now I am getting help and getting things done (finally, hooray). I don’t know if DT is a duplicate finder per-se, I have been hacking away and a backed up version of my documents folder with Gemini for now.

And so, for learning how to file, to categorize and to sort, DT seems like a pretty good option. I guess the only thing I really don’t understand yet is, import or index?

IF anyone would be kind enough to comment on using DT as a boon to previously messy management skills and bad organizational habits I would appreciate it. And, there is the question of importing everything (including photos and files that still need other applications to open them) into DT - as opposed to simply indexing everything on my drive that I still need to have answered.

Much thanks

Dg


#2

The “index” vs “import” dialog is one of the most common topics in this forum. If you haven’t read through the discussions here, I suggest browsing for what Bill DeVille, Greg Jones, and others have written. Be prepared: there is no “best way” answer other than “it depends”. It depends on what you need to accomplish - and what you need to accomplish will change over time as you work with DEVONthink.

One factor: if you need to reorganize documents in groups (i.e., folders) frequently, then importing can be a good option so you don’t have to concern yourself with differences between the structure of your group hierarchies inside DEVONthink vs. what you have in your file system outside DEVONthink.

On the other hand, if you have other apps that you want to access the a folder hierarchy (not just individual files), then indexing can be helpful. For example, I index folders in OneDrive because I keep documents there that I access with Word and Excel on the iPad versions of those apps. So, I can edit on OneDrive and still have those documents viewable in DEVONthink on the desktop.

Documents imported into DEVONthink are never changed. See for yourself: browse to a .dtBase2 database file in Finder, control-click it and choose “Show Package Contents” – what Finder will reveal is that the database “file” is actually a folder containing many subfolders. Don’t modify the internals of this folder, but you can browse it and find your documents and see for yourself that nothing has changed with your data. However, the internal structure of the database is designed for efficiency and so it’s not going to make a lot of sense to you. In the future, you won’t need to visit the inside of a database in this manner.

You can always export your documents, and you can export them with a folder structure just like what you see when you work with DEVONthink.

Images in DEVONthink – personally, I think you’re better off with a photo management solution like Lightroom. DAM is not a DEVONthink strength (frankly, DEVONthink is not capable of DAM). But that’s good thing: because a real strength of DEVONthink is its ability to work well with other applications you own. For example, DEVONthink has pretty basic editor for text and rich text, but it is easy to open a document in an external editor, make your edits, then save it – all without extracting the file from the database.

Duplicate finder? Although DEVONthink is pretty good (not perfect) at flagging duplicate documents, you’ll be better off using Gemini to prune your file inventory before bringing the documents into DEVONthink (or indexing, if that’s what you prefer).


#3

Thanks. I “was” thinking that the way to use DT was to have it index my entire documents folder (like I said, organization is hard for me). I Was thinking this was one way to look through everything I had with the AI feature and the good search feature. I currently make good use of HoudaSpot for searches and (file) Renamer for combining suspected duplicates so I can cull the heard as it were.

I continue to work hard at clearing out the duplicates and setting up a working, long term file structure.

The most difficult thing that I Have to do is to keep track of the research I have, work on my book, work on other projects (music, photography and tracking other projects). I Was hoping that DT could help me find similar or the same ideas as I look for ideas that support a current talk I am going to give, and article, etc. I have a hard time remembering the latest version of a revision for example and it could help a lot to see where I have written in a similar fashion.

Now I am beginning to realize that a better use of DT for me might be to search out all files of a certain type using HoudaSpot and drag those into DT for indexing only. Find all files that are web, PDF, text for example.

AM I missing something or is this starting to make sense, or I should say, am I on the right track?

Thanks

Dg


#4

Whether you index or import files, the AI, classify, search, etc., features work the same. DEVONthink known as much about indexed files as it does about imported files.

You don’t have to commit to indexing or importing now–or ever. Focus on the work and the rest will become clearer.

There are numerous blogs and books about using DEVONthink for research.


#5

I actually just finished a massive reorganization of about 300 academic journal articles in PDF format using a combination of indexing and importing.

I was able to index a relatively poorly organized directory in Finder, then use “auto group” to do a rough sorting. Then I moved this set of roughly grouped indexed PDFs to a database of PDFs I had already imported. From here I merged the two with manual sorting (which had been greatly aided by DT’s AI which did a lot of the early legwork - dividing the indexed PDFs into bite-sized groups of roughly similar subject matter).

After all that, I actually dragged the whole set of newly organized groups out of DT into a directory in Dropbox, thus copying all of the files, organized into folders that corresponded to DT Groups, to Dropbox/Finder. I then removed the whole lot from DT and deleted the original directory of poorly organized PDFs.

I then directed DT to index this new directory, producing a nice DT database of academic articles that I can tag and move into groups as I see fit, while also maintaining a nicely structured directory of PDFs to facilitate access from my iPad to annotate in PDF Expert 5.

So it is a lot of in and out, but in the end I now have:
PDFs that are simultaneously

  • In Finder/Dropbox, nicely organized, accessible to PDF Expert 5 for iPad annotation
  • Indexed in DT allowing for my to leverage all the special organizational and searching functions of DT without having to give up the flexible mobile access.

Reading notes that are stored in the DT database (not imported per se, because they were created in the database).

The major benefit of this to me is that I can manipulate the organization of the indexed PDFs in DT without affecting the organization in the finder. This is important since the organization in Finder caters more to allowing me to designate bite-sized chunks available for offline access on my iPad, than it does to actually organizing and retrieving. All the real organizing and retrieving can take place in DT.

This also allows me to index PDFs in Dropbox directory I share with several colleagues who do not use DT. I can have all the organizational benefits of DT, without my colleagues even noticing.

So indexing plays a very important role in my (albeit very nascent) DT workflow. That being said, I import a lot (lots of news articles about the subject matter I study, for example are imported), as well anything else I can when it makes sense to do so, largely for the sake of portability and stability (no broken indexes). Other times, however indexing is very convenient and makes a great deal of sense.

So to echo others in this thread and the many other threads here about index/import, you should just do what makes sense for you, and don’t hesitate to blend the two together.


#6

This is a great use case, @scottlougheed, thank you for writing it up.

Good point – which shows that the import/index discussion is at its heart a question about “how do I organize my stuff”, to which there are never any binary, either/or answers. With thoughtful planning, Scott demonstrates a powerful aspect of DEVONthink: your data can be be organized (and reorganized) in numerous configurations for different purposes. Import, creating documents internally, indexing, tagging, labeling, duplication, replication, aliases, smart groups, ordinary groups, groups-as-tags, plus the six major views in the View menu: all of these features work together. It never necessary to choose one over the other.


#7

scott makes a good point, and i would also like to add that indexing is perfect for when you are just starting out with the app and unsure if you want to commit. the feature enables you to play around without affecting the organization of data in your regular workflow. it also lets you hang on to an app you love to work with while still taking advantage of devonthink.


#8

That’s very true-if one follows a process similar to the one that Scott describes. What is more common to see here is when users new to indexing want to move documents from indexed group to indexed group in DEVONthink and then are surprised to learn that the Finder structure is not updated automatically. Or worse yet, when users move/rename folders in the Finder that are indexed in DEVONthink and then later discover that the groups have gone missing in DEVONthink.

I have nearly 100% of my documents indexed into my databases, but I still caution users starting out with DEVONthink to index sparingly until the dynamics of doing so is fully understood.


Classify confusion
#9

I agree that new users ought to be cautious, but I suppose this goes for just about anything from handling duplicates to emptying the trash. Indexing actually seems more straightforward to me than other features like the duplicate detector, which, despite its name, isn’t always detecting exact duplicates.

Duplicates (mentioned by the OP) are explained on the blog.
blog.devontechnologies.com/2014/ … evonthink/

So is indexing (the OP’s main question).
blog.devontechnologies.com/2007/ … -indexing/

Will people read the manuals, blogs, and other information out there? Maybe not, because it can be difficult to find if you are unfamiliar with the app, so I’ve put together in one place some of the resources that I’ve found helpful.
christopher-mayo.com/?p=2237

Personally, I always assume there will be some stumbling around and missteps when I am starting out with an app. For this reason, I recommend lots of backups (always a good idea – the OP seems to have this taken care of) and thorough testing before trying an app out with mission critical stuff.


#10

Thanks for this, FG.


#11

You’re welcome! Not much there to it, really, but I know reading some of these things sooner rather than later would have helped me a lot in the early days of using the app.


#12

2nd thanks for collating those articles - found some there I hadn’t come across before!

I might be wrong (could have been on one of those other blogs(?) you linked to - but I think(?) you made mention of using DevonAgent - did you use it to find some of those DTPO links?

I’ve long been toying around with purchasing it - but keep on holding off, since I’m not sure how much value I would get out of it (since it presumably cannot search many of the academic-specific databases)…

Given the lengthy trial on offer - I guess I really should try and explore it properly - tried a while back, but gave up fairly quickly, struggled to find my way around it. Regardless - your links will see me give it another go!


#13

Although there are academic/scientific journal sources of information on the topics that I usually research (environmental science & technology, policy issues and law and regulation) I would be severely limited were I to rely on journal sources alone.

Over the years I’ve bookmarked a number of information sites including governmental agencies, NGOs, news sites & c., in addition to journals, and I often run DEVONagent Pro searches that are useful to add documents to my reference collection.

Some of my long-term interests include junk science (misinterpretaion of scientific findings as well as invalid scientific findings) and junk policy issues (irrelevant strawman arguments, junk science, hidden motives, &c.). Even in peer-reviewed environments I’ve collected over the years many examples of junk science and junk policy, many of which keep being cited even after criticism. That’s to be expected in the lively and often contentious areas in which I’m interested. Are there consequences? Unfortunately, yes, from the enactment of laws and regulations to the ways in which problems can be interpreted badly, hindering efforts at their solution. One of the approaches to environmental issues that I consider to be unscientific, sterile and unsatisfactory is the Precautionary Principle. Another is misunderstanding of the concept of sustainability, when the idea ignores dynamic recognition and response to problems as they become apparent (the Club of Rome study, Limits to Growth, failed to be predictive and useful because it used a static version of the Forrester model).

There are many areas of academic research for which the important reference sources and the publication of research findings are limited to academic journals. The consequence of such research is deemed to contribute to the discipline, as an end in itself. That’s the way much of human knowledge has increased. Once in a while a new idea or a new discovery has impacts beyond the discipline that can change a society. Never forget that. :slight_smile: