Procedure to index Bear notes in DT: need a way to find duplicates with minimal changes in Filename but not content

rufus123 · October 19, 2020, 5:45am

I am working on a project to index Bear notes in DevonThink Pro 3. It is going well. I just have to iron out one last step.

I managed to find a way to work around the fact that Bear is a database:

On macOS, Bear’s notes are stored in a SQLite database. On macOS, the database is accessible with any third-party app or library that has SQLite support.
Bear’s macOS database is located here:
~/Library/Group Containers/9K33E3U3T4.net.shinyfrog.bear/Application Data/database.sqlite

I have a large number titles under a file and would like to search for duplicates.

I already know that with DevonThink Pro 3 if title and content is the same, the document is blue. My problem is that the duplicates have small variations in Title but content/body is the same for complicated reasons related to the way I export the Bear notes from a database to Finder files.

For example, all of these Finder files (which I index in DevonThink Pro 3) are identical.

VPN new subscription choice of protocol and server 26 Aug 2020.html - this is the original file
VPN new subscription choice of protocol and server 26 Aug 2020 2.html - just a copy
VPN new subscription choice of protocol and server 26 Aug 2020 3.html - just a copy
VPN new subscription choice of protocol and server 26 Aug 2020 4.html- just a copy

Please note that a file like this
1- VPN new subscription choice of protocol and server 19 Oct 2020.html (different title - date is different) is not a duplicate
2- VPN new subscription choice of protocol and server 26 Aug 2020.html same title but different content is not a duplicate

thanks in advance for your time and help

rmschne · October 19, 2020, 5:59am

As advised to you in the DEVONthink forum New and totally confused by the editor. Used to Evernote and Bear

Bear does not store files. It stores notes in an SQLite database file. You need actual files in the filesystem to index.

For a new user of DEVONthink, you are over-complicating this by orders of magnitude.

rufus123 · October 19, 2020, 6:40am

thank you for your concern. I certainly do appreciate it.
Everything is going fine. My project is 99% completed and it looks very nice. I am very happy with the results. I am just at the last step which is to get rid of duplicates.
When my project is completed and I have the answer to my question above, I will post my solution, and I will write keyboard maestro macros to make it all automatic.
And if forum members find this post irritating, I will delete it, no problem.
From a DT perspective, it’s a bit perplexing for those who have large amounts of information in Bear to be told to just forget about indexing in DT.

cgrunenberg · October 19, 2020, 7:02am

Only the content matters (more or less strict depending on the preferences), the title is not relevant.

rufus123 · October 19, 2020, 7:24am

thank you for answering my post.

Below are 4 documents ( 2 docs which each have one duplicate) in my inbox. In both cases the content (body) is the same. The duplicates have space+integer at the end of the filename.

I thought that with DT duplicates appear in blue. Could you suggest a procedure to quickly identify them and delete them ? I want to keep the originals which do not have a space+integer at the end.

thank you !

cgrunenberg · October 19, 2020, 7:26am

This depends on Preferences > General > Appearance. Either duplicates are blue or an icon marks them. But as your screenshot is cropped, it’s hard to tell if there’s one.

chrillek · October 19, 2020, 7:31am

Although I agree with @rmschne’s assessment: what is the relation/connection between Bear’s usage of a relational database and you having duplicates of files?

Did you extract the notes from Bear? If yes - how? Was that the process that generated the duplicates?

Finally: what @BLUEFROG said elsewhere about indexing Evernote notes holds true for indexing Bear notes: if you continue to use Bear and add notes to it, you’d have to export them again into the filesystem and index them. Rushing e more duplicates.

rufus123 · October 19, 2020, 7:33am

OK. I changed the preferences, but then everything is blue including the original. I will have to find another solution using regex.
thank you for your help

cgrunenberg · October 19, 2020, 7:35am

There’s no such thing like an “original”, both items are duplicates as item A is a duplicate of item B and vice versa. However, you can e.g. find them using a smart group with conditions like…

Item is Duplicate
Name ends with " 2" (without quotes actually)

rufus123 · October 19, 2020, 7:35am

Yes, I am new to DT but not to computing.
I will post a detailed solution on indexing Bear notes including getting rid of duplicates.
Your points are excellent and I had to address each one to find a solution.
thank you

chrillek · October 19, 2020, 7:37am

If you have two things that are considered equal, how would you denote one of them as “the original”? Your logic is flawed, and there’s a script in DT to “remove duplicates”, iirc. I doubt that a regex will help you in this context

rufus123 · October 19, 2020, 7:38am

Could I define a smartgoup as
1- item is duplicate and
2- name ends with space followed by integer ?

chrillek · October 19, 2020, 7:43am

Why are you dodging questions of those you want to help you?

Regardless: There is a predefined script to remove duplicate in DT.

(the first one in the data section). What’s wrong with that?

rmschne · October 19, 2020, 7:44am

given your expertise in computing, direct access to the SQL database to do what you want? My hunch is a well constructed query may find the duplicates. But I have never explored the Bear database.

I do note how your stated needs have morphed.

cgrunenberg · October 19, 2020, 7:58am

No.

rufus123 · October 19, 2020, 8:45am

@rmschne @chrillek @cgrunenberg

thank you all for your comments. I certainly do appreciate your help.

It’s best that I create a new post with my solution when it is ready.

thank you @chrillek for your suggestion. I will keep it in mind as a last resort. I am aiming to make the whole process automatic.

Plip · September 6, 2021, 1:26pm

Hey @rufus123
What is the current status at your idea?
Have you had any successes? Is there any way to search Bear’s notes via DT?
I have been eagerly awaiting “Panda” or Bear 2.

rufus123 · September 6, 2021, 2:39pm

hello @Plip
I stopped my project because I decided that I am just too chicken to use indexing (as opposed to importing), because I am too worried about the fact that it is possible to accidentally delete in Finder files that are indexed files in DevonThink. This reservation only applies to me because of my lower IQ (according to my spouse) and my tendency to bungle. I say this because I am sure that forum members will object to my reservation citing that there are failsafes.

BLUEFROG · September 6, 2021, 2:53pm

I’d say you do what you’re comfortable with.
As you rightly say, there are other considerations when indexing, things that should be understood before committing to it.

In my experience, there are often times people use indexing when they don’t need to but because they’ve read it’s “what everyone is doing”. I think that’s a bad reason to index.

rufus123 · September 6, 2021, 2:55pm

thank you for your comment. I certainly agree with the above. I am happy that you were not offended. Have a nice day !