Copy and Paste from Kindle

Halver · December 6, 2012, 11:09pm

When you copy a snippet from a book in Kindle for Mac, Kindle automatically appends the author, title, and page number. Thus, when you paste it into a new .rtf in DT, all of this information is at the bottom of the document.
In most cases, I’d find this handy. But I’m afraid it is going to wreck the AI function in DT so that whenever I use See Also, I will just get other snippets from that source.
So: 2 questions.

Is this going to affect the AI as I suspect?
If so, how can I solve?
I have thought of replacing the text, but cannot find any simple way of doing that for a group of .rtf documents. And since I have hundreds to thousands of snippets, it is no fun doing it manually one document at a time.
Any other thoughts?
Thanks in advance.

Declan · December 7, 2012, 4:28am

I think this is the same basic issue as one I described here. forum.devontechnologies.com/view … 9dc4d0e3ac

Declan

korm · December 7, 2012, 10:55am

@Halver

The text appended by Kindle is merely a bibliographic entry of the form


<author><publication date><title><Kindle location><publisher>

The content varies for each source document and location. Unless all of your “thousands of clippings” are from the same source – why would it skew the AI? About the only text in common is “Kindle”, which appears twice in every clipping.

If manually deleting the last paragraph isn’t possible as you make each clip, then a script could be written that looks at every document in a selection and deletes the last paragraph.

The following script will blindly delete the last paragraph from each document in a selection of rich text documents. There is NO error checking in this example, and this script WILL destroy your data. The script is provided only for example – use in only on test documents.

(*
	This script will delete the last paragraph of a rich text
	document in DEVONthink.
	
	The script WILL destroy your data.
	
	Use of this script is for demonstration only.
*)

tell application id "DNtp"
	set theSelection to selection
	repeat with thisItem in theSelection
		try
			tell rich text of thisItem
				set theCount to the count of the paragraphs
				set theNewText to ""
				set loopCount to 1
				repeat until loopCount is equal to theCount
					tell paragraph loopCount
						set theNewText to theNewText & attribute run 1
					end tell
					set loopCount to loopCount + 1
				end repeat
			end tell
		end try
		set rich text of thisItem to theNewText
		-- display dialog theNewText
	end repeat
end tell

Halver · December 7, 2012, 1:27pm

@Declan
Thanks, it does sound like a similar problem to the one you posted:
forum.devontechnologies.com/view … 9dc4d0e3ac

@Korm
Thank you very much for your quick and helpful reply. I am not very facile with Applescript, so I appreciate the code, and also the need for extreme care when using it.
As to your question:

The text appended by Kindle is merely a bibliographic entry of the form
<author><publication date><title><Kindle location><publisher>
The content varies for each source document and location. Unless all of your “thousands of clippings” are from the same source – why would it skew the AI? About the only text in common is “Kindle”, which appears twice in every clipping.

I would think this would mess with the AI in at least 2 ways.

The db includes snippets from Kindle as well as from other sources. So when I go to See Also from a document that is from Kindle (and thus includes the word Kindle at the bottom), I would expect it to preferentially find those snippets that are also from Kindle. This is not what I want. I would like it to ignore the fact that I got the snippet from Kindle vs. some other pdf, for example, and look for similar topics.
Similarly, I have 10 to 100 snippets from each book. Each of these documents (from a Kindle) would have not only the word Kindle in common at the bottom but also the other bibliographic information. Thus I would think AI would do a great job of finding all the other snippets from that source because it would find this unique combination of words in all of them. But that is not very helpful. I want it to draw connections between different sources, not within one source.

Am I missing something?
If not, I suppose it would be helpful if there were some hashtag like #AIIgnore that you could put around text you didn’t want AI to look at in a document, or something like that. But I know this is low priority and I will just delete the text.

Devonthink has been a terrific program, wish I’d had it years ago.
Thanks again.

korm · December 7, 2012, 5:04pm

The only way to know is to try the AI on files with or without the Kindle citations. Why not delete the citation from a few dozen and see what happens?

Bill_DeVille · December 7, 2012, 10:26pm

I doubt that the citation information will tilt See Also beyond the level of usefulness.

But even were that so, the citation information is important source info, and I would hate to accumulate a library of reference material without source documentation. (I don’t use a citation manager app. I want everything self-documenting.)

Halver · December 8, 2012, 6:49pm

@korm
I’ll try removing the citations and see what happens.
@Bill_Deville
Certainly I don’t want a bunch of snippets without source info. I am putting the source info in the spotlight comments field. I assume that won’t affect the AI, though maybe I’m wrong.
Thanks again all for your help.

Bill_DeVille · December 8, 2012, 7:35pm

That’a fine, as See Also only looks at document content, and not at metadata such as Spotlight Comments. But it is an extra step and may or may not prove to make a significant difference, depending on the overall content in your database.