Breaking up large documents

For my research, I import a lot of large documents - lengthy articles and books - into my database. I often want to tag small portions of the books as they relate to specific topics.

Obviously, if I just tag the entire book with every pertinent tag, it’s useless. So I break the book up into chapters - each chapter is a separate PDF. Then, if I search for a specific tag, it’ll at least show me the chapter instead of the entire book. But often, that’s inefficient, too. I might want to tag just a paragraph or sentence, but searching for that tag brings up the entire chapter.

Is there any way to tag a small portion of a document and have that tag bring up just that portion instead of the whole document? In others words, can I tag one sentence in a chapter and have that tag bring up just that sentence instead of the entire chapter?

1 Like

You already asked a similar question some time ago in thread Create link to line?.

Did you try the scripts that were posted there?

Hi Pete,

Actually the previous question was about bookmarks or links, so that I could find my way back to a certain place in a document. I’m looking now for a way to focus my tags so they can refer not just to an entire document, but to a portion. The workaround I’m using now is breaking chapters into smaller and smaller units, so that a tag points me to just the pertinent text, not everything else in the chapter. But this is pretty labor-intensive.


1 Like
  • Use one of the scripts to create a bookmark that links to the portion
  • Tag the bookmark

There’s no other way, I think.

That’s pretty interesting. If I assume correctly, this technique will still open the entire chapter, but it will point me directly to the beginning of the section I want tagged.

The problem for me is that it doesn’t indicate where that section ends. My ideal solution would be one that creates “culls” … a tag leads me only to, say, the two paragraphs in this large chapter that is pertinent to that tag. I’ve been creating culls manually in previous projects, and it’s labor-intensive, but it gives me exactly what I’m looking for, and no more, which is great for a project with tons snd tons of material.

But thanks for the quick response and the interesting suggestion.

You could try another option:

  • convert the PDF to RTF(D)
  • optionally use Script: Split RTF(D) at Font Sizes to break the RTF into small pieces
  • in the paragraph of interest: use contextual menu Copy Paragraph Link
  • create a bookmark with the copied item link
  • tag the bookmark

Why not use an Annotation file and insert quotes?

I fear I’m taking advantage of your generosity by going on, and I don’t want to appear ungrateful for your suggestions, but these don’t really do what I’m looking for. Breaking the chapters into smaller pieces would be fine if each cull applied only to one tag. But some tags would cover sections that would start in one piece and end in another, etc.

All I’m looking to do is to be able to highlight a section of text and then tag it. Invoking the tag would then bring up just those culls of text in various documents that apply specifically to that tag. (And, of course, DT’s metadata would remain attached to each cull, so I could always know the source.)

But it looks like I can’t do anything that clean and simple. Alas.

Again thank you for your time and ideas. I do appreciate it.

1 Like

Maybe this workflow could do what you’re looking for:

  • create MD note with the culled text / optionally add your own thoughts
  • add link to the line in source document from which text was extracted (either in the MD content or as custom metadata)
  • tag the MD note

Now you can see under that tag all those text culls from different documents that were tagged, each in a separate MD note. And you can quickly jump back to the source as well.

From here, you could use transclusion to create summaries and collections of these insights if you wanted to.

Edit: Alternatively you could tag the document and then also add the name of the tag as typewriter text next to the paragraph(s) you want to refer back to. This way, you can find the document by searching for the tag, and find the specific paragraphs the tag refers to by searching for its title within the document. With this workaround, it would probably be useful to add some syntax before the tag to avoid search results becoming cluttered with results from the main text of the document. So, for example “Tag_{tag title}” or “X_{tagtitle}” would return only the typewriter notes you added next to “tagged” paragraphs.

Thanks for the suggestion. I’ll give it a try.

An easy and very effective way to do this is to create aa Page Link to the specific page in question inside the PDF

Then save the Page Link as a Bookmark

Give the Bookmark an appropriate descriptive name

And add a Tag to the bookmark

As my culls sometimes stretch across pages, into multiple pages, saving a page link as a bookmark doesn’t work as well for me. But thank you.

1 Like

Hi R_Barre

I use a variation of Bluefrog’s and A2307’s suggestions with a script triggered from the menu bar to select text in the PDF which is pasted into a markdown note. It prompts for the title of the note - you could maybe adapt it to prompt for a tag and maybe the metadata you want.

I can’t really script (I cobbled it together from much more elegant scripts posted by cgrunenberg, pete31 and others and added bits here and there) so it probably looks horrible, but it works with no problems for me.

-- Create note with MD link to selected PDF with or without selected PDF text

use AppleScript version "2.4"
use framework "AppKit"
use scripting additions

property theDelimiter : return & return -- or e.g. linefeed & linefeed
property theSeparator : "---" & return

tell application id "DNtp"
		if not (exists think window 1) then
			error "Please open a window"
			set theWindow to think window 1
		end if
		set theRecord to content record of theWindow
		if theRecord ≠ missing value then
			set theType to (type of theRecord) as string
			if theType is in {"PDF document", "«constant ****pdf »"} then
					set theSelectedText to selected text of theWindow & "" as string
				on error
					display notification "No text selected"
					set theSelectedText to ""
				end try
				set thePage to current page of theWindow
				set theRefURL to reference URL of theRecord & "?page=" & thePage
				set theSubject to text returned of (display dialog "State proposition/ title of note." with title "Subject" default answer "")
				-- get source document info
				set theName to the name of theRecord
				set theDate to creation date of content record
				set theMonth to ((month in theDate) as integer) as string
				set theDay to the day in theDate as string
				set theYear to the year in theDate as string
				set mdLink to "[" & theName & "](" & theRefURL & ")"
				set matterName to the name of current database
				-- determines whether to include selected text or not
				if theSelectedText is not "" then
					set theContent to "# " & theSubject & theDelimiter & theSeparator & theDelimiter & "*Supporting quote*:" & theDelimiter & theSelectedText & theDelimiter & theSeparator
					set theContent to "# " & theSubject & theDelimiter & theSeparator
				end if
				set theReference to theDelimiter & "*Source*: " & mdLink & theDelimiter & "Date: " & theDay & "." & theMonth & "." & theYear & theDelimiter & "Location: " & matterName & theDelimiter & theSeparator
				my setClipboardToPlainText(theContent & theReference) -- necessary in macOS Mojave
				set the clipboard to theContent & theReference
				-- create the note
				create record with {name:(theSubject), type:markdown, content:(theContent & theReference)} in inbox
				error "Please open a PDF record"
			end if
			error "Please open a PDF record"
		end if
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
	end try
end tell

on setClipboardToPlainText(theText)
	set thePasteboard to current application's NSPasteboard's generalPasteboard()
	thePasteboard's clearContents()
	(thePasteboard's setString:theText forType:(current application's NSPasteboardTypeString))
end setClipboardToPlainText

I know it’s not quite the answer you are looking for.

The other thing I’ve come round to is splitting large PDFs by chapters (“bookmarks” in Acrobat-speak) before I import and start marking them up in DT. It really helps search get you to the relevant parts quickly and replicating key documents (but my use case may be different). One of the great things about DT is how it makes you think about your work practices.

Anyway, feel free to use or disregard if not useful. Best of luck with it. :slightly_smiling_face:

You’re very kind to take the time to share this. Thank you.

And yes, I have also come around to splitting large PDFs into smaller, bite-sized pieces so that a tag refers to relevant culls and not a lot of other extraneous text all around it.

My workflow also involves OmniOutliner. Everything first goes into DT3, organized by source. When I go through each document to find the material I want to cull, I use a Keyboard Maestro macro to copy the relevant text, paste it into the OO outline, open an Inline Note attached to the item, and automatically append the title of the source from DT. I then have one large OO document with all the culled material, which I can easily organize into an order that makes sense to me; nesting some items under others, moving things up and down, etc.

It works pretty well … at least for me.

Again, thanks for your suggestions.

1 Like

I can see your logic.

I work towards something similar using hierarchies of markdown notes linked to an index.

It’s always interesting to hear about other ways of doing things.

1 Like

If your budget allows it you may want to look into MaxQDA or Atlas.ti or the like, as what you’re describing (paragraph-level coding, queries and selective exports etc.) is what QDA software is designed to do.


Looks very interesting. Not sure I want to move everything into a new app, but it is definitely interesting.