I’m compiling a database of books and articles about a historical event that took place over the course of one week. I have about thirty .pdfs about this event, the result of books and articles I’ve scanned.
Each of these .pdfs covers the events of this week. What I’d like to do do now is to tag sections of the .pdfs so that I can bring all the accounts of each day together from all the sources. In other words, tag the section of each .pdf that relates to the first as “Day One”, take each section of each .pdf that tells the story of the second day as “ay Two”, etc. So I can have a smart group that collates all the accounts of each day into a separate group.
Tagging sections/chapters of a document has been discussed before, but at least for now tags can only be assigned to the entire document. If you must use tags, perhaps you can split the PDFs into separate documents for each day?
I use the following script for very much the same task. The extracts from the pdfs can be dated and tagged and one click takes me back to the original pdf.
This is such a frequent question it might be worth repeating how DT references a particular page link in a pdf and how you can use this to create a tag for a particular page.
When you right click a page in a pdf and click ‘Copy Page Link’ you get a URL in the following form
Despite its strange appearance its very similar to the kind of URL you get linking to a website except that it links to a page in document in DT. You can use that link in any document that accepts link such as Textedit, Pages, Scrivener and, of course, in Rich Text Format notes in DT and so on.
Any document in DT has a URL field. you can see this field by opening the info panel for the document. This field also appears at the top of the window for a document. Clicking it will take you to the document.
An easy way to make a tag to a particular page in a pdf is to do the following:
Copy Page link to the particular page.
Make a new plain text or rtf document.
Paste the page link into the url field in the info panel of the new document
Add a tag to the document
Add a title to the document (I suggest putting the page number in the title)
Add any notes to the document about the tagged section.
Voila ! Opening the list of tags will display a list of all pages with the tag. Click on the document and then click on the URL field to go to the page.
There are many scripts on the forum which automate this process and add all sorts of other extras such as the script I referenced in my previous post, but this is the basic workflow.
The purpose of the page link, when clicked, is to open the referenced document and scroll to and display a specific page, e.g., in the Three Panes view. That’s what I see here. Doesn’t that accomplish your purpose?
It doesn’t, as almost all of my links will be to multiple-page sections from larger pdfs. If the link sends me to the first page, it won’t show me how many more pages I wanted to narrow that section down to. I’ll get the whole document.
The only answer, I think, is to physically break the pdfs down to small sections. Alas.
Splitting a PDF is not a bad idea, however, you are missing something. If you right-click a PDF page thumbnail and choose Copy Page Link, this link can be pasted as per Frederiko’s suggestion (or other methods) and should open the PDF to the correct page. Make sure you deselect the PDF in another window first then click the link.
That’s what I was doing. And it opens up the entire document. Yes, it opens it to the page I selected, but the document is a book. What I want to do is to select, say, three pages, create a link, and have the link open just those three pages. Not open the entire document and point me to the first of the three pages. I have so many documents and so many individual pieces, that I will never know where each section ends, only where they begin.
The script I pointed you too does actually that ! It extracts just the pages you want to reference.
If the installation is too complicated (which I understand) I also have version which uses PDFpen Pro by Smile Software and doesn’t require Java or Sedja. Send me a pm if you would like to try it because I haven’t got around to posting it. (https://smilesoftware.com/pdfpenpro)
Thanks for that - yes, it’s way more complicated than I want to try at this time.
How about this: if I go into one of my large .pdf documents, and select multiple thumbnails of pages, and drag them into another folder, I get an image of only one page. Not the multiple pages. Is there any way I can copy/drag multiple pages from a pdf into another folder, thus creating a new document that consists only of the few selected pages?
You could select the desired thumbnails, right-click to copy the pages, then create a new PDF with the New>With Clipboard command (from the Data many or by right-clicking).
Splitting the PDFs will accomplish your objective.
In choosing the number of pages to be included in each segment, you will have to view pages to make that determination, then copy the desired ones to the clipboard and create a new PDF document from the clipboard contents. You can then tag the new PDF document.
I would have approached this differently. Instead of creating multiple new PDFs, I would create much more compact rich text documents that hold the Page Link for the first page describing an event, and type in the pertinent page range. I would tag each rich text document just as you would tag a corresponding PDF segment document.
My approach has several advantages. The most important advantage is that I can add text to my new document, such as a description, comment or summary of the “excerpt” that is searchable. You can’t do that in your PDF “splits” of the original PDF. My rich text notes will be smaller, requiring less storage space, yet will equally lead you to the pertinent page(s) of the referenced PDF. They will require less memory when opened, but potentially can provide more useful information that can be searched. It’s easier to add clickable links to other notes or references, if desired.
Perhaps I might find it advantageous to excerpt the relevant section of text from the referenced page(s) of the book-length PDF. To do that, I would choose Data > Convert > to plain text, which would produce a text document holding all the text of the PDF. By doing a phrase search of the text string that a segment starts with, I can immediately find that starting point, select to the desired end point, press Command-C (or perhaps Command-X) and then paste the clipboard into the rich text document. I’d have a more compact full excerpt than your PDF page copies, with other advantages noted above.
Thanks - as always - for your very thoughtful and creative reply. Your approach does have the advantage of keeping the database smaller, no small consideration.
But here’s why the split-PDF approach works better for me.
Let’s say I’ve found the pertinent sections in 10 books that apply to one event, and I’ve made a Group for that event. With your approach, I’d have a text document with ten links in it. If I want to look through these ten documents I have to keep going back to the text document, click on the link, see how many pages it represents, then read those pages. If I get engrossed in it, I’ll invariably forget how many pages it’s supposed to be, and will have to go back to the text document to refresh my memory. Then, when I’ve finished reading that one, I’ll have to go back to the text document, click on the next link, see how many ages that one represents, then read that, close, and repeat for each link.
By splitting the PDFs, I simply put all ten new PDFs into a folder and read among them. It’s easier for me. I’ll just have to live with the growing size of the database.
If I have a .pdf in a DTPO folder, click in it and Select All, I can then paste it into a text document.
Even numbered pages appear as text, but odd numbered pages appear as images of the original pdf page.
If I select one page at a time, copy and paste, it all works fine, and I get a new document of just text (much smaller than the corresponding pdf). It would, of course, be a lot simpler if I could copy a whole document at a time, not have to do it page by page. Am I doing something wrong here?