I have scannend PDFs with two real A4 pages on one A3 PDF page, like this:
Can DEVONthink cut pages like this into two?
Or is there annother tool out there doing this easily?
Many thanks for any pointers!
I have scannend PDFs with two real A4 pages on one A3 PDF page, like this:
Can DEVONthink cut pages like this into two?
Or is there annother tool out there doing this easily?
Many thanks for any pointers!
DEVONthink doesnāt support this but you could use Preview.app and Tools > Rectangular Selection. Select & copy the first page, then use File > New from Clipboard and save the page as a PDF, afterwards process the second page the same way.
I use PDF Scissor, a cheap one-trick app that does exactly and only this.
Works well indeed. Thanks for the tip.
FWIW ā I was just sitting above a very similar challenge (problem), but, I have a slightly different use case. but, I thought this might also be interesting / useful for people
a) with a similar requirement (auto-splitting of PDFs)
b) a workflow in āthe image departmentā. but I also think this could well be used outside of the image centered domain.
this script splits pdf around pages and uses the first text line found (in a sanitized way) as title for the resulting (page-)PDFs.
this was done with AI. so, first it might just be a base to work with. then, I am sure the coding priests here might have some ways to make this more elegant.
the script:
-- PDF Split & Rename Script with Advanced Format Removal
tell application "Finder"
set pdfFile to (choose file with prompt "Select PDF to process" of type {"PDF"}) as alias
set outputFolder to (choose folder with prompt "Select output directory")
end tell
set pdfPath to POSIX path of pdfFile
set outDir to POSIX path of outputFolder
if character -1 of outDir is not "/" then set outDir to outDir & "/"
-- Define expanded PATH
set shellPath to "PATH=/usr/local/bin:/opt/homebrew/bin:/usr/bin:/bin:/sbin:/usr/sbin; "
-- Create detailed log file
set logFile to outDir & "debug_log.txt"
do shell script "echo 'PDF PROCESSING LOG - STARTED AT $(date)' > " & quoted form of logFile
do shell script "echo 'PDF File: " & pdfPath & "' >> " & quoted form of logFile
do shell script "echo 'Output Dir: " & outDir & "' >> " & quoted form of logFile
do shell script "echo '-----------------------------------' >> " & quoted form of logFile
-- Split PDF into individual pages
try
do shell script "echo 'Splitting PDF...' >> " & quoted form of logFile
do shell script shellPath & "pdfseparate " & quoted form of pdfPath & " " & quoted form of (outDir & "page_%03d.pdf") & " 2>> " & quoted form of logFile
do shell script "echo 'Split operation completed.' >> " & quoted form of logFile
on error errMsg
do shell script "echo 'ERROR DURING SPLIT: " & errMsg & "' >> " & quoted form of logFile
display dialog "Error splitting PDF. Check log for details." buttons {"OK"} default button 1
return
end try
-- List all the split files
set fileList to do shell script "ls -1 " & quoted form of outDir & "page_*.pdf"
set splitFileList to paragraphs of fileList
set processingErrors to 0
set processedCount to 0
repeat with filePath in splitFileList
try
set fullPath to filePath
if filePath does not start with "/" then
set fullPath to outDir & filePath
end if
set fileName to do shell script "basename " & quoted form of fullPath
-- Document processing
do shell script "echo '--------------------------------------' >> " & quoted form of logFile
do shell script "echo 'Processing: " & fileName & "' >> " & quoted form of logFile
-- Extract text
do shell script "echo 'Extracting text...' >> " & quoted form of logFile
set extractCmd to shellPath & "pdftotext " & quoted form of fullPath & " - | grep -v '^\\s*$' | head -n 1"
set extractedText to do shell script extractCmd
do shell script "echo 'Extracted raw text: " & extractedText & "' >> " & quoted form of logFile
-- Step-by-step format cleanup with detailed logging
-- Step 1: Strip format suffixes that are directly attached to words
set step1 to do shell script "echo " & quoted form of extractedText & " | sed -E 's/([A-Za-z0-9]+)(jpg|jpeg|tif|tiff|png|gif|bmp|pdf|psd|raw|heic|webp)([^A-Za-z0-9]|$)/\\1\\3/gi'"
do shell script "echo 'After format suffix removal: " & step1 & "' >> " & quoted form of logFile
-- Step 2: Remove file extensions with dots
set step2 to do shell script "echo " & quoted form of step1 & " | sed -E 's/\\.(jpg|jpeg|tif|tiff|png|gif|bmp|pdf|psd|raw|heic|webp)([^A-Za-z0-9]|$)/\\2/gi'"
do shell script "echo 'After dot-extension removal: " & step2 & "' >> " & quoted form of logFile
-- Step 3: Remove standalone format words
set step3 to do shell script "echo " & quoted form of step2 & " | sed -E 's/\\b(jpg|jpeg|tif|tiff|png|gif|bmp|pdf|psd|raw|heic|webp)\\b//gi'"
do shell script "echo 'After standalone format removal: " & step3 & "' >> " & quoted form of logFile
-- Step 4: Remove image dimensions
set step4 to do shell script "echo " & quoted form of step3 & " | sed -E 's/[0-9]+ ?[xXĆ] ?[0-9]+ ?(pixels?|px)?//gi'"
do shell script "echo 'After dimension removal: " & step4 & "' >> " & quoted form of logFile
-- Step 5: Clean up extra spaces and trim
set processedText to do shell script "echo " & quoted form of step4 & " | sed 's/ */ /g' | sed 's/^ *//' | sed 's/ *$//'"
do shell script "echo 'After space cleanup: " & processedText & "' >> " & quoted form of logFile
-- Step 6: Prepare for filename
set processedText to do shell script "echo " & quoted form of processedText & " | cut -c 1-200 | tr ' ' '-'"
-- Sanitize text for filename
set cleanName to do shell script "echo " & quoted form of processedText & " | " & ¬
"tr -d '/:\\\"\\\\|?*<>.' | " & ¬
"tr -d '\\r\\n' | " & ¬
"sed 's/^-//g' | sed 's/-$//g' | sed 's/-\\{2,\\}/-/g'"
do shell script "echo 'Sanitized text: " & cleanName & "' >> " & quoted form of logFile
-- Generate new filename
if cleanName is "" then
set newName to fileName -- Keep original if no text
do shell script "echo 'No text found, keeping original name' >> " & quoted form of logFile
else
set newName to cleanName & ".pdf"
-- Avoid filename collisions
set counter to 1
set baseName to text 1 thru -5 of newName
set checkCmd to "[ -e " & quoted form of (outDir & newName) & " ] && echo 'exists' || echo 'new'"
set fileExists to do shell script checkCmd
repeat while fileExists is "exists"
set newName to baseName & "_" & counter & ".pdf"
set checkCmd to "[ -e " & quoted form of (outDir & newName) & " ] && echo 'exists' || echo 'new'"
set fileExists to do shell script checkCmd
set counter to counter + 1
end repeat
-- Rename the file
do shell script "echo 'Renaming to: " & newName & "' >> " & quoted form of logFile
do shell script "mv " & quoted form of fullPath & " " & quoted form of (outDir & newName)
end if
set processedCount to processedCount + 1
on error errMsg
do shell script "echo 'ERROR PROCESSING FILE: " & errMsg & "' >> " & quoted form of logFile
set processingErrors to processingErrors + 1
end try
end repeat
-- Final status
do shell script "echo '-----------------------------------' >> " & quoted form of logFile
do shell script "echo 'Processing completed at $(date)' >> " & quoted form of logFile
do shell script "echo 'Files processed: " & processedCount & "' >> " & quoted form of logFile
do shell script "echo 'Errors encountered: " & processingErrors & "' >> " & quoted form of logFile
if processingErrors > 0 then
display dialog "Completed with " & processingErrors & " errors. Check the log file for details." buttons {"OK"} default button 1
else
display dialog "Successfully processed " & processedCount & " pages!" buttons {"OK"} default button 1
end if
an advantage vis-a-vis the application based processing: the script can be used in contexts like building macros (e.g. via Keyboard Maestro), or even inside DT and its smart rules etc.
the price one has to pay: you need to have Poppler installed (via Homebrew), which is a a PDF rendering library based on the xpdf-3.0 code base. see here
the way the AI/LLM describes its achievement/code artifact:
The Problem Solved
The script now correctly handles cases where image format names like ājpgā are directly attached to words (such as āDry0719jpgā) in the extracted text.
Key Elements of the Solution
- Step-by-Step Format Cleaning:
ć» Format suffixes directly attached to words are now detected and removed
ć» The pattern ([A-Za-z0-9]+)(jpg|jpeg|ā¦)([^A-Za-z0-9]|$) specifically matches and removes these embedded formats- Enhanced Debugging:
ć» Each cleaning step is logged separately
ć» Makes it easier to diagnose any future issues- Comprehensive Format Handling:
ć» Removes file extensions with dots
ć» Removes standalone format words
ć» Removes image dimensions
ć» Cleans up extra spaces and special characters
What You Can Do With This Script
You can now:
ć» Split any PDF into individual pages
ć» Have each page automatically named based on its content
ć» Process documents with image names and technical metadata in the text
ć» Get clean, readable filenames without format suffixes
There are especially two āextravaganciesā built into this, that one might want to get rid of:
otherwise, looking forward to any improved community version, variation or forkā¦
In case you need to do this in bulk, the stand-alone version of ABBYY Finereader will do this easily ā you can load many pages of double spread PDF, and it will process the entire file into single pages and OCR at the same time. Itās saved me a lot of time on the occasions that I use my phone to scan a book chapter by openings.
As usual, the code is just terrible. Four do shell script
calls to write four log entries because the AI has never heard about newline
. Reliance on an external program (pdfseparate
). Using ls
to get a list of files that the script itself generated in the first place. Goodness, what do we have computers for?
Sorry, but what is the point in posting terrible code like that ā here or anywhere on the net? Everyone can tell an AI to produce something they think is ācodeā. It might even run. But why publish crap like that? No one can even learn anything useful from it (except how not to code).
If I were a ācoding priestā, I would not spend a second on making this āmore elegantā. Just write something good from scratch.
Unfortunately, it is not even clear what this code is supposed to do. Instead of having a bunch of log entries, some comments in the code would have been helpful (without marketing blurb like āadvanced format removalā)
What does āapplication-based processingā mean? Is pdfseparate
not an application? Or xpdf
?
Whatever this stuff is supposed to do can be done with less lines of code and without relying on any external program, just using AppleScript/JXA and PDFKit.
thanks for your comment, which kind of speaks for itself, really.
the code, as you say, works.
and it covers a relevant case.
happy to discuss, revise it w/ anyone interested in healthy community discussion (including about the use of AI for āpatchingā problems).
also, everything about it was transparent ā which I kind of cannot say for the motivation bringing such kinds of comments.
actually, itās a kind of energy I would label as at least as questionable, as openly posting non-elegant code.
Others and I already posted several examples of how to split PDFs, eg. Splitting a PDF at a recurring string/expression and Split PDFs - Custom Break - #14 by pete31 and Splitting PDFs | JavaScript for Automation (JXA). They are fairly short, clear and instructive. We can talk about them, eg how to modify them to do what you want.
But for me (!), thereās no point in talking about crap code like the one produced by whatever AI. Thatās just not worth my time and energy.
first, you actually do not have to talk about it.
you obviously chose to do so. freely.
then, I am happy to now see this thread, re. the topic of the OP, enriched like that.
as you know, I am all for enriching threads.
⦠and if the impulse for something like that needs to be ācrap codeā (and potential discussions about cultures of coding in times of AI, prosumers etc), I am totally fine with that.
leaves open, the kind of āadressingā and ābeing spoken toā by someone at least resembling an āangry priestā. that part, I actually do not accept.
but I can live w/ people displaying such a kind of ācultureā and ākarmaā. but personally, I do not accept it at all ā or even think itās tolerable within some other ācultural ethicsā aside from the āethics of codeā, namely the culture of forums.
others can make up theor own minds on that part.
I do not wish to stir up any argument and appreciate everyone has their own view. The only comment Iād make is that in my time of using DEVONthink Iāve come across some amazingly well-written and helpful scripts on the forum. I found many of those when I was in my infant stage of AppleScript programming (Iām little past that now!). I think in a case like that itās helpful that anyone picking up a script here also picks up, with it, some good teaching/learning. I would not have known, at that stage, what was "goodā and what simply might have "done the trickā.
So maybe there is some point in comments about "goodā and "bad scriptsā so some us know what weāre doing when we copy or adapt them.
This is not a personal criticism of your views but merely a comment to show that some of us need still need constructive help!
Stephen
appreciated.
I had the same experience as to be provided w/ good code, and even more good āteachingā alongside, esp thinking of @pete31 ā but also of others.. and always appreciated that. explicitly. from heart as much as a ācultural way of interactingā.
then, I also sympathize with what you are saying, as long as we are speaking about approaches as to 'how to learn to code“ and cultures / stances around that. of course, one can learn from others if he/she wants to (learn to) code.
but this is not the sole context of this.
first, and I feel very strange having to say something like that in an open forum of adults:
neither did my comment offend anyone. nor was it an objective offense against anything.
so, I do not know, why I have to be talked to like that, especially not if this really is about ālearningā ā and a culture of āteachingā (and I was teaching myself). so, no teacher would (should) speak like that. (let aside forum netiquette.)
then, you are right when you say " in a case like that [when one is on a trajectory of learning to code] itās helpful that anyone picking up a script here also picks up, with it, some good teaching/learning". I would be the last to contradict.
but here, there are more and other contexts ā even if @chrillek personal perspective is that this is about āgood codingā and ālearning itā purely. but nowhere did I subscribe to be a ālearner of pure codingā, or even any type of personal/professional coding.
for one, and I made abundantly clear this is AI code(!), we are living in times where AI is a reality. it has just been implemented into DT. so besides it being used for āsummarizingā (and we could have a philosophical or a writers discussion about whether that ever should be done by a machine), AI is and will be used for (personal and professional) coding.
so, if there is to learn something new, it is how to (collectively) deal with that reality from now on, or how to give people the right tools and mindset to handle the new dualities and affordances, including pitfalls. given that a lot of people donĀ“t want to be coders, or donĀ“t have the capacities for it, people will in large swathes use such āmakeshiftā code to solve their very practical, hand-on problems as they encounter them in the use of a potent app like DT.
ā so, we are not (only) talking about ālearning to codeā. at least that is not my perspective.
then, this was as much about the practical solution (what I labeled a ādigital artifactā), as it was about bringing in the conceptual issue of using PDF content for renaming split PDFs on the basis of their content ā which is a very practical, challenging, and in many contexts relevant requirement.
ā so, this could have been a start of a very fruitful discussion in that direction, as long as it is not forcefully framed as ācoding ethosā and bringing people into āteacherā and ālearnerā positions.
also, the first thing a āteacherā has to learn, is to not shout at their pupils (that is, if they hand themselves in for that).
while I am happy to discuss all the issues involved, and also to ālearnā (where I want/can), I am very unwilling to make this a case where the onum is on me ājustifyingā posting such posts in this way.
and very generally, I refuse to accept this tonality being brought to any forums āDiscourseā.
PS: it should also be clear, that my initial post was āa personal criticismā of @chrillek ā nevertheless he chose to treat it like that, or deduce the right to launch quite offensive and transgressing tonality w/o any need whatsoever⦠so, IĀ“d wish for a more balanced approach to an ā welcome ā attempts of mediation.
lerone,
This forum has contributors from many countries. Some people are, because of their background, much more direct than others. @chrillek is very direct, very no-nonsense - and if you can not take his comments too much to heart, you will benefit from them, and from his (chrillek) super useful contributions all over this forum. Heās a programmer, and Iāve worked with programmers for decades. If thereās one thing they are known for, is parsimony, code economy - and always shorter, more elegant, better ways of coding stuff. You can use AI to create code, but it is a ways from creating short, elegant, less error-prone code. I do write Applescript - and I believe not too shabbily - but when I need something, the first thing I do is SEARCH. @Bluefrog, @cgrunenberg , @troejgaard , someone else most of the time has created some stuff that I can study and build on.
Back in design school - yes, in the US of A - weād stick our stuff on the wall, and the criticism was NASTY (profs calling your stuff shit, tearing it to pieces physically, not kidding). The idea was you learned that no one was criticizing YOU - they were criticizing the stuff that was on the wall.
Weāve been over this before. AI code is notoriously bad. And I havenāt found yet someone here willing to fix that stuff. Nor ādiscussā it. Thereās simply no point. Itās like telling a two-year-old that they canāt paint like Picasso, although on the surface, some of their drawings might resemble deconstruvist paintings.
Talking about real code might teach people something. Treating gibberish as if it were something else will not teach anyone anything. And I donāt care if AI is a reality if it produces gibberish.
Not by your fault, of course.
folks,
I made my points about this being about more than one context (above post).
I canĀ“t make you open up to that kind of exchange, if you decide this is āpurely about codeā and ādiscussing it (code)ā.
I am still quite ⦠astonished⦠that all this ā even if it were correct (which it isnĀ“t by any standards of discourse), you are so willing to apeace all this ācommunicative overreachā, while solely always arguing with those being on the other side of this kind of communicative transgression.
if that is your stance, in such times, then so be it. for me this culture of āpatrimonyā and ābuddyingā are outdated.
I know design school people, as I know a lot of other professional and cultural contexts. nowhere would I make up with this old narrative of āpeople ripping others up in communicationā are just out there for the good. again, everyones choice.
I made my arguments, and would quite look forward to seeing them being discussed, likewise (i.e. having collective, non-forced, understanding about how to deal w/ āmakeshift-ā and āAI-codeā; as much as the problem of slicing PDFs in ways that are fit for context (including meaningful automatic renaming).
as to communication culture (and I wouldnĀ“t see myself fit to speak for ācoding cultureā, even though I have worked ā productively ā w/ many coders myself), you make your choice about style and 'codesĀ“. and I make mine.
but appreciate the demonstrated effort to infuse some senseful approach to the ālearning to codeā discussion, @Stephen_C ! appreciated.
PS: fun fact: I have a hunch, that @chrillek and me might even be from the same country ā ā I never thought this argument about āthis is his/her cultureā ā in this way ā is any good; especially as I have worked in many transcultural contexts myselfā¦
Sorry you donāt seem to understand, but rather feel hurt. Yes it is a cultural thing, and I can guarantee I worked for much longer and with many more cultures, and professions, that you did. Not to brag. I have even taught that. But that is besides the matter - pride is the issue. And when you are ready to swallow your pride, you can get ready to learn - especially from the tough ones, the ones that will rip your work. Iād rather take a grumpy Nobel Prize winner than a nice tinkerer, anytime.
no, I am not āpersonallyā hurt.
you don“t seem to understand that.
I am making arguments about the cultures and codes of collective communication, and about multidimensional exchange/communication/rationalities, especially in forums.
otherwise I donĀ“t want to question your personal confidence, even though I donĀ“t know wherefrom you take your knowledge about me. also, things like āTinkerersā vs. āNobel Prize Laureatesā I think is a very unhelpful and misplaced framing here (again, rather talking about a very loaded culture of clear-cu(l)t āpersonalitiesā, all with ātheir placeā to speak).
and these āpersonalizationsā are really of no interest to me and I see them ā in face of the muliplicity of aspects raised ā as reductive framings. especially, as they are all leading into the same river: sanctioning offensive and topically narrowed down and āclosing downā (instead of āopeningā / welcoming) communication. That is: āslantedā, non-reciprocal communication.
I am not hurt. but I surely learn about standards of communication here. and the approach to collective, shared and mutual learning.
I think weāve talked this one out. Closing shopā¦