PDF upside-down after OCR conversion, how to flip it rightside up?

hemicyon · December 25, 2024, 10:59pm

I’m new to DTTG, and I’ve imported a pdf into my inbox on my iPad mini where I converted it to a searchable pdf. After the conversion finished, the pdf was flipped upside-down. The pdf is in fact searchable, but I can’t seem to figure out how to rotate the pdf rightside-up. Any help appreciated, thanks.

chrillek · December 26, 2024, 7:30am

I think that it’s not possible to rotate PDFs in DTTG. At least I didn’t find a way when trying to do so recently.

hemicyon · December 26, 2024, 8:54am

Oh, hm, that’s going to be a little frustrating if converting to a searchable pdf keeps flipping the files upside down every time.

chrillek · December 26, 2024, 9:19am

I never noticed that, but I rarely OCR In DTTG. Tried it just now and the PDF was not rotated at all. Are you sure that the un-Ocrd PDF was not already rotated? Having OCR do that makes no sense, I think.

rmschne · December 26, 2024, 9:20am

Have you seen it happen again? This the first that issue mentioned here. Perhaps it was previously OCR’ed or rotated by other software before entry to DEVONthink ToGo? OCR just adds a text layer, so why rotated is a mystery. Or there is a bug. Dunno.

troejgaard · December 26, 2024, 11:16am

Just to make sure… Have you checked that it’s not simply the screen orientation of your iPad that rotated?

hemicyon · December 26, 2024, 4:53pm

Positive. I made the pdf, and the orientation was definitely correct to begin with. And the original pdf is preserved in DTTG in the correct orientaiton next to the OCR’d file that is upside-down.

Yes, I tried OCR’ing the file again, and it flipped again. I also tried reimporting the file and OCR’d that one with the same result. I made the pdf file. It was correctly oriented and was not processed or edited anywhere else before I opened it in DTTG.

Yes. The original un-OCR’d file is still there and oriented correctly, and the OCR’d version above it is upside-down.

This wouldn’t really be an issue if I could just rotate it back, and it seems strange that such a basic function isn’t available.

And I’m noticing now taking screenshots that, while the OCR search appears to correctly show found results in the search window, the results matched in the file are wildly wrong.

troejgaard · December 26, 2024, 5:35pm

Alright I usually have the rotation lock on, so I have sometimes been tricked when I turn it off and forget to turn it back on.

It actually is possible to rotate PDFs. From the manual, under View and Edit Documents > PDFs:

Open Page Editing, select the relevant pages (looks like this document only has one page) and tap the rotate button until the rotation is correct. It rotates clockwise, 90° at a time.

What about other files?

By the way, is OCR even necessary for that file? How was it originally created? It looks like something that would already have a text layer.

hemicyon · December 26, 2024, 6:11pm

~~I don’t see this in DTTG. Is this only for the desktop version of the software? I’m on the mobile version.~~ Ah I see now, it is in DTTG! Cheers! (For whatever reason I just was not seeing the paper sheet icon with the folded corner.)

This is the first time I’m attempting to do anything in DTTG, and this is the very first file I’ve opened in it. For testing, I tried a light version of the same page, and OCR’ing did not flip the orientation of that file; however, for accessibility reasons, I cannot work from the light version and need the dark version. I know DTTG has the option to Use Dark Background, but because this works by inverting all colours, it makes the charts (images) unreadable.

Yes, it’s from a lost webpage preserved on the Wayback Machine. I know there are a variety of ways to save webpages as pdfs while maintaining the view format (ie not just ‘print as pdf’), but since I don’t do it very often I always have to re-figure it out. This time I used a Firefox extension called FireShot, though this is the first time using this tool.

Although after rotating it back to the correct orientation, the OCR results are still wonky.

troejgaard · December 26, 2024, 9:08pm

I never OCR in DEVONthink To Go, so I’m not familiar with it’s limitations. Maybe white text on dark background can trip it up?

What are your OCR settings? Particularly Auto correct orientation, maybe Quality.

It looks like the text content itself is actually fine, but somehow the text layer is not mapped properly to the page. What do you see if you convert the PDF to plain text?

I never heard of the FireShot extension before. It looks like it basically creates a screenshot of the website, at least with the free version – that is: a pure image file, optionally wrapped in a PDF? There must be a better solution.

Just out of curiosity, what “accessibility reasons” necessitates white text on dark background? If you’re comfortable sharing, of course

Your requirements are somewhat specific. I assume you use an extension like Dark Reader, then used FireShot to save the page as it looked with that. Does it have to be PDF? If not, maybe try the SingleFile extension. This saves everything necessary in a self-contained HTML file – including “dark mode” applied by an extension. That also means the layout is dynamic, which might be nice on an iPad mini.

Since you only use DEVONthink To Go, does that mean you don’t have a mac? Otherwise, the simplest option for PDF is to use Safari’s “Export as PDF” command (found in the File menu.) This preserves the look of the page in the browser window – but crucially, also text content and most links. Like this you don’t need an extra OCR step.

That method gives me this result:
Linux scarf from Ulla (Safari Export PDF w Noir).pdf (530.5 KB)
(I use the Noir extension for Safari, this is the built-in “Black” theme)

Source

https://web.archive.org/web/20051125005203/http://www.dabne.net/carolina/geekknit/linuxscarf-en.htm

BLUEFROG · December 26, 2024, 10:47pm

Is this something you can reliably reproduce?
- If so, please ZIP and post the PDF before OCR for us to examine. I’m guessing there’s an issue with the PDF, not the OCR engine.

hemicyon · December 27, 2024, 12:15am

For me, DTTG reliably flips the dark mode file and does wonky things with the OCR. The light mode file is OCR’d as expected without being flipped. Both files were made exactly the same way and have the same dimensions and resolution.

FireShot Capture 005 - Linux scarf from Ulla - web.archive.org.pdf (2.9 MB)
FireShot Capture 006 - Linux scarf from Ulla - web.archive.org.pdf (2.9 MB)

drew · December 27, 2024, 12:28am

I thought this was the “expected behavior” of OCR-ing an image in DT. I have thus far been unable to make it leave the page’s rotation alone (same with deskewing). If anyone can fix this behavior, I would love it, as DT also lacks the ability to batch rotate PDFs (is this feature coming?)

hemicyon · December 27, 2024, 12:40am

Agreed. It was the first solution in my search results, and I went with it because I needed a pdf to work with right away.

Spot on.

I suppose this depends. I need to be able to annotate and mark up the files with the Apple Pencil. I can definitely do this with pdfs; can the same be done with the SingleFile? I’m guess not, but it’s worth asking.

I do have a mac. Firefox has been my default browser for some time now, so I wasn’t aware Safari had this feature. I’ll have to make a macro in Keyboard Maestro to automate this for future use. Cheers again!

BLUEFROG · December 27, 2024, 2:48am

I don’t see an answer to whether DTTG’s Settings > OCR Settings > Auto correct orientation is enabled or not.

I am seeing no rotation using Convert > To Searchable PDF on either document you posted. However, I’m also getting no useful text layer on the dark document.

hemicyon · December 27, 2024, 5:17am

This was apparently enabled by default.

When I disabled auto-rotate, it did not generate any OCR for me either. I only tested the dark file though.