Best practices to create epub from pdf

any suggestion?
starting from DT (pdf already OCRd…) ?

For you, I searched “epub” in the “DEVONthink Handbook” and I see nothing about it having capabilities of creating epub from anything. It can index/search/view. I suggest you start with a search on the “net” for this capability. I notice a lot of web sites and there surely are other tools you can purchase to do what you want, but out of scope for DEVONthink.

1 Like

In general, producing ePUB from PDF will not be feasible. ePUB is basically HTML, aka a structured format. PDF is anything but structured.
There might be exceptions, but I doubt that its worthwhile investigating further.

1 Like

You might want to see the calibre documentation on converting specific file types; scroll approx. half way down for PDF. The website states

To re-iterate PDF is a really, really bad format to use as input. If you absolutely must use PDF, then be prepared for an output ranging anywhere from decent to unusable, depending on the input PDF.

Disclaimer: I haven’t used calibre myself - it just came up when I had a quick duckduckgo on how to convert pdf to epub.

1 Like

It’s not really a Calibre specific issue. PDF is like a graphics format: Basically, you can say

goto 10, 20
set the font to Helvetica Bold 20 point
write "Hi "
store the current position in x
set the font to Helvetica 12 point
write "I'm going crazy with all this"
restore position from x
set the font to Helvetica Bold 20 point 
write "there,"

In HTML, you’d want to have something like (and it would be a lot of fun already to assemble the h1 element from the PDF above)

<h1>Hi there,</h1>
<p>I'm crazy about PDF</p>

and you’d have the corresponding CSS (approximately)

h1 { font: "helvetica"; font-size: 20p; font-weight: bold}
p {font: "helvetica"; font-size: 12p; font-weight: normal}

Now imagine the PDF contains a 20 point piece of text in Times Roman – is that another h1 element? Or just something that the original author wanted to appear in 20 point Times Roman?

Basically, PDF is like drawing whatever comes to your mind whenever it comes to your mind (as long as it stays on the same page). HTML is about structure, it has no idea of pages, and representation is preferably managed by CSS, not in the HTML itself. Also, HTML can reflow (i.e. you can make the window smaller and the text follows). No such luck with PDF.

Sure; calibre just happened to be available as an example. I preferred referencing it over the online services which also claim to offer such conversions (and which appear first when using pertinent search terms); I would personally not choose to trust any such services with one of my PDFs.

1 Like

You could try to

  • convert PDF to RTF in DEVONthink
  • remove unnecessary stuff from RTF
  • use online RTF to ePub converter

Why do you want an EPUB if you already have a PDF?

it is easier to read on iOs devices…

If that’s your goal then you could split your PDFs into RTFs

The conversion from PDF to RTF is not perfect but it’s a nice way to make stuff easier readable. I don’t delete the PDF, it’s really only for reading on iPhone.

In cases where I want to keep the split RTFs (instead of deleting them after reading) I open the unsplit RTF in Nisus Writer and remove page numbers etc. via regex. It’s a bit more work but this way one can get quite good results.

1 Like