Hi, for a family history project, I’m trying to archive some family obituaries from the web by web clipping with the Firefox plugin in PDF uncluttered format so any photos come down too. The newly created document’s ‘Kind’ is PDF+Text which I thought meant it was OCR’d. But neither searching DT, nor searching within the document on a unique text string give any results.
But maybe I’m off base and OCR isn’t a thing for web clipping?
OCR is usually only necessary in case of scans/photos but might be also necessary in case of poorly created PDF documents. Does a conversion to plain/rich text of the PDF document produce the expected results?
I clipped the same page as rich text. Now I can search and find the document within DT based on a unique word in the document, but doing a Cmd+F to search within the document doesn’t return results for that word in the document.
So I’m unclear how to get a webpage into DT that is searchable both within DT and within the document.
And I was under the assumption that a document type of “PDF+Text” was an OCR’d document, but maybe that’s not always true?
Good to know a wildcard character is required if wildcard checkbox is checked - thanks!
So here is the top of the page that wasn’t in the earlier screen shot of the rich text webclip. The pdf webclip is much better where the clipped document looks just like the webpage. But the pdf webclip isn’t searchable in DT. I think I found a workaround. If I right click the pdf webclip and select OCR > to searchable PDF, I get a popup asking if I want to convert this searchable pdf again. If I click Convert, the pdf webclip is now searchable.
Am I missing some setting where the webclipper would perform OCR so I don’t have to do it manually?