Searching documents by containing links

Is it possible to search documents that contain specific URLs?

I have tons of PDFs that are generated by Fireshot extension. All URLs in these PDFs are masked as getfireshot.com/xx and I want to detect those PDFs, but searching didn’t do the trick. Any help will be much appreciated.

Post a screen cap of your search and the info inspector for one of the captured files.

Here they are:

As far as I know it’s not possible to search URLs inside a PDF.

You could try Script: Extract PDF URLs to find PDFs that contain or not contain given URLs. It’s possible to

  • Filter by URL start.
  • Filter by URL end.
  • Filter by URL start and URL end.
  • Filter can either "include" or "exclude" the passed lists.

so you should be able to find what you’re looking for.

Thanks for suggestion and letting me know about such a great script but I am not sure I understand how it would help me. As far as I understand, your script takes PDF as input, but this is where I am struggling with: finding PDFs. Should I select all PDFs then run the script?

Yes. There’s no other way. Probably easiest to assign a label or a tag to each matching record to collect them.

The script is fast so even with thousands of records you should be done soon. But I wouldn’t run it on all at once, better do batches of some hundred records.

If you need help let me know.

Thanks a lot!

What is this document and where did it come from?
Can you start a support ticket and send me the file?

Sure