Download Manager Frustration

RobH · November 6, 2019, 2:33am

Since learning that DT has a download manager a couple months ago, I’ve played around with it several times, but was now able to download anything that I considered useful. I think, mainly, because I don’t really understand the process nor the options.

Today, I’m trying to download a website and convert it into a PDF for annotation, and I can’t figure it out. The site is in blog format, and each part of the story is on a different page (with no option for displaying all on a single page). I’ve read the manual, I’ve read the section in “Take Control of DEVONThink,” and I’ve gotten the site downloaded into the inbox, but I cannot figure out how to convert the whole thing into a PDF.

I read a post on here about a script that would do it, but it appears to be no longer available. Is there anywhere that has detailed information on how to use the Download Manager?

cgrunenberg · November 6, 2019, 8:42am

By default the download manager just downloads the added web resources (like download managers in browsers). Optionally it can also download additional files like linked images, scripts, stylesheets etc. or follow links to grab a complete website. However, it does not change the format of downloaded webpages, e.g. it can’t convert them to PDF or webarchives. What’s the URL of website?

rkaplan · November 6, 2019, 12:04pm

Serious question - could you perhaps give us a practical example of how the Download Manager can be helpful for a specific URL and a specific purpose?

I have tried out just about every other feature in DT3 and found good use for it. But I, too, am scratching my head and thinking I must be missing something with the download manager.

Let me say what I was hoping it would do - and I suspect this is something others perceived too. I imagined that I could choose a URL such as as weather site and set the download manager to take a screenshot at some regular interval such as hourly or daily so I would wind up with an archive of the website content on an ongoing basis. I think it might be possible to do that currently, but the data would be stored in individual files without an obvious way to put it back together as a website easily.

What is an example of a practical real-world use of the Download Manager?

cgrunenberg · November 6, 2019, 12:27pm

E.g. just like the download manager of browsers while viewing web documents or documents, only major difference is then that the files can be downloaded to a database. Or to grab offline copies of websites for archiving.

rkaplan · November 6, 2019, 1:54pm

OK - so if I am going to download something I know I will save in DT3, you are suggesting rather than use Chrome or Safari and then move from Downloads to DT3, use Devonthink’s browser so it goes directly into the database?

cgrunenberg · November 6, 2019, 2:06pm

Only while you’re viewing webpages with DEVONthink, then an Option-click (or the contextual menu) can be used to add downloads.

BLUEFROG · November 6, 2019, 2:45pm

Help > Documentation > Windows > Download Manager ??

rkaplan · November 6, 2019, 2:49pm

OK I can see how that can be useful.

I just tried it by logging into my Dropbox from DT3

I chose one file and put it in the queue. Then when I started the download manager, it picked up tons of links from the Dropbox page and crashed DT3 (the app closed). So I set download manager to not follow links and tried it two more times. Same issue each time - DT3 crashed completely.

cgrunenberg · November 6, 2019, 3:04pm

Please choose Help > Report Bug while pressing the Alt modifier key and send the result to cgrunenberg - at - devon-technologies.com - thanks!

BLUEFROG · November 6, 2019, 3:14pm

I chose one file and put it in the queue

How…?

rkaplan · November 6, 2019, 3:15pm

Done - thanks

rkaplan · November 6, 2019, 3:20pm

With the “Add Link to Downloads” context menu by right-clicking an item in Dropbox

BLUEFROG · November 6, 2019, 3:25pm

Did you have the options set to Only Added Files?

Also, that’s not a link to the file itself in Dropbox.

That’s a download of a 15MB MP4. Clearly not the file itself.

rkaplan · November 6, 2019, 3:42pm

Yes, It is a link to download a .pdf file. I was right-clicking on 6-10-19.pdf below:

I changed to “Only Added Files” and it worked correctly as you suggested -thank you.

It seems the Download Manager can be very helpful but can quickly get out of hand following large numbers of links - I guess I need to think through the various options carefully.

Thanks.

BLUEFROG · November 6, 2019, 3:45pm

I don’t download websites with it, as I have no such need. But I use the Only Added Files option many time daily in Support.

And remember, we have no control over the link from any web page. The Dropbox link isn’t very useful when you think it will be downloading the actual payload, not a link to it. (But that’s also their standard M.O. anyways; giving links but not the data.)

RobH · November 6, 2019, 6:14pm

I don’t use any other download manager, so I have no conception of its function.

As noted in the OP, I read the manual and Kissel’s book. Neither really provided me with the in-depth information I was looking for.

What I’m trying to do is review a series of blog posts for a class. I want to annotate it as a PDF in DT, but there is no way to view all posts at once on the site; they exist as separate pages. I was hoping to use DT to download these pages and concatenate them into a single file (file type doesn’t matter at this point, because conversion can happen separately). Regardless of what I seem to select in options, what I get with the Download Manager is this:

The pages I want have the naming convention of “0001.html” (so, 0002.html, 0003, html, etc) and the “0001” file in the image above shows the first page. If I click on a link in that, I do go to the next (or other) page, but where is it? I can’t find it, so I can’t do anything with it, other than view it. Nowhere in the files that were downloaded are the pages I want. Which means I have a lot of stuff I don’t want and can’t do anything with the pages I do want.

If I start over and load the page into the built-in browser and use the capture function, one page at a time, I get this:

Every page has the same name, likely because that’s the page title. While I can sort these by their URL, it’s still a pain of a work flow.

Can the DL Manager download these pages for me, or is this not expected behavior for this?

Oh, BTW, the Download Manager will appear on every desktop space, so when I CMD-Tab to DT3, that window pops up in the current desktop, instead of bringing me to the one that contains DT.

BLUEFROG · November 6, 2019, 8:00pm

No I wouldn’t say this would be an expected use of the Download Manager.

A clarification: The Download Manager wouldn’t be a useful too for trying to merely pluck individual HTML files from a site. Would they be the byproduct of downloading a site? Yes, but the results would have to be sifted through.

Also, yes there could be some utility in an AppleScript method, but this would likely not contain local resources for images and stylesheets as I interpreted were desired in the initial post.

What is the URL of a page you’re trying to capture?

t_hayash · November 6, 2019, 10:34pm

I realize that you’d like to automate this process (and it may be possible using a script, but I’m not a script kid), but it may be just as effective if more so to use the sorter to save the web page as a PDF file or whatever other file type you like. You can capture the page as web archive, pdf. rtf, etc. I sometimes find that unless I have a strong reason to create an automated system, it’s easier to go for the manual method and save the time of creating the automated tool. Perhaps if the site is 100s of pages the automated method would be more efficient…

RobH · November 7, 2019, 4:20am

I worked with Jim via PM on this issue, because the URL I was working with stated it may have graphic images and I didn’t want to post it, only to have someone click on it unknowingly (it’s a blog dealing with Hurricane Katrina). I’m posting the results of Jim’s guidance for anyone that comes afterwards looking for help.

I think the issue I had was that I wasn’t selecting the correct “Files” for the Download Manager to download. Here is a screenshot of what Jim sent me:

screencap1

In my case, I used the default settings, which (for whatever reason) had only Style Sheets and (I think) Embedded Images checked.

The other setting was where the downloads were being saved. To download to a folder (and not DT3), you need to make the selection here:

BLUEFROG · November 7, 2019, 4:43pm

As an addendum…
You don’t have to use the external folder. That is what I personally use, as I use an external location for transient support files.

Otherwise, downloads will go to a Downloads folder in the root of the chosen database.

Also, the download process likely will download excess materials. You can curate them or leave them. The root domain folder that’s created should be sufficient to keep but we can’t guarantee something else may be needed.

Additionally, the internal structure of a site could easily vary, so there’s not a “one size fits all” setting here.

And lastly, the issues of copyrights and intellectual property must be considered. There is certainly grey area in some of these matters, so we aren’t advocating downloading entire websites without thinking about these things.

Be considerate; be good.