I would love to have DA scrape metadata from a page that interested me.
For instance, here is a science news story from BBC News (which I already have in a DTPro database). tags in the header contain (among other things) the publication date, the headline, the content type (the fact that it’s a story), and a description of the story. Two tags contain the reporter’s byline, and a at the bottom of the story contains the reporter’s email address. I would love to be able to capture all that metadata during download and have it mapped automatically to Dublin Core and FOAF elements.
I imagine I could script the extraction of this metadata myself after download (I’m already looking at how to do this to my existing DTPro 1.5.4 records) but it would be more efficient to scrape the page at download. Also in the interest of efficiency, scraping wouldn’t have to happen during the initial search, but could be an option after I’d read a page summary in DA, maybe even a background process for pages in the archive.
There. It’s a big request, but I believe it’s going to become more important as interest grows in repurposing data for semantic publishing.