hoard.it : bootstrapping the NAW

Hey Jeremy

I’m not the one who built the templates or “shape” definitions, so DZ might be better placed to answer…but certainly in my experience (and I think in Dan’s build, too), the more structured the data, the better. On sites like http://www.ingenious.org.uk for example, we (NMSI) built in metadata into the HEAD and this makes it a breeze for hoard.it to parse out content as the original site intended, rather than as we (hoard.it) are interpreting it. I also heard him swearing at the British Museum markup which was hugely inconsistent from page to page… 🙂

Certainly in my original idea, the system was envisaged to work with a cascading approach. So, for instance, it would look in the first instance for evidence of full structured data associated with the page – say RDF, API or RSS, use that if found; if not, look for microformatty stuff then POSH, finally defaulting to “unstructured” HTML if none of the above was found.

I think in this instance, because of the rapid dev, the templates don’t do any of this automatically but it is determined when DZ writes the template for each site. So in the Ingenious examples, Dan looked at the source and saw that structure, so used it to drag out the data. In other sites, it didn’t exist so he just used the lowest common denominator – HTML markup.

We are now thinking about ways in which you could relate the structured content and the markup. That’s one of our next stages, and would be great to talk to you about your aspirations, too…

Cheers

Mike