hoard.it : bootstrapping the NAW

Well done guys, only looked at Mike’s description about, and the pre-alpha months ago, so this is an initial response (from a techie again, sorry!)

As you know, I did some playing with the idea of a microformat or some sort of POSH for museum objects, and using a bookmarklet to let users build and annotate their own collection of stuff, sort of like a domain-specific but richer del.icio.us (like hoard.it, an API is a key aim for accessing the data. Difference is I didn’t get around to making it…). The drawback of that approach, of course, is that the web techs need to put their data out in that format. On the other hand, it’s possibly more flexible than a (basic) screen scraping approach, in that the structure for your POSH should be (a) explicitly documented somewhere and (b) more like a schema than a template (which might say “look for the div just after the header, take the H1 and paras in it and do….” – microformats give you the confidence to make more sophisticated transformations).

What you’ve done get’s over a huge bump by avoiding the need for museum web techs to actually do anything to their HTML (a lot of the time) – instead the hard work is done by whoever makes the templates. So this is a great kick-start. I wonder, though, whether once hoard.it has shown the community the benefits of client-side gathering by bookmarklets, or server-side spidering, the next step might be to try to settle on and evangelise some POSH conventions. My experiments were aimed at requiring a minumum – one could just indicate the existence of a better source of data on an object (perhaps the metadata in the head of a page, or some XML source) and the gatherer could grab that instead of the embedded POSH, and the plan was for it then to be able to accept either the POSH format OR something like CDWALite, SPECTRUM XML etc.

So, do you think there’s scope for POSH to push this further?