Scraping, scripting, hacking

I just finished my talk at Mashed Library 2009 – an event for librarians wanting to mash and mix their data. My talk was almost definitely a bit overwhelming, judging by the backchannel, so I thought I’d bang out a quick blog post to try and help those I managed to confuse.

My talk was entitled “Scraping, Scripting and Hacking your way to API-less data”, and intended to give a high-level overview of some of the techniques that can be used to “get at data” on the web when the “nice” options of feeds and API’s aren’t available to you.

The context of the talk was this: almost everything we’re talking about with regard to mashups, visualisations and so on relies on data being available to us. In the cutting edge of Web2 apps, everything has got an API, a feed, a developer community. In the world of museums, libraries and government, this just isn’t the case. Data is usually held on-page as html (xhtml if we’re lucky), and programmatic access is nowhere to be found. If we want to use that data, we need to find other ways to get at it.

My slides are here:

[slideshare id=1690990&doc=scrapingscriptinghacking-090707060418-phpapp02]

A few people asked that I provide the URLs I mentioned together with a bit of context. Many of the slides above have links to examples, but here’s a simple list for those who’d prefer that:

Phew. Now I can see why it was slightly overwhelming ūüôā

If you love something, set it free

Last week, I had the¬†privilege of being asked to be one of the¬†keynote speakers at a conference in Amsterdam called Kom je ook?. This translates as “Heritage Upgrade” and describes itself as “a symposium for cultural heritage institutions, theatres and museums”.

I was particularly excited about this one: firstly, my partner keynoters were Nina Simon (Museum Two) and Shelley Bernstein (Community Manager at the Brooklyn Museum) – both very well known and very well respected museum and social web people. Second (if I’m allowed to generalise): “I like the Dutch” – I like their attitude to new media, to innovation and to culture in general; and third – it looked like fun.

Nina talked about “The Participatory Museum” – in particular she focussed on an oft-forgotten point: the web isn’t social technology per se; it is just a particularly good tool for making social technology happen. The fact that the online medium allows you to track, access, publish and distribute are good reasons for using the web BUT the fact that this happens to populate one space shouldn’t limit your thinking to that space, and shouldn’t alter the fact that this is always, always¬†about people and the ways in which they come together. The changing focus of museum moving from being a content provider to being a platform provider also rang true with me in so many ways. Nina rounded off with a “ten tips for social technology” (slide 12 and onwards).

Shelley gave another excellent talk on the incredible work she is doing at the Brooklyn Museum. She and I shared a session on Web2 at Museums and the Web 2007, and once again it is the genuine enthusiasm and authenticity which permeates everything she does which really comes across. This isn’t “web2 for web2’s sake” – this is genuine, pithy, risky, real content from enthused audiences who really want to take part in the life of the museum.¬†

My session was on setting your data and content free:

[slideshare id=768086&doc=mikeellisifyoulovesomethingsetitfreefinal-1227110930707512-9&w=425]

Hopefully the slides speak for themselves, but in a nutshell my argument is that although we’ve focussed heavily on the social aspects of Web2.0 from a user perspective, it is the stuff going on under the hood which really pushes the social web into new and exciting territory. It is the data sharing, the mashing, the API’s and the feeds which are at the heart of this new generation of web tools. We can resist the notion of free data by pretending that people use the web (and our sites) in a linear, controlled way, but the reality is we have fickle and intelligent users who will get to our content any which way. Given this, we can either push back against freer content by pretending we can lock it down, or – as I advocate – do what we can to give user access to it.