Are synapses intelligent?

It’s hard not to be fascinated by the emerging and developing conversations around museums and the Semantic Web. Museums, apart from anything else, have lots of stuff, and a constant problem finding ways of intelligently presenting and cross-linking that stuff. Search is ok if you know what you’re looking for but browse as an alternative is usually a terribly pedestrian experience, failing to match the serendipity and excitement you get in a physical exhibition or gallery.

During the Museums and the Web conference, there was a tangible thread of conversation and thought around the API’d museum, better ways of doing search, and varied opinions about openness and commerce, but always there was the endless tinnitus of the semantic web never far away from people’s consciousnesses.

As well as the ongoing conversation, there were some planned moments as well, among them a workshop run by Eric Miller (ex. W3C sem web guru), Ross Parry‘s presentation and discussion of the “Cultural Semantic Web” AHRC-funded think tank and the coolness of Open Calais being applied to museum collections data by Seb Chan at the Powerhouse (article on ReadWrite Web here – nice one Seb!).

During the week I also spent some time hanging out with George Oates and Aaron Straup Cope from Flickr, and it’s really from their experiences that some thoughts started to emerge which I’ve been massaging to the surface ever since.

Over a bunch of drinks, George told me a couple of fairly mind-blowing statistics about the quantity of data on Flickr: more than 2 billion images which are being uploaded at a rate of more than 3 million a day….

What comes with these uploads is data – huge, vast, obscene quantities of data – tags, users, comments, links. And that vat of information has a value which is hugely amplified because of the sheer volume of stuff.

To take an example: at the individual tag level, the flaws of misspellings and inaccuracies are annoying and troublesome, but at a meta level these inaccuracies are ironed out; flattened by sheer mass: a kind of bell-curve peak of correctness. At the same time, inferences can be drawn from the connections and proximity of tags. If the word “cat” appears consistently – in millions and millions of data items – next to the word “kitten” then the system can start to make some assumptions about the related meaning of those words. Out of the apparent chaos of the folksonomy – the lack of formal vocabulary, the anti-taxonomy – comes a higher-level order. Seb put it the other way round by talking about the “shanty towns” of museum data: “examine order and you see chaos”.

The total “value” of the data, in other words, really is way, way greater than the sum of the parts.

This is massively, almost unconceivably powerful. I talked with Aaron about how this might one day be released as a Flickr API: a way of querying the “clusters” in order to get further meaning from phrases or words submitted. He remained understandably tight-lipped about the future of Flickr, but conceptually this is an important idea, and leads the thinking in some interesting directions.

On the web, the idea of the wisdom of crowds or massively distributed systems are hardly new. We really is better than me.

I got thinking about how this can all be applied to the Semantic Web. It increasingly strikes me that the distributed nature of the machine processable, API-accessible web carries many similar hallmarks. Each of those distributed systems – the Yahoo! Content Analysis API, the Google postcode lookup, Open Calais – are essentially dumb systems. But hook them together; start to patch the entire thing into a distributed framework, and things take on an entirely different complexion.

I’ve supped many beers with many people over “The Semantic Web”. Some have been hardcore RDF types – with whom I usually lose track at about paragraph three of our conversation, but stumble blindly on in true “just be confident, hopefully no-one will notice you don’t know what you’re talking about” style. Others have been more “like me” – in favour of the lightweight, top-down, “easy” approach. Many people I’ve talked to have simply not been given (or able to give) any good examples of what or why – and the enduring (by now slightly stinky, embarassing and altogether fishy) albatross around the neck of anything SW is that no-one seems to be doing it in ways that anyone ~even vaguely normal~ can understand.

Here’s what I’m starting to gnaw at: maybe it’s here. Maybe if it quacks like a duck, walks like a duck (as per the recent Becta report by Emma Tonkin at UKOLN) then it really is a duck. Maybe the machine-processable web that we see in mashups, API’s, RSS, microformats – the so-called “lightweight” stuff that I’m forever writing about – maybe that’s all we need. Like the widely accepted notion of scale and we-ness in the social and tagged web, perhaps these dumb synapses when put together are enough to give us the collective intelligence – the Semantic Web – that we have talked and written about for so long.

Here’s a wonderful quote from Emma’s paper to finish:

“By ‘semantic’, Berners-Lee means nothing more than ‘machine processable’. The choice of nomenclature is a primary cause of confusion on both sides of the debate. It is unfortunate that the effort was not named ‘the machineprocessable web’ instead.”

Museums and the Web 2008: roundup

Ok. Obviously the intention was to live-blog the sessions I went to during Museums and the Web, but in the end it all comes down (unfortunately) to time, of which there simply isn’t enough (except when waiting for a damn plane). I’m working on an API using a RESTful approach to sort this out but I’m having trouble with the bending of spacetime and a glitch in vbscript which means you can’t get at the right bit of the EnergyEquivalence 2.0 DOM. Bear with me. Maybe it’s better in Ruby…

Anyway. Here’s some highlights for me, in rough order of appearance:

No API? FOI…

Frankie Roberto (Science Museum) and Seb Chan (Powerhouse) gave a hugely entertaining and interesting talk within the topic area “Aggregating Museum Data”. David Bearman introduced it: “I’m not supposed to be biased, but this is my favourite session..”

Frankie’s approach is outlined in his paper, but briefly he asked the question “what if we look at the aggregate of museum collections instead of the detail?”. He got a bunch of data from several museums by submitting a Freedom Of Information request. There were some great moments: the matrix of which museums responded (most didn’t) was one of them; the final application display using Google Maps was another. But most of all he also coined the phrase “Good Enough” around museum data, which is very much aligned with my philosophy of “just do it”.

Seb showed some awesome stuff using Open Calais on museum collections at the Powerhouse and a whole load of other cool stuff around geo-rss, OpenSearch and so-on. He also came out with some great sound-bites: “look closely at order and you see mess” and “tagging: it’s a bit 2007″…

One thing that I really liked was a checkbox he had built into the CMS next to machine-generated data which asked human editors: “has a human verified this data?”. A nice touch, and presumably useful not only for checking (in an aggregate sense) how accurate the machine has been but also possible to tweak the final UI accordingly: “this data is machine generated, don’t trust it quite as much…”, or whatever.

Again, very interesting and eye-opening. Funny, too – I loved the fact that “Ray Oscilloscope” had been identified by the semantic engine as a person…it may become my new pseudonym…

Openness

On Friday, Brian and I ran a session on Openness. The people at the session were great: It was a lively and engaging debate, looking at some of the questions around openness in the museum community; how we measure value; how financial gain can be held up next to marketing exposure and so on. Seb made a great point which stayed with me about how museums have got into the habit of ascribing value to individual objects rather than to the bit which actually adds value: the context, the exhibition, the experience.

Search and semantics

Two more sessions stood out for me: first, the NMOLP presentation from the V&A in London. I have a number of concerns about the general approach this project is taking, but on the plus side they’re looking at OpenSearch to deliver cross-museum searching, and that’s (hopefully) going to be a good thing. I just hope that the Google Coop example I put up at http://www.museumcollections.org.uk/ a while back can be beaten: the point of me doing this was partly to illustrate the ease with which groups of museums can be added to cross-domain search. I’m worried about NMOLP developing their own search ranking protocol, for example, when there’s a pretty good one out there in the shape of PageRank and the Google Enterprise. I’m sure they know what they’re doing, and look forward very much to the end result. Let’s hope it’s got a public API 🙂

Nate did a rather better post on this session over here with some interesting comments, too.

The final one I’m going to post about here is a session on the Delphi Toolkit which was great because it illustrated with real world examples what these kinds of emerging semantic technologies do for the end-user. And I think the SW is an area badly lacking in examples.

Closing Plenary

The whole conference closed with what I thought was a very disappointing plenary from Clifford Lynch. Obviously only a personal opinion, but I felt that after a hugely positive, buzzing and engaging week, this was a very slow, low-energy and – most importantly – misrepresentative wrap-up to what had gone on. (I also felt at several points that he was just plain wrong about some of the stuff he talked about…)

Here’s my “direction of travel” gut feel for what actually went on during the week:

  • We’re doing some very cool stuff using some great new approaches and technologies.
  • We’re starting to see the benefits of open access to our content, both in terms of Creative Commons and programmatic access via API’s or syndication.
  • We’re – at last – worrying less and doing more.
  • We’re beginning to see the benefits of community, not just the coolness.
  • Finally: we’re up for collaborating and sharing in more open and positive ways than ever before.

So that’s that. Now I’m in an airport, heading homeward. Bye for now…

Happy Birthday, Electronic Museum…

//www.flickr.com/photos/diongillard/2402287771/12th April 2008 was the Electronic Museum’s 1st birthday. All together now, hip-hip…etc.

I started blogging at the beginning of the Museums and the Web conference on April 12th 2007. Even then I was seriously late into the game: many other bloggers have been posting since 2000 or even earlier. I’d held off for 3 main reasons:

First, if you haven’t blogged then you don’t understand the drug that it is. I started never really intending to continue, but sucked the smoke deep and never looked back.

Second, I genuinely felt (and still feel!) that there are issues within the tech and museum sectors that need attention: in other words, I have something to say. Hopefully you agree…

Finally, I spent at least a year being put off by the technology. Like many first bloggers I began with Blogger.com. Back then (and I haven’t looked recently), it was a clumsy tool; basic functionality (actually, I have to say, blocking technology), bad templates, etc. I dabbled with TypePad (ouch, unless you’ve got a degree in CompSci) and then finally settled on WordPress. It is a genius bit of user-centredness and like many great bits of tech, actually encourages you to step in and use it.

In case you’re interested and are a bit of a stat-head: just the other day I passed the 20,000 views mark; I have written 135 posts which have gathered 307 comments. Scarily, I’ve had more rogue comments caught by Akismet than visits: currently 22,263.

So. Just remains for me to say thank you for reading and commenting. Please keep doing so, and feel free to let me know whether there’s stuff you’d like me to focus on (or not focus on). It’ll be interesting to see what the coming year brings 🙂

(Thanks to diongillard for the image)

API: “the nubby bits on Lego”

Aaron Cope from Flickr gave a good talk this morning entitled “The API As Curator” which meandered its way around but contained some gem quotes and ideas:

“once upon a time I was a painter, and then the web happened”

“you do art to share it”

“the web: it seemed a perfect way around the gallery system, which as an artist is the bane of your life”

“I come in peace”

“making the web’s plumbing non-scary”

“if you’re talking about the web then sooner or later you have to talk about computer programming”

and my favourite one of all:

“An API is the nubby bits on Lego”

He focused in general on the importance of the both the developer as permanent and valued member of any creative web team, but also the process of development itself: the iterative, always changing, rapid-cycle and how important this is to anyone trying to remain innovative and creative online.

But “nubby bits” is still the piece that stays with me 🙂

Museums and the Web day 3 (or day 1..)

Ok. It’s opening plenary time here at Museums and the Web 2008. I didn’t manage to do any blogging yesterday – that’s what an entire day of workshops followed by immediate dinner and wine does to you…

Michael Geist is the guest speaker: “technology advocate and trouble maker”. I like him already 🙂

Michael spent his talk going through a number of sites and examples, some of which will be very familiar to us web types; others a little less well known. The examples which particularly jumped out for me (for two different reasons) were the Facebook group Fair Copyright for Canada which was started by Michael, and his example of opening up the book “In the Public Interest” for free download.

The Facebook group example was particularly powerful because it caused demonstrable change in the real world. This was actually a running thread through many of the sites that Michael showed: virtual experiences are one thing, but “real” world responses to these virtual experiences are happening too, and that’s a hugely important thing to focus on. I’ve used this to defend Twitter recently (yes, I know the irony, having said bad things about lifestreaming before…) – Twitter has recently got me back in touch with people out here in the real world, and that gives it a legitimacy and power that it doesn’t necessarily have “just” online.

The “In the Public Interest” example demonstrated (although Michael didn’t give any actual figures) that free download actually increased sales. I like this because it continues to support the Scarcity vs Scale argument which I’ve pitched on this blog previously. It’s a very pertinent discussion; Brian and I are giving a paper on Openness on Friday at which we’ll be focusing on open content (among other things). Already this week – and in my experience, always within the sector – this discussion rumbles alongside most things we try to do on the web: API provision, Web 2.0, UGC or getting collections databases online. The more evidence there is that this approach works (or not!), the better.

The overriding message from Michael for me is that online activity causes, extends, pushes “real” activity in very valuable and increasingly tangible ways.

Museums and the Web – Tuesday

So here I am in Montreal for Museums and the Web 2008. The journey was ok apart from the obligatory 2 hour delay out of Heathrow. Someone apparently spotted a snowflake on the runway so everything ground to a halt while they dispatched the emergency extreme weather squad to sort it out.

They know how to do weather over here. It’s obviously not snowed for a while but there are still remnant piles, 6-7 feet thick just knocking around the town. Show that to anyone in the UK and the transport infrastructure would have fallen apart in seconds.

So – this week at Museums and the Web: Today – pre conference Semantic Web workshop. Wednesday, I’m running a blogging workshop with Brian Kelly in which I’ll be talking about this blog: why I do it, how it’s going, what I’ve learnt. The afternoon is my workshop on mashups. Slides and stuff for all the above coming shortly.

Then Thursday the conference sessions start. Friday and I’m back in front of people with Brian for our paper ‘what does openness mean to museums?’.

Meanwhile, I’ve provided Jennifer and David with OneTag for the week – the aim in a nutshell is to try and capture the ‘buzz’ around the conference by aggregating any blog posts and tweets tagged ‘mw2008’ and do stuff with this content. J + D have found a bunch of willing volunteers to blog alongside the people like me who’d be doing it anyway. Basically, everyone is being encouraged to tag and post as much as possible.

Have a look at:

More later.

Introducing OneTag

You might have noticed I’ve been a bit quiet on the blog front for the last couple of weeks. This is because I’m having a drive to send some ideas partying and have therefore been knee-deep coding my latest project most evenings.

OneTag logoI’ve put together an idea for people who run conferences or events. It’s called OneTag (www.onetag.org). It’s very simple conceptually, although as I’m discovering, a complete *dog* to code… – the idea is that it aggregates all the “buzz” about a particular (live) event and then provides the means to view this in different ways. Find out more at http://www.onetag.org/ot/about.asp.

Usual “it’s a beta” disclaimers apply…

I’ve agreed with David Bearman and Jennifer Trant that I’ll be trialling the system during the Museums and the Web 2008 conference in Montreal.

I need your help…

First off, if you’re going to the conference and intend to blog, twitter or upload any photos then the global tag follows the same pattern as previous years and is therefore mw2008. If you’re blogging then just add this as a tag or category; if you’re twittering then please use the hashtag #mw2008 as part of your tweet.

Second, if you’re the owner of a blog or other social networking site, will be blogging about the conference and have feed addresses you can supply me with, then let me know in the comments or via email and I’ll add these to the OneTag aggregator.

Finally, if you’d like to get access to the mw2008 OneTag feeds and views to help me test them then do feel free to get in touch – again, via email if you know it or using the comments to this post. Alternatively, tweet me direct at http://twitter.com/dmje.

I’m at the stage where as many critical eyes as possible is going to help muchly..

Thanks in advance!

Scarcity vs scale

“Musical” blokeI’ve been finishing off the openness paper this week (taking me a long time to get my ideas together at the mo..) and doing some thinking around how you manage to still make money in this brave new world of free, open, readily available everything. Actually, let’s not call it making money but creating value, either in a financial or social sense.

Ian Rogers (Yahoo), who had posted before about the music industry tendancy to ostrich the very obvious problems of their industry (today highlighted by the EMI news of 2,000 redundancies) has written a looooooong but very insightful post about where it all goes from here.

The article is really worth taking the time to read in its entirity, but the bit which really caught my eye and got me thinking in terms of the whole commercial – value – assets – openness debate was the opening phrase, and title of the presentation:

“Losers wish for scarcity. Winners leverage scale”

Think about the importance of what is being said here for a minute: In the traditional world of marketing, selling, commerce, the value of something is largely determined by scarcity. This is still the way of the [physical] world in many ways today. We buy diamonds because they are rare; we phone a plumber because he has unique skills and knows how to fix the boiler better than we do; we go to museums to see things which we can’t see anywhere else.

The problem that the music industry has – and the cultural sector – is that once you move these endeavours online the entire equation changes shape, radically.

Whereas Amazon or other retailers with “real” product sit on top of the pile by increasing value both by leveraging scale (number of visitors buying books increasing incrementally as traffic increases) and scarcity (they are the ones who ship the books, which are themselves a product, and hence valuable by their scarcity..), the ones who have to think harder are those who have content as their product. That’s EMI, iTunes, The Guardian…and museums. Why? Because as soon as you put something on the web, it can (and is) duplicated, copied, downloaded, mashed and borrowed…

To date, the general response to opening museum content up – and yes, *gasp*, maybe making it free has been, understandably, “er. what? our images [other stuff] – free? certainly not”.

Let’s unpick this a little bit more. Instead of free, substitute “more free” – think about museums actively encouraging people to “borrow” images with “embed this in your blog/myspace page” links next to any assets displayed on page. This is effectively what web browsers (and certainly Google Images) do anyway, with a simple right click / copy-paste. Extend it and you’ve got an API model – “use this content on your website”. We as a sector know very very well that this happens already – I’ve talked before about the 9% referral figure we used to get on the Science Museum website from MySpace: all from embedded images. The point here is that people are doing this already whether we like it or not.

This is a limited example, but the point is that some kind of disruption is required to make this new market work for us. In the music industry, companies like Amie Street are breaking valuable new ground by defining new business models for (music) content. In this example, music tracks start off cheap/free and get more expensive the more they are downloaded. It’s a brilliant and highly respected model.

For museums, one of the first barriers to overcome is understanding what value the long tail has – when no museums carry on-page advertising (correct me if you know of one), we’re hard pushed to ascribe value to a page view. We’re still as a sector struggling with the basic notions of how to measure success, let alone confident enough to suggest that the commercial models we have might be wrong, or at least flawed.

I’m not (at the moment!) suggesting that we should close all our picture library operations: what I am saying is that the historical tendancy to be closed, guarded and scarce simply doesn’t work. It’s not just that users “abuse” this already (and we’d be spending a LOT of futile time trying to prevent the MySpace 9% from embedding our images), it’s that there is something really rather important going on here. Museums – we’ve said it before – are completely at home in the long tail.

Somewhere along the line we’ll understand the importance of embracing rather than denying the proliferation of copying, pasting, borrowing. To get there we need to be better at understanding what value is, and that’s hard.

[* image: curious 1950’s bloke outside Bath Habitat with a wicked foot-cranked guitar-playing machine and violin. Bet he hasn’t considered music rights.]

The progress of content

I’m just helping Brian Kelly author a paper on Openness in Museums for the Museums and the Web conference later in the year. It just stuck me that the movement of content around the web has followed / is following a pattern a little bit like this:

Phase I: content held as HTML within sites. Little or no interoperability. Content mostly viewed “on site”

Phase II: content held as XHTML within sites. Better markup means better SEO. Better SEO means that content starts to find its way out to the wider web

Phase III: content held as XHTML but also key bits of content (news in particular) syndicated out via RSS

Phase IV: content held as XHTML/XML; key segments syndicated via RSS (and some RDF) but additional movement of data via some “islands” of additional functionality such as API’s.

Phase V: content held as XHTML/XML, some/all syndicated via RSS, RDF, API’s but additional standards (oAuth, OpenSearch, Microformats etc) begin to ensure further interoperability between disparate sites

It’s a bit of a brain dump and please feel free to take it apart in the comments, but I thought I’d share it with you 🙂

I’d say most big commercial sites are firmly at Phase III but moving towards IV; museums are mostly at Phase II but moving (slowly!) towards Phase III…