What to do about Facebook?

Ah, Facebook.

On the one hand:

…this is the single most dynamic, engaged and engaging platform for user generated content that there has ever been. 500 million people, converging on a single web application. Wait, read that again – 500 MILLION people. That’s a noticeable chunk of the entire global population.

That’s a totally, utterly and completely insane amount of user penetration. And when you use it (I don’t, much, but I watch my wife and her friends and I dip in to see what is happening every so often) – it’s obvious why. Facebook is slick, it’s user-focused and it’s all about the connections. Critical mass + friends + photos? Of course it works.

On the other hand:

…Facebook is regularly cited – actually, cited is way too gentle a word – screamed about – for being EVIL. Reasons vary, but they tend to focus on what is seen as a hugely lax approach to privacy. Actually, it’s layer upon layer of laxness – from totally baffling privacy controls to requiring a PhD to delete your account to the latest “facetracking by default” functionality. It’s a general “don’t give a ****” thing, it appears.

When it isn’t privacy, it’s concerns about “domination of the web” (particularly now things such as the Like Button are out in the wild) or how closed their so-called “Open Graph” is in reality, or the possibility that Zuckerberg did something wrong once or – well, go read “10 reasons to delete your Facebook account” for some more.

And here’s the tension, beautifully summed up by Jason Scott in a stunningly entertaining rant about Facebook. Cover your body organs if you’re of a sensitive disposition:

People aren’t just eating Facebook’s Shit Sherbet of overnight upgrades, of lack of guarantees and standards, of enveloping tendrils of web standard breaking. They are shoveling it down. They’re grabbing two crazy handfuls of Facebook every minute of every day when they’re not forced to walk down a hallway or look up from their phones or ipads or laptops or consoles. They’re grabbing buckets of Facebook and finding ways to shove it down with one hand while pawing around for a second bucket.  People have bought the fuck in.

So what to make of this? For someone like me – a generalist who straddles two very different groups of people – the tension is very often felt. I have geeks in one ear talking about open standards, pushing for privacy controls and hoping upon hope that the Semantic Web will get here one day. In the other ear I hear people who couldn’t give a monkeys about open standards, probably have “password” as their password, and seem remarkably relaxed about posting pictures of themselves hunched over a bucket bong. With these people there’s no denying the pleasure, the engagement, the rich content and the opportunities that Facebook offers.

On thing seems sure: rant as much as you like, but there’s no escaping. Facebook – in fact, big companies of all sizes – will dominate the internet landscape for a long time to come, and they’ll always find success.  There’s a reason why there is no alternative to Twitter, no alternative to Google and no alternative to Facebook: these are the places where everyone goes. It’s horrible, and uncomfortable, and we all wish people weren’t so terribly dumb, but the fact of the matter is – people choose social, and they do it at the expense of – well, lots of things: privacy, openness, safety. The utility of these tools is easy to underestimate in the general scheme of things, especially if you’re a geek – but utility, ease, sociability are the non-geek world’s open standards, the defining shape of their lives.

This seems to be the sting in the tail of large-scale social web activity. In order for it to be compelling, it requires a large social graph. In order for a large social graph to work, you normally need a big company or concern behind the scenes. Where there’s a big company, there’s money. Where there’s money, ethics almost always start being eroded. Bang.

I don’t think anyone should be under any illusions that everyone is going to delete their Facebook account (when they can work out how) just yet. Learning, awareness, going into this with eyes (yours, or your institutions’) seems to be the only possible answer to the question in the title. A moderate, not polarised approach.

Clearly I’m getting old.

The Brooklyn Museum API – Q&A with Shelley Bernstein and Paul Beaudoin

The concept and importance of museum-based API’s are notions that I’ve written about consistently (boringly, probably) both on this blog and elsewhere on the web. Programmatic and open access to data is – IMO – absolutely key to ensuring the long-term success of online collections.

Many conversations have been going on about how to make API’s happen over the last couple of years, and I think we’re finally seeing these conversations move away from niche groups of enthusiastic developers (eg. Mashed Museum ) into a more mainstream debate which also involves budget holders and strategists. These conversations have been aided by metrics from social media sites like Twitter which indicate that API access figures sometimes outstrip “normal web” browsing by a factor of 10 or more.

On March 4th 2009, Brooklyn Museum announced the launch of their API, the latest in a series of developments around their online collection. Brooklyn occupies a space which generates a fair amount of awe in museum web circles: Shelley Bernstein and team are always several steps in front of the curve – innovating rapidly, encouraging a “just do it” attitude, and most importantly, engaging wholly with a totally committed tribe of users. Many other museum try to do social media. Brooklyn lives social media.

So, as they say – without further ado – here’s Shelley and Paul talking about what they did, how they did it, and why.

Q: First and foremost, could you please introduce yourselves – what your main roles and responsibilities are and how you fit within the museum.

Shelley Bernstein, Chief of Technology. I manage the department that runs the Museum’s helpdesk, Network Administration, Website, gallery technology, and social media.

Paul Beaudoin, Programmer. I push data around on the back-end and build website features and internal tools.

Q: Can you explain in as non-technical language as possible what exactly the Brooklyn API is, and what it lets people do?

SB: It’s basically a way outside programmers can query our Collections data and create their own applications using it.

Q: Why did you decide to build an API? What are the main things you hope to achieve …and what about those age old “social web” problems like authority, value and so-on?

SB: First, practical… in the past we’d been asked to be a part of larger projects where institutions were trying to aggregate data across many collections (like d*hub). At the time, we couldn’t justify allocating the time to provide data sets which would become stale as fast as we could turn over the data. By developing the API, we can create this one thing that will work for many people so it no longer become a project every time we are asked to take part.

Second, community… the developer community is not one we’d worked with before. We’d recently had exposure to the indicommons community at the Flickr Commons and had seen developers like David Wilkinson do some great things with our data there. It’s been a very positive experience and one we wanted to carry forward into our Collection, not just the materials we are posting to The Commons.

Third, community+practical… I think we needed to recognize that ideas about our data can come from anywhere, and encourage outside partnerships. We should recognize that programmers from outside the organization will have skills and ideas that we don’t have internally and encourage everyone to use them with our data if they want to. When they do, we want to make sure we get them the credit they deserve by pointing our visitors to their sites so they get some exposure for their efforts.

Q: How have you built it? (Both from a technical and a project perspective: what platform, backend systems, relationship to collections management / website; also how long has it taken, and how have you run the project?)

PB: The API sits on top of our existing “OpenCollection” code (no relation to namesake at http://www.collectiveaccess.org) which we developed about a year ago. OpenCollection is a set of PHP classes sitting on top of a MySQL database, which contains all of the object data that’s been approved for Web.

All that data originates in our internal collections management systems and digital asset systems. SSIS scripts run nightly to identify approved data and images and push them to our FreeBSD servers for processing. We have several internal workflow tools that also contribute assets like labels, press releases, videos, podcasts, and custom-cropped thumbnails. A series of BASH and PHP scripts merge the data from the various sources and generate new derivatives as required (ImageMagick). Once compiled new collection database dumps and images are pushed out to the Web servers overnight. Everything is scheduled to run automatically so new data and images approved on Monday will be available in the wee hours Tuesday.

The API itself took about four weeks to build and document (documentation may have consumed the better part of that). But that seems like a misleading figure because so much of the API piggy-backs on our existing codebase. OpenCollection itself – and all of the data flow scripts that support it – took many months to build.

Cool diagrams. Every desk should have some.

Cool diagrams. Every desk should have some.

Q: How did you go about communicating the benefits of an API to internal stakeholders?

SB: Ha, well we used your hoard.it website as an example of what can happen if we don’t! The general discussion centered around how we can work with the community and develop a way people can can do this under our own terms, the alternative being that people are likely to do what they want anyway. We’d rather work with, than against. It also helped us immensely that an API had been released by DigitalNZ , so we had an example out there that we could follow.

Q: It’s obviously early days, but how much interest and take-up have you had? How much are you anticipating?

SB: We are not expecting a ton, but we’ve already seen a lot of creativity flowing which you can check out in our Application Gallery. We already know of a few things brewing that are really exciting. And Luke over at the Powerhouse is working on getting our data into d*hub already, so stay tuned.

Q: Can you give us some indication of the budget – at least ballpark, or as a % compared to your annual operating budget for the website?

SB: There was no budget specifically assigned to this project. We had an opening of time where we thought we could slot in the development and took it. Moving forward, we will make changes to the API and add features as time can be allocated, but it will often need to be secondary to other projects we need to accomplish.

Q: How are you dealing with rights issues?

SB: Anything that is under copyright is being delivered at a very small thumbnail size (100px wide on the longest size) for identification purposes only.

Q: What restrictions do you place on users when accessing, displaying and otherwise using your data?

SB: I’m not even going to attempt to summarize this one. Here’s the Terms of Service – everyone go get a good cup of coffee before settling down with it.

Q: You chose a particular approach (REST) to expose your collections. Could you talk a bit about the technical options you considered before coming to this solution, and why you preferred REST to these others?

PB: Actually it’s been pointed out that our API isn’t perfectly RESTful, so let me say first that, humbly, we consider our API REST-inspired at best. I’ve long been a fan of REST and tend to gravitate to it in principal. But when it comes down to it, development time and ease of use are the top concerns.

At the time the API was spec’ed we decided it was more important to build something that someone could jump right into than something meeting some aesthetic ideal. Of course those aren’t mutually exclusive goals if you have all the dev time in the world, but we don’t. So we thought about our users and looked to the APIs that seemed to be getting the most play (Flickr, DigiNZ, and many Google projects come to mind) and borrowed aspects we thought worked (api keys, mindful use of HTTP verbs, simple query parameters) and left out the things we thought were extraneous or personally inappropriate (complicated session management, multiple script gateways). The result is, I think, a lightweight API with very few rules and pretty accommodating responses. You don’t have to know what an XSD is to jump in.

Q: What advice would you give to other museums / institutions wanting to follow the API path?

SB: You mean other than “do it” <insert grin here>? No, really, if it’s right for the institution and their goals, they should consider it. Look to the DigitalNZ project and read this interview with their team (we did and it inspired us). Try and not stress over making it perfect first time out, just try and see what it yields…then adjust as you go along. Obviously, the more institutions that can open their data in this way, the richer the applications can become.


Many, many thanks to Shelley and Paul for putting in the time to answer my questions. You can follow the development of the Brooklyn Museum collections and API over on their blog, or by following @brooklynmuseum on Twitter. More importantly, go build something cool 🙂

(Selling) content in a networked age

I’m just back from Torquay where I’d been asked to speak at the 32nd annual UKSG conference. I first came across UKSG more than a year ago when they asked me to speak at a London workshop they were hosting. Back then, I did a general overview of API’s from a non-technical perspective.

This time around, my presentation was about opening up access to content: the title “If you love your content, set it free?” builds on some previous themes I’ve talked and written about. Presenting on “setting content free” to a room of librarians and publishers is always likely to be difficult. Both groups are – either directly or indirectly – dependent on income from published works. I’m also neither publisher nor librarian, and although I spent some time working for Waterstone’s Online and know bits about the book trade, my knowledge is undoubtedly hopelessly out of date.

Actually, I had two very receptive crowds (thank you for coming if you were there!) and some really interesting debate around the whole notion of value, scarcity and network effects.

[slideshare id=1228656&doc=settingcontentfreeuksg2009final-090331123331-phpapp01]

Like any sector, publishers and librarians have their own language, their own agendas and their own histories of successes and failures. Also like any sector, they are often challenged to spend time thinking about the bigger picture. Day jobs are about rights and DRM, OPAC and tenure. They aren’t (usually) about user experience, big-picture strategy or considering and comparing approaches from other sectors.

What I wanted to do with the presentation was to look at some of the big challenges which face (commercial) material in the networked world by thinking a bit more holistically about people’s relationship with that content, and the modes of use that they apply to the stuff that they acquire via this networked environment.

The – granted, rather challenging – title of the presentation is actually a question cunningly disguised as a statement. Or maybe it’s a statement cunningly disguised as a question. I lost track. The thing I was trying to do with this questatement (and some people missed this, more fool me for being too subtle) was to say: “Look, here’s how many people are talking about content now: they’re making it free and available; they’re encouraging re-use; they’re providing free and open API’s. They’re understanding that users are fickle, content-hungry and often unfussy about the origin of that content. What, exactly, do we do in an environment like this? What are the strategies that might serve us best? Can we still sell stuff, and if so, how?”

The wider proposition (that content fares rather better when it is freed on the network than when it is tethered and locked down) is a source of fairly passionate debate. I’ve written extensively about Paulo Coehlo’s experiments in freeing his books, about API’s, about “copywrong“, about value, authority and authenticity. The suggestion that if you free it up you will see more cultural capital is starting to be established in museums and galleries. The suggestion that you might, just might, increase your financial capital by opening up is for the most part considered PREPOSTEROUS to publishers. Giving away PDF’s increases book sales? Outrageous. Apart from the only example I’ve actually seen documented, of course, which is Coehlo’s, and that seems to indicate a completely different story.

There are fine – and all the finer the closer you examine them – levels of detail. Yes, an academic market is vastly different from a popular one: you don’t have the scale of the crowd, the articles are used in different ways, the works are generally shorter, the audiences worlds apart. But nonetheless, Clay Shirky’s robust (if deeply depressing) angle on the future – sorry, lack of future – of the newspaper industry needs close examination in any content-rich sector. I don’t think anyone can deny that the core proposition he holds up – that the problems that (newspaper) publishing solves (printing, marketing and distribution) are no longer problems in the networked age. I don’t think that what he’s saying is that we won’t have newspapers in the future, and he’s definitely not saying that we won’t need journalists. What he is saying – and this was the angle I focused on in my slides – is that this change is akin to living through a revolution. And with this revolution needs to come revolutionary responses and understanding that the change is far bigger and more profound than almost anyone can anticipate. The open API is one such response (The Guardian “Open Platform” being an apposite example). Free PDF’s / paid books is another. Music streaming and the killing of DRM is another.

Revolutions are uncomfortable. The wholesale examination of an entire industry is horrifically uncomfortable. Just take a look at the music business and you’ll see a group of deeply unhappy executives sitting around the ashes of a big pile of CD’s as they mourn the good ‘ole times. But over there with music, new business models are also beginning to evolve and emerge from these ashes. Spotify is based on streaming, Last.fm is based on social, Seeqpod is a lightweight wrapper for Google searches, The Pirate Bay ignores everyone else and provides stuff for free.

Which ones are going to work? Which ones will make money? Which ones will work but displace the money-making somewhere else? The simple answer, of course, is that no-one really knows. Some models will thrive, others will fail. Some will pave a new direction for the industry, others we’ll laugh at in five years time.

So where can the answers be found? Predictably for me, I think all sectors (including academic publishing!) need to take a punt and do some lightweight experimentation. I think they need to be trying new models of access based around personalisation, attention data and identity. They need to examine who gets paid, how much and when. They need to be setting stuff free in an environment where they can measure – effectively – the impact of this freedom across a range of returns, from marketing to cultural to financial. If they do this then they’re at least going to have some solid intelligence to use when deciding which models to take ahead. And it may be that this particular industry isn’t as challenged as most people assume, and that the existing models can carry on – lock it down, slap on some DRM, charge for access. It’d be far less uncomfortable if this was the case. But at least that decision would be made with some solid knowledge backing it up.

Open Access is one clear way of forging this debate ahead. But once you get under the apparently simple hood of the OA proposition, it actually turns out that not only are many institutions simply ignoring guidelines to produce OA versions of published works but that the payment models are complicated and based on a historical backdrop which to many seems inherently broken. I’d be interested to hear from someone with way more knowledge than me on the successes and failures or market research done on setting content free in this way.

It was clear to me in talking to a range of people at UKSG – librarians, publishers, content providers – that there are huge swathes of evidence missing – surprising, perhaps, from sectors which pride themselves on accuracy and academic rigour. When I asked “how many people aren’t coming to your site because search engines can’t see your content?” or “what is your e-commerce drop-out rate?” or “how much of your stuff do you estimate is illegally pirated?”, very few had coherent – (or even vague) (or any!) – answers.

More telling, perhaps, is that the informal straw poll question I posed to various people during the conference: “Do you feel that this is a healthy industry?” was almost always answered with a negative response. And when I asked why, the near-consistent reply was: “It’s too complicated; too political; too entangled” or from one person: “the internet has killed us”.

I’m really not as naive as I sometimes appear 🙂 I know how terribly, terribly hard it is to unpick enormous, political and emotive histories. When I suggest that “we need to start again”, I’m obviously not suggesting that we can wipe the slate clean and redefine the entire value proposition across a multi-billion dollar, multi-faceted industry. But I think – simply – that awareness of the networked environment, a knowledge of how people really use the web today and an open mind that things might need to change in profound ways are very powerful starting points in what will clearly be an ongoing, fraught and fascinating discussion.

Lights, bushels.

Brian has written a short post about universities actively trying to stop promotional material (yes – promotional material) finding freedom on the web. How funny is that?

On a related note, Sarah Perez from ReadWriteWeb did a post a couple of days ago about hidden image resources in the so called “deep web”. The list of links is great – I particularly like Calisphere and this collection of the 1906 SF earthquake. Lovely.

A couple of things though – first, surely Perez is wrong to suggest that these images are “the deep web”? I did a couple of tests looking for images via Google and it all seemed to be spidered ok. This one for instance was found via a Google search for the image title. It also appears on Google Image Search. Granted, you’d likely not find it given the quantity of other stuff, but it is definitely being spidered, so to me that means it’s not Deep Web. I may have missed something..

The finer point is more interesting, which is about what these institutions have done (or not) to promote these exceptionally fine collections. I haven’t looked into it any further in these cases but it’s familiar territory (you know, the whole open content, CC licensing, Flickr-usage, watermarking, marketing gubbins).

That’s where it comes back to Brian’s post – the content is great, the hard work has been done: the digitisation, the cataloguing, the site design. Then at the last hurdle, fear seems to strike. Better hide the content, you know, in case someone – like – uses it.

Go figure.

Museums and the Web day 3 (or day 1..)

Ok. It’s opening plenary time here at Museums and the Web 2008. I didn’t manage to do any blogging yesterday – that’s what an entire day of workshops followed by immediate dinner and wine does to you…

Michael Geist is the guest speaker: “technology advocate and trouble maker”. I like him already 🙂

Michael spent his talk going through a number of sites and examples, some of which will be very familiar to us web types; others a little less well known. The examples which particularly jumped out for me (for two different reasons) were the Facebook group Fair Copyright for Canada which was started by Michael, and his example of opening up the book “In the Public Interest” for free download.

The Facebook group example was particularly powerful because it caused demonstrable change in the real world. This was actually a running thread through many of the sites that Michael showed: virtual experiences are one thing, but “real” world responses to these virtual experiences are happening too, and that’s a hugely important thing to focus on. I’ve used this to defend Twitter recently (yes, I know the irony, having said bad things about lifestreaming before…) – Twitter has recently got me back in touch with people out here in the real world, and that gives it a legitimacy and power that it doesn’t necessarily have “just” online.

The “In the Public Interest” example demonstrated (although Michael didn’t give any actual figures) that free download actually increased sales. I like this because it continues to support the Scarcity vs Scale argument which I’ve pitched on this blog previously. It’s a very pertinent discussion; Brian and I are giving a paper on Openness on Friday at which we’ll be focusing on open content (among other things). Already this week – and in my experience, always within the sector – this discussion rumbles alongside most things we try to do on the web: API provision, Web 2.0, UGC or getting collections databases online. The more evidence there is that this approach works (or not!), the better.

The overriding message from Michael for me is that online activity causes, extends, pushes “real” activity in very valuable and increasingly tangible ways.