The Brooklyn Museum API – Q&A with Shelley Bernstein and Paul Beaudoin

The concept and importance of museum-based API’s are notions that I’ve written about consistently (boringly, probably) both on this blog and elsewhere on the web. Programmatic and open access to data is – IMO – absolutely key to ensuring the long-term success of online collections.

Many conversations have been going on about how to make API’s happen over the last couple of years, and I think we’re finally seeing these conversations move away from niche groups of enthusiastic developers (eg. Mashed Museum ) into a more mainstream debate which also involves budget holders and strategists. These conversations have been aided by metrics from social media sites like Twitter which indicate that API access figures sometimes outstrip “normal web” browsing by a factor of 10 or more.

On March 4th 2009, Brooklyn Museum announced the launch of their API, the latest in a series of developments around their online collection. Brooklyn occupies a space which generates a fair amount of awe in museum web circles: Shelley Bernstein and team are always several steps in front of the curve – innovating rapidly, encouraging a “just do it” attitude, and most importantly, engaging wholly with a totally committed tribe of users. Many other museum try to do social media. Brooklyn lives social media.

So, as they say – without further ado – here’s Shelley and Paul talking about what they did, how they did it, and why.

Q: First and foremost, could you please introduce yourselves – what your main roles and responsibilities are and how you fit within the museum.

Shelley Bernstein, Chief of Technology. I manage the department that runs the Museum’s helpdesk, Network Administration, Website, gallery technology, and social media.

Paul Beaudoin, Programmer. I push data around on the back-end and build website features and internal tools.

Q: Can you explain in as non-technical language as possible what exactly the Brooklyn API is, and what it lets people do?

SB: It’s basically a way outside programmers can query our Collections data and create their own applications using it.

Q: Why did you decide to build an API? What are the main things you hope to achieve …and what about those age old “social web” problems like authority, value and so-on?

SB: First, practical… in the past we’d been asked to be a part of larger projects where institutions were trying to aggregate data across many collections (like d*hub). At the time, we couldn’t justify allocating the time to provide data sets which would become stale as fast as we could turn over the data. By developing the API, we can create this one thing that will work for many people so it no longer become a project every time we are asked to take part.

Second, community… the developer community is not one we’d worked with before. We’d recently had exposure to the indicommons community at the Flickr Commons and had seen developers like David Wilkinson do some great things with our data there. It’s been a very positive experience and one we wanted to carry forward into our Collection, not just the materials we are posting to The Commons.

Third, community+practical… I think we needed to recognize that ideas about our data can come from anywhere, and encourage outside partnerships. We should recognize that programmers from outside the organization will have skills and ideas that we don’t have internally and encourage everyone to use them with our data if they want to. When they do, we want to make sure we get them the credit they deserve by pointing our visitors to their sites so they get some exposure for their efforts.

Q: How have you built it? (Both from a technical and a project perspective: what platform, backend systems, relationship to collections management / website; also how long has it taken, and how have you run the project?)

PB: The API sits on top of our existing “OpenCollection” code (no relation to namesake at http://www.collectiveaccess.org) which we developed about a year ago. OpenCollection is a set of PHP classes sitting on top of a MySQL database, which contains all of the object data that’s been approved for Web.

All that data originates in our internal collections management systems and digital asset systems. SSIS scripts run nightly to identify approved data and images and push them to our FreeBSD servers for processing. We have several internal workflow tools that also contribute assets like labels, press releases, videos, podcasts, and custom-cropped thumbnails. A series of BASH and PHP scripts merge the data from the various sources and generate new derivatives as required (ImageMagick). Once compiled new collection database dumps and images are pushed out to the Web servers overnight. Everything is scheduled to run automatically so new data and images approved on Monday will be available in the wee hours Tuesday.

The API itself took about four weeks to build and document (documentation may have consumed the better part of that). But that seems like a misleading figure because so much of the API piggy-backs on our existing codebase. OpenCollection itself – and all of the data flow scripts that support it – took many months to build.

Cool diagrams. Every desk should have some.

Cool diagrams. Every desk should have some.

Q: How did you go about communicating the benefits of an API to internal stakeholders?

SB: Ha, well we used your hoard.it website as an example of what can happen if we don’t! The general discussion centered around how we can work with the community and develop a way people can can do this under our own terms, the alternative being that people are likely to do what they want anyway. We’d rather work with, than against. It also helped us immensely that an API had been released by DigitalNZ , so we had an example out there that we could follow.

Q: It’s obviously early days, but how much interest and take-up have you had? How much are you anticipating?

SB: We are not expecting a ton, but we’ve already seen a lot of creativity flowing which you can check out in our Application Gallery. We already know of a few things brewing that are really exciting. And Luke over at the Powerhouse is working on getting our data into d*hub already, so stay tuned.

Q: Can you give us some indication of the budget – at least ballpark, or as a % compared to your annual operating budget for the website?

SB: There was no budget specifically assigned to this project. We had an opening of time where we thought we could slot in the development and took it. Moving forward, we will make changes to the API and add features as time can be allocated, but it will often need to be secondary to other projects we need to accomplish.

Q: How are you dealing with rights issues?

SB: Anything that is under copyright is being delivered at a very small thumbnail size (100px wide on the longest size) for identification purposes only.

Q: What restrictions do you place on users when accessing, displaying and otherwise using your data?

SB: I’m not even going to attempt to summarize this one. Here’s the Terms of Service – everyone go get a good cup of coffee before settling down with it.

Q: You chose a particular approach (REST) to expose your collections. Could you talk a bit about the technical options you considered before coming to this solution, and why you preferred REST to these others?

PB: Actually it’s been pointed out that our API isn’t perfectly RESTful, so let me say first that, humbly, we consider our API REST-inspired at best. I’ve long been a fan of REST and tend to gravitate to it in principal. But when it comes down to it, development time and ease of use are the top concerns.

At the time the API was spec’ed we decided it was more important to build something that someone could jump right into than something meeting some aesthetic ideal. Of course those aren’t mutually exclusive goals if you have all the dev time in the world, but we don’t. So we thought about our users and looked to the APIs that seemed to be getting the most play (Flickr, DigiNZ, and many Google projects come to mind) and borrowed aspects we thought worked (api keys, mindful use of HTTP verbs, simple query parameters) and left out the things we thought were extraneous or personally inappropriate (complicated session management, multiple script gateways). The result is, I think, a lightweight API with very few rules and pretty accommodating responses. You don’t have to know what an XSD is to jump in.

Q: What advice would you give to other museums / institutions wanting to follow the API path?

SB: You mean other than “do it” <insert grin here>? No, really, if it’s right for the institution and their goals, they should consider it. Look to the DigitalNZ project and read this interview with their team (we did and it inspired us). Try and not stress over making it perfect first time out, just try and see what it yields…then adjust as you go along. Obviously, the more institutions that can open their data in this way, the richer the applications can become.

_______

Many, many thanks to Shelley and Paul for putting in the time to answer my questions. You can follow the development of the Brooklyn Museum collections and API over on their blog, or by following @brooklynmuseum on Twitter. More importantly, go build something cool 🙂

Limiting addiction

Before the brave new world of cloud computing, selling and buying software was a pretty straightforward thing. It’d either be shareware, in which case you’d download and walk away, or there would be some kind of time- or function-limited demo which you’d (if you liked it) upgrade at some point in the future.

Since stuff went cloudy, life has got a little bit more complicated in the world of the software business model. Recently, I’ve happened across a number of services that approach their business models in different ways, and I thought it’d be interesting to compare and contrast.

Pic from http://tinyurl.com/5hwvvp

One of the models that has now become popular carries the buzzword “Freemium”. It’s essentially not a whole lot different from a downloaded bit of software which is functionally crippled in some way. In the Freemium model, something is provided which allows the end user access to free stuff, but a lack ultimately convinces them to upgrade to some kind of paid – “premium” – service. A classic example is the widely-lauded hosted project management software by 37 Signals, Basecamp. You’re encouraged to sign up for free. Your account then gives you access to a limited version of the software. In this particular instance, you get a single project and no file uploads, but apart from that everything is the same as the paid version. The unpaid version gives you enough of a glimpse into the functionality and usefulness of the tool to realise that the additional functionality provided by the paid version is likely to be useful. 

Another example is SugarSync, the cloud file-syncing service. Interestingly, while they used to have a “get 2Gb free” freemium model, they have now reverted to “10Gb for 45 days” on their free plan. We’ll examine why in a moment.

The Freemium model is based around something I’m going to call limiting addiction. The service provider is attempting to find a fine balance between provision and lack of provision of service. In the Basecamp example, 37 Signals are looking for users to find the service useful enough that they see the value, but not so useful that they’ve got enough with the free version. In this particular example, they’ve got things right: a single user without file upload ability gets you far enough to see the value of the service, but it isn’t enough that you can actually run any projects usefully. Net result? You upgrade to premium.

Let’s examine SugarSync now. When they first launched, they offered a time-unlimited 2Gb of storage space under a free account. I don’t know the inside track, but I’m betting that users found the service too useful – 2Gb is after all a fair amount of synced disk space – and I’ll bet that not enough of them were upgrading to the premium editions. The Freemium model in this case offered too much for nothing.

In the last couple of days I’ve signed up for a service called Spotify. It’s a downloadable app which lets you stream pretty much any music on demand to your desktop. I love the service, possibly more than anything I’ve come across this year – I’m an avid last.fm and Seeqpod fan, but Spotify goes a step further – it is fast, reliable, content rich. It is pretty much the perfect music application for me. The Spotify business model is interesting: you can get music for free, but it is ad supported: every 3-4 songs there is an audio advert. To remove this you can either pay 99p a day or £9.99 a month. 

All well and good. But (and I hate to say this in case they’re reading..), Spotify is offering too much for free. The ads just aren’t annoying enough or frequent enough for me to bother paying the premium. The value is – like with the original SugarSync model – too high. I’m addicted, but not limited by the free version of the software. I’m a customer waiting to happen: If those ads were more in my face – if the software was more limiting – I’d almost definitely pay to get rid of them. I’m sure Spotify will get this right in the end: they’re currently still in closed beta and are very early into their market, and I’m delighted that I get so much value for nothing for the time being. If they’re savvy, though, they’ll continuously tweak their business plan as time goes on until they have the perfect freemium balance. And what is that balance? Well, a high addiction -> limitation -> upgrade rate.

There’s one more example that I’d like to look at, and this one fails for reasons that I’ll highlight in a minute. This example is a web prototyping tool called Protoshare. Rather than opting for a Freemium / functionally limited model, they’ve gone for a 30 day free trial. During that 30 days, you get full access to all the functionality – but after that you have to pay.

Again, all well and good. However, there’s a subtlety here which I think Protoshare have missed. It’s this: People like me who do IA work for a living tend to do stuff on a client/project basis. This means I have two options when it comes to Protoshare: I either start the 30 day trial at an arbitrary (between projects) time – this is fine, and pressure-free, but like most people if I’m not using a tool for a specific purpose, I don’t fully evaluate it under real-life conditions. The second option is to use Protoshare for a specific project. In this scenario, however, I’m putting myself way out on a limb – I haven’t had a chance to test the software before committing to it, and there’s no way I’m going to do a bit of paid (and timetabled..) work for a client without using a tool that I know and trust. Net result? I walk away from a product which could be exactly what I need – and would pay for. If Protoshare had a limiting addiction model, I’d probably have signed up by now.

There are infinite ways of cutting the Freemium business model: you can do it around paid support, functionality, disk space, speed, look and feel, etc. The extent to which various facets of the software are measured and valued is key, and – as the Protoshare example shows – really quite subtle as well. It’s not a case of “X has value Y, do Z” – it is more about considering the software against the likely market, users, use scenarios and so on. Ultimately, though, all these approaches are about giving a taste of something which tempts to an extent that you want more, but doesn’t satisfy. If you fail to tempt enough (Protoshare) or satisfy too much (Spotify), it’s very likely that you’ll either miss markets, or revenue, or both.