The Brooklyn Museum API – Q&A with Shelley Bernstein and Paul Beaudoin

The concept and importance of museum-based API’s are notions that I’ve written about consistently (boringly, probably) both on this blog and elsewhere on the web. Programmatic and open access to data is – IMO – absolutely key to ensuring the long-term success of online collections.

Many conversations have been going on about how to make API’s happen over the last couple of years, and I think we’re finally seeing these conversations move away from niche groups of enthusiastic developers (eg. Mashed Museum ) into a more mainstream debate which also involves budget holders and strategists. These conversations have been aided by metrics from social media sites like Twitter which indicate that API access figures sometimes outstrip “normal web” browsing by a factor of 10 or more.

On March 4th 2009, Brooklyn Museum announced the launch of their API, the latest in a series of developments around their online collection. Brooklyn occupies a space which generates a fair amount of awe in museum web circles: Shelley Bernstein and team are always several steps in front of the curve – innovating rapidly, encouraging a “just do it” attitude, and most importantly, engaging wholly with a totally committed tribe of users. Many other museum try to do social media. Brooklyn lives social media.

So, as they say – without further ado – here’s Shelley and Paul talking about what they did, how they did it, and why.

Q: First and foremost, could you please introduce yourselves – what your main roles and responsibilities are and how you fit within the museum.

Shelley Bernstein, Chief of Technology. I manage the department that runs the Museum’s helpdesk, Network Administration, Website, gallery technology, and social media.

Paul Beaudoin, Programmer. I push data around on the back-end and build website features and internal tools.

Q: Can you explain in as non-technical language as possible what exactly the Brooklyn API is, and what it lets people do?

SB: It’s basically a way outside programmers can query our Collections data and create their own applications using it.

Q: Why did you decide to build an API? What are the main things you hope to achieve …and what about those age old “social web” problems like authority, value and so-on?

SB: First, practical… in the past we’d been asked to be a part of larger projects where institutions were trying to aggregate data across many collections (like d*hub). At the time, we couldn’t justify allocating the time to provide data sets which would become stale as fast as we could turn over the data. By developing the API, we can create this one thing that will work for many people so it no longer become a project every time we are asked to take part.

Second, community… the developer community is not one we’d worked with before. We’d recently had exposure to the indicommons community at the Flickr Commons and had seen developers like David Wilkinson do some great things with our data there. It’s been a very positive experience and one we wanted to carry forward into our Collection, not just the materials we are posting to The Commons.

Third, community+practical… I think we needed to recognize that ideas about our data can come from anywhere, and encourage outside partnerships. We should recognize that programmers from outside the organization will have skills and ideas that we don’t have internally and encourage everyone to use them with our data if they want to. When they do, we want to make sure we get them the credit they deserve by pointing our visitors to their sites so they get some exposure for their efforts.

Q: How have you built it? (Both from a technical and a project perspective: what platform, backend systems, relationship to collections management / website; also how long has it taken, and how have you run the project?)

PB: The API sits on top of our existing “OpenCollection” code (no relation to namesake at http://www.collectiveaccess.org) which we developed about a year ago. OpenCollection is a set of PHP classes sitting on top of a MySQL database, which contains all of the object data that’s been approved for Web.

All that data originates in our internal collections management systems and digital asset systems. SSIS scripts run nightly to identify approved data and images and push them to our FreeBSD servers for processing. We have several internal workflow tools that also contribute assets like labels, press releases, videos, podcasts, and custom-cropped thumbnails. A series of BASH and PHP scripts merge the data from the various sources and generate new derivatives as required (ImageMagick). Once compiled new collection database dumps and images are pushed out to the Web servers overnight. Everything is scheduled to run automatically so new data and images approved on Monday will be available in the wee hours Tuesday.

The API itself took about four weeks to build and document (documentation may have consumed the better part of that). But that seems like a misleading figure because so much of the API piggy-backs on our existing codebase. OpenCollection itself – and all of the data flow scripts that support it – took many months to build.

Cool diagrams. Every desk should have some.

Q: How did you go about communicating the benefits of an API to internal stakeholders?

SB: Ha, well we used your hoard.it website as an example of what can happen if we don’t! The general discussion centered around how we can work with the community and develop a way people can can do this under our own terms, the alternative being that people are likely to do what they want anyway. We’d rather work with, than against. It also helped us immensely that an API had been released by DigitalNZ , so we had an example out there that we could follow.

Q: It’s obviously early days, but how much interest and take-up have you had? How much are you anticipating?

SB: We are not expecting a ton, but we’ve already seen a lot of creativity flowing which you can check out in our Application Gallery. We already know of a few things brewing that are really exciting. And Luke over at the Powerhouse is working on getting our data into d*hub already, so stay tuned.

Q: Can you give us some indication of the budget – at least ballpark, or as a % compared to your annual operating budget for the website?

SB: There was no budget specifically assigned to this project. We had an opening of time where we thought we could slot in the development and took it. Moving forward, we will make changes to the API and add features as time can be allocated, but it will often need to be secondary to other projects we need to accomplish.

Q: How are you dealing with rights issues?

SB: Anything that is under copyright is being delivered at a very small thumbnail size (100px wide on the longest size) for identification purposes only.

Q: What restrictions do you place on users when accessing, displaying and otherwise using your data?

SB: I’m not even going to attempt to summarize this one. Here’s the Terms of Service – everyone go get a good cup of coffee before settling down with it.

Q: You chose a particular approach (REST) to expose your collections. Could you talk a bit about the technical options you considered before coming to this solution, and why you preferred REST to these others?

PB: Actually it’s been pointed out that our API isn’t perfectly RESTful, so let me say first that, humbly, we consider our API REST-inspired at best. I’ve long been a fan of REST and tend to gravitate to it in principal. But when it comes down to it, development time and ease of use are the top concerns.

At the time the API was spec’ed we decided it was more important to build something that someone could jump right into than something meeting some aesthetic ideal. Of course those aren’t mutually exclusive goals if you have all the dev time in the world, but we don’t. So we thought about our users and looked to the APIs that seemed to be getting the most play (Flickr, DigiNZ, and many Google projects come to mind) and borrowed aspects we thought worked (api keys, mindful use of HTTP verbs, simple query parameters) and left out the things we thought were extraneous or personally inappropriate (complicated session management, multiple script gateways). The result is, I think, a lightweight API with very few rules and pretty accommodating responses. You don’t have to know what an XSD is to jump in.

Q: What advice would you give to other museums / institutions wanting to follow the API path?

SB: You mean other than “do it” <insert grin here>? No, really, if it’s right for the institution and their goals, they should consider it. Look to the DigitalNZ project and read this interview with their team (we did and it inspired us). Try and not stress over making it perfect first time out, just try and see what it yields…then adjust as you go along. Obviously, the more institutions that can open their data in this way, the richer the applications can become.

_______

Many, many thanks to Shelley and Paul for putting in the time to answer my questions. You can follow the development of the Brooklyn Museum collections and API over on their blog, or by following @brooklynmuseum on Twitter. More importantly, go build something cool 🙂

Follow me, not in a weird way but because it's what all the cool kids are doing:

Or... get small parts of my brain in your inbox

Follow me, not in a weird way but because
it's what all the cool kids are doing: