Oct 09, 2006: The evolution of metadata at the BBC

The Bulletin of ASIS&T published a nice piece on the evolution of metadata creation and usage at the BBC by Karen Loasby. Karen skips from 2002 to 2004 to the present, and takes a peek ahead to 2008. It's a quick read, and Karen's depiction of the BBC's transformation mirrors much of what's happening (or soon will) across the industry. I'll summarize here, though with my comments added, I'm afraid my posting may be longer that Karen's original article.

Four years ago, metadata hadn't yet penetrated the BBC widely, and was chiefly seen as a means for bumping up page rankings in Google and on BBC's local search engine. The organization took a fragmented approach to developing and applying metadata, and "for the most part the keywords made the site search worse." But by 2004, things had "shifted from improving search to the possibility of powering feeds and aggregation pages".

I find this shift quite interesting: lately so many see metadata as a panacea in much the same way they've adored personalization, portals, CMS, and search engines in the past. But metadata isn't a technology. That's always made me scratch my head—how can something be sexy without being a technology?—but perhaps metadata's elevation to silver bullet status makes sense because it's been increasingly combined with technologies (such as feeds and aggregation).

Anyway, back to Karen's piece: two years ago, the BBC began to invest more heavily in ambitious controlled vocabularies to cover all programmatic content; understandably, content authors rebelled against the huge burden of formally tagging their documents. Semi-automated classification approaches held promise but were still relatively untried; in general, tagging of BBC content was quite uneven. Naturally there was a backlash, and as folksonomic approaches became more familiar, individuals at the BBC began experimenting with them as an alternative to traditional metadata.

This backlash included many who "believed we shouldn't presume to categorize content for our users". In retrospect, this belief almost seems quaint. We saw this across the industry: in their rush to damn the old ways and embrace folksonomies, usually smart people completely missed the point. These two varieties of metadata—traditional vocabularies and folksonomies—aren't and never have been mutually exclusive. Instead, they can and should function as actors in what I call a metadata ecology, where both serve important and often symbiotic roles. Additionally, the old/new distinction was a false one: input from users and authors has, to vary degrees, always been a factor in developing formal controlled vocabularies. And, ironically, the people who see classification as imposing an unwelcome and authoritarian worldview don't question in the subtleties of retrieval algorithms and the worldviews they represent.

But back to the BBC: the storm clouds that seemed to portend a clash between metadata camps simply blew past. How come? Here are a few theories, in Karen's words:

  • The rise of digital program content needing tagging. Some of the program metadata is less subjective than for news stories, and it is harder to argue that the BBC shouldn't be describing this content. The audience expects us to have brand and actor metadata.
  • A focus on audio-video content (AV) rather than Web articles. There is no text to search so metadata is a necessity, not a nice-to-have.
  • Increasing production of data-driven prototypes that can demonstrate the possibilities of metadata. One prototype, the Open Archive also made use of a rich store of ready-made metadata from the internal BBC program catalog.

Additionally, Karen notes that a true BBC metadata ecology is starting to take shape: "A compromise solution (known as the metadata threshold) allows for free-text tagging that is absorbed into formal CVs when enough content is tagged with that term. The solution aims to combine cheap and responsive tagging with unambiguous aggregation power. So far it has been very successful at slashing overheads." Ah, metadata ecological goodness. It's so gratifying to see that the smart people at the BBC have decided to ignore the hyperbole and false dichotomies of philosophy and practice, and instead focus on getting the damned job done. Kudos.

Looking ahead a couple years, Karen expresses concern that the BBC is "still not really tapping into end user language". I happen to know that the BBC is investing in search analytics which, by surfacing users' actual requests for information in their own words, should go a long way toward closing that loop.

Thanks to Karen and her colleagues for their continued level-headedness in information architecture practice—the BBC is truly a leader in the field—and for sharing what they've learned with the rest of us. I hope you'll enjoy her article as much as I did.

