louisrosenfeld.com logotype

Home > Bloug Archive

Dec 01, 2001: Damning Metadata

I can't remember which IA blog pointed me to Doug Kaye's blog, but I found his frustration with metadata to be... well, frustrating.

I'll get into why in just a moment, but first, it's interesting to note a bit of an anti-metadata backlash of late. The pendulum swings away: back in the mid-90s (boy, does it feel strange to say that!), Argus would try to sell clients on the value of developing controlled vocabularies and thesauri. We often heard this response: "Nope, we have this great new search engine, and it will solve all of our users' information problems. No need to ever manually 'touch' our content." Just like that. End of discussion.

During the past year or two, a wave of painful realization swept these same folks. The search engine snake oil had dissolved, leaving a residue of poor performance and general dyspepsia. Now, finally believing that "Taxonomies are Chic", they were interested in hiring Argus to create vocabularies to describe their content. All of their content. Which, of course, was entirely unrealistic. And so we went about trying to convince these people not to classify everything, only the most important content.

Now there must be some sort of counter-counter-movement afoot: people who've experimented with classification schemes, and were disappointed to find that, yet again, there was no silver bullet to be found, just as with search engines. I don't know if Doug Kaye is one of those poor souls afflicted with silver bulletitis, but he is down on metadata for two reasons:

First, every required step acts as a deterrent to the use of the system. I've found that to be true in every software product or web-based system with which I've been involved. In some cases (such as an on-line dating service for which I was CTO) I've actually tested it. The more you ask, the less likely people are to participate.

Of course, Doug is raising an important point: metadata is about process as much as syntax and semantics. But intelligent metadata design doesn't ignore procedural issues, such as how the work is going to get done and who's going to do it. Sometimes it makes sense to have authors suggest metadata for their own content, sometimes separate subject matter experts, sometimes indexers, and sometimes you use software. In certain cases, you use some combination of the above. There are countless factors that influence these decisions, not the least of which are how dynamic and ephemeral your content is, how much of it there is, and how much you can spend on it.

More from Doug:

Second, contrived taxonomies typically associated with metadata are a disaster. I've tested this, too. No one person--or committee--can design a taxonomy for the ideas of others. Library science is inadequate for the range of knowledge and thought are encountered with weblogs.

Huh?

Weblogs are certainly diverse, and classification, as noted above, is no panacea. But library science has done a passable job at classifying something even broader than weblogs: the entirety of human knowledge that is found in the Library of Congress. Sure, you'll find many problems with LoC classification, but considering its age and non-digital inception, you could do a lot worse. Certainly author-supplied keywords can be... a lot worse.

Personally, I'm sure glad that that committee at the National Library of Medicine came up MESH headings to represent the ideas of all those medical researchers have been coming up with for years. Accessible medical research might have been what saved my dad's life last summer.

Instead of throwing out babies with bathwater, we need to create value by selecting and combining the subset of architectural approaches--search engines and classification schemes included--that are most appropriate for each unique situation.

I wish this damned pendulum would stop swinging soon.

email this entry

Comment: Paula Thornton (Dec 17, 2001)

"...Doug is raising an important point: metadata is about process as much as syntax and semantics."

Metadata...that's why I started using the term
meta'stuff' (metapeople, metaprocess...the great
collection of everything that we believed belonged in an Information Services component...today would be a corporate portal). Funny too. I tried discussing
metadata management with Verity, because they didn't allow for access to the data their automated classification tool came up with, and they thought I was nuts...obviously, it was metadata that made Joe Busche (technical director of MetaTagger at Interwoven) and I great colleagues in crime.

Content commentary: similar to your 'overlaps', I
always say that there are only two basic activities an individual can perform on a site -- 'search' and 'do'.
And while it can be argued that 'search' is a 'do utility', the distinctions help to divide two distinct categories of activities that I think are important for 'precise' IA activities -- requiring two distinct skillsets (all IA assignments should be in teams of at least 2). The 'search' resource is more aligned to LIS
experience, the 'do' resource is more aligned to a mix between a business analyst, a technical writer, and a learning theorist (process design, labeling/chunking and cognition optimization).

As for the direct thoughts you had about content and links, I still firmly believe in the concept that Apalu has pursued (but can't successfully tell the story of the possibilities). I have a market analysis piece that I prepared for their investors that I'm willing to share. The potential of the tool (not as would be concluded from the site) is detailed in the paper. It requires a lot of manual labor, but it allows us to 'hardwire' content to natural language (in addition to a taxonomy). If Interwoven were to blend this technology with the metadata managed by Metatagger (particularly the 'vanilla' version they promised to release -- not associated with TeamSite), then we'd really have something powerful.

Comment: Jim Williams (Mar 28, 2002)

hi Paula,

I am very interested in your paper on Apalu's strategy for applying NL to content. I've been looking for solutions like this in the call center space - their approach sounds interesting.

Add a Comment:

Name

Email

URL (optional, but must include http://)

Required: Name, email, and comment.
Want to mention a linked URL? Include http:// before the address.
Want to include bold or italics? Sorry; just use *asterisks* instead.

DAYENU ); } else { // so comments are closed on this entry... print(<<< I_SAID_DAYENU
Comments are now closed for this entry.

Comment spam has forced me to close comment functionality for older entries. However, if you have something vital to add concerning this entry (or its associated comments), please email your sage insights to me (lou [at] louisrosenfeld dot com). I'll make sure your comments are added to the conversation. Sorry for the inconvenience.

I_SAID_DAYENU ); } ?>