louisrosenfeld.com logotype

Home > Bloug Archive

Dec 02, 2003: Skip This Rant and Read Shirky

As you can tell from my title, I'm a fan of this short but brilliant piece by Clay Shirky: "The Semantic Web, Syllogism, and Worldview". Shirky shreds many assumptions behind the Semantic Web by exposing a major crack in its ontological foundation: a reliance on syllogisms ("If A=B, and B=C, then A=C"). These little packages of logical goodness may sound nice on paper, but often require us to make huge generalizations and don't occur frequently in real life, where the Semantic Web is supposed to actually help us.

Shirky goes on to describe the Semantic Web as a kind of reverse engineering of the old artificial intelligence problem: AI doesn't work in ambiguous, multi-domain environments, like the real world for instance. So the Semantic Web's approach turns the tables by moving much of the burden from technology onto the shoulders of content authors: "Since it's hard to make machines think about the world, the new goal is to describe the world in ways that are easy for machines to think about." To that end, ontologists rely on metadata-driven solutions. And as Shirky and many others point out, this leap of faith assumes "...that many important aspects of the world can be specified in an unambiguous and universally agreed-on fashion". Any cataloger will tell you that such faith isn't well placed.

And yet to my complete shock, I increasingly hear the word metadata uttered with the same breathy excitement as such other recent panaceas as push, portals, and personalization. I'm aghast, yet in a sort of ironically pleasant way, as I've had to explain for years and years what metadata is and how it can be an important part of my clients' complete IA breakfast. But where balance of approaches, including metadata, makes sense, we instead encounter an attitude that a single silver bullet will do the trick cleanly and simply.

This all puts "LIS IAs" like me in an increasingly compromising position. With a background in librarianship, I ought to be gaga over metadata, muttering a mantra of "subject, author, title... subject, author, title...". Yet I find myself recommending that extensive investments in metadata be postponed, at least in the enterprise environment, in favor of less expensive and more feasible architectural approaches that won't go down in flames and force my clients into bankruptcy.

Why am I so uneasy with large metadata-driven approaches? One problem: in many environments, those espousing metadata as "the answer" don't recognize that there are really two types of metadata to wrangle with: structural (think attributes or fields) and semantic (descriptive values or controlled vocabularies that populate those attributes). Each of these can require an extensive investment to think through, develop, implement, and, perhaps most importantly, maintain. People's information needs are moving targets, as is an organization's content; the metadata that connect them naturally need to evolve as well.

Already have metadata? Then you might have another problem on your hands. Different business units or applications might make use of different metadata attributes. Your author is my composer is her creator; how do we resolve these differences? These differing "world views," as Shirky describes them, make it challenging to achieve structural interoperability: all of our sources of metadata agreeing on which attributes to employ, and what to name them.

Even if we can all agree to the same metadata attributes, we still have more conflicting world views to tackle. What you call "cell phone," Sales refers to as "digital assistant," Marketing brands as "communications solution," and Support labels "mobile phone". Anyone who's tried to get their company to agree on a vocabulary as seemingly simple as product names knows what I'm talking about. Achieving this sort of semantic merging is perhaps the most difficult aspect of metadata development; while there are complicated methods to merge semantic metadata, such as cross-walking terms and developing meta-thesauri, very few organizations have the money or brain power to take this on.

Shirky sums up many metadata challenges with a concise statement: "it's easy to get broad agreement in a narrow group of users, or vice-versa, but not both." Hey, if you don't make your metadata structurally interoperable, you can't have semantic merging. How can you have any pudding if you don't eat yer meat? Silly Pink Floyd references aside, I've tried to capture all of this in a nifty diagram (40 Kb PDF file).

If this all sounds like the Mother of all Migraines, you're spot on. Probably too challenging for most of us, even if our content management systems suddenly sprouted new code to adequately support metadata management. And even if we could get everyone in our organizations--or, in the case of the Semantic Web, every web publisher on the planet--to share the same "world views" as expressed by metadata.

Well thanks for getting through this rant. I promise I'll return to concise, reasonable blogging mode soon. In the meantime, just read the Shirky article.

email this entry

Comment: Adrian Howard (Dec 2, 2003)

I'd take a look at http://www.poorbuthappy.com/ease/semantic/ - a nice review of Shirky's piece along with the many, many responses.

I'm with the "mostly a straw man" camp. I've yet to meet anybody whose definition of "semantic web" matches Shirky's definition.

Comment: Lou (Dec 2, 2003)

Adrian, looks like great stuff; thanks for the pointer.

Comment: victor (Dec 3, 2003)

True that boil-the-ocean projects are bad, but semantic web projects don't have to be. Enter Paul Ford, who - besides countering Shirky's misguided arguments[1] - recently released Harpers.org, a semantic web project that cost under US $100K and was done by a handful of people[2].

It's like Flash: the technology isn't the problem, bad design is the problem. In both cases we simply need to learn more about how to do this right. This is a long term vision that won't be solved in the short term, especially by those with little patience[3]

1
http://ftrain.com/ContraShirky.html

2
http://ftrain.com/AWebSiteForHarpers.html

3
http://www.noisebetweenstations.com/personal/weblogs/tinderbox/informat/knowledg/semantic.shtml

Comment: Clay Shirky (Dec 7, 2003)

_I'm with the "mostly a straw man" camp. I've yet to meet anybody whose definition of "semantic web" matches Shirky's definition._

This is true, but not for the reasons you think.

The reason I don't lead with the canonical definition of the Semantic Web in the opening paragraph of that piece is that there _is_ no canonical definition. I've run into a number of people, notably Paul Ford and Tim Bray, who say "Of course that AI-in-the-sky stuff is nonsense", but who then fail to provide anything other than a generic motherhood-and-apple pie defintion that boils down to "metadata is good."

Adrian Howard continues the pattern of ignoring the actual quotes in the middle of the article, which are quite frankly insane, all from people purporting to explain the Semantic Web. He also offers no definition himself.

Having seen the criticism of that piece over the last month, I am convinced that the Semantic Web is so vague as to be undefinable, and that if anyone with any standing in the community put up a mission statement and said "There, this is what we are all working on", it would ignite a civil war.

As for Harpers's, Paul's project uses Semantic Web technologies, and is a very good site, _except for the semantics_. Take a look at its classification of e.g. the Democratic Party:

"This [category] is The Democratic Party, a political party and a United States bureaucracy. It is part of United States Government Bureaucracies, which is part of Government Bureaucracies, which is part of Organizations & Bureaucracies, which is part of Connections, which is part of Harpers.org."

Aren't you glad we got that all worked out?

The Democratic party is not in fact a Government bureacracy, much less part of the Government, and the ways in which the Democratic Party relates to the Government is totally different than the way it relates to Connections, a sub-section of Harpers, or to Harpers as a whole. It's just this sort of difficult in defining "facts", as Paul suggests Harper's is doing, and then automating their handling, that undermines the success of Harpers as an avatar of the Semantic Web.

Don't get me wrong, I think Paul has built a terrific site for Harper's, but the terrificness has everything to do with the flexible linking structure, and nothing to do with the semantics of the links themselves, which produce results like the one quoted above.

Comment: victor (Dec 8, 2003)

RE>Aren't you glad we got that all worked out?

Well, yes, actually. I rather like having Harper's world view neatly outlined in a reusable taxonomy, just in case I want to reuse it.

Beyond the physical sciences, all taxonomies are subjective. They simply reflect the material at hand and the point of view of the people practicing in those fields. The Semantic Web can't change that, it just makes the bits you want to use easier to use.

Comment: Denis (Dec 15, 2003)

*About the syllogism*... as I understood the articles, there is a concept mistake.

Aristotele descrined the syllogism to explain how sometimes people are thinking, but is not a rule to find answers :)

He knew that syllogims can result in absourd sentences... :)

Comment: Lou (Dec 21, 2003)

Seems like some of the debate here centers on whether we're discussing the Semantic Web or semantic webs. The latter are certainly feasible; Harper's is an example, and many of us have been working on similar concepts, though we might call them content models or information models. Clearly, AI approaches will produce greater value in narrow domains. But if we're considering the full Web as the scope of the Semantic Web, then we're truly boiling the oceans, as Victor describes it.

We can learn to develop localized semantic webs based on some of the thinking that's gone on regarding the Semantic Web (as well as from metadata development, content markup, and other related areas that have been around for years and years). But it's time to abandon the pipe dream of a singular, monolithic Semantic Web; it's impractical, and most likely not even feasible.

Comment: Lou (Dec 21, 2003)

Good stuff on Semantic Web versus semantic webs, this time from Very Smart Guy Alex Wright:

http://www.agwright.com/blog/archives/cat_semantic_web.html

Clearly, I'm a bit behind on the discussion; glad that folks like Peter van Dijck and Alex Wright are blogging it all for the rest of us!

Add a Comment:

Name

Email

URL (optional, but must include http://)

Required: Name, email, and comment.
Want to mention a linked URL? Include http:// before the address.
Want to include bold or italics? Sorry; just use *asterisks* instead.

DAYENU ); } else { // so comments are closed on this entry... print(<<< I_SAID_DAYENU
Comments are now closed for this entry.

Comment spam has forced me to close comment functionality for older entries. However, if you have something vital to add concerning this entry (or its associated comments), please email your sage insights to me (lou [at] louisrosenfeld dot com). I'll make sure your comments are added to the conversation. Sorry for the inconvenience.

I_SAID_DAYENU ); } ?>