louisrosenfeld.com logotype

Home > Bloug Archive

Aug 12, 2001: Response to my most recent Lame-Brained Theory

In my August 1 Bloug entry, I baited y'all with the following statement:

"Search algorithms haven't really changed in decades, and we probably won't see any radically different search algorithms in our lifetimes. Aside from hardware-based improvements, such as processing speed, the automated aspects of the search process simply won't get much better."

I was hoping to get a response from Someone Who Knows More Than I Do™. And it was my lucky day; Microsoft's Dave Billick, an information retrieval expert whom I knew back in his Ann Arbor days, was kind enough to respond:

I'd say search algorithms have changed in a few basic ways, at least in the Web context. The use of linked to/from data and the like are bits of information we never had available before. This is potentially valuable information (but I'm not convinced yet). But it is new. And there's ton's more data for TF/IDF logic to work on. And there's lots more ability to spam results. So algorithms have changed--I'm not saying they are necessarily any better--but maybe mostly to deal with issues not directly related to results quality for end-users.

I'm not sure I'd agree with Dave; for example, I don't see the linked to/from stuff as being much different than what Eugene Garfield did with the Science Citation Index back in the '60s. I do agree that the scale of everything--including data--has exploded. And while we do have more data to learn from, we also have more stuff for those algorithms to chew through. The more we have, the worse retrieval algorithms will perform. So even small incremental algorithmic improvements can't, IMHO, keep pace with exponential growth in content and complexity.

Dave then took aim at my two knuckle-headed predictions; round one:

Why would you believe "users will become better searchers"? I'd say there is zero evidence that will ever happen. Are online library catalog searches better today than they were 5, 10, 20 years ago? Were they ever very good in spite of all kinds of efforts to help? No. Only old-fashioned hands-on training has shown any benefit as far as I know. Are people any better at programming their VCRs today than 10 years ago? Or at fixing their cars themselves? Or using any advanced features of any desktop applications? No. Did students get better at using paper card catalogs in libraries? Most often, the inherent technology gets better but the generic user stays at a certain expertise plateau.

Dave is right, of course. But I think I am too: we don't need to program our VCRs to survive and prosper. But we increasingly will have to become better users of information in order to survive and prosper. A century ago doctors weren't especially effective at finding information. Today they'd have to be considered pretty advanced searchers because their field demands it. I think the same thing will happen to many other fields, and the information explosion will certainly spawn some new fields that are all about finding, synthesizing and consuming quality information.

Dave's response to my second prediction:

I hope the "presentation and organization of search results will improve" but again I don't see any reason to expect any huge leap there. I theorize that what we really need are good techniques to reprocess results lists without lots of typing or having to know lots about the data itself. Just think of the various ways people process and organize stuff in their homes: shopping lists, mail, books, laundry, photos, tools in the garage, bills, etc. These same paradigms will be the basis of whatever evolves in IR. So I envision ways to more easily ask to see results by most current or oldest, or those that contain some additional criteria (such as filtering a set of Vancouver hotels by price first, then by distance from some venue, etc.), or by color or size, or whatever... without having to type every thing. I think this is much less a UI problem. It may be that the biggest leap in IR is partially an ergonomic one combining truly dependable voice recognition with smarter query parsing so I can just say "show me non-smoking hotels in Vancouver under $200 a day within 15 minutes walking distance of the ferry dock". "And I mean Vancouver Canada, not Washington." I believe the natural language folks are making good progress in the query parsing part (but it still requires lots of memory and CPU cycles); good enough voice recognition is probably a ways out.

Here's why I'm more optimistic, at least for small leaps rather than a huge one: as with retrieval, there are many existing tools and approaches that can improve presentation. But unlike retrieval, presentation is something that users can interact with directly. Put another way: we're blissfully ignorant of what we're missing when a search algorithm misses out on some useful content. But we're more likely to notice a really dumb ranking order or a silly set of content elements that are displayed for each hit. So there will be more impetus to intelligently combine and configure results presentation tools than retrieval algorithms.

Anyway, I really appreciate Dave's thoughts; any other takers?

email this entry

Add a Comment:



URL (optional, but must include http://)

Required: Name, email, and comment.
Want to mention a linked URL? Include http:// before the address.
Want to include bold or italics? Sorry; just use *asterisks* instead.

DAYENU ); } else { // so comments are closed on this entry... print(<<< I_SAID_DAYENU
Comments are now closed for this entry.

Comment spam has forced me to close comment functionality for older entries. However, if you have something vital to add concerning this entry (or its associated comments), please email your sage insights to me (lou [at] louisrosenfeld dot com). I'll make sure your comments are added to the conversation. Sorry for the inconvenience.