louisrosenfeld.com logotype

Home > Bloug Archive

Jun 12, 2002: Crazy. Brilliant. Reasonable?

Rick Starbuck, an interactive design specialist from Oakland, attended our information architecture seminars in San Francisco. Suitably inspired, he poses a very interesting search-related question:

On a search results page, adding a blurb that says something like "the following documents also contained words you DID NOT search on. Select a word from the list to narrow your results." The feature would merely find the most common words (top five, top ten?) across the entire set of results, excluding the words originally searched for, and display them in a dropdown list or similar control. Do you know anyone who is doing this? Does it sound reasonable/crazy/brilliant?

I don't know, maybe brilliant in a reasonably crazy way. It's definitely a neat idea. The problem is that in some cases, the most common words are not at all useful. For example, "education" would show up way too frequently in an education-related site. But there probably are situations where this approach could be very helpful, and in the education example, it would work if you were willing to do the manual work of identifying and adding "too common" terms (like "education") to your stopwords list.

Has anyone encountered this before, or have any reactions?

(I'll now brace myself for the inevitable responses from LIS comrades, who'll say something along the lines of "Dialog launched this feature back in 1981. Lou, were you sleeping on that day of Online Searching 501 as well?" And my answer will be, unfortunately, yes.)

email this entry

Comment: Prentiss Riddle (Jun 12, 2002)

At this point I think we reach the Garden of Forking Paths: there are a number of follow-up questions one might reasonably add to search results, and it becomes a matter of usability research to decide which of them is actually beneficial to a given audience within a given domain. (All of these features, by the way, might be desireable for power users, but power users don't necessarily need this kind of after-the-fact "Now would you like to do X?" help -- they need a rich set of features at their fingertips from the outset.)

To wit:

-- Narrow your search by adding frequent terms from the result (AND)

-- Narrow your search by excluding frequent terms from the result (NOT)

-- Broaden your search with synonyms from a theaurus (OR)

-- Broaden your search by stemming your search terms (wildcards and OR)

-- Broaden your search by cross-referencing results to a subject classification

-- Narrow your search by cross-referencing results to subcategories in a subject classification

-- Broaden (or is it narrow?) your search with a "people who liked this also liked that" mapping

-- Broaden (or is it narrow?) your search by cross-reference to other metadata (author/creator, etc.)

-- etc.

Rick's suggestion sounds like a good one, but only one of many possibilities. Some of these may be obviously better or worse in certain domains (e.g., in a book or music search you want to offer a branch to author or artist). Is any of them clearly superior in a "generic" web search? Dunno.

Comment: Rich Wiggins (Jun 12, 2002)

It's clever and it's out of the box thinking but in practice it's a bad idea. Common words in the set of documents on the hit list just don't equate to a good handle for anything.

Far more useful to offer:

-- People who liked the documents in this hit list also liked [these documents] (ie a la Amazon with similar books)

-- The keywords you typed are often used with these [other keywords]

Either of those would require a lot of trapping of user behavior but would be useful.

Notice how well Google is doing spelling correction these days? Hint: it's not an English dictionary.

Comment: ssn (Jun 13, 2002)

I think that what you are describing is a "query expansion technique", known as "relevance feedback".

I found this power point presentation that addresses the subject:

Comment: Andrew (Jun 13, 2002)

You could also offer: "these are the five (or ten) most common searches on the site." Or, don't even call them "searches". Create dynamic links based on that day's (or week's or whatever) popular topics. "Make search results" pages for these that aren't just lists of links, but rather lists of links with short text summaries for each. Now your searchers are creating pages that can be helpful.

Comment: Andew Gilmartin (Jun 13, 2002)

How does the search results relate to what you did not find? Simple structure information like "you found 5 of the 100 level 1 documents and 10 of the 2,500 level 2 documents" or content information like "you found 5 documents that contain critial terms found in only 15% of the overall collection."

Comment: Bo Madsen (Jun 13, 2002)

I should be possible to implement the idea by utilizing the Google API(http://www.google.com/apis). Developers: show us what you can do!

Note: Google has released an API that lets developers interact with the google DB's and applications.

Comment: Matt Clarke (Jun 24, 2002)

If the result set of an initial search is large, what terms would you like to appear on the screen to help narrow the search? It is not the words that are most common throughout the result set, but the terms that maximally differentiate between clusters within the result set.

e.g. I search for "architecture" and get a huge result set. The search engine does a cluster analysis and finds that a sizable subset contain the word "information" and a largely disjoint subset contain the word "building". So those two words are offered to the user as a way of restricting the results to the type of architecture they are looking for.

Comment: Ron Lusk (Jul 1, 2002)

AltaVista in its early days had some sort of search enhancement applet which could graphically show you the clusters of words on pages you had found. For a off-the-top-of-my-head example, a search on "beagle" might give you a cluster of pages with recurring words like "labrador", "dog", "feeding", "howling"; and another (much smaller) cluster with "legal", "shyster", "attorney", and so forth, reflecting the word's use in different domains.

Comment: ssn (Jul 9, 2002)

Teoma tries to explore semantic networks of hubs and authorities.

Comment: Matt Clarke (Jul 30, 2002)

For anyone still following this discussion, have a look at http://vivisimo.com. There "clustering engine" does just about what I suggested above.

Comment: mss@interchange.ubc.ca (Aug 4, 2002)


Search engine capabilities

Add a Comment:



URL (optional, but must include http://)

Required: Name, email, and comment.
Want to mention a linked URL? Include http:// before the address.
Want to include bold or italics? Sorry; just use *asterisks* instead.

DAYENU ); } else { // so comments are closed on this entry... print(<<< I_SAID_DAYENU
Comments are now closed for this entry.

Comment spam has forced me to close comment functionality for older entries. However, if you have something vital to add concerning this entry (or its associated comments), please email your sage insights to me (lou [at] louisrosenfeld dot com). I'll make sure your comments are added to the conversation. Sorry for the inconvenience.