louisrosenfeld.com logotype

Home > Bloug Archive

Dec 11, 2001: Dreaming of Links

This actually came to me in a dream. I was sitting in my childhood home in Katonah, New York, talking with none other than Peter Merholz and Jesse James Garrett (don't worry guys, you are definitely not regulars in my dreams). I don't remember why we were there or what we were discussing, but this came to me:

The humble hyperlink is really quite a useful string of text. It's not unlike a user's search query. In fact, it often stands in for a user's query. And, if I can use the term loosely, it's a form of author-supplied indexing. At the same time! And authors are much happier to create links within their content than to index them the old fashioned way. Meaning there are lots of rich links out there that we might take advantage of.

So, a hyperlink is both:

  • a form of indexing, as determined by content authors; and
  • a query, when selected by a user.

But it gets better: a link can be a key to not one but two contexts:

  • the content it represents (i.e., where it goes to); and
  • the content where it occurs (i.e., its starting point).

Can we derive benefit from these characteristics? We already know that the hyperlink creates a meaningful connection between two documents (the starting point and the destination). So could we do something like this:

  1. User clicks on link
  2. Link retrieves destination document.
  3. User clicks on handy new "MORE" button in browser.
  4. Link is then executed as a search query against a collection of documents that are similar to the original "destination" document. Or similar to the "starting point" document. Or both.

So, if we assume that a link is a query that moves us from Document 'a' to Document 'b', perhaps we can extrapolate, using the same link to create Document Collection 'A' and Document Collection 'B'? Can we use those new collections to create context and reduce ambiguity as we continue our search for that nice string of text embedded in the original link?

Obviously this approach won't work well when clicking on so generic a link as "home page". But it might when clicking on a more precise link like "bungee jumping". Or a generic link, like "job postings," where it would be nice to narrow the possible set of search results by providing more context.

Anyway, it's probably either 1) been tried before and proven to be a really dumb idea, or 2) so dumb an idea that it never was tried. But heck, it came to me in a dream, and if it's really so dumb, Peter and Jesse deserve at least part of the blame.

email this entry

Comment: Prentiss Riddle (Dec 18, 2001)

Congratulations on a productive dream! (Seen "Waking Life" yet?) Your idea does sound familiar, in a couple of ways. It sounds like you've merged the Google idea (i.e., the existence and context of a link provide useful information about the quality and relevance of the object linked to) with the Alexa/Amazon "more like this" idea.

In fact, Google does have a "similar pages" feature -- see for example:

http://www.google.com/search?hl=en&num=10&q=related:www.louisrosenfeld.com/

...but this doesn't really reflect the "directional" information in a link. Are you sure you want to use the link from a to b to create two separate document collections A and B? Maybe what you really want is to create a single collection AxB, which might be translated as "the set of documents resembling document b which are linked to by documents resembling document a". I have no idea whether this might or might not be useful, but it's certainly intriguing.

I wonder how one tests interesting ideas like this. Certainly one could start from scratch and build one's own spider and search engine, but there's a pretty high entry cost to collecting a significant-sized corpus. I imagine that Google and Alta Vista and Inktomi, etc., must have a skunkworks for testing new search methods, but I doubt that they provide access to outsiders. Is there a similar testbed in an open academic environment? The closest thing I can think of is that the Internet Archive makes their 100-terabyte collection available to researchers, but last I knew it wasn't optimized for real-time access methods other than their own Wayback Machine. Sounds like a general experimental web IR platform would be a useful thing for some university to start up somewhere, with a good chance that it would pay for itself in technology transfer down the line.

Add a Comment:

Name

Email

URL (optional, but must include http://)

Required: Name, email, and comment.
Want to mention a linked URL? Include http:// before the address.
Want to include bold or italics? Sorry; just use *asterisks* instead.

DAYENU ); } else { // so comments are closed on this entry... print(<<< I_SAID_DAYENU
Comments are now closed for this entry.

Comment spam has forced me to close comment functionality for older entries. However, if you have something vital to add concerning this entry (or its associated comments), please email your sage insights to me (lou [at] louisrosenfeld dot com). I'll make sure your comments are added to the conversation. Sorry for the inconvenience.

I_SAID_DAYENU ); } ?>