Sep 01, 2006: Guidelines for the use of search query data

I imagine that, by now, you've heard of the recent AOL search data fiasco. AOL researchers released millions of queries from hundreds of thousands of searchers for research purposes (a really Good Thing). Unfortunately, the queries weren't properly scrubbed, jeopardizing the privacy of many of those searchers (a really, really, really Bad Thing).

Naturally, this snafu has raised red flags among many privacy advocates (not to mention people writing books on search analytics). Front page coverage in the New York Times has a way of drawing attention to an issue, and not surprisingly, important people have taken notice. US Congressman Edward Markey has renewed his call for legislation to limit and in certain cases prohibit the use of search query data.

Like any new tool, search analytics is a double-edged sword. The AOL debacle—and the subsequent invasion of thousands of individuals' privacy—represents one of the worst possible outcomes of keeping query data around. But before we throw out the baby with the bathwater, we need to consider the good that search analytics can bring.

For example, there could be a negative impact on public health if the Centers for Disease Control and the National Institutes of Health are prevented from analyzing users' search queries. Those institutions would be denied an effective tool for diagnosing problems with their content, metadata, and search systems, reducing the ability of the tens of thousands of people to find authoritative health information on these sites.

That's just the tip of the iceberg, so let's stop for a moment, take a deep breath, and consider reasonable solutions before draconian ones. Like any other tool, search analytics isn't inherently good or bad. We just haven't yet had the dialog on the appropriate use of query data. Rather than jumping straight to legislation, let's start with guidelines. Before we invoke the legal system, let's first try self-policing and careful consideration, see what we can learn, and do our best to make improvements. Then, with the benefit of experience, we'll be better prepared to determine if legislation is indeed necessary.

My co-author, Rich Wiggins, has taken a stab at what might ultimately constitute a rational policy for the use of search query data. Please have a look and consider adding your own comments.

