louisrosenfeld.com logotype

Home > Bloug Archive

Nov 11, 2003: Search Log Analysis Tools

On November 19 I'll be giving a short talk on search log analysis at the Southeastern Michigan UPA meeting here in Ann Arbor. (It'll be followed by an AIfIA F2F meeting; come to both!)

I'm usually pretty surprised at how few UX people are even aware of search log analysis, much less understanding its value as a user research technique, so I'll introduce it and will run attendees through an exercise. If you want to come, it's free for members, $8 for non-members; 6:30pm at Soar Technology (3600 Green Court, Suite 600); RSVP to uid@compuware.com.

I'm pretty ignorant about what are considered the best tools for generating search log reports. I've asked the opinion of one of the world's leading experts, Avi Rappoport of SearchTools.com, but she's pretty frustrated by what's out there, which makes me pessimistic.

But before we send up the white flag, it'd be nice to ask around a bit more. I'd love for people to comment on what reporting tools (remember, for search log analysis) they love, like, or tolerate; perhaps we can collect some communal knowledge. I'll also be poking around, and will share what I find here.

email this entry

Comment: Michael (Nov 11, 2003)

My organization mainly uses Perl to create detailed reports. As a digital library using the charge-back model, we need to heavily track on various dimensions. Common usage tracking is per person, department, business unit, subscription (e.g. publication).

For intranet sites, implementing an authentication process hooked into HR databases should be a no brainer, especially if you have to charge for information services. Then the brunt of the work is getting a programmer who knows how to parse huge amounts of text and generate reports periodically on these dimensions so you can show your management the revenue per product, usage over time, etc. This seems a necessary measure to determine, for example, where dollars can be saved or where products need to be promoted.

For making sense of how your users understanding the organization of your site, visualization of your log data also makes sense. For this kind of work we pass our huge monthly Apache logs through GraphViz to get a sense of how people are moving around from node to node on our site.

Perhaps the problem with finding good search log analysis tools out there is that they only offer a limited set of descriptions of user interactions with not that much analysis. You only get so much information from a web server log. I think part of the problem in getting good data out for reports is that you need to be tracking a lot more than just hits to pages. You can only describe what you are actively watching. It sort of reminds me of that saying "garbage in, garbage out" in some ways.

Comment: Gene (Nov 11, 2003)

I typically work with Verity Ultraseek, which will give you a query term frequency report. I import that into Access so I can query and filter the data, and then export it Excel if I need a chart.

Most often, I'll give my client a list of the top 100 or 300 unique query terms. I usually check for synonyms or search concepts manually, which slows things down quite a bit. This step could be automated if my clients used a controlled vocabulary. However, they're often surprised to find that people search for, say, "H2S" and "hydrogren sulfide" instead of "hydrogen sulphide." Go figure ;)

You've probably seen this article Lou, but I thought I'd include a link anyway:

Better Search Engine Design: Beyond Algorithms

Comment: Nick (Nov 11, 2003)

I have used a verity of tools from WebTrends and BridgeTrack to obscure shareware log analysis tools. My experience has been that I "tolerate" these tools. Nothing I have seen has produced any kind of report exactly as I would expect it to be organized and denoted. There has been a lot of variations in results from these tools vs. the log files as well. What qualifies as a "visit"... often a single session will show up as multiple visits, etc. Hit's vs. page views, etc. The metrics are often improperly implemented in most tools, even the high end ones. So my hack solution is to use a combination of 3 offline tools and one server tool and one router tool. It seems to get the job done but often involves a larger time commitment to analyze the 5 sources of data to come up with a solid set of results.

Comment: Ben (Nov 12, 2003)

We use a home-grown tool - every hit to the search page saves the searching user, term, # of hits returned, and each result clicked to a database. We can then run raw data, frequency, and session reports. From any of those, we can drill-down to report on each individual term ("foo" was searched for 37 times, each time with 12 results returned; "foo.doc" was clicked on 35 times as a result - we use that last bit to help set Best Bets for search terms).

Comment: Walter Underwood (Nov 13, 2003)

I typically create Verity Ultraseek as well as use it, and I wrote the query frequency report that Gene mentions. A few months ago, we added new reports, some based on clickthrough measurements (which result a user clicked on). We now have: top queries by frequency, top queries with no hits, a month-to-month trend report, and some traffic reports. For the top queries, we report the avg. number of results pages viewed and the avg. number of results clicked on. These measure user navigation actions, going to another page of results or clicking on a result. Together, they give a fair picture of how many actions a user does to find an answer for each query.

Measuring "no hits" queries is important, because most users do not know how to modify their query and try again. If they get no hits, they are stuck, and they leave.

Query logs are gold, because they show the terms that your visitors use, instead of the ones that you use. Every other measurement on your site describes your links that use your words. This is where the visitors speak.

And I don't know of any good general tools for search log reports. Before we included tools in Ultraseek, we recommended using Awk or Perl on our tab-separated logfile.


Comment: Ken (Nov 13, 2003)

I do a lot of search log analysis -- we record the searches (and resulting click-throughs from the results list) in a text file. Much of what I do is in Excel. I've found that the most interesting log entries are those where there are 0 hits or "too many." "Too many," of course, is a relative thing -- for some of our searchable collections, it could be 20-30, for others, 200-300.

The excessive hits signifies either the query impossibly vague or, often, that our indexing is not as careful as it could be (particularly useful in our main resource database, where we've written brief abstracts of the ~700 most useful resources). The 0-hits results are gold. They tell us, in the aggregate, what people are looking for that they can't find. And often we have it, just not using the "right" terminology. In this case, we update indexing terms.

Also of interest are the searches that return a few results (that appear to my eye, anyway, to respond to the query) where the user doesn't click through to a result. I don't know what to make of those.

We also do query sorting -- stemming them to the first 7-10 characters using Excel (split columns), and then sorting the 'stemmed' list to find the most searched-for concepts, in a quick & dirty way.

Comment: James Robertson (Nov 13, 2003)

Hi Lou, this is something that I definitely include in all my seminars, and I rate it as one of the most useful *automatic* research processes that can be used on a site.

I typically recommend implementing two reports:

1. Most popular search terms - obviously the information that people are most interested in, perhaps more resources should be applied to these areas.

2. Failed search terms (searches that returned 0 hits) - a number of causes: there is no information available (write some?); the language & terminology used does match (important for building a thesaurus, fix using search engine synonyms); implementation problems mean that content that does exist doesn't come up (fix the source problems); users can't spell (spell-check queries, or use synonyms).

Unfortunately, I haven't seen much in the way of reports built into search engine software, which is strange and disappointing. Thankfully, most provide raw logs which can then easily be manipulated using Perl, etc. (We've done this in the past.)

Cheers, James

Comment: Lou (Nov 16, 2003)

I'm posting this on behalf of Avi Rappoport, who tells me that her ^@#!$* server blew up. This is her list of good things she'd like from a search log report:
* How many searches were there for each week/month/quarter/year?
* What are the top 1% of queries? (a little clustering by stemming is a good thing here, if your engine stems for search)
* What are the top 10% of no-matches queries?
* What are the top 10% of low-matches queries? (one to 4 hits, or more if it's a big site)
* How many empty searches?
* How did these all change over the last week/fortnight/month etc.?
* How do the changes correlate to changes in the site, search engine, company profile?
* Are there any queries in the rest of the log that are showing significant increases over these periods?
* What are the patterns in the less-frequent queries -- are people looking for names? places? web site addresses?
* Note: If you have Search Zones, you need to create reports for each zone, which will change the no-matches information significantly.
* Note 2: after one fascinating event, in which there were 16,700 queries on the same offensive term in one hour, I think some session tracking is also in order...
* What were the top pages that got hits from the search results? What were the queries that sent them there?
* What queries sent users to the best pages in our site?
* If you have access to the web site log, and your query is a GET which includes the query in the URL, *and* the web site log includes referrer information, you can extract the last two above.

Comment: Lou (Nov 16, 2003)

I think it's come up before, but this tool looks promising:


Anyone familiar with it?

Comment: Kyle (Nov 17, 2003)

I have heard great things about MondoSearch: http://www.mondosoft.com/mondosearch.asp, but it's only for the .net platform.

Comment: Martin (Nov 18, 2003)

I just have a note for Kyle's comment. It is correct that MondoSearch will work on a .Net platform - but it also works on systems not running .Net.

It can only be installed on windows systems, but will have no trouble indexing a site that is running from a unix server.

More info on http://www.mondosoft.com :-)

Comment: Kyle (Nov 18, 2003)

I'm sorry, you are right Martin. I just noticed that you only need to have the .Net framework if you want to install the InformationManager product (the product that is the link between MonsoSearch and BehaviorTracking): http://www.mondosoft.com/ms-requirements-plug-and-play.asp

Comment: Ken (Nov 18, 2003)

I've been using Analog both professionally and personally since the mid 1990s. It's a fantastic tool for showing aggregate use of web sites or pages. Assuming your log files include referring URLs, you can generate nice reports of the searches that led your users to your site.

Analog is less useful for tracking individual user sessions -- what a specific user did, click by click, through your site. Webtrends handles that well, but is not free. Analog's a great tool for finding out how many users hit how many pages, which links are broken, who's referring to your site (at least, who's generating traffic), what people who find your site are looking for, and what they're downloading when they're there.

Comment: Lou (Nov 18, 2003)

Ken, does Webtrends allow you to separate out search-related data? And to do so in the context of individual sessions?

Comment: Ken (Nov 19, 2003)

I'm not sure about the search-related data in Webtrends; I only used it a few times, and not recently, but never upgraded it when I got a new version of Windows.

Webtrends is generally quite powerful at reporting "user sessions" -- through the use of cookies if your server is so configured, or by tracking IP addresses if not -- so I would think search queries and results would be one of its report options.

Comment: Wil Reynolds (Nov 20, 2003)

The good thing about using a log analysis tool to analyze your search is because you can track user behavior after the search.

Just analyzing the keywords people type in tactical, analyzing the behavior of users AFTER the search is more strategic.

Are users that use your search more / less likely to take your desired result than the average user?

Are users that use search showing certain behaviors unlike those that do not use search?

Some of this may be TOO high-level, but I figured I would chime in with my little 2 cents!

Good luck everyone!

Comment: Azy Mazlita (Dec 1, 2003)

This reply might be a bit late for Lou but just to share my thoughts with everyone. I do agree with Avi Rappoport that search log analysis tools out there are disappointing. For a start, there is no specifically-designed search log analysis tools that I am aware of. The search-analysis functionality is normally just an add-ons to the standard log analysis tools such as WebTrends, Analog or Superstats and only points out the "normal" observations such as which/what keywords bring users to your site via search engines. Research done by the Excite Group (Amanda Spink, Jansen) looks at millions of query terms submitted by users to Excite. They used specially designed software to extract the average number of terms submitted, add/minus terms during reformulations, spelling erros and boolean operators. You may want to refer to their publications for more details.

I became interested with search log analysis as I was frustrated of what's out there and believed there are other more useful information that we can dig out from the log. I proposed a model for search log data that enables me to extract other interesting observations beyond the typical query terms analysis such as (1) how many links user clicks before reformulating (useful if search engine would like to suggest any alternative terms), (2) how many reformulations in a session (too many reformulations of the same query subject might not be a good sign), or (3) the ranking of result lists selected. From our current research, we found that users are more inclined to resubmit queries than navigate the result lists (that is why Google is doing so well, people know that they could get the right web site by keep submitting queries rather than navigate the web).

You may want to refer to http://www2003.org/cdrom/papers/poster/p061/p61-mathassan.html for further info. The full version of this paper has been submitted to publication and still under review, so can't say much more, sorry.

Hope that explains something.

Comment: Martin (Jun 15, 2004)

I previously replied to this thread. Back then merely to correct an input regarding MondoSearch and its .NET integration (dated Nov 18. 2003).

However the thread is more regarding log analysis and search analysis in general. Please dont see this as a feeble attempt on promoting Mondosoft's products ahead of other tools. When push comes to shove its the individual site and a tweaked setup that will determine if one products is better than the other.
Instead I would just like to add that a log analysis tool alone is certainly not going to give you all information about what happens when a user enters your website...

Three main areas are important when investigating a users actions:

1) Referral status:
Did the user come from a global engine and what did he/she search for there? Collecting this information is important not only for SEO purposes, but also in terms of making sure if you target the desired audience. (colleting referring sites, searchengines or not, in general can be crucial for a commercial website).

2) User actions:
What pages are the user visiting on the site, and are they submitting contact forms etc. ?

3) Local searches:
Are the users doing local searches?, do they find what they look for? - and do you have a good tool to provide you with this data.
(if your commercial site dont have local search - you are missing out: http://www.useit.com/alertbox/20010513.html)

In other words, webserver log analysis alone will not do. In a perfect world you would only need one tool for all 3 areas - unfortunately the available tools are not quite there yet.

Speaking for myself it is no secret that we use an analysis tool called HBX on http://www.mondosoft.com. This tool relies on some javascript variables to be sent to a server in order to provide useful data. It collects keywords from the different global engines and other useful referral info. You can even set up campaigns in order to track if your various ads around the web are doing some ROI. All in all useful information. BUT, without information from the local search engine, we would not now if a visiting prospect was just passing by or actually interested in our products. This could turn cold calling into proactive sales..!

The search analysis tool mentioned is called BehaviorTracking. Earlier it was only available for people who already had a MondoSearch search engine - but, a so called BT Connector has been made available which will work for other search engines as well. Giving you statistics that will tell you if visitors find what they are looking for or not...
More info on http://www.behaviortracking.com

Sorry for the long read. Hope the information was useful and not too self promotional ;-)
Just remember, log analysis alone is keeping you from knowing everything that is happening on your site...

Best regards

Comment: Paul (Jul 16, 2005)

I am using awp-hosting and they have free search analysis tools very easy to use. Wel, guess you do have to find your way around a bit. Can anyone tell me do all reports record visits the same way. I think not since I noticed different reports show different visits . The numbers are close though. If anyone wants to check out my art site, it is http://www.leasurefineart.com
Youll find realist oil paintings of the morro bay area.

Add a Comment:



URL (optional, but must include http://)

Required: Name, email, and comment.
Want to mention a linked URL? Include http:// before the address.
Want to include bold or italics? Sorry; just use *asterisks* instead.

DAYENU ); } else { // so comments are closed on this entry... print(<<< I_SAID_DAYENU
Comments are now closed for this entry.

Comment spam has forced me to close comment functionality for older entries. However, if you have something vital to add concerning this entry (or its associated comments), please email your sage insights to me (lou [at] louisrosenfeld dot com). I'll make sure your comments are added to the conversation. Sorry for the inconvenience.