louisrosenfeld.com logotype

Home > Bloug Archive

Sep 20, 2002: Yet More on the 80/20 Rule and IA

I'm giving a brief panel presentation at the University of Michigan Business School this afternoon. Another riff on Pareto's Principle. Did you know that his first name was Vilfredo? If you'd like a gander at my Powerpoint slides, they're at http://www.louisrosenfeld.com/presentations/020920-umbs.ppt (868 kbytes).

email this entry

Comment: vanderwal (Sep 20, 2002)

Very nicely done Lou.

Comment: Rich Wiggins (Sep 22, 2002)

Re Pareto (and Zipf and Bradford) -- Some of you may have seen a thread on Sigia-l about the Best Bets-style service that we built at Michigan State. I've always been a big believer in search log analysis but only after I built a Best Bets service did I see how steep the curve is. Out of a sample of 200,000 searches, it only takes 500 unique phrases to account for 40% of all searches performed.

As you can see, the curve is even more asymptotic than in Lou's lovely Powerpoint. What blew my mind is that the curve from my real-world data matches the Zipf curve quite precisely:


I am looking for folks who are willing to share their curves. (Hmmm, that sounded funny...) How do these curves vary for different information spaces? I can share a log analysis script written in Perl if that'd help.

Comment: Iņigo Arbildi (Sep 24, 2002)

Hi, Lou & Blougers!
Read about Pareto's principle applied to information retrieval and it rang a bell... Have a look at http://firstmonday.org/issues/issue7_7/bates/, where Marcia J. Bates makes a good point reminding us the Bradford Distribution (an application of Pareto's to the Library Sciences)in information retrieval. BTW, her article is a good summary of common dotcom mistakes in IR.
Some good explanation on the Distribution of Bradford at http://www.aslib.co.uk/jdoc/1998/jun/06.html

Comment: Ron Zeno (Sep 27, 2002)

Thank you for the information and references on Pareto's Principle. For a long time I've wondered where the "80/20 rule" came from and whether or not it was meaningful. I'd always suspected that, like George Miller's 7+-2, it was a questionable research result that was being completely misapplied to make arguments appear more valid.

Now I know I was wrong. Miller's research results actually have obscure but valid applications. Pareto's appears to have none at all - it is just a simple rhetorical device.

Comment: Lou (Sep 27, 2002)

Actually, I'd disagree slightly. Pareto's is a rhetorical device *and* has valid applications in many fields.

Comment: Lou (Sep 27, 2002)

Iņigo, thanks for the cites. We actually had some good discussion around the Bates article right here in Bloug in July. A better URL for the Bates article is http://www.firstmonday.org/issues/issue7_7/bates/ ; and the Bloug discussion is available at http://louisrosenfeld.com/home/bloug_archive/000100.html I'm looking forward to reading the Bradford article.

Comment: Iņigo Arbildi (Sep 30, 2002)

Ooops! It seems that I didn't take much care copying/pasting the url ;-) Sorry about that. Of course, Lou was already aware of both the article and the right URL. Thanks, Lou! I really enjoyed your previous discussions about the Bates' article.

Add a Comment:



URL (optional, but must include http://)

Required: Name, email, and comment.
Want to mention a linked URL? Include http:// before the address.
Want to include bold or italics? Sorry; just use *asterisks* instead.

DAYENU ); } else { // so comments are closed on this entry... print(<<< I_SAID_DAYENU
Comments are now closed for this entry.

Comment spam has forced me to close comment functionality for older entries. However, if you have something vital to add concerning this entry (or its associated comments), please email your sage insights to me (lou [at] louisrosenfeld dot com). I'll make sure your comments are added to the conversation. Sorry for the inconvenience.