Sep 20, 2002: Yet More on the 80/20 Rule and IA

I'm giving a brief panel presentation at the University of Michigan Business School this afternoon. Another riff on Pareto's Principle. Did you know that his first name was Vilfredo? If you'd like a gander at my Powerpoint slides, they're at http://www.louisrosenfeld.com/presentations/020920-umbs.ppt (868 kbytes).

Comment: vanderwal (Sep 20, 2002)

Very nicely done Lou.

Comment: Rich Wiggins (Sep 22, 2002)

Re Pareto (and Zipf and Bradford) -- Some of you may have seen a thread on Sigia-l about the Best Bets-style service that we built at Michigan State. I've always been a big believer in search log analysis but only after I built a Best Bets service did I see how steep the curve is. Out of a sample of 200,000 searches, it only takes 500 unique phrases to account for 40% of all searches performed.

As you can see, the curve is even more asymptotic than in Lou's lovely Powerpoint. What blew my mind is that the curve from my real-world data matches the Zipf curve quite precisely:


I am looking for folks who are willing to share their curves. (Hmmm, that sounded funny...) How do these curves vary for different information spaces? I can share a log analysis script written in Perl if that'd help.

Comment: Iņigo Arbildi (Sep 24, 2002)

Hi, Lou & Blougers!
Read about Pareto's principle applied to information retrieval and it rang a bell... Have a look at http://firstmonday.org/issues/issue7_7/bates/, where Marcia J. Bates makes a good point reminding us the Bradford Distribution (an application of Pareto's to the Library Sciences)in information retrieval. BTW, her article is a good summary of common dotcom mistakes in IR.
Some good explanation on the Distribution of Bradford at http://www.aslib.co.uk/jdoc/1998/jun/06.html

Comment: Ron Zeno (Sep 27, 2002)

Thank you for the information and references on Pareto's Principle. For a long time I've wondered where the "80/20 rule" came from and whether or not it was meaningful. I'd always suspected that, like George Miller's 7+-2, it was a questionable research result that was being completely misapplied to make arguments appear more valid.

Now I know I was wrong. Miller's research results actually have obscure but valid applications. Pareto's appears to have none at all - it is just a simple rhetorical device.

Comment: Lou (Sep 27, 2002)

Actually, I'd disagree slightly. Pareto's is a rhetorical device *and* has valid applications in many fields.

Comment: Lou (Sep 27, 2002)

Iņigo, thanks for the cites. We actually had some good discussion around the Bates article right here in Bloug in July. A better URL for the Bates article is http://www.firstmonday.org/issues/issue7_7/bates/ ; and the Bloug discussion is available at http://louisrosenfeld.com/home/bloug_archive/000100.html I'm looking forward to reading the Bradford article.

Comment: Iņigo Arbildi (Sep 30, 2002)

Ooops! It seems that I didn't take much care copying/pasting the url ;-) Sorry about that. Of course, Lou was already aware of both the article and the right URL. Thanks, Lou! I really enjoyed your previous discussions about the Bates' article.

