1 / 23

EK Ch 17: Power laws and rich-get-richer phenomena (with an application of

EK Ch 17: Power laws and rich-get-richer phenomena (with an application of Web Spam detection Spam, Damn Spam and Statistics ). Numbers. Your grades so far in this class. The weight of an apple. The temperature in Chicago on July 4 th . The height of a Dutch man.

Download Presentation

EK Ch 17: Power laws and rich-get-richer phenomena (with an application of

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EK Ch 17: Power laws and rich-get-richer phenomena (with an application of Web Spam detectionSpam, Damn Spam and Statistics)

  2. Numbers Your grades so far in this class. The weight of an apple. The temperature in Chicago on July 4th. The height of a Dutch man. The speed of a car on I-90. Most instances are typical. Seeing a rare number is very surprising. These numbers are well-characterized by the average and the standard deviation.

  3. City populations • New York 8,310,212 • Los Angeles 3,834,340 • Chicago 2,836,658 230. Cambridge, MA 101,335 240. Gainesville, FL 95,447 250. McKinney, TX 54,369 A few cities with high population Many cities with low population

  4. City populations

  5. Power Law: Fraction f(k) of items with popularity k is proportional to k-c. f(k) k-c log [f(k)] log [k-c] log [f(k)] -c log [k]

  6. City populations

  7. Number of Web page in-links (Broder+)

  8. Other examples

  9. Length of the URL’s host

  10. Number of host name resolutions to a single IP

  11. Web page out-degrees

  12. Web page in-degrees

  13. Word count variance

  14. Content evolution

  15. Cluster size

  16. … because they care to know ;-)

  17. Why does data exhibit power laws? Imitation Power law

  18. Constructing the web • Pages are created in order, named 1, 2, …, N • When created, page j links to a page by • With probability p, picking a page i uniformly at random from 1, …, j-1 • With probability (1-p), pick page i uniformly at random and link to the page that i links too Imitation

  19. The rich get richer 2 b) With prob. (1-p), pick page i uniformly at random and link to the page that i links too 3/4 1/4

  20. The rich get richer 2 b) With prob. (1-p), pick page i uniformly at random and link to the page that i links too Equivalently, 2 b) With prob. (1-p), pick a page proportional to its in-degree and link to it

  21. Food for thought Why is Harry Potter popular? If we could re-play history, would we still read Harry Potter, or would it be some other book?

  22. Information cascades and the rich Information cascade = so some people get a little bit richer by chance and then rich-get-richer dynamics = the random rich people get a lot richer very fast

  23. Music download site – 8 worlds • “Let’s go driving,” Barzin • “Silence is sexy,” Einstürzende Neubauten • “Go it alone,” Noonday Underground • “Picadilly Lilly,” Tiger Lillies • “Let’s go driving,” Barzin • “Silence is sexy,” Einstürzende Neubauten • “Go it alone,” Noonday Underground • “Picadilly Lilly,” Tiger Lillies 18 59 3 7 47 10 2 1

More Related