Why are power laws important?

Why are power laws important? [Slide acks to SharadGoel, Chris Anderson]

Heavy heads and long tails • Universality, and that it makes us think outside of CLT • Effect of “head” vs. “tail” • e.g. 10 elements account for 10% of mass, getting the next 10% needs 100 elements, next 10% needs 1000,… • Suggests optimizing heavily for head. However, that is not enough, e.g. search engines are really differentiated based on how they handle tail queries. • Optimizing system design: • Optimizing caching patterns for search queries • How to distribute query loads in web repositories • Tail issues in modeling: • Ad-matching/targeting: tail queries/ads will have sparser data • Mobile communication patterns • “Limitless” stores • In machine learning • Often makes model selection problematic • naïve way was to assume gaussian and deal with others as “outliers”

Two different effects in “limitless stores” • Impacts on direct revenue: • iTunes, Amazon, Netflix : essentially limitless selection • Over traditional stores e.g. Barnes&Noble, Blockbuster, Walmart • Vast majority of products are “misses” or “non-hits” • More user centric effects • How many people consume tail content?

Two business models: concentrating on the head vs. tail

Long tail Effect

Technology enabling monetization of the tail

Who is consuming the tail: possible hypothesis Most people are sheep: Majority of consumers prefer popular offerings. Most are eccentric: Most are a bit eccentric, consuming both popular and specialty products. Most would be affected by larger product selection [Broder, Goel, Gabrilovich, Pang’10]

Long Tail of Netflix 1-100: 15% consumption 3000-infinity : 15% consumption

Consumer Satisfaction Consumer “satisfied” if at least 90% of his demand can be found. 35% of Netflix users and 70% of Yahoo Music users are not satisfied by traditionally sized inventory. Increasing inventory size causes larger increase in satisfied users than in revenue i.e. larger increase in potential customer base

Measuring eccentricity of an user Null model: user chooses item w.p. proportional to popularity. Eccentricity of user: Median rank of item consumed. Finding: On average, users are less eccentric (i.e. more satisfied by head content) than null model predicts. However: Engaged users are a lot more eccentric than light ones

Models of Networks

Why? • Explanatory: understanding the main characteristics of networks in terms of simple intuition • Rich gets richer, growth + preferential attachment  power law. growth + copying  power law with bipartite cliques • Predictive: how the network is growing to grow in future • Predicting links, load, maybe use in learning? • Algorithmic: testing/analyzing performance of algorithms • Algorithms for clustering, routing etc. could often be “hard” in theory • Would like to test their performance on “perturbed” versions of real data; on “possible worlds” • compression

Some usually desirable properties • Heavy tails • Degree, eigenvalue power laws. • Eigenvalue power laws follow from degree (Mihail, Papadimitriou et al.) • Clustering coefficient • Fraction of closed triangles • Small paths between most pairs • “effective diameter” • Presence of cliques • more so in web related graphs • Assortativity? • Are thelinks between high degree nodes more than expected? • Growth behavior • Does the graph sparsify/densify as it grows? • Densification power law (Leskovec et al.) • Structure of communities • How do the communities look like? later in the course…

Zoo of models • Random static • G(n,p), G(w), Kronecker (stochastic) • Random Evolutionary • Preferential • Copying • Affiliation • Optimization based • HOT model: each node optimizes some criterion (e.g. average distance + centrality measure) when joining. Main difference with game theoretic models is that nodes take decision only once, not respond to other’s decisions • Note: none of these are trying to address the “why” (i.e. incentive) question • Realm of algorithmic game theoretic models

Why are power laws important?