Multilayer Filtering or The Dangerous Economics of Spam Control 2008 MIT Spam Conference

Multilayer FilteringorThe Dangerous Economics of Spam Control2008 MIT Spam Conference By Alena Kimakova and Reza Rajabiun York University and COMDOM Software Toronto, Canada

I.1 Spam as an empirical problem • Two historical observations (2002-2008) • A) Spam ratio in 2002 = 20% -30% of all email messages Spam ratio in 2008 = 70% - 90% of all email messages Increased sophistication of spam (pdf, image, search engine, etc) • B) Increased sophistication and accuracy of statistical content filters: 98% accuracy, 0.1% false positives (Cormack and Lynam, 2007) • Empirical puzzle: Why more spam after the adoption of technical and regulatory countermeasures?

I.2 Methodology: Positive analysis • How can we explain the growth and sophistication of spam? • Hypothesis: A technological trade-off between speed and accuracy facing network owners and operators. • Approach combines: A) Game theoretical models: Large volumes of spam because of asymmetries in the distribution of filter quality across the Internet. B) Evolution of the technological possibilities frontier facing ISPs and other operators from the early 2000s. • Problem with existing studies in economics and computer science: Do not account for incentives of spammers and ISPs. • General point: Importance of interdisciplinary cooperation between economists and computer scientists in designing spam filtering bundles and regulatory countermeasures.

II.1 Technological Choice • Advances in content filtering accuracy  Constrained sensory threat • However: High noise/signal ratio  Network costs of spam rise Of particular concern in developing countries with relatively lower: a) Bandwidth b) Processing capacity c) Administrative capacity Spam and the Digital Divide (Rajabiun, 2007) • The literature in computer science and economics almost exclusively focuses on false negative/positive problem

II.2 End user and network costs • More realistic assumption: End user (E) and network costs of spam (N) are likely to be closely linked. • General problem facing an ISP (Server level problem) Costs of Spam = C ( E (E1, E2 ), N ( E1, E2, S ) ) E1 – Expected false negative rate E2 – Expected false positive rate S – Number of servers • Theory: Little known about relationships, but not static. • Practice: Can be estimated for individual ISPs based on: a) Accounting information b) Features of antispam systems available at a point in time

II.3 Antispam technology • Basic filtering methods available since the late 1990s • Server level: Adoption of (fuzzy) fingerprinting (2001-2005) and reputation based systems (2004-2006) upstream (fast, but not accurate) • End user level: Statistical (Bayesian) content filters (accurate, but not fast) • Other technical + public policy measures: Aiming to increase the costs of sending spam (Hashcash, civil/criminal law, do not call registries) • Optimal choice of filter depends on identity of end user/ISP Upstream ISPs more sensitive to speed, downstream to accuracy • Divergence between (socially) optimal and actual technological choice

III.1 The long tail • Distribution of taste for spam for each sub-network: not normal • Khong (2004): Mechanisms that connect spammers and those with a taste for spam  first best solution (open channel argument) Blocking and filters second best • Empirically: More spam after wide-spread adoption of open channels rather than less. • Loder et al. (2004): Attention Bond Mechanism (ABM) first best because it allows for price negotiations between senders and receivers. • Basic economic assumption: The subjective theory of value Ex: Search for affordable drugs for the uninsured in the U.S. • The long tail in natural sciences: Phase transition/multiple equilibria • Game theory: Strategic complementarities

III.2 Sender side countermeasures • In Microeconomic theory: Long tailed distributions associated with markets where markups are invariant to the number of sellers (e.g.: mutual funds) • Margin for spammers, or expected response rates, are invariant to the number of spammers at play • Implications: Legal sanctions and IP reputation systems increase costs of spamming, drive some spammers out of the market, but do not thin out the market. • Intuition: As in wars against prostitution and drugs, “hang them all” strategies ineffective + increase social costs (Becker-Friedman).

IV.1 Strategic conflicts • Trivial model of spam: Tragedy of the commons  Generic solution is to increase costs on spammers, but results in escalating spam wars. • Empirically: Increasing sender costs since early 2000s, but more spam. • Escalation  Development and adoption of new spamming techniques • Androutsopoulos et al. (2005)  2 player game between senders and receivers has a single Nash equilibrium  Settles in infinitely repeated games, unless changes to underlying technologies or taste for spam

IV.2 Spam growth and filter quality • Reshef and Solan (2006): Blame filters for growth of spam due to differences in filter quality. When costs of sending messages not too high  Effect of improved filter quality on total volume of spam ambiguous • Eaton et al. (2008): Complementarities between filters and sender side countermeasures. Improving filtering alone results in more spam. Given ineffective sender side countermeasures, they suggest receiver side payments (as in SMS systems). • Kearns (2005): Spam as a source of both costs and revenues for ISPs  economic incentive to adopt inefficient filters

V.1 Speed versus accuracy • Existing literature: Even if they could read end user preferences accurately, upstream backbone providers do not have sufficient financial incentives to adopt the right technological countermeasures. • Argument here: Not necessarily because of financial factors alone. • Hypothesis: ISPs faced technological trade-offs in terms of speed v. accuracy • Coordination failure not between senders and receivers as in the tragedy of commons or Khong (2004), but between upstream and downstream entities/servers. • Downstream better off with less incoming spam, but cannot force upstream to do the optimal filtering for them.

V.2 Bundles and layers • Bundles of countermeasures facing spammers: a) Ad hoc feature selection rules (late 1990s): centralized b) Fingerprinting/checksum filters (2000-2005): centralized c) IP reputation/authentication mechanism (2004-2006): centralized d) Statistical content filters (since late 1990s): distributed • Asymmetric filter quality (2000-2006): (b and c) fast relative to 1st generation statistical content filters (5x), but less accurate (-5% and -30% respectively). • Response by income smoothing spammers: higher noise/signal ratio, more variants, one shot BGP spectrum agility

VI.1 The response • A) Coordination by operators to strengthen authentication protocols (SPF, DKIM) Problem: A wide range of techniques available to bypass, and even use the protocols as an instrument of sending more spam! • B) Closing the gap between fast and accurate filters: Further optimization of the methods for distributed content scanning, learning, and classification 1st versus 2nd generation Bayesian filters (CRM114, Bogofilter, COMDOM)

VI.2 Evolution of Bayesian content filters

VI.3 Findings • Technological trade-off between speed and accuracy now closed with distributed 2nd generation Bayesian filters (at least 30x differential in throughput relative to 1st generation): Note: Fingerprinting was 5x faster than 1st generation Bayesian filters in terms of throughput • Fixed versus variable costs of message processing  Substantial reductions in variable costs of scanning, minor improvements in fixed costs of classification

VII. Summary • More spam is an instrument for: a) Evading filters b) Searching for people with a taste for spam • Normative question for policy makers: Should spamming be illegal? • Legal sanctions may induce moral hazard problem and potentially exacerbate the problem at the aggregate level by adopting more costly strategies/technologies (especially important for developing countries). • For designers of antispam systems/bundles: Should we retain layers that aim to increase the costs of spamming through ad hoc centralized control (e.g. IP reputation, fingerprinting)?

Multilayer Filtering or The Dangerous Economics of Spam Control 2008 MIT Spam Conference

Multilayer Filtering or The Dangerous Economics of Spam Control 2008 MIT Spam Conference

Presentation Transcript

Spam

Spam Filtering Service Providers

Filtering Spam With

Spam, Spam, Spam, Spam….

Spam

SPAM

SPAM FILTERING

SPAM

Spam Filtering State of the Art

Latest Spam Filtering Techniques

Spam Filtering Using Bayesian Approach

SPAM

Email Spam Filtering Service

Cloud-Based Spam Filtering

A Survey SMS Spam Filtering

Spam

Spam, Spam, Spam, Spit and Spim

Reflections on Bayesian Spam Filtering

Filtering Spam With