Click trajectories end to end analysis of the spam value chain
1 / 26

Click Trajectories: End-to-End Analysis of the Spam Value Chain - PowerPoint PPT Presentation

  • Uploaded on

Click Trajectories: End-to-End Analysis of the Spam Value Chain.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Click Trajectories: End-to-End Analysis of the Spam Value Chain' - ornice

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Click trajectories end to end analysis of the spam value chain

Click Trajectories: End-to-End Analysis of the Spam Value Chain

Author : Kirill Levchenko, Andreas Pitsillidis, NehaChachra, Brandon Enright, M’arkF’elegyh’azi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, Stefan Savage

Source : IEEE Symposium on Security and Privacy , 2011

Reporter : MinHao Wu

Outline Chain

  • Introduction

  • Related work

  • Data collection methodology

  • Analysis

  • Conclusion

Introduction Chain

  • Spam-based advertising is a business

  • While it has engendered both widespread antipathy and a multi-billion dollar anti-spam industry, it continues to exist because it fuels a profitable enterprise

  • quantifies the full set of resources employed to monetize spam email— including naming, hosting, payment and fulfillment

  • Collect spam-advertised URLs Chain

    • data sources of varying types, some of which are provided by third parties, while others we collect ourselves.

    • we focus on the URLs embedded within such email, since these are the vectors used to drive recipient traffic to particular Web sites.

    • the “bot” feeds tend to be focused spam sources, while the other feeds are spam sinks comprised of a blend of spam from a variety of sources.

  • Crawler Chaindata

    • DNS Crawler

      • From each URL, we extract both the fully qualified domain name and the registered domain suffix.

      • for example, if we see a domain we will extract both as well as

      • We ignore URLs with IPv4 addresses (just 0.36% of URLs) or invalidly formatted domain names, as well as duplicate domains already queried within the last day

  • Web ChainCrawler

    • The Web crawler replicates the experience

    • It captures any application-level redirects (HTML, JavaScript, Flash)

    • For this study we crawled nearly 15 million URLs, of which we successfully visited and downloaded correct Web content for over 6 million

  • Content Clustering and ChainTagging

    • we exclusively focus on businesses selling three categories of spam-advertised products: pharmaceuticals, replicas, and software

    • because they are reportedly among the most popular goods advertised in spam

  • Content clustering Chain

    • process uses a clustering tool to group together Web pages that have very similar content.

    • The tool uses the HTML text of the crawled Web pages as the basis for clustering

    • If the page fingerprint exceeds a similarity threshold with a cluster fingerprint

    • Otherwise, it instantiates a new cluster with the page as its representative.

  • Category Chaintagging

    • The clusters group together URLs and domains that map to the same page content.

    • We identify interesting clusters using generic keywords found in the page content, and we label those clusters with category tags—“pharma”, “replica”, “software”—that correspond to the goods they are selling.

  • Program Chaintagging

    • we focus entirely on clusters tagged with one of our three categories, and identify sets of distinct clusters that belong to the same affiliate program.

    • examining the raw HTML for common implementation artifacts, and making product purchases

    • we assigned program tags to 30 pharmaceutical, 5 software, and 10 replica programs that dominated the URLs in our feeds.

  • Purchasing Chain

    • we also purchased goods being offered for sale.

    • We attempted 120 purchases, of which 76 authorized and 56 settled.

    • Of those that settled, all but seven products were delivered.

    • We confirmed via tracking information that two undelivered packages were sent several weeks after our mailbox lease had ended, two additional transactions received no follow-up email

  • Operational Chainprotocol

    • We placed our purchases via VPN connections to IP addresses located in the geographic vicinity to the mailing addresses used.

    • This constraint is necessary to avoid failing common fraud checks that evaluate consistency between IP-based geolocation, mailing address and the Address Verification Service (AVS) information provided through the payment card association.

Analysis Chain

  • Click Support

  • Realization

  • Redirection Chain

    • some Web sites will redirect the visitor from the initial domain found in a spam message to one or more additional sites, ultimately resolving the final Web page

    • 32% of crawled URLs in our data redirected at least once and of such URLs, roughly 6% did so through public URL shorteners, 9% through well-known “free hosting” services, 40% were to a URL ending in .html

  • Intervention Chainanalysis

    • for any given registered domain used in spam

    • the defender may choose to intervene by either blocking its advertising(e.g., filtering spam)

    • disrupting its click support

  • anti-spam interventions need to be evaluated in terms of two factors:

    • their overhead to implement and

    • their business impact on the spam value chain.


  • we have characterized the use of key infrastructure — registrars, hosting and payment—for a wide array of spam advertised business interests.

  • we have used this data to provide a normative analysis of spam intervention approaches .