Large scale monitoring of dht traffic
Download
1 / 17

Large-Scale Monitoring of DHT Traffic - PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on

Large-Scale Monitoring of DHT Traffic. Ghulam Memon – University of Oregon Reza Rejaie – University of Oregon Yang Guo – Corporate Research, Thomson Daniel Stutzbach – Stutzbach Enterprises. International Workshop on Peer-to-Peer Systems (IPTPS) 2009, Boston MA . Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Large-Scale Monitoring of DHT Traffic' - jersey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Large scale monitoring of dht traffic

Large-Scale Monitoring of DHT Traffic

Ghulam Memon – University of Oregon

Reza Rejaie – University of Oregon

Yang Guo – Corporate Research, Thomson

Daniel Stutzbach – Stutzbach Enterprises

International Workshop on Peer-to-Peer Systems (IPTPS) 2009, Boston MA.


Introduction
Introduction

  • Distributed Hash Tables (DHT) provide a scalable approach for distributed content management, e.g. file sharing

  • DHTs have been an active area of research since 2001

  • DHTs have been recently deployed in real world applications.

    • e.g. Kad, Azureus, Mojito

  • Characterizing traffic in widely deployed DHTs allows us to:

    • Identify opportunities for performance improvement.

    • Detect anomalous behavior.

  • Accurately capturing traffic in a widely deployed DHT is challenging.

IPTPS 2009 Boston, MA.


Challenges in capturing dht traffic
Challenges in Capturing DHT Traffic

  • Common approach for capturing DHT traffic is to use a instrumented peers as monitors.

  • Using a small number of monitors can not capture an accurate view of traffic

  • Using a large number of monitors is expensive and may changeand/or disrupt the DHT.

    • e.g. 8 monitors per peer [Steiner:DBISP2P 2007 ]

      Goal: Capturing a representative view of DHT traffic efficiently without changing and/or disrupting the system.

IPTPS 2009 Boston, MA.


Classifying dht traffic

Pr

Pi

Pi

Pt

Classifying DHT traffic

DHT Traffic

Two types of messages are observed by peer p:

  • Routing Traffic:Messages that are routed by but not destined to peer p.

    • Depends on DHT geometry and peer visibility.

  • Destination Traffic:Messages that are destined to peer p

    • Demonstrates DHT usage.

Pr

Pi

Pi

Pt

We focus on capturing destination traffic

IPTPS 2009 Boston, MA.


This paper presents
This paper presents

  • Montra, a new approach to efficiently & accurately capture DHT traffic without disrupting the system

    • Montra should be applicable to most DHTs

  • Validation of Montra over a deployed DHT, Kad.

  • Preliminary characterization of Kad traffic

IPTPS 2009 Boston, MA.


Key idea
Key Idea

Montra

  • Real-world DHTs add redundancy to cope with churn:

    • Each file is published at multiple peers

    • Search operation identifies multiple peers

  • If monitor peer Pm is the closest peer to the target peer Pt, Pm will observe all the destination traffic of Pt

IPTPS 2009 Boston, MA.


Key idea1

……

……

……

0x8

0xe

0xf

……

……

……

Key Idea

Montra

  • Request Orig. (Pr) searches destination for content ID 0xe.

  • Node 0xe (Pt) is closest to requested ID 0xe.

  • Monitor 0xf (Pm) captures the request.

Routing Table

0xe

0xe

0xe

0x0

0x8

0xe

0x1

0x9

0xf

ID Space

0xe

0xf

Pt

Pm

Pr

  • Placing one monitor per peer will provide an accurate view of traffic.

  • How to avoid/minimize the impact on system?

IPTPS 2009 Boston, MA.


Minimally visible monitors mvms

Request

Minimally Visible Monitors (MVMs)

Montra

  • To minimize the disruption on the system, we use Minimally Visible Monitors (MVMs).

    • MVMs are only visible to (i.e. exchange messages with) their target peer.

  • Deploying a large number of MVMs causes minmum/no disruption in the system.

    • Each MVM slightly changes the routing table of the target peer.

Request

Request

Request

Pt

Pr

Pr

Pr

Pm

ID Space

Response

Request

Request

IPTPS 2009 Boston, MA.


Identifying destination peers

0xa9

0xad

0xaf

0xa

Pm

Pm

Pm

0xa8

0xae

0xac

Identifying Destination Peers

Montra - MVMs

  • In the presence of churn and packet loss, a single peer (or MVM) can not reliably identify its destination traffic.

    • Closer peers may exist.

    • Requires a regional view of traffic

  • We monitor all peers in a continuous zone of ID space. e.g. 4 bit zone 0xa

    • Periodically crawl to detect all the peers in the zone.

  • All the captured requests within a zone have a destination in that zone.

  • Destination peers are identified during post-processing.

    • For a given captured request find the closest monitored peer.

IPTPS 2009 Boston, MA.


Validation
Validation

  • We quantify the accuracy of Montra from 2 different angles, using the Kad network:

    • Content Accuracy: What fraction of destination traffic per zone is captured?

    • Peer Accuracy: How accurately Montra determines destination peers?

  • Validation Methodology:

    • Instrumented Source

    • Instrumented Destination

IPTPS 2009 Boston, MA.


Instrumented source validation
Instrumented Source Validation

Validation

  • Use instrumented Kad client to send requests for random IDs in a zone (Instrumented Source).

    • Log all requests and their destinations.

  • Monitor the same zone using Montra.

  • Compare source and monitor logs to determine content and peer accuracy.

  • Uses synthetic workload but the requests are distributed over the entire zone.

IPTPS 2009 Boston, MA.


Instrumented destination validation
Instrumented Destination Validation

Validation

  • Use instrumented Kad client to passively observe and log requests (Instrumented Destination).

  • Monitor the same zone simultaneously.

  • Compare destination and monitor logs.

    • Using some heuristics

  • Uses real-world workload but the requests are localized to the instrumented destination.

IPTPS 2009 Boston, MA.


Results
Results

Validation

Content Accuracy

  • Zone size decreases with zone prefix length.

  • Both the figures show similar results.

  • Instrumented Source: increasing zone size beyond 6-bit degrades accuracy

    • Time taken to crawl <=5 bit zone hinders prompt addition of MVMs.

  • Instrumented Destination: zone size has minimal impact on accuracy.

    • MVMs are promptly added around instrumented destination.

Peer Accuracy

IPTPS 2009 Boston, MA.


Publish request rate
Publish Request Rate

Characterization Kad

Keywords

  • How request rate varies across different zones?

  • The heavily skewed behavior is consistent across different zones

  • Each zone has some hot keywords and files

  • Rate for Publish keywords is higher than files.

    • A lot of common names occur in filenames

  • See the paper for more results.

Files

IPTPS 2009 Boston, MA.


Relation between published and searched content

Characterization Kad

Relation Between Published and Searched Content.

Files

  • What is the balance between supply and demand for a file?

  • Balance = Pub./(Sear. + Pub)

  • 15% of files are searched but never published

    • Newly popular files that are not yet widely available.

  • 60% of files are published but never searched.

    • Popular files from past that are highly available.

  • 95% of keywords are published but never searched

    • A very small pool of keywords is actually used.

Keywords

IPTPS 2009 Boston, MA.


Conclusion
Conclusion

  • Montra is a new technique for capturing DHT traffic accurately and efficiently without disrupting the system.

  • Montra’s accuracy was validated over the Kad network.

  • Presented initial characterization of traffic in Kad

  • Ongoing work:

    • Further evaluation of Montra over other DHTs, e.g. Azureus, Mojito

    • Further analysis of captured traffic in Kad and other DHTs

    • Exploring other usage of Montra, e.g. detecting botnet c&c

IPTPS 2009 Boston, MA.


Search request rate
Search Request Rate

Characterization Kad

Keywords

  • Search file and search keyword requests have the lowest range of requests

    • Demonstrates user behavior.

  • User behavior for search keywords is different across different zones.

    • Some zones have more popular keywords

  • User behavior for search files across different zones is consistent.

Files

IPTPS 2009 Boston, MA.


ad