Brief Overview of Academic Research on P2P

Brief Overview of Academic Research on P2P Pei Cao

Relevant Conferences • IPTPS (International Workshop on Peer-to-Peer Systems) • ICDCS (IEEE Conference on Distributed Computer Systems) • NSDI (USENIX Symposium on Network System Design and Implementation) • PODC (ACM Symposium on Principles of Distributed Computing) • SIGCOMM

Areas of Research Focus • Gnutella-Inspired • The “Directory Service” Problem • BitTorrent-Inspired • The “File Distribution” Problem • P2P Live Streaming • P2P and Net Neutrality

Gnutella-Inspired Research Studies

The Applications and The Problems • Napster, Gnutella, KaZaa/FastTrak, Skype • Look for a particular content/object, and find which peer has it  the “directory service” problem • Challenge: how to offer a scalable directory service in a fully decentralized fashion • Arrange direct transfer from the peer  the “punch a hole in the firewall” problem

Decentralized Directory Services • Structured Networks • DHT (Distributed Hash Tables) • Very active research areas from 2001 to 2004 • Limitation: lookup by keys only • Multi-Attribute DHT • Limited support for query-based lookup • Unstructured Networks • Various improvements to basic flooding based schemes

What Is a DHT? • Single-node hash table: key = Hash(name) put(key, value) get(key) -> value • How do I do this across millions of hosts on the Internet? • Distributed Hash Table

Distributed Hash Tables • Chord • CAN • Pastry • Tapastry • Symphony • Koodle • etc.

The Problem N2 N1 N3 Put (Key=“title” Value=file data…) Internet ? Client Publisher Get(key=“title”) N4 N6 N5 • Key Placement • Routing to find key

Key Placement • Traditional hashing • Nodes numbered from 1 to N • Key is placed at node (hash(key) % N) • Why Traditional Hashing have problems

Consistent Hashing: IDs • Key identifier = SHA-1(key) • Node identifier = SHA-1(IP address) • SHA-1 distributes both uniformly • How to map key IDs to node IDs?

Key 5 K5 Node 105 N105 K20 Circular 7-bit ID space N32 N90 K80 Consistent Hashing: Placement A key is stored at its successor: node with next higher ID

N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60 Basic Lookup

“Finger Table” Allows log(N)-time Lookups ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80

Finger i Points to Successor of n+2i N120 112 ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80

Lookups Take O(log(N)) Hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60

Chord Lookup Algorithm Properties • Interface: lookup(key)  IP address • Efficient: O(log N) messages per lookup • N is the total number of servers • Scalable: O(log N) state per node • Robust: survives massive failures • Simple to analyze

Related Studies on DHTs • Many variations of DHTs • Different ways to choose the fingers • Ways to make it more robust • Ways to make it more network efficient • Studies of different DHTs • What happens when peers leave aka churns? • Applications built using DHTs • Tracker-less BitTorrent • Beehive --- a P2P based DNS system

Directory Lookups: Unstructured Networks • Example: Gnutella • Support more flexible queries • Typically, precise “name” search is a small portion of all queries • Simplicity • High resilience against node failures • Problems: Scalability • Flooding  # of messages ~ O(N*E)

Flooding-Based Searches • Duplication increases as TTL increases in flooding • Worst case: a node A is interrupted by N * q * degree(A) messages 1 3 2 4 6 5 7 8 . . . . . . . . . . . .

Problems with Simple TTL-Based Flooding • Hard to choose TTL: • For objects that are widely present in the network, small TTLs suffice • For objects that are rare in the network, large TTLs are necessary • Number of query messages grow exponentially as TTL grows

Idea #1: Adaptively Adjust TTL • “Expanding Ring” • Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds • Success varies by network topology

Idea #2: Random Walk • Simple random walk • takes too long to find anything! • Multiple-walker random walk • N agents after each walking T steps visits as many nodes as 1 agent walking N*T steps • When to terminate the search: check back with the query originator once every C steps

Flexible Replication • In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters • Limited node storage => what’s the optimal replication density distribution? • In Gnutella, only nodes who query an object store it => ri pi • What if we have different replication strategies?

Optimal ri Distribution • Goal: minimize ( pi/ ri ), where  ri =R • Calculation: • introduce Lagrange multiplier , find ri and  that minimize: ( pi/ ri ) +  * ( ri - R) =>  - pi/ ri2 = 0 for all i => ri  pi

Square-Root Distribution • General principle: to minimize ( pi/ ri ) under constraint  ri =R, make ri proportional to square root of pi • Other application examples: • Bandwidth allocation to minimize expected download times • Server load balancing to minimize expected request latency

Achieving Square-Root Distribution • Suggestions from some heuristics • Store an object at a number of nodes that is proportional to the number of node visited in order to find the object • Each node uses random replacement • Two implementations: • Path replication: store the object along the path of a successful “walk” • Random replication: store the object randomly among nodes visited by the agents

KaZaa • Use Supernodes • Regular Nodes : Supernodes = 100 : 1 • Simple way to scale the system by a factor of 100

BitTorrent-Inspired Research Studies

Modeling and Understanding BitTorrent • Analysis based on modeling • View it as a type of Gossip Algorithm • Usually do not model the Tit-for-Tat aspects • Assume perfectly connected networks • Statistical modeling techniques • Mostly published in PODC or SIGMETRICS • Simulation Studies • Different assumption of bottlenecks • Varying details of the modeling of the data transfer • Published in ICDCS and SIGCOMM

Studies on Effect of BitTorrent on ISPs • Observation: P2P contributes to cross-ISP traffic • SIGCOMM 2006 publication on studies in Japan backbone traffic • Attempts to improve network locality of BitTorrent-like applications • ICDCS 2006 publicatoin • Academic P2P file sharing systems • Bullet, Julia, etc.

Techniques to Alleviate the “Last Missing Piece” Problem • Apply Network Coding to pieces exchanged between peers • Pablo Rodriguez Rodriguez, Microsoft Research (recently moved to Telefonica Research) • Use a different piece-replication strategy • Dahlia Makhi, Microsoft Research • “On Collaborative Content Distribution Using Multi-Message Gossip” • Associate “age” with file segments

Network Coding • Main Feature • Allowing intermediate nodes to encode packets • Making optimal use of the available network resources • Similar Technique: Erasure Codes • Reconstructing the original content of size n from roughly a subset of any n symbols from a large universe of encoded symbols

Network Coding in P2P: The Model • Server • Dividing the file into k blocks • Uploading blocks at random to different clients • Clients (Users) • Collaborating with each other to assemble the blocks and reconstruct the original file • Exchanging information and data with only a small subset of others (neighbors) • Symmetric neighborhood and links

Network Coding in P2P • Assume a node with blocks B1, B2, …, Bk • Pick random numbers C1, C2, …, Ck • Construct new block E = C1 * B1 + C2 * B2 + … + Ck * Bk • Send E and (C1, C2, …, Ck) to neighbor • Decoding: collect enough linearly independent E’s, solve the linear system • If all nodes pick vector C randomly, chances are high that after receiving ~K blocks, can recover B1 through Bk

P2P Live Streaming

Motivations • Internet Applications: • PPLive, PPStream, etc. • Challenge: QoS Issues • Raw bandwidth constraints • Example: PPLive utilizes the significant bandwidth disparity between “Univeristy nodes” and “Residential nodes” • Satisfying demand of content publishers

P2P Live Streaming Can’t Stand on Its Own • P2P as a complement to IP-Multicast • Used where IP-Multicast isn’t enabled • P2P as a way to reduce server load • By sourcing parts of streams from peers, server load might be reduced by 10% • P2P as a way to reduce backbone bandwidth requirements • When core network bandwidth isn’t sufficient

P2P and Net-Neutrality

It’s All TCP’s Fault • TCP: per-flow fairness • Browsers • 2-4 TCP flows per web server • Contact a few web servers at a time • Short flows • P2P applications: • Much higher number of TCP connections • Many more endpoints • Long flows

When and How to Apply Traffic Shaping • Current practice: application recognition • Needs: • An application ignostic way to trigger the traffic shaping • A clear statement to users on what happens

Brief Overview of Academic Research on P2P

Brief Overview of Academic Research on P2P

Presentation Transcript

Brief Overview of Cryptography

Brief Overview Of Hurricanes

A Brief Overview of Current Research on Dividends

Brief Overview of Some Futures Research Methods

BRIEF OVERVIEW

Brief overview of LSFM

Brief overview

Brief overview

BRIEF OVERVIEW OF PRESENTATION

BRIEF OVERVIEW

(Brief) Overview

Brief Overview of ICSF

Brief Overview of Embrapa

Brief Overview

A Brief Overview of Really Current Research on Dividends

BRIEF OVERVIEW

A Brief Overview of

Brief Overview of Macromolecules

Brief overview of ideas

A Brief Note on Academic Writing

Brief Overview On Hemorrhoid Treatment