Introduction • Widespread unstructured P2P network • Currently between 200,000 & 300,000 hosts • Ideal as a research test bed • Large scale network demonstrates the need for scalable P2P protocols • A Gnutella client has 4-10 TCP connections to other peers • For signaling traffic UDP isused and to make use of the benefits of server based networks a ”ultra-peer” state wascreated
Introduction (Cont.) • ”Ultra-peer” status is self assigned by powerful peers and provides some extrafunctionality compared to ordinary nodes • There exist many freely available Gnutellaclients • Some of the most popular are: • Limewire • Bearshare • Morpheus • Shareaza • It has the most increasing number of users • It has a verypleasant GUI and connects also to eDonkey and BitTorrent
Its Main Features • This protocol underlies much of the current file-sharing activity on the Internet. • It is based on TCP/IP and http! • A file sharing network (fsn) is a bunch of machines that exchange files using gnutella. • To connect to a gnutella network, you need the IP address of one single machine that is already part of the network.
Gnutella • Peer-to-peer indexing and searching service. • Peer-to-peer point-to-point file downloading using HTTP. • A gnutella node needs a server (or a set of servers) to “start-up”… gnutellahosts.com provides a service with reliable initial connection points But introduces a new single point of failure!
Gnutella vs. Napster • Like Napster, distributed file storage and transmission • Added the ability to distribute file discovery • Ask your direct peers who else they know • Query those machines directly
Concepts of Unstructured Services • There are many interesting ideas being explored; • Breaking shared files into many parts to both increase bandwidth (parallel I/O) and increase security of content as no one site can access files without cooperation from its peers • This type of technology makes censorship very hard. • MojoNation has a load balancing and scheduling algorithm in the form of micro payments to reward those who contribute most to the community of peers. • Gnutella - which is a family of related products -- is usually described as a P2P search engine as its interface is nearer that of a search engine than a Web file system
Characteristics • Gnutella is a distributed system for file sharing • provide means for network discovery • provide means for file searching and sharing • Defines a network at the application level • Employs the concept of peer-to-peer • all hosts are equal (symmetry) • there is no central point • anonymous search, but reveal the IP addresses when downloading
connection • Once you establish connection to the first servent, you announce your presence. • The first servent will pass on that message to all the servents that it is connected to, and so on. • These servents all reply with data about themselves • how many files it is sharing • how many kilo bytes the files take up • This already adds up to a lot of traffic!
Gnutella File Sharing model • Users register files with network neighbors • Search across the network to find files to copy • Does not require a centralized broker (as Napster) Copying Final Fantasy 4 Bob Carol Where is Final Fantasy 4? Carol has Final Fantasy 4 Where is Final Fantasy 4? Alice Ted Carol has it
Resource Discovery Decentralized File-sharing Model • Peers have same capability and responsibility • The communication between peers is symmetric • There is no central directory server Index on the metadata of shared files is stored locally among all peers • Gnutella • FreeServe • MojoNation
Resource Discovery Decentralized (Cont.) • every user acts as a client, a server or both (servent) • User connects to framework and becomes a member of the community, allowing others to connect through him/her • Users speak directly to other users with no intermediate or central authority • No one entity controls the information that passes through the community
Resource Discovery Advantages and Disadvantages • Advantages: • Inherent scalability • Avoidance of “single point of litigation” problem • Fault Tolerance • Disadvantages: • Slow information discovery • More query traffic on the network
Unstructured Decentralized Services • There some 200 available Napster clones to support this area http://www.ultimateresourcesite.com/napster/main.htm • Currently the most popular is Imesh [http://www.imesh.com], which has some 2 million users and can share any type of file. • Some of the best known file sharing systems are • MojoNation [http://www.mojonation.net] • Freenet [http://freenet.sourceforge.net/] • Gnutella [http://gnutella.wego.com/] • These three are not server based like Napster but rather support waves of software agents expressing resource availability and interest propagating among an informal dynamic networks of peers
DFS Variations • DFS: Distributed File Sharing
P2P File Sharing Benefits • Cost sharing • Resource aggregation • Improved scalability/reliability • Anonymity/privacy • Dynamism
Management/Placement Challenges • Per-node state • Bandwidth usage • Search time • Fault tolerance/resiliency
Gnutella in Details • Share any type of files (not just music) • Decentralized search unlike Napster • You ask your neighbors for files of interest • Neighbors ask their neighbors, and so on • TTL field quenches messages after a number of hops • Users with matching files reply to you Figure from http://computer.howstuffworks.com/file-sharing.htm
The Gnutella protocol (v0.4) • PING – Notify a peer of your existence • PONG – Reply to a PING request • QUERY – Find a file in the network • RESPONSE – Give the location of a file • PUSHREQUEST – Request a server behind a firewall to push a file out to a client.
Joining Gnutella Network • The new node connects to a well known ‘Anchor’ node. • Then sends a PING message to discover other nodes. • PONG messages are sent in reply from hosts offering new connections with the new node. • Direct connections are then made to the newly discovered nodes. Gnutella Network New PING PING PING PONG PING PING A PING PING PONG PING PING PING
Properties of the Flooding Searching by flooding: • If you don’t have the file you want, query 7 of your partners. • If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. • Requests are flooded, but there is no tree structure. • No looping but packets may be received twice Note: Play gnutella animation at: http://www.limewire.com/index.jsp/p2p
Gnutella no hierarchy use bootstrap node to learn about others join message Send query to neighbors Neighbors forward query to all attached neighbors (floods) If queried peer has object, it sends message back to querying peer query join Query flooding
Pros peers have similar responsibilities: no group leaders highly decentralized no peer maintains directory info Cons excessive query traffic query radius: may not have content when present bootstrap node still required maintenance of overlay network More on query flooding
About the Flooding • There is nothing that stops a servant flooding its network region with messages. • Cost of maintaining Network • Cost of searching file
= source = forward query = processed query = found result = forward response Breadth-First Search (BFS)
Resource Discovery Pros and Cons • Benefits: • Peers speak directly with no central authority • Nobody owns the Gnutella Network and nobody can shut it down • No central point of failure • Limited per-node state Isolated node failure can quickly and automatically be worked around • Free loading • Scalability • Drawbacks: • Searches are less effective and can be slow • Bandwidth intensive • Gnutella network evolving to include “controlled decentralization” (limewire, bearshare, toadnode)
QUERY QUERY QUERY QUERY QUERY QUERY QUERY QUERY QUERY HIT HIT Searching for a File Gnutella Network • A node broadcasts its QUERY to all its peers who in turn broadcast to their peers. • Nodes route QUERYHITs along the QUERY path back to the sender containing file location details. • To download files a direct connection is made using details of the host in the QUERYHIT messages.
Free Riding • File sharing networks rely on users sharing data • Two types of free riding • Downloading but not sharing any data • Not sharing any interesting data • On Gnutella • 15% of users contribute 94% of content • 63% of users never responded to a query • Didn’t have “interesting” data Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella”
Summary of the Gnutella’s Features • Decentralized • No single point of failure • Not as susceptible to denial of service • Cannot ensure correct results • Flooding queries • Search is now distributed but still not scalable
Initials Problems and Fixes • Freeloading: WWW sites offeringsearch/retrieval from Gnutella network withoutproviding file sharing or query routing • Block file-serving to browser-based non-file-sharing users • Prematurely terminated downloads: • Software bugs • long download times over modems • modem users run gnutella peer only briefly (Napster problem also!)or any users becomes overloaded • fix: peer can reply “I have it, but I am busy. Try again later”
Initials Problems and Fixes 2 • 2000: avg size of reachable network only 400-800 hosts • Why so small? • modem users: not enough bandwidth to provide search routing capabilities: routing black holes • Fix: create peer hierarchy based on capabilities • previously: all peers identical, most modem blackholes • connection preferencing: • favors routing to well-connected peers • favors reply to clients that themselves serve large number of files: prevent freeloading • Limewire gateway functions as Napster-like central server on behalf of other peers • for searching purposes
Gnutella Enhancements • Pings/Pongs can consume up to 50% ofbandwidth • Solutions: • Pong Limiting • Pong Caching • Ping Multiplexing • http://www.limewire.com/index.jsp/pingpong
Gnutella enhancements 2 • Cache query responses • Results • Evolving Protocol • Gnutella Developer Forum • UltraPeers • Alternative queryrouting algorithms
Can Heterogeneity Make GnutellaScale? • Ideas • Replace query flooding with multiple random walks • Proactive replication • #replicas proportional to sqrt(requestrate) • Result: Two orders of magnitude improvement in terms ofquery-time, per node load and message traffic
Can Heterogeneity Make GnutellaScale? 2 • Gnutella assumption: • All peers are equal • Not true! Heterogeneity among P2P peers (dial-up users vs.college users) • Evolve topology to match node capacities • Use random walks over this topology
Can Heterogeneity Make GnutellaScale? 3 • Solution outline • C_i, node capacity in[j,i] messages from j->i, out[i,j] messages i->j • Init in[i,j]=out[i,j]=0, OutMax[i,j]=c_i/d_I • Update according the messages received/sent • Check if overloaded • If so redirect high-input neighbor to neighbor with highOutMax (spare capacity) • Intuitively, take yourself out of the loop • If node cannot be found ask neighbor to throttle back • Result: Average query length reduces from 70 to2-9 hops • depending on topology
Measurement Results • Who is sharing what? • August 2000
TTL i=0 Problems With Gnutella • Protocol scalability • Message broadcast technique imposes limitations on the network size packets per message = ∑noPeersi • In November 2000 dial-up bandwidth barrier reached • Overlay network efficiency • Random selection of peers results in inefficient use of the underlying network • Redundant traffic generated on the Internet
Heterogeneous connection qualities of the Gnutella • 35% have upstream bottleneck bandwidth of at least 100Kbps • only 8% have at least 10Mbps bandwidth • 22% have bandwidth 100kbps or less
Why Look at Gnutella • Widespread unstructured P2P network • Currently between 200,000 & 300,000 hosts • 2006: still heavily in use by about 2 million users • Gnutella clients (among others): • LimeWire • Morpheus • BearShare • OpenCola • Shareaza • It has the most increasing number of users • It has a very pleasant GUI and connects also to eDonkey and BitTorrent • Ideal as a research test bed • Large scale network demonstrates the need for scalable P2P protocols
Limewire: Improvement on Gnutella • Creation peer hierarchy based on capabilities • previously: all peers identical, most modem blackholes • connection preferencing: • favors routing to well-connected peers • favors reply to clients that themselves serve large number of files: prevent freeloading • Limewire gateway functions as Napster-like central server on behalf of other peers • for searching purposes
Limewire • The Limewire P2P file sharing program connects to the Gnutella P2P network • Limewire client software is widely recognized for its clean user interface that does not contain adware • Sometimes billed as the „fastest file sharing program” • Limewire claims to offer relatively good search and download performance • Free Limewire software downloads are available for Windows, Linux and Macintosh operating systems • Limewire Pro pay clients also exist
BearShare • The BearShare P2P file sharing program is a popular free software client for the Gnutella P2P network • Both free and pay downloads of BearShare file sharing programs exist
Shareaza • Shareaza is an up-and-coming P2P file sharing program • This client offers an extremely powerful search engine capable of connecting to multiple popular P2P networks including eDonkey, BitTorrent and Gnutella • Shareaza file sharing software includes intelligence for detecting fake and/or corrupted files • The free Shareaza download also contains no ads or spyware • As the installed base of Shareaza client users grows • expect Shareaza to become an even better P2P file sharing program
Anonymous? • The person you are getting the file from knows who you are • That’s not anonymous. • Other protocols exist where the owner of the files doesn’t know the requester. • Peer-to-peer anonymity exists
Summary • peer-to-peer networking: applications connect to peer applications • focus: decentralized method of searching for files • each application instance serves to: • store selected files • route queries (file searches) from and to its neighboring peers • respond to queries (serve file) if file stored locally • Gnutella history: • 3/14/00: release by AOL, almost immediately withdrawn • too late: 23K users on Gnutella at 8 am this AM • many iterations to fix poor initial design (poor design turned many people off) • What we care about: • How much traffic does one query generate? • how many hosts can it support at once? • What is the latency associated with querying? • Is there a bottleneck?