Peer to peer computing the hype the hard problems and quest for solutions
Download
1 / 60

Peer-to-Peer Computing: The hype, the hard problems and quest for solutions - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Peer-to-Peer Computing: The hype, the hard problems and quest for solutions. Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation. Outline. Section I Overview of P2P P2P Framework Overview of distributed computing frameworks Additional P2P framework requirements P2P Middleware

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Peer-to-Peer Computing: The hype, the hard problems and quest for solutions' - carolyn-foster


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Peer to peer computing the hype the hard problems and quest for solutions

Peer-to-Peer Computing: The hype, the hard problems and quest for solutions

Krishna Kant

Ravi Iyer

Vijay Tewari

Intel Corporation


Outline
Outline quest for solutions

  • Section I

    • Overview of P2P

    • P2P Framework

      • Overview of distributed computing frameworks

      • Additional P2P framework requirements

      • P2P Middleware

  • Section II

    • Taxonomy of P2P applications

    • Research Issues

  • Section III

    • Preliminary Performance Modeling

  • Conclusion


Goals for section i
Goals for Section I quest for solutions

  • Examine the early beginnings of Peer-to-Peer.

  • Look at some possible definitions of Peer-to-Peer

  • General idea about the Peer-to-Peer applications and frameworks.

  • Identify the requirements of Peer-to-Peer applications.


P2p beginnings

Where is “X”? quest for solutions

Mediator

1

2

Peer B has it

3

Copying X

Peer A

Peer B

P2P Beginnings

  • Interest kindled by distributed file-sharing applications

    • Napster: Mediated digital music swapping. (http://www.napster.com)


P2p beginnings1

1 quest for solutions

Where is File X?

1

Where is File (Key) X?

4

C: I have it.

2

Where is File (Key) X?

C: I have it.

3

P2P Beginnings

  • Gnutella: Fully distributed file sharing. (http://gnutella.wego.com)

  • Freenet Distributed file sharing with anonymity and key based search. (http://freenet.sourceforge.net)

Peer B

Peer A

5

GET File (Key) X (HTTP)

6

File X

Peer D

Peer C


We had them already
We had them already! quest for solutions

  • Using idle CPU cycles on home PCs, e.g., [email protected]

    • Involves scanning of radio telescope images for extraterrestrial life.

    • Chunks of data downloaded by home PCs, processed and results returned to the coordinator.

    • Similar schemes used for other heavy-duty computational problems.

  • Idle disk and main memory on workstations exploited in a number of network of workstation (NOW) projects.

Master

Processed Data

Raw Data

Peer 4

Peer 1

Peer 2

Peer 3

Data Crunching

Data Crunching

Data Crunching

Data Crunching


Newer applications
Newer Applications quest for solutions

  • P2P streaming media distribution

    • CenterSpan (C-Star Multisource Peer Streaming)

      • Mediated, Secure P2P platform for distributing digital content.

      • Partition content and encrypt each segment. Distribute segments amongst peers. Redundant distribution for reliability.

      • Download segments from local cache, peers or seed servers.

      • http://www.centerspan.com

    • vTrails

      • vtCaster: At stream source. Creates network topology tree based on end users (vtPass client software).

      • Dynamically optimizes tree.

      • Content distributed in a tiered manner.

      • http://www.vtrails.com


Newer applications1
Newer Applications quest for solutions

  • P2P Collaboration

    • Groove (http://www.groove.net)

      • Real time, small group interaction and collaboration.

      • Fundamental notion around a “shared space”

        • Each member of the group owns a copy of the “shared space”.

        • Changes made to the “shared space” by one member are propagated to each member of the group (Store and forward if some member is offline).

      • Platform is secure.

        • PKI for user authentication.

        • End to end encryption.

        • Groove components are digitally signed


So what is p2p
So, what is P2P? quest for solutions

  • Hype: A new paradigm that can

    • Unlock vast idle computing power of the Internet, and

    • Provide unlimited performance scaling.

  • Skeptic’s view: Nothing new, just distributed computing “re-discovered” or made fashionable.

  • Reality: Distributed computing on a large scale

    • No longer limited to a single LAN or a single domain.

    • Autonomous nodes, no controlling/managing authority.

    • Heterogeneous nodes intermittently connected via links of varying speed and reliability.

  • A tentative definition:

    • A dynamic network (peers can come & go as they please)

    • No central controlling or managing authority.

    • A node can act as both as a “client” and as a “server”.


P2p platforms
P2P Platforms quest for solutions

  • Legion, University of Virginia, Now owned by “Avaki” Corp.

  • Globe, Vrije Univ., Netherlands

  • Globus, Developed by a consortium including Argonne Natl. Lab and USC’s Information Sciences Institute.

  • JXTA, Open source P2P effort started by Sun Microsystems.

  • .NET by Microsoft Corp.

  • WebOS, University of Washington

  • Magi, Endeavors Technology

  • Groove


Avaki legion
Avaki (Legion) quest for solutions

  • Objective: Wide-area O/S functionality via distributed objects.

  • Middleware infrastructure for distributed resource sharing in mutually distrustful environment..

  • Global O/S services built on top of local O/S

*Source: Peer-to-Peer Computing by David Barkai (Intel Press)


Avaki legion1
Avaki (Legion) quest for solutions

  • Naming: LOID (location Indep. Object Id), current object address & object name

  • Persistent object space: generalization of file-system (manages files, classes, hosts, etc.)

  • Communication: RPC like except that the results can be forwarded to the real consumer directly.

  • Security: RSA keys a part of LOIDs, Encryption, authentication, digesting provided.

  • Local autonomy: Objects call local O/S services for all management, protection and scheduling.

  • Active objects: objects represent both processes and methods.

  • Overall: Comprehensive WAN O/S, but not targeted as a general P2P enabler.


  • Globe
    Globe quest for solutions

    • Objective: Another model for WAN O/S.

    • Distributed passive object model. Processes are separate entities that bind to objects.

    • Each object consists of 4 subobjects:

      • Semantics subobject for functionality.

      • Communication subobject for inter-object communication.

      • Replication subobject for replica handling including consistency maintenance.

      • Control subobject for control flow within the object.

    • Binding to object includes two steps:

      • Name & location lookup and contact address creation.

      • Selecting an implementation of the interface.

    • Overall: Similar to Legion, except that processes and objects are not tightly integrated.


    Globus
    Globus quest for solutions

    • Objective: Grid computing, integration of existing services.

    • Defines a collection of services, e.g.,

      • Service discovery protocol

      • Resource location & availability protocol

      • Resource replication service

      • Performance monitoring service

    • Any service can be defined and becomes the part of the “system”.

    • Higher level services can be built on top of basic ones.

    • Preserves site autonomy. Existing legacy services can be offered unaltered.

    • Overall: Excellent reusability. Unconstrained toolbox approach => Very difficult to join two “islands”.


    JXTA quest for solutions

    • Objective: A low-level framework to support P2P applications:

      • Avoids any reference to specific policies or usage models.

      • Not targeted for any specific language, O/S, runtime environment, or networking model.

      • All exchanges are XML based.

    • Base concepts for

      • Identifiers

      • Advertisements

      • Peers

      • Peer Groups

      • Pipes

    • At the highest abstraction defines a set of protocols using the base concepts:

      • Peer Discovery protocol: Discovery of peers, resources, peer groups etc.

      • Peer Resolver Protocol

      • Peer Information Protocol

      • Peer Membership protocol.

      • Pipe binding protocol

      • Peer endpoint protocol.


    JXTA quest for solutions

    Source: White Paper on Project JXTA: A Technology Overview by Li Gong


    Microsoft net in the context of p2p
    Microsoft .NET in the context of P2P quest for solutions

    • Objective: An enabler of general XML/SOAP based web services.

    • Message transfer via SOAP (simple object access protocol) over HTTP.

    • Kerberos based user authentication.

    • Extensive class library.

    • Emphasizes global user authentication via passport service (user distinct from the device being used).

    • Hailstorm supports personal services which can be accessed via SOAP from any entity


    MAGI quest for solutions

    • Enabler for collaborative business applications.

    *Source: Peer-to-Peer Computing by David Barkai (Intel Press)


    Magi quest for solutions

    • Magi: Micro-Apache Generic Interface, an extension of Apache project.

    • Superset of HTTP using

      • WebDAV: Web distributed authoring & versioning protocol, which provides, locking services, discovery & assignment services, etc. for web documents.

      • SWAP (simple workflow access protocol) that supports interaction between running services (e.g., notification, monitoring, remote stop/synchronization, etc.)

    • Intended for servers; client interface is HTTP.


    Webos
    WebOS quest for solutions

    • Objective: WAN O/S that can dynamically push functionality to various nodes depending on loading.

    • Outgrowth of the Berkeley NOW (network of workstations) project.

    • Consists of a number of components

      • Global naming: Mapping a service to multiple nodes, load balancing & failover.

      • Wide-area file system (with transparent caching and cache coherency).

      • Security & Authentication w/ fine-grain capability control.

      • Process control: Support for remote process execution.

    • Project no longer active, parts of it being used elsewhere.

    • Overall: Dynamic configurability useful for P2P environment.


    Groove
    Groove quest for solutions

    • Groove (http://www.groove.net)

      • Real time, small group interaction and collaboration.

      • Fundamental notion around a “shared space”

        • Each member of the group owns a copy of the “shared space”.

        • Changes made to the “shared space” by one member are propagated to each member of the group (Store and forward if some member is offline).

      • Platform is secure.

        • PKI for user authentication.

        • End to end encryption.

        • Groove components are digitally signed


    Requirements for p2p applications
    Requirements for P2P Applications quest for solutions

    • Local autonomy: No control or management by a central authority.

    • Scalability: Support collaboration of arbitrarily large number of nodes.

    • Security & Privacy: All accesses are authenticated and authorized.

    • Fault Tolerance: Assured progress with up to k failures anywhere.

    • Interoperability: Any peer that follows the protocol can participate irrespective of platform, OS, etc.

    • Responsiveness: Satisfy the latency expectations of the application.

    • Non-imposing: Allows machine user full resource usage whenever desired without affecting responsiveness.

    • Simplicity: Setting up a P2P application or participating in one should require minimum of manual intervention.

    • Auto-optimization: Ability to dynamically reconfigure the application (no of nodes, functionality, etc.)

    • Extensibility: Dynamic addition of functionality.


    P2p services
    P2P Services quest for solutions

    • Basic.

      • Network Services.

      • Naming.

      • Event and Exception management services.

      • Storage Services

      • Metadata services

      • Security Services

    • Advanced.

      • Search and Discovery.

      • Administrative and Auditing.

      • File services akin to a virtual file system.

      • User and group management services.

      • Resource management services.

      • Digital Rights management.

      • Replication and Migration services.


    From services to possible layers

    Location Independent Services

    Sharable Resources

    Naming, Discovery, Directory

    Administration, Monitoring

    Standards

    Policies

    Identity, Presence, Community

    Identity, Presence, Community

    Identity, Presence, Community

    Security

    Security

    Security

    Availability

    Availability

    Availability

    Communications

    Communications

    Communications

    From Services to possible Layers

    • Availability from unreliable components

    • Replication

    • Striping

    • Failover

    • Guaranteed message queuing

    • Authorization

    • Integrity

    • Privacy

    • Web of trust

    • Certification

    • DRM


    From services to possible layers1

    Location Independent Services quest for solutions

    Sharable Resources

    Naming, Discovery, Directory

    Administration, Monitoring

    Standards

    Policies

    Identity, Presence, Community

    Identity, Presence, Community

    Identity, Presence, Community

    Security

    Security

    Security

    Availability

    Availability

    Availability

    Communications

    Communications

    Communications

    From Services to possible Layers

    • Local Autonomy

    • IT allocation of resources

    • Self administration – reliable whole from unreliable parts

    • Resource monitoring

    • Payment tracking

    • CPU, storage, memory

    • Bandwidth

    • I/O devices

    • Capability discovery

    • Name space management

    • Metadata management

    • Discovery & location of peers, services, resources, users

    • User / group identity

    • Authentication

    • Persistence

      • Beyond a session

      • Across multiple devices


    Questions

    Questions ??? quest for solutions


    Part 2 taxonomy research issues
    Part 2: Taxonomy & Research Issues quest for solutions

    • Goals:

      • To introduce a taxonomy for classifying P2P applications and environments.

      • To elaborate upon some major research issues.


    P2p taxonomy
    P2P Taxonomy quest for solutions

    • Consider two types of properties:

      • Application characteristics

      • Environmental characteristics

    • Application Characteristics:

      • Resource (or data) storage: organized or scattered.

      • Resource control: organized or scattered.

      • Resource usage: isolated or collaborative.

      • Consistency constraints: loose or tight.

      • QoS constraints: loose (e.g., non real-time), moderate (e.g., online transaction processing).

      • query/response), or tight (e.g., streaming media).


    P2p taxonomy cont d
    P2P Taxonomy Cont’d quest for solutions

    • Environmental characteristics:

      • Network latency: Ranges from uniformly low (e.g., for a high-speed LAN) to highly variable (e.g., for general WAN).

      • Security concerns: Ranges from low (e.g., corporate intranet) to high (e.g., public WAN).

      • Scope of failures: Ranges from occasional isolated failures (e.g., a laboratory network of workstations) to network partitioning.

      • Connectivity: Ranges from always-on (e.g., nodes in a business LAN) to occasional-on (e.g., mobile devices).

      • Heterogeneity: Ranges from complete homogeneity to complete heterogeneity (in platform, O/S, protocols etc.).

      • Stability: Ranges from highly stable (i.e., Planned occasional changes/upgrades) to unpredictable.

    • Convenient to aggregate them as “friendly” and “hostile”.


    Research issues
    Research Issues quest for solutions

    • Intelligent caching of search results.

    • Intelligent object retrieval

      • Retrieval by properties rather than URL.

      • Need distributed indexing mechanisms.

      • Directing searches to more promising and less loaded nodes.

    • Multiparty synchronization and communication that scales to thousands of nodes.

    • For home computers: Utilize idle computing resources w/o significant communication requirements.

    • Unobtrusive use: If the owner wants to use the resources, get out of the way quickly.

    • Low latency service handoff protocols.


    Research issues cont d
    Research Issues Cont’d quest for solutions

    • Distributed load balancing that scales to thousands of geographically distributed nodes.

    • Stitching traffic from multiple paths to reduce latency or losses for real-time applications.

    • Access control in a mutually suspicious environment (foreign objects on your machine must protect themselves from you, and you from these objects).

    • Effective mapping of the application topology to the physical topology.

    • Architectural features to

      • Efficiently propagate requests and responses w/o significant CPU involvement

      • Squelch duplicate, orphaned or very late responses.


    Additional p2p issues
    Additional P2P Issues quest for solutions

    • Communicating with peers behind NAT devices and firewalls.

    • Naming and addressing peers that do not have DNS entries.

    • Coping with intermittent connectivity & presence (e.g., queued transfers).

    • Authentication of users independent of devices.

    • Digital rights management.

    • On demand task migration w/o breaking the application.

    • Efficient distributed information location and need based content migration.

    • Scalability to huge number of peers (e.g., 100M):

      • Peer state management

      • Discovery and presence management (intermittent connectivity & slow last mile links)

      • Certificate management and authentication.


    Part 3 performance study

    Part 3: Performance Study quest for solutions

    Goals:

    1. Define a performance model including

    - Network model

    - File storage and access model

    - File caching and propagation model

    2. Discuss sample results

    3. Discuss Architectural impacts


    P2p network characteristics
    P2P Network Characteristics quest for solutions

    • Desirable characteristics

      • Adequate representation of ad hoc nature of the network.

      • Expected to contain a few special sites (well-known, content rich, substantial resources, etc.)

      • Heavy-tailed nature of connectivity.

    • Other Issues

      • Dynamic changes to the network

        • Direct modeling not required if rate of change << request rate.

        • Metadata consistency issues still need to be considered.

      • Mapping of virtual P2P network on physical network

        • P2P applications generally don’t pay attention to mapping.

        • “Virtual links” bet. P2P neighbors are essentially statistically identical.

        • A better modeling possible, but difficult to calibrate.


    P2p node model
    P2P Node Model quest for solutions

    • Consider a 3-tier model for nodes

      • tier-1: Well-known, resource-rich, always on & part of network.

        • Similar to traditional server nodes (globally known sites in Gnutella)

        • Henceforth called as distinguished nodes.

      • tier-2: “Hub” nodes (reasonably resource rich & mostly on)

        • Contribute storage/files in addition to requesting them.

        • May join/leave the network, but at time-scale >> req-response time.

        • Henceforth called as undistinguished nodes.

      • tier-3: Infrequently connected or primarily “client” functionality

        • No need to represent these explicitly in the network

        • Requests/responses from these appear to originate from tier-1/2 nodes that they home on.


    P2p network model
    P2P Network Model quest for solutions

    • Use a random graph model to represent topology.

      • Traditional G(n,p) RG model too simplistic.

    • Use a 2-tier non-uniform model built as follows:

      • Start with a degree Kd regular graph of Nd dist. Nodes.

      • Add Nu undistinguished nodes sequentially as follows:

        • The new node connects to K other nodes.

        • K: const or an integer-valued RV in range 1..Kmax

        • Each connection targets an undistinguished node with prob qu (this may not be possible for the first Kmax nodes).

        • Dist. Node target: uniform distribution over all dist nodes.

        • Undist. Node target: Zipf(a) over existing undist. nodes.

        • At most one connection allowed between any pair of nodes.

      • a controls the decay rate of nodal degree

        • a=0 => Uniform dist => Very slow decay. Used here for simplicity.


    Topological properties
    Topological properties quest for solutions

    • Some network properties can be analyzed analytically

    • Outline of Analysis (see http://kkant.ccwebhost.com/download.htm)

      • Degree distribution:

        • Distinguished nodes at level 0, each new node defines a new level.

        • Pn(l2,l): Prob(level l node has degree n when current level = l2)

        • Get recurrence eqns for Pn(l2,l) & hence its PGFf(z| l2,l) .

        • Get avg degree Dat(l2,l) at level l when current level = l2.

        • Can be adapted for computing the undistinguished degree of a node.

      • No of nodes reached in h hops:

        • Rh matrix: Rh(i,j) is prob of reaching level i from level j in exactly h hops.

        • Compute Rh(i,j) by enumerating all unique paths of length h.

        • Compute G(l2,h), avg no of nodes reached in h hops starting from a level l2.

      • Request and response traffic at level l node:

        • nreqs = No of requests reaching undist. nodes in h hops = 1 + ShG(l2,h),

        • nresps = 1 + Shh G(l2,h), since resp from h hops away goes thru h nodes.

      • Nodal utilization & node engineering:

        • Easy to ensure that nodal utilization do not exceed some limits.

    • Queuing properties generally intractable; explored via simulation.


    Sample results 100 nodes
    Sample Results - 100 nodes quest for solutions

    undist no_of nodes undist resps traf

    prob hops reached reached /node /node

    1 5.9 3.3 4.9 6.1

    2 55.2 44.5 103.6 146.5

    0.05 3 99.1 85.8 235.2 320.5

    4 100 90.0 238.8 328.8

    5 100 90.0 238.8 328.8

    1 5.9 4.3 4.9 8.4

    2 34.3 23.8 61.7 82.3

    0.50 3 91.0 73.9 231.7 304.0

    4 99.9 89.4 267.5 356.9

    5 100 89.6 267.7 357.3

    1 5.9 5.3 4.9 10.6

    2 28.6 22.6 50.3 73.6

    0.95 3 76.7 63.8 194.6 258.4

    4 98.5 87.4 281.8 369.2

    5 99.7 89.3 287.8 377.2


    Sample results 500 nodes
    Sample Results - 500 nodes quest for solutions

    undist no_of nodes undist resps traf

    prob hops reached reached /node /node

    1 6.0 3.6 5.0 6.2

    2 243.7 232.7 480.5 711.5

    0.05 3 499.7 488.6 1248.4 1737.0

    4 500.0 490.0 1249.6 1739.6

    1 6.0 4.7 5.0 8.5

    2 95.7 84.2 184.3 264.6

    0.50 3 483.5 465.1 1347.8 1812.4

    4 500.0 490.0 1413.9 1903.9

    1 6.0 5.8 5.0 10.7

    2 35.1 29.1 63.2 91.7

    0.95 3 163.5 137.1 448.3 582.4

    4 405.7 367.7 1417.2 1782.7


    Simulation of random graphs
    Simulation of Random Graphs quest for solutions

    • Simulation of Random graph is a hard problem

      • Model represents a large number of topologies that the actual network might take.

      • Too many instances to simulate explicitly and then average the results.

      • Example: 2 dist & 3 undist nodes, each connects to 2 nodes => 6 distinct topologies.

    • Possible approaches to simulation:

      • Average case analysis

      • Model with limited set of instances.

      • Direct simulation of probabilistic model.


    Average case analysis
    Average case analysis quest for solutions

    • Intended environment

      • To study performance of an “average” network defined by RG model.

      • No dynamic changes to the topology possible.

    • Graph construction

      • Start with the regular graph of distinguished nodes (as usual).

      • For adding undist nodes, work with only the avg connectivities Kd & Kufor an incoming node.

      • Always connect to the existing node with min connectivity.

      • Kd & Kd can be used successively to handle non-integer Kd values (similarly for Ku).

    • Characteristics/issues

      • Simple, only one graph to deal with in simulation.

      • Gives correct avg reachability and nodal utilizations.

      • All queuing metrics (including avg response time) are underestimated.


    Constrained connectivity
    Constrained Connectivity quest for solutions

    • Intended environment

      • To capture most likely scenarios of connectivity.

      • Accommodate both static topology an slowly changing topology.

    • Graph construction and simulation

      • For the entering level l2 node, analytically estimate Dat(l2,l) at all l.

      • Allow connection to a level l node only if degree(l) falls in the range (min..max) Dat(l2,l) .

      • Found that min=0.5 and max=1.5 is quite adequate.

      • Generate a limited set (~100) instances of the graph.

      • During simulation, each query randomly selects one instance.

    • Characteristics/issues

      • Avoids highly asymmetric topologies => queuing properties are underestimated.

      • All generated instances are given equal weight. Relative weights can be estimated but very expensive.


    Probabilistic graph emulation
    Probabilistic Graph Emulation quest for solutions

    • Intended environment

      • To study overall performance when the topology is defined by the random graph model.

      • Accommodate fast changing or unstable topologies.

    • Method:

      • For each node i, estimate relative prob qij of having an edge to node j  i.

      • A query coming from node k to node i is sent to node j with prob qij/(1-qik).

      • This virtual topology for the query is used to return responses as well.

    • Characteristics/Issues

      • Method dependent on analytic calculation of edge probabilities to neighbors.

      • Single simulation automatically visits various instances in the correct proportion.

      • No explicit control over which instances are visited => Reliable results may take a very long time.

      • Very expensive and difficult to handle complex operations (e.g., file migration).


    File size access distribution
    File Size & access distribution quest for solutions

    • Using a 2-segment model:

      • Small sizes: Distribution generally irregular; uniform is a reasonable model.

      • Pareto tail with decay rate 1<a<2 is quite reasonable.

    • Adopted distribution:

      • Uniform dist in the small-size range 400 bytes to 4 KB.

      • Pareto distribution with a min value of 4KB and mean of 40 KB => a = 1.11.

      • 40 KB mean is typical for web pages, but too small for MP3 files.

    • “File category” provides a link between file size and its “popularity”. Needed to model higher access rate of small files.

      • Chose 9 categories (equally spaced in log domain)

        400B, 1.265KB, 4KB, 12.65KB, 40KB, 126.5KB, 400KB, 1.265MB, 4MB, 12.65MB

    • File access distribution:

      • Across categories, distribution specified by a discrete mass function:

        (0.07, 0.14, 0.2018, 0.20, 0.14, 0.098, 0.0686, 0.048, 0.0336)

      • This increases linearly first and then decays geometrically w/ factor 0.7.

      • Within each category, assume uniform access distribution.


    Parameters file copies
    Parameters: File Copies quest for solutions

    • Each search in a P2P network may result in multiple “hits”.

    • Need only dist. of hits; precise modeling of search mechanism not needed.

    • Use file copies for this:

      • Each file has C copies in the range (1..Cmax) with a given distribution.

      • A file is now identified by the triplet: (category, file_no, copy_no) where file_no is a unique id (e.g., sequence no) of files in a category.

    • This allows following capabilities:

      • Unique searches specified by the file-id triplet.

      • Non-unique searches specified by (category, file_no).

      • Replication control and fault-tolerant operation.

    • File copy parameters:

      • Distribution may be related to the nature of the file (not considered here).

      • Separate distributions allowed for files allocated to dist & undist nodes.

      • Assuming a triangular distribution with Cmax = 20, and mode Cmode= 5 for all nodes => Mean no of copies = 8.667.


    File assignment to nodes
    File Assignment to Nodes quest for solutions

    • Assignment of copies to nodes:

      • Assign copies at a fixed distance so as to distribute them evenly across the network.

      • Apply an offset for each round of copy assignment to avoid bunching up.

      • Do not assign more than one copy of a file to a node.

    • Algorithm: loop over all files

      n_copies = triangular_rv(1, Cmax , Cmode) // Generate random no of copies

      if ( n_copies > n_nodes ) n_copies = n_nodes; // Don’t allow more copies than nodes

      distance = n_nodes/n_copies; // Distance for copy allocation

      offset = 1 + n_nodes/no_files; // If too few files, get an offset to avoid bunching

      tot_offset = (tot_offset + offset) % n_nodes;

      node_no = tot_offset; // Node for the assignment of first copy

      for ( copy_no = 0; copy_no < n_copies; copy_no++) {

      assign_file( node_no, file_no, size);

      node_no = (node_no + distance) % n_nodes; // Next node for assignment

      if ( copy_no < n_copies -1 && node_no == (tot_offset + wraps)% n_nodes) {

      node_no = (node_no + 1) % n_nodes; wraps++;

      }

      } // loop over copies


    Query characteristics
    Query Characteristics quest for solutions

    • Assumptions:

      • No queries (searches) started from distinguished nodes since these nodes are essentially “servers”.

      • Identical query arrival process at each undistinguished node.

    • Arrival process model

      • An on-off process with identical Pareto distribution for on \& off periods:

        P(X>x) = (x/T)g for x > T

      • Assume T=12 secs, and g=1.4 which gives E(X)=30 secs.

      • Const inter-arrival time of 4 secs during the on-period, no traffic during off period.

      • Total traffic at a node is superposition of arrivals from all reachable nodes.

      • Approx. a self-similar process with Hurst parameter H=(3 - g)/2 = 0.8 when no of reachable nodes is large.

    • Query properties:

      • Each query specifies a file (category, file_no) w/ given access characteristics.

      • Shown results do not specify copy_no => Multiple hits possible for each query.

      • Query percolates for h “hops”. (h=3 can cover 95% of nodes for chosen graph).

      • If a query arrives at a node more than once, it is not propagated.


    File retrieval
    File Retrieval quest for solutions

    • Query Response:

      • Query reaching a node generates found/not found response, which travels backwards along the search path.

      • Querying node runs a timer Tu; all responses after the timeout are ignored.

      • Currently no concept of retrying the timed out requests.

      • Distribution of Tu: Triangular in the range (3,14) secs with mean 8.0 secs.

    • File retrieval:

      • Randomly choose one of the positively responding nodes for file retrieval.

      • Requested file(s) are obtained directly (i.e., do not follow the response path).

      • Retrieved file may be optionally cached at the requesting node.

    • File cache flushing

      • Used as a way of modeling dynamic changes in tier-3 nodes (which are not represented).

      • A cache flush represents a tier3 user disconnecting and replaced by another statistically identical tier-3 node.

      • No of cycles before cache flushing: Zipf with min=30, max=120 and a =1.0.


    Simulation results
    Simulation Results quest for solutions


    Major observations
    Major Observations quest for solutions


    Conclusions future work
    Conclusions & Future Work quest for solutions

    • Covered in the tutorial:

      • Introduced major developments relevant to P2P computing.

      • Introduced sample middleware functionality to support P2P applications.

      • Introduced a taxonomy for classifying P2P computing applications and environments.

      • Discussed major research issues to be resolved.

      • Proposed a random graph model for P2P networks and studied its properties.

      • Studies some performance issues for P2P deployments using detailed simulation of file-sharing applications.

    • Potential Future Work

      • Further refinement of middleware functionality and taxonomy as newer P2P applications emerge.

      • More comprehensive performance studies, particularly going beyond simply file-sharing.


    Backup

    Backup quest for solutions


    Goals
    Goals quest for solutions

    • Define Peer-to-Peer.

    • General idea about the Peer-to-Peer applications and frameworks.

    • Identify the requirements of Peer-to-Peer applications.

    • Examine a taxonomy for Peer-to-Peer.

    • Performance. (Not clear what we write here)


    Ad hoc collaborative computing
    Ad-hoc Collaborative computing quest for solutions

    • Several applications e.g., telemedicine, military planning, video-conferencing, document editing

      • A group of peers discover one-another and form an ad-hoc network

      • Peers setup the necessary communication channels (perhaps secure) and distribute objects.

      • Peers do arbitrary real-time computation perhaps involving multiparty synchronization.

      • Results are collected and the network disbanded.


    JXTA quest for solutions

    • At the highest abstraction defines a set of protocols:

      • Peers & peer groups: An arbitrary grouping of peers; group members share resources & services.

      • Services: A basic set defined (e.g., discovery, membership, access control, resolver, communication, etc.)

      • Pipes: Unidirectional, asynchronous communication channels. A peer can dynamically connect/disconnect to any existing pipe within the peer group.

      • Messages: Arbitrary sized w/ src and dest addresses in URI form.

      • Advertisements: A “properties” record needed for name resolution, availability, etc. Specified as a XML document.


    P2p services1
    P2P Services quest for solutions

    • Basic.

      • Network Services.

        • Core communication functionality.

        • Enable communication on various network topologies such as direct via firewalls.

        • Enable communication in the face of intermittent connectivity.

      • Event and Exception management services.

        • Publish and subscribe model.

      • Storage Services

        • Low level File services.

      • Metadata services

        • Generic mechanism for publishing and obtaining Metadata for

          • Devices

          • Resources (Files, CPU, Memory etc)


    P2p services2
    P2P Services quest for solutions

    • Security Services

      • Identification

      • Authentication

      • Access Control

      • Integrity

      • Confidentiality

      • Audit Trail

    • User and group management services.

    • Resource management and Placement services.

  • Advanced.

    • Naming.

    • Search.

    • Discovery.

    • Administrative.

    • Auditing.

    • File services


  • Additional p2p issues1
    Additional P2P Issues quest for solutions

    • Communicating with peers behind NAT devices and firewalls.

    • Naming and addressing peers that do not have DNS entries.

    • Coping with intermittent connectivity & presence (e.g., queued transfers).

    • Authentication of users independent of devices.

    • Digital rights management.

    • On demand task migration w/o breaking the application.

    • Efficient distributed information location and need based content migration.

    • Scalability to huge number of peers (e.g., 100M):

      • Peer state management

      • Discovery and presence management (intermittent connectivity & slow last mile links)

      • Certificate management and authentication.


    Web sites of interest
    Web Sites of Interest quest for solutions

    • Napster (http://www.napster.com)

    • Gnutella (http://gnutella.wego.com)

    • Freenet (http://freenet.sourceforge.net)

    • JXTA (http://www.jxta.org)

    • Avaki Corp (http://www.avaki.com)

    • Legion (http://legion.virginia.edu)

    • Globe (http://www.cs.vu.nl/~steen/globe)

    • Globus (http://www.globus.org)

    • Microsoft .Net (http://www.microsoft.com/net)


    Web sites of interest1
    Web sites of interest quest for solutions

    • Peer-to-Peer Working Group. (http://www.p2pwg.org)

    • CenterSpan (http://www.centerspan.com)

    • vTrails (http://www.vtrails.com)

    • [email protected] (http://setiathome.ssl.berkeley.edu)


    ad