collaborative content delivery n.
Skip this Video
Loading SlideShow in 5 Seconds..
Collaborative Content Delivery PowerPoint Presentation
Download Presentation
Collaborative Content Delivery

Loading in 2 Seconds...

play fullscreen
1 / 29

Collaborative Content Delivery - PowerPoint PPT Presentation

  • Uploaded on

.: DRAFT :. Collaborative Content Delivery. A peer-to-peer solution for web-based publish/subscribe. Werner Vogels Robbert van Renesse, Ken Birman Dept. of Computer Science, Cornell University. Presentation duality …. The case for Collaborative Content Delivery vs

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Collaborative Content Delivery

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
collaborative content delivery

.: DRAFT :.

Collaborative Content Delivery

A peer-to-peer solution for web-based publish/subscribe

Werner VogelsRobbert van Renesse, Ken BirmanDept. of Computer Science, Cornell University

presentation duality
Presentation duality …
  • The case for Collaborative Content Delivery


  • The innovative technology used to build the system
    • Spectacularly scalable technology
    • Secure, reliable, robust & fast
    • A solution to many distributed management problems
late night reading
Late night reading

Epidemic Theory of Infectious Diseases and its ApplicationsN.T.J. BaileyHafner PressSecond Edition, 1975

the problem
The Problem
  • Access to real-time information at syndicated news sites is highly inefficient
  • An estimated 70%-80% of the bandwidth is wasted on redundant transport both at the consumer and at the publisher
  • Consumers frequently return to the website to receive timely updates
isn t this solved already
Isn’t this solved already?
  • RSS – channels provide summaries for processing by bots.
    • But the mechanism remains “pull”
  • HTTP – Delta should reduce bw cost
  • News feeds from major vendors
    • “push” is the right model for frequently changing data with timely delivery
    • Proprietary formats and high fees
    • Email summary as cheap alternative
    • Still high bandwidth cost at the publisher
  • Hybrid “push/pull” by organizations exploiting distributed content delivery
scale is a major obstacle
Scale is a major obstacle
  • No coordinated action by syndication sites to provide shared information push infrastructure
  • The one-to-many technologies used currently are inherently not scalable
  • No technology is available that can deliver data from thousands publishers to millions of subscribers in real-time.
we can do better
We can do better
  • Current push solutions fail to exploit the collaborative power of the Internet
  • Ideally the publishers inject one update into the world and all interested subscribers will receive this.
  • In this model all consumers are collaborating to route the information to right subscribers
  • The information arrives at all desktops within tens of seconds after publishing
peer to peer solution
Peer-to-Peer Solution
  • P2P is the only approach to a cost effective, scalable solution
  • Subscribers weave an ad-hoc infrastructure for subscription based routing
  • Scalable, autonomous & decentralized management
  • High level of robustness and reliability in message delivery
  • Authentication of publishers
emerging technologies
Emerging technologies
  • Astrolabe, CAN, Cord, Pastry, are emerging research technologies.
  • Astrolabe the furthest in
    • Scalability
    • Security integration
    • Manageable
    • Firewall, proxy and NAT support
  • Complete technology that we are now using to develop applications
astrolabe mariner
  • A system for ultra-scalable, distributed state management
    • Robust, through the use of epidemic techniques
    • Scalable, through the use of information aggregation and fusion
    • Secure, through certificates
    • Flexible, through secure mobile code
  • Simulated, Emulated, Tested and Deployed.


Robust and Scalable Technology for Distributed System Monitoring, Management and Data Mining

distributed systems management
Distributed Systems Management
  • Is extremely important in the deployment of large systems
  • Scalable management of applications and systems is still a major Quest
  • Management technology needs to be integrated into applications
  • The management subsystem is often more complex than the application itself
  • Information/state management system
  • Monitors the dynamically changing state of sets of distributed resources
  • Reports summaries to its consumers
  • Uses information hierarchies to organize the data
  • Uses aggregation techniques to continuously compute the summary nodes in the system
current use of mariner
Current use of Mariner
  • Monitor and control applications, systems and infrastructure
  • Resource discovery
  • Collaboration management
  • Coordination of distributed tasks
  • Edge-caching control
  • CDN dynamic management
  • You can see mariner as a large database with information about the global system
  • None of this information resides on a single server
  • Each principal has a row in the virtual database in which it is allowed to update with <attribute, value> pairs.
  • A principal can only directly access the rows of other nodes in its zone and its intermediate nodes in the hierarchy to the root.
mariner in a single zone
Mariner in a single zone
  • Lowest level in the hierarchies can be nodes or finer grained if the application requires it
  • Security key for zone needed to add a new column; user key needed to update row
scalability through hierarchy
Scalability through Hierarchy
  • Leafs are organized into zones
  • Each leaf has a self-managed attribute list
  • The base zone is the collection of individual attribute lists of its leafs
  • Each intermediate zone is the collection of attribute list constructed out of aggregation of the information in its child zones
  • Each list has some basic attributes, that Mariner uses to manage itself such contact lists, timestamps, etc.
simple hierarchy
Simple Hierarchy

New Jersey

San Francisco

information aggregation
Information Aggregation
  • Aggregation functions are programmable
  • Subset of SQL
  • Code is embedded in aggregation function certificates (AFC)
  • Signed certificate is installed into an attribute list
  • Used to construct (new) attributes in zones of the hierarchy
epidemic dissemination
Epidemic Dissemination
  • Each Astrolabe instance maintains all the zones on its path to the root
  • No centralized servers for intermediate zones
  • Consequently each instance has a copy of the root zone
  • Replication is achieved through gossip techniques.
  • Guarantees eventual consistency
afc propagation
AFC propagation
  • Output of the AFC includes a copy of it self – results in a copy of the AFC into the parent zone
    • Reaches the root and other zone leafs
  • Adoption – check the ancestors lists to find new AFC’s
  • Spreads through the system in the order of tens of seconds.
  • Certificates have an expiration date, unless refreshed aggregation eventually halts
i ll skip
I’ll skip
  • Aggregation function details
  • Mobile code details
  • Eventual consitency
  • Certificates
  • Authentication
  • Firewalls, & nat’s
robustness through gossip
Robustness through Gossip
  • Use of Epidemic Techniques to disseminate data and AFC’s
  • Pure peer-to-peer communication
  • Full autonomous progress
  • Actions based on probability theory
  • Robustness improves with scale
  • Fixed low overhead, independent of scale
  • Control as well as Data transport
  • Conceptually: each zone periodically picks another zone at random and exchanges the state of those zones
  • Slightly more complex because there are virtual zones …
gossip target selection
Gossip target selection
  • Each instance update the issued attribute, evaluates depending AFC’s
  • An agent (instance) will gossip on behalf of those zones for which it is a contact, with a rate depending on configuration
  • At each level pick at random a child from the contact list and exchange state
  • Failure detection
    • If no update seen for an agent in time Tfail, remove it from the system
  • Integration
    • After partitions, crashes, etc. renegate trees can be formed
    • Use of broadcast, multicast, hints, to discover other agents
subscription routing
Subscription routing
  • At the leafs the subscribers store subscription information
  • Aggregation functions combine the subscriptions of participants into subscriptions for the zone
  • Publishers use

zone.send(subscription, data)

which is forwarded if the zone has children that match the subscription

routing infrastructure
Routing infrastructure
  • Each zone dynamically selects 2-3 routing nodes using AFC’s using various load factors
  • These nodes receive news items for their children in their zone
  • Forwarding based on the individual subscription information
  • Redundancy used to achieve robustness and reliability