1 / 15

Introduction

Introduction. Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004. Welcome!. To the initial version of the Penn Systems Seminar First of an ongoing series, focusing on systems research topics of general interest

alexis-long
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004

  2. Welcome! • To the initial version of the Penn Systems Seminar • First of an ongoing series, focusing on systems research topics of general interest • Format: reading and discussion (no homework or exams) • Independent Study encouraged to supplement the seminar • Our focus: P2P and distributed ad hoc systems

  3. What Is the Vision of Peer-to-Peer Computing? Loose coupling, auto configuration: • No central administration • Scalability • Adaptability/resiliency • Nodes contribute as well as consume resources • System continues as peers join and leave

  4. How Does P2P Work? • P2P infrastructure forms an overlay network over the real Internet, which supports: • Schemes for distributing resources (data, computation) without a directory structure • Unstructured: query by flooding or over advertisements • Structured: query according to an algorithm that organizes the peers into a consistent structure (hash table, tree, …) • Graceful handling of loss or gain of nodes • Replication “where appropriate” • Provides reliability/availability • Improves performance (self-tuning) • More on this later, from Honghui

  5. The Promise of P2P • Major challenge for applications is generally scalability • Traditional systems definition: • Scalability of systems to numbers of requests, clients, etc. • But we need “human” scalability as well: • Avoid human administration, tuning, oversight, custom code • Self-administering; auto-tuning • Providing the “right” abstractions • Human contributors often create heterogeneity among components, data, participation levels, etc. • Aspects of P2P should help with all of these

  6. The Central Questions:Goals of this Seminar • “What is the killer app for a P2P substrate?” • Is there more to this P2P idea than pirating music and searching for little green men (and women)? • What applications can benefit from P2P-like techniques? • What are their key properties? • What programming models are most appropriate for building such applications? • How can P2P techniques be improved to better support the applications we want to build? • Security, trust, reliability, consistency, …

  7. Some P2P Applications • Early in the semester: examining apps built over P2P overlay networks • We’ll start with two projects here at Penn • We’d like to talk with you if you’re interested in working or collaborating on these projects! • BRIEF overviews of the issues – more detailed talks later in the semester • Later: P2P games • First: Orchestra – P2P meets data integration…

  8. Key Problem: Coordinating Efforts between Collaborators • Today, to collaboratively edit structured data, we centralize • For many applications, this isn’t a good model, e.g.: • Bioinformatics groups have multiple standard schemas and warehouses for genomic information – each group wants to incorporate the info of the others, but have it in their format, with their own unique information preserved, and the ability to override info from elsewhere • Different neuroscientists have may data from measuring electrical activity in the same part of the brain – they may want to share common information but maintain their specific local information; each scientist wants the ability to control when their updates are propagated Work-in-progress with Nitin Khandelwal; other contributors: Murat Cakir, Charuta Joshi, Ivan Terziev

  9. The Orchestra System: Infrastructure for Collaborative Data Sharing • Each participant is a logical peer, with some XML schema that is mapped to at least one other peer’s schema • Schemas’ contents are logically synchronized initially and then on demand Translated updates from 3: + XML tree A’ - XML tree B’ Part2 mappings between XML schemas Schema 2 mappings Translated updates from 3: + XML tree A’’ - XML tree B’’ Updates: + XML tree A - XML tree B Part3 Part1 Schema 1 Schema 3

  10. Some Challenges in Orchestra • Mappings • How to express them • Using them to translate updates, queries • Inconsistency • How to represent conflicts • How to resolve them • Update propagation • Consistency with intermittent connectivity • Scaling • To many updates • To many queries Logical & semantics- level Implementation- level (P2P-based)

  11. Mappings • Some peers may be replicas • Others need mappings, expressed as “views” • Views: functions from one schema to another • Can be inverted (may lose some information) • Can be “chained” when there is no direct connection • (Much research in generating these automatically [DDH00][MB01], …) • Prior work on propagating updates through relational views [BD82][K85][C+96]… • Ensuring the mapping specifies a deterministic, side-effect-free translation • Algorithmically applying the translation • Ongoing work with Nitin Khandelwal: • Extending the model to handle (unordered) XML • Challenge: dealing with XML’s nesting and its repercussions

  12. A Globally Consistent Model that Encodes Conflicts • Even in the presence of conflicts, want a “global state” (from perspective of some schema) when we synchronize • Allows us to determine what’s agreed-upon, what’s conflicting • Can define conflict resolution strategies • Goal: “union of all states” with a way of specifying conflicts • Define conditional XML tree based on a subset of c-tables [IM84] • Each peer pi has a boolean flag Pi representing “perspective i” root If P2 auth If P1 auth Lee Smith

  13. Propagating Updates with Intermittent Connectivity • How to synchronize among n peers (even assuming the same schema)? • Not all are connected simultaneously • Usual approaches: • Locking (doesn’t scale) • Epidemic algorithms (only eventually consistent) • Approach: • “Shadow instance” of the schema, replicated within the other peers of the network • Everyone syncs with the shadow instance • Benefits: state is deterministic after each sync

  14. Scaling, Using P2P Techniques • Update synchronization • Key problem: find values conflicting with “shadow instance” • Partition the “shadow instance” across the network • Query execution • Partition computation across multiple peers (PIER does this) • Query optimization • Optimization breaks the query into sub-problems, uses dynamic programming to build up estimates of the costs of applying operators • Can recast as recursion + memoization • Use P2P overlay to distribute each recursive step • Memoize results at every node • Why is this useful? Suppose 2 peers ask the same query!

  15. Current Status • Have a basic strategy for addressing many of the problems in collaborative data sharing • Initial sketches of the core algorithms • Need to develop them further • … And to implement (and validate) them in a real system!

More Related