1 / 26

Tapestry Architecture and status

Tapestry Architecture and status. UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao ravenben@eecs.berkeley.edu. Why Tapestry. Today’s Internet Route failures not uncommon BGP too slow to recover, redundant routes unexploited IPv4 constrains deployment of new protocols

iram
Download Presentation

Tapestry Architecture and status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TapestryArchitecture and status UCB ROC / Sahara RetreatJanuary 2002 Ben Y. Zhaoravenben@eecs.berkeley.edu

  2. Why Tapestry • Today’s Internet • Route failures not uncommon • BGP too slow to recover, redundant routes unexploited • IPv4 constrains deployment of new protocols • IP multicast, security protocols (DDoS traceback), … • Wide-area applications straining existing systems • Scalable management of large scale resources • Our goals • Wide-area scalable network overlay • Highly fault-tolerant routing / location • Introspective / self-tuning platform • Support application-specific protocols • Efficient (b/w, latency) data delivery • Pass on wide-area solutions to application layer ROC/Sahara Retreats, 1/2002

  3. What is Tapestry? • A prototype of a decentralized, fault-tolerant, adaptive overlay infrastructure(Zhao, Kubiatowicz, Joseph et al. 2000) • Network substrate of OceanStore • Routing: Suffix-based hypercubeSimilar to Plaxton, Rajamaran, Richa (SPAA97) • Decentralized location:Virtual hierarchy per object with cached location references • Dynamic algorithms using local information • Core API: • publishObject(ObjectID) • routeMsgToObject(ObjectID) • routeMsgToNode(NodeID) ROC/Sahara Retreats, 1/2002

  4. Routing and Location • Namespace (nodes and objects) • 160 bits length  280 names before name collision • Each object has its own hierarchy rooted at Root f (ObjectID) = RootID, via a dynamic mapping function • Suffix routing from A to B • At hth hop, arrive at nearest node hop(h) such that: hop(h) shares suffix with B of length h digits • Example: 5324 routes to 0629 via5324  2349  1429  7629  0629 • Object location: • Root responsible for storing object’s location • Publish / search both route incrementally to root ROC/Sahara Retreats, 1/2002

  5. 3 4 2 NodeID 0x43FE 1 4 3 2 1 3 4 4 3 2 3 4 2 3 1 2 1 2 3 1 Tapestry MeshIncremental suffix-based routing NodeID 0x79FE NodeID 0x23FE NodeID 0x993E NodeID 0x43FE NodeID 0x73FE NodeID 0x44FE NodeID 0xF990 NodeID 0x035E NodeID 0x04FE NodeID 0x13FE NodeID 0x555E NodeID 0xABFE NodeID 0x9990 NodeID 0x239E NodeID 0x73FF NodeID 0x1290 NodeID 0x423E ROC/Sahara Retreats, 1/2002

  6. Object LocationRandomization and Locality ROC/Sahara Retreats, 1/2002

  7. Fault-tolerant Routing • Strategy: • Detect failures via soft-state probe packets • Route around problematic hop via backup pointers • Handling: • 3 forward pointers per outgoing route (2 backups) • 2nd chance algorithm for intermittent failures • Upgrade backup pointers and replace • Protocols: • First Reachable Link Selection (FRLS) • Proactive Duplicate Packet Routing ROC/Sahara Retreats, 1/2002

  8. Talk Outline • Tapestry overview • Architecture • Evaluation • Brocade • Conclude ROC/Sahara Retreats, 1/2002

  9. Sandstorm (async I/O, event arch.) Java Virtual Machine Operating System Architecture Background OceanStore implementation • Java with asynchronous I/O • Event-based, stage driven architecture(Sandstorm – M. Welsh) Applications OceanStore Tapestry ROC/Sahara Retreats, 1/2002

  10. Router Key Stages • StaticTClient / Federation • Uses config files to bootstrap initial Tapestry • DynamicTClient • Integrates new nodes into static Tapestry • Router • Primary handler of routing and location • Patchwork • Introspective monitoring and fault-detection Applications Dynamic TClient OceanStore Patchwork Static TClient Sandstorm (async I/O, event arch.) ROC/Sahara Retreats, 1/2002

  11. Federation used as rendezvous point Pair-wise pings to generate route tables Federation used as global barrier to begin S 1. Si says hello to F 2. F informs group of Si 3. Nodes do pair-wise pings S F S 4. Nodes signal readiness 5. Barrier reached at F, signals start S Static TClient ROC/Sahara Retreats, 1/2002

  12. Routes Request Routes Response Moving Object Pointers Directed Multicast Dynamic TClient Node Integration • Hill-climb to find nearest Gateway • Route to surrogate / copy routes • Move relevant objects to new root • Directed multicast notifies nearby nodes F G S ? ROC/Sahara Retreats, 1/2002

  13. Routing / Location • Router class • Maintains: • RoutingTable: [ ][ ] of RouteEntries • ObjectPointers: Hash(Guid)PublishInfoHash(Guid)LastHop • Handles: • Object publication / unpublication / mobile objects • Route / location message handling ROC/Sahara Retreats, 1/2002

  14. A B B C A C Patchwork • Fault-handling / introspective stage • Granulated periodic beacons measure loss and network latency to entries in routing table • Promote/demote routes in single RouteEntry B X A C network Router ROC/Sahara Retreats, 1/2002

  15. Deployment Status • Object Location • Publish / unpublish / route to object • Mobile objects (backtracking unpublish) • Active deletes, confirmation of non-existence • General Routing • Route to node, redundant routes • Soft-state fault-detection, limited optimization • Advanced policies for fault recovery • Dynamic Integration • Integration w/ limited optimizations • Best effort fault-resilient integration mechanisms • Background threads for optimization / refresh ROC/Sahara Retreats, 1/2002

  16. Talk Outline • Tapestry overview • Architecture • Evaluation • Brocade • Conclude ROC/Sahara Retreats, 1/2002

  17. Generalized Results • Cached object pointers • Efficient lookup for nearby objects • Reasonable storage overhead • Multiple object roots • Improves availability under attack • Improves performance and perf. stability • Reliable packet delivery • Redundant pointers approximate optimal reachability • FRLS, a simple fault-tolerant UDP protocol ROC/Sahara Retreats, 1/2002

  18. First Reachable Link Selection • Use periodic UDP packets to gauge link condition • Packets routed to shortest “good” link • Assumes IP cannot correct routing table in time for packet delivery IP Tapestry A B C DE No path exists to dest. ROC/Sahara Retreats, 1/2002

  19. Some Numbers • Measurements • PIII 800, L2.2.18, IBM JDK 1.3 • Simulating 6 nodes (4 staticTC, 1 federation, 1 dynamicTC) • Publishing / locating ~10 objects • PublishMsg, RouteMsg: ~ 0-2 ms • Integration: ~2600ms (w/ pings) • Integration messages: • Assuming latency data available • 2 x n (routing and objects)16M (directed multicast notification) (M  3) ROC/Sahara Retreats, 1/2002

  20. Talk Outline • Tapestry overview • Architecture • Evaluation • Brocade • Conclude ROC/Sahara Retreats, 1/2002

  21. Landmark Routing on P2P • Brocade • Exploit non-uniformity • Minimize wide-area routing hops / bandwidth • Secondary overlay on top of Tapestry • Select super-nodes by admin. domain • Divide network into cover sets • Super-nodes form secondary Tapestry • Advertise cover set as local objects • Routing (AB) uses brocade to route directly into B’s local network ROC/Sahara Retreats, 1/2002

  22. Brocade Mechanisms • Selective utilization • Nodes cache local cover set • Only utilize brocade if dest. not in cache • Forwarding messages to supernodes • Super-node does IP-snooping • Direct: cover set caches supernode • Inter-domain routing: AB • ASN(A) via IP • SN(A) finds SN(B) via Tapestry location • SN(B)B via Tapestry/Chord/Pastry/CAN ROC/Sahara Retreats, 1/2002

  23. Brocade Routing RDP Local cover set cache on; interdomain:intradomain = 3:1 Packet simulator, Transit-stub 4096 T nodes, 16 SuperN ROC/Sahara Retreats, 1/2002

  24. Brocade Bandwidth Usage Local cover set cache onB/W unit: (sizeof (Msg) * Hops) ROC/Sahara Retreats, 1/2002

  25. Ongoing / Future Work • Fill in full functionality • Fault-handling policies, introspection, self-repair • More realistic experiments • Artificial topologies on SOSS simulator • Larger scale dynamic integration experiments • Code development • External deployment / Code release • Sprint programmable routers • Academic networks • Introspective measurement platform • Implementing applications (Bayeux, Brocade … ) ROC/Sahara Retreats, 1/2002

  26. For More Information Tapestry and related projects (and these slides): http://www.cs.berkeley.edu/~ravenben/tapestry OceanStore: http://oceanstore.cs.berkeley.edu Related papers: http://oceanstore.cs.berkeley.edu/publications http://www.cs.berkeley.edu/~ravenben/publications ravenben@eecs.berkeley.edu ROC/Sahara Retreats, 1/2002

More Related