Centrifuge: Integrated Lease Management and Partitioning for Cloud Services

Centrifuge: Integrated Lease Management and Partitioning for Cloud Services 7th USENIX conference on Networked systems design and implementation, 2010 (NSDI’10) Seminar Presentation for CSE 708 by RuchikaMehresh Department of Computer Science and Engineering 22nd February, 2011

Outline Structure Problem statement Manager service Requirements Lookup library Architecture Owner library Other issues API Performance evaluation Conclusion

Problem statement Managing and operating on a large number of in-memory state while maintaining the responsiveness of a cloud service. Microsoft’s Live Mesh service defined the requirements and constraints for solving this problem.

Microsoft Live’s Mesh service • Large scale commercial cloud service • In operation since April 2008 • As of March 2009, Centrifuge was actively used by 5 Live Mesh component services (List)

Terminology • Lease • Technique to ensure that only one server at a time is responsible for a given piece of state. • Partitioning • Process of assigning each piece of state to an in-memory server; requests are then sent to the appropriate servers.

Sample ApplicationCloud based rendezous service Request (D2) : What is D1’s IP address? Request (D1) : My IP address is x Response (D2) : x Front-end web server Front-end web server Front-end web server Response (D2) : x Request (D1) : My IP address is x Request (D2) : What is D1’s IP address? Back-end application server Back-end application server Back-end application server

Requirements • Large number of small objects • Every object can be handled by one server • Memory is expensive – no replication • Effective partitioning • Freedom from fine- grained leasing • Load balancing • Easy addition, removal or crash handling of servers.

Architecture • Servers that hold object leases • Holds objects in-memory • Process requests The front-end servers take care of incoming requests and forward them to the back-end servers Manager-directed leasing and partitioning • Logically centralized service implemented using a replicated state machine

Manager Service • Executes manager-directed leasing • Manager controls how the key space is partitioned and leases are assigned. • Key space mapped to variable-length range using consistent hashing • One range lease per virtual node • 64 virtual nodes per owner library (Question) • Every leased range has a lease generation number

Manager Service • Live deployment • 3 standby servers • 5 Paxos-running servers • Can tolerate 2 machine failures Standby servers Only the current leader interacts with Lookups and Owners Only the current leader executes the logic for partitioning and lease management • Runs Paxos • Serves as state store • Executes leader election protocol

Manager Service – Leader Election • Changes to leader applied to Paxos group first • Standby servers check for candidacy periodically • New leader reads in the state from Paxos group • Lookups/Owners contact leader directly • What happens when leader does not respond? • Server addition and removal • Leader calculates the desired (re)assignments, recalls leases, then grant the new ones • 60 second leases

Lookup library • Maintains complete copy of the lease table • 200KB per lease table (100X64X32 B) • Advantages • Local objects to owner mapping • When remote confirmation at owner misses (due to change in lease number), a loss notification is generated • Incremental changes copied from manager every 30 seconds (Question)

Lookup-Manager Protocol Entries in the change log are truncated every 5 minutes Manager sends entire lease table when lookup’s LSN is too old Or Size(change log) > Size(lease table)

Owner Library • Owners send lease request/renews messages to Manager every 15 seconds • 3 consecutive lost/delayed requests lose the lease • Manager responds with the (complete) renewed/granted lease information • Why are lease grants different from renewals?

Other issues • Dealing with clocks • Clock rate synchronization • Dealing with Message Races • Solution: Add two sequence numbers (Senders’s and owner’s seq no.) • Drop the racing message • Random backoff, send again

API • Semantics of Lookup() are that it returns hints • If hint fails, caller retries after a short backoff • New lease generation number (without flag) at Owner represents node crash • Caller resends the earlier subscribe message • All lookup libraries signal a LossNotificationUpcall() on appropriate ranges • OwnershipChangeUpcall(): • To initialize data structures when some new range of the key space has been granted, or to garbage collect the associated state

Performance Evaluation • Observation of Centrifuge in production for 2.5 months (Dec 2008 to March 2009) • 130 Owners; 1,000 Lookups; 8 Manager servers • Questions • Is Centrifuge manager a bottleneck in steady-state? • How well does Centriuge handle high-churn events? • How stable are production servers?

Performance Evaluation • Low CPU and network utilization in steady state for all server • Slightly higher for the current leader • Network bursts on 12/16/2008 and 1/15/2009 when standbys take over • Security patches rolled out on these two days • 1.5 month of steady activity • 10 leases lost over this period • Conclusions • Unplanned owner failures are quite rare • Owner recovery is reasonably rapid (7 Owners recovered in <10 min) • Message races are very rare in steady state (12 drops in 1.5 months) v v v v v v

Performance Evaluation • 2.5 hour window during on second security rollout period • 9:20 pm: Servers 1 and 3 restarted, server 2 took over • 9:45 pm: Server 2 restarted, server 1 took over • Number of dip in owners at periodic intervals because patch is applied to group of servers at regular intervals. • Conclusion: • Burst in network traffic due to lookup servers copying entire lease tables • During high churn, the observed load at the leader (Manager service) is small • During the (quiet) period Jan 8 to Jan 13 , API success rate is 100% over 53 million invocations v v v v v v v v

Performance Evaluation • Testbed • Around 2k front-end server instances, maintaining 10:1 Lookups:Owners ratio • Restarted more rapidly than in production environment At Manager Leader

Outline Structure Problem statement Requirements Manager service Lookup library Architecture Owner library Other issues API Performance evaluation Conclusion

Conclusion • Simplifies building scalable application tiers with in-memory state • Combining leasing and partitioning to handle massive number of objects • Freedom from fine-grained leases. • Manager-directed leasing that scales well • Avoids lease fragmentation • Non-traditional API where, • Clients can not request leases

Questions-1 (Santoshb) • Any reasons for choosing 64 virtual nodes per owner library? Can this be increased in more complex systems? • Most of these measurements were experimentally derived. Though they haven’t directly mentioned about this choice, it is most likely the optimum choice considering the work load. Adding more number of virtual nodes can adversely affect the performance. Consider installing too many virtual machines on one server. Back

Questions-2 (Santoshb) • What are the bottlenecks to extend the current design across multiple data centers? • Centrifuge is designed to work within a datacenter (Refer to section 2.4). However, if it has to be extended to multiple datacenters, network reliability will be the first and foremost issue. I believe it is not an issue of bottlenecks, but feasibility. If there are too many message races, or latencies involved, Centrifuge can not work efficiently. Besides, maintaining strict clock rate synchronization across datacenters will also be a challenge. And this is just the tip of the iceberg.

Questions-3 (Fatih) • Each lookup library has a replicated copy of the lease table and these lookups can have stale data. How does the system handle this stale data? • In slides. Back

Questions-4 (Lavone_R) • What do you think the drawbacks and short comings of centrifuge are in regards to its lease mechanism policies, its partitioning implementation and its infrastructure that enables services to be ran on pools of in-memory servers? • Centrifuge works well within a datacenter, so we can not move copies of replicas close to the site and make one a primary copy (like in Pnuts). Also, need fine-grained leases for such applications.Synchronization requirement is strict Chubby is better for loosely-coupled distributed systems. For the problem specification, Centrifuge seems to be the best option. Chubby handles different kind of problem (file system, session).

Questions-5 (Yong Wang) • In section 2 :“each object can be fully handled by one machine holding an exclusive”, is that mean the reading or writing of each object is serializable? Why that simplify design? Is there any consideration about keeping consistency? • Yes, the requests are inherently serializable because only one server is responsible for applying changed to a particular object. (Refer to Section 4.2) This assumption simplifies design because if a server could not handle a popular object from within a lease range, there would be additional lease reassignments and this will make No issues of consistency here (Refer to Section 4.2)

Thank You !!

List of Live Mesh Services • File sharing and synchronization across devices • Notifications of activity on these shared files • A virtual desktop that is hosted in the datacenter • File manipulation through a web browser • Connectivity to remote devices Back

Centrifuge: Integrated Lease Management and Partitioning for Cloud Services