Comparison of Cloud Providers Presented by Mi Wang
Motivation • Internet-based cloud computing has gained tremendous momentum. A growing number of companies provide public cloud computing services. • How to compare their profermance? • Help customer choose a provider that best fits its performance and cost needs. • Help cloud provider know the right direction for improvements
A Benchmark for Clouds • Introduction • The goal of benchmarking a software system is to evaluate its average performance under a particular workload • TPC benchmarks are widely used today in evaluating the performance of computer systems. • Transaction Processing Performance Council (TPC) is a non-profit organization to define transaction processing and database benchmarks . • However, they are not sufficient for analyzing novel cloud services
Requirements of a Cloud Benchmark • Features and Metrics • The main advantages of cloud computing are scalability, pay-per-use and fault-tolerance • A benchmark for the cloud should test the these features and provide appropriate metrics for them.
Requirements of a Cloud Benchmark • Architectures • Clouds may have different service architectures • A cloud benchmark should be general enough to cover the different architectural variants
Problems of TPC-W • The TPC-W benchmark specifies an online bookstore that consists of 14 web interactions allowing to browse, search, display, update and order the products of the store. • The main measured parameter is WIPS, the number of web interactions per second(WIPS) that the system can handle. • TPC-W measure the cost by the ratio of total cost to maximum WIPS
Problems of TPC-W • TPC-W is designed for transactional database systems. Cloud systems may not offer strong consistency constraints it requires. • WIPS is not for adaptable and scalable systems. Ideal clouds would compensate increasing load by adding new processing units. It is not possible to report the maximum WIPS.
Problems of TPC-W • Metric for cost is not applicable for clouds. Different price-plans and the lot-size prevent to calculate a single $/WIPS number. • TPC-W does not reflect technical evolution of web applications • TPC-W lacks adequate metrics for measuring the features of cloud systems like scalability, pay-per use and fault-tolerance
Ideas for a new benchmark • Features • Should analyze the ability of a dynamic system to adapt to a changing load in terms of scalability and costs • Should run in different locations • Should comprise web interactions that resemble the access patterns of Web 2.0 like applications. Multimedia content should also be included.
Ideas for a new benchmark • Configurations: A new benchmark can choose between three different levels of consistency • Low: All web interactions use only the BASE (Basically Available, Soft-State, Eventually Consistent) guarantees • Medium: The web interactions use a mix of consistency guarantees ranging from BASE to ACID. • High: All web interactions use only the ACID guarantees.
Ideas for a new benchmark • Metrics: Scalability • Ideally, clouds should scale linearly and infinitely with a constant cost per WI • Increasing the issued WIPS over time and continuously counting the WI in a given response time. Measure the deviation between issued WIPS and answered WI
Ideas for a new benchmark • Metrics: Scalability Correlation Coefficient R2 R2(0, 1), R2 = 1 indicates perfect linear scaling fi+4 fi+3 fi+2 fi+1 yi+4 yi+3 yi+2 OR Non-linear regression function f(x) = xb b(0, 1), b = 1 indicates perfect linear scaling f(x) = xb yi+1 fi yi
Ideas for a new benchmark • Metrics: Cost • Measure the cost in dollars per WIPS • Price plans might cause variations of the $/WIPS • Measure average and standard deviation of cost per WIPS Plans are fully utilized
Ideas for a new benchmark • Metrics: Fault tolerance • Failure is defined as a certain percentage of the resources used for the application is shut down • Clouds would be able to replace these resources automatically • Measure the ratio between WIPS in RT and Issued WIPS Failure
Goals of CloudCmp • Provide performance and cost information about various cloud providers • Help a provider identify its under-performing services compared to its competitors. • Provide a fair comparison • Characterizing all providers using the same set of workloads and metrics • Skip specialized services that only a few providers offer
Goals of CloudCmp • Reduce measurement overhead and monetary costs • periodically measure each provider at different times of day across all its locations • Comply with cloud providers’ use policies • Cover a representative set of cloud providers
Method: Select Provider • Amazon AWS • Microsoft Azure • Google AppEngine • Rackspace CloudServers
Method: Identify core functionality • Elastic compute cluster • The cluster includes a variable number of virtual instances that run application code. • Persistent storage • The storage service keeps the state and data of an application and can be accessed by application instances through API calls. • Intra-cloud network • The intra-cloud network connects application instances with each other and with shared services. • Wide-area network. • The content of an application is delivered to end users through the wide-area network from multiple data centers (DCs) at different geographical locations.
Method: Identify core functionality • Services offered by the providers
Method: Choose Performance Metrics • Elastic compute cluster • Provides virtual instances that host and run a customer’s application code. • Is charged per usage: • IaaS: Time of an instance remains allocated • PaaS: CPU cycles consumes • Elastic: can dynamically scale up and down the number of instances
Method: Choose Performance Metrics • Metrics to compare Elastic compute cluster • Benchmark finishing time • how long the instance takes to complete the benchmark tasks • Scaling latency • time taken by a provider to allocate a new instance after a customer requests it • Cost per benchmark • cost to complete each benchmark task
Method: Choose Performance Metrics • Persistent Storage • Three common types: table, blob, and queue • Tablestorage is designed to store structural data like conventional databases • Blob storage is designed to store unstructured blobs such as binary objects • Queuestorage implements a global message queue to pass messages between different instances • Two pricing models: • Based on CPU cycles consumed by an operation • Fixed cost per operation
Method: Choose Performance Metrics • Metrics to compare Persistent Storage • Operation response time • Time for a storage operation to finish • Time to consistency • time between when data is written to the storage service and when all reads for the data return consistent and valid results • Cost per operation
Method: Choose Performance Metrics • Intra-cloud Network • Connects a customer’s instances among themselves and with the shared services offered by a cloud • None of the providers charge for traffic within their data centers. Inter-datacenter traffic is charged based on the amount of data • Metrics to compare Intra-cloud Network • Path capacity: TCP throughput • Latency
Method: Choose Performance Metrics • Wide-area Network • The collection of network paths between a cloud’s data centers and external hosts on the Internet • Metrics to compare Wide-area Network • optimal wide-area network latency • The minimum latency between testers’ nodes and any data center owned by a provider
Implementation: Computation Metrics • Benchmark tasks • A modified set of Java-based benchmark tasks from SPECjvm2008 that satisfies constraints of all the providers • Metrics: Benchmark finishing time • Run the benchmark tasks on each of the virtual instance types provided by the clouds, and measure their finishing time • Run instances of the same benchmark task in multiple threads to test multi-threading performance
Implementation: Computation Metrics • Metrics: Cost per benchmark • Multiply the published per hour price by finishing time for those who charge by time • Using billing API for those who charge by CPU cycle. • Metrics: Scaling latency • Repeatedly request new instances and record the requesttime and available time • Divide the latency into two segments to locate the performance bottleneck • Provisioning latency: Request time to powered-on time • Booting latency: Powered-on time to available time
Implementation: Storage Metrics • Benchmark tasks • Use Java-based client to test API to get, put or query data from the service • Non-Java-based clients are also tested • Mimic streaming workload to avoid the potential impact of memory or disk bottlenecks at the client’s side
Implementation: Storage Metrics • Metrics: Response time • The time from when the client instance begins the operation to when the last byte reaches the client • Metrics: Throughput • The maximum rate that a client instance obtains from the storage service
Implementation: Storage Metrics • Metrics: Time to Consistency • Write an object to a storage service, then repeatedly read the object and measure how long it takes before correct result is returned • Metrics: Cost per operation • Via billing API
Implementation: Network Metrics • Metrics: Intra-cloud Network Throughput and Latency • Allocate a pair of instances in the same or different data centers, run standard tools such as iperfand ping between the two instances • Metrics: Optimal Wide-area Network Latency • Run an instance in each data center owned by the provider and ping these instances from over 200 nodes on PlanetLab(a group of computers available as a testbed for computer networking and distributed systems research). Record the smallest RTT • For AppEngine: collect the IP addresses of the instance from each of the PlanetLab nodes. Ping all of these IP addresses from each of the PlanetLabnodes
Results • Anonymizethe identities of the providers in our results, and refer to them as C1 to C4 (But it is easy to see that: C1 – AWS, C2 – Rackspace, C3 – AppEngine, C4 – Microsoft Azure) • Test all instance types offered by C2 and C4, and the general-purpose instances from C1 • Refers to instance types as provider.i, i denotes the tier of service • Compare instances from both Linux and Windows for experiments depend on the type of OS. Test Linux instances for others
Results • Cloud instances tested
Results: Elastic Compute Cluster • Metrics: Benchmark finishing time
Results: Elastic Compute Cluster • Price-comparable instances offered by different providers have widely different CPU and memory performance. • The instance types appear to be constructed in different ways. • For C1, the high-end instances may have faster CPUs • For C4, all instances might share the same type of physical CPU • C2 may be lightly loaded during the test • The disk I/O intensive task exhibits high variation on some C1 and C4 instances, probably due to interference from other colocated instances
Results: Elastic Compute Cluster • Metrics: Performance at Cost
Results: Elastic Compute Cluster • For single-threaded tests, the smallest instances of most providers are the most cost-effective • For multi-threaded tests, the high-end instances are not more cost-effective than the low-end ones. • The prices of high-end instances are proportional to the number of CPU cores • Bounded by memory bus and I/O bandwidth • For parallel applications it might be more cost-effective to use more low-end instances
Results: Elastic Compute Cluster • Metrics: Scaling Latency
Results: Elastic Compute Cluster • Metrics: Scaling Latency • All cloud providers can allocate new instances quickly with the average scaling latency below 10 minutes • Windows instances appear to take longer time to create than Linux ones • For C1, Windows ones have larger booting latency, possibly due to slower CPUs • For C2, provisioning latency of the Windows instances is much larger. It is likely that C2 may have different infrastructures to provision Linux and Windows instances.
Results: Persistent Storage • Table Storage • Test the performance of three operations: get, put, and query • Each operation runs against two pre-defined data tables: a small one with 1K entries, and a large one with 100K entries • Repeat each operation several hundred times • C2 is not tested because it does not provide a table service
Results: Persistent Storage • Table Storage: Response Time • The three services perform similarly for both get and put operations • For the query operation, C1 appears to have a better indexing strategy • None of the services show noticeable performance degradation in multiple concurrent operations
Results: Persistent Storage • Table Storage: Time to Consistency • 40% of the get operations in C1 see inconsistency when triggered right after a put, Other providers exhibit no such inconsistency • C1 does provide an API option to request strong consistency but disables it by default.
Results: Persistent Storage • Table Storage: Cost per operation • Both C1 and C3 charge lower cost for get/put than query • C4 charges the same across operations and can improve its charging model by accounting for the complexity of the operation.
Results: Persistent Storage • Blob Storage: Response time
Results: Persistent Storage • Blob Storage: Response time • Blobs of different sizes may stress different bottlenecks. The latency for small blobs can be dominated by one-off costs whereas that for large blobs can be determined by service throughput, network bandwidth, or client-side contention. • C2’s store may be tuned for read heavy workload.
Results: Persistent Storage • Blob Storage: Response time of multiple concurrent operations • C1 and C4’s blob service throughput is well-tuned for multiple concurrent operations
Results: Persistent Storage • Blob Storage: Maximum Throughput • C1 and C2’s blob service throughput is close to their intra-datacenter network bandwidth • C4’s blob service throughput of a large instancealso corresponds to TCP throughput inside its datacenter, and may not be constrained by instance itself.
Results: Persistent Storage • Blob Storage: Cost per operation • The charging models are similar for all three providers and are based on the number of operations and the size of the blob. • No differences
Results: Persistent Storage • Queue Storage: Response time • Message size is 50 Byte
Results: Persistent Storage • Queue Storage • No significant performance degradation is found in sending up to 32 concurrent messages • Response time of the queue service is on the same order of magnitude as that of the table and blob services. • Both services charge similarly–1 cent per 10K operations