Cloud Computing for Machine Learning and Cognitive Applications

雲端計算Cloud Computing Text Book: Kai Hwang, Cloud Computing for Machine Learning and Cognitive Applications, The MIT Press, 2017 (開發圖書代理) Kai Hwang and Min Chen, Big-Data Analytics for Cloud, IoT and Cognitive Computing, Wiley, 2017 Google, Machine Learning Crash Course with TensorFlow APIs, https://developers.google.com/machine-learning/crash-course/

Info • Time：Mon. 14:10~16:00 (Room 207), 16:10~17:00 (Lab 501) • Evaluation： • Practicums 20% • Operation Tests (Twice) 40% • Midterm (Written Test) 20% • Final (Written Test) 20% • Website：http://ares.ee.nchu.edu.tw/Course.files/cc1071/index.html

Outline • Principles of Cloud Computing Systems • Virtual Machines, Docker Containers, and Server Clusters • Cloud Architectures and Service Platform Design • MapReduce Software Frameworks and CUDA GPU Architectures • Supervised Machine Learning • Unsupervised Machine Learning • Deep Learning • Reinforcement Learning • Cloud Building with OpenStack and ML Programming with TensorFlow

Lecture 1Principles of Cloud Computing Systems

Elastic Cloud Systems for Scalable Computing • Traditional computer systems have emphasized high-performance computing (HPC) applications: Computations • In terms of raw speed in batch processing • The new demand for network-based computing requires high-throughput computing (HTC) systems: Services • Built with parallel and distributed computing technologies • Triggered the upgrading of many data centers into Internet clouds • Can serve millions of users simultaneously

Elastic Cloud Systems for Scalable Computing (cont.) • The driving forces behind cloud computing • The ubiquity of broadband and wireless networking, falling storage costs, and progressive improvements in Internet computing software • Users can demand more capacity at peak hours, reduce costs, experiment with new services, and remove unneeded capacity • Service providers can increase the systemutilization via multiplexing, virtualization, and dynamic resource provisioning • The concept of cloud computing • Evolved from cluster, grid, and utility computing

Elastic Cloud Systems for Scalable Computing (cont.) • A computing cluster consists of interconnected stand-alone computers • Work cooperatively as a single integrated resource • Typically built around a low-latency, high-bandwidth interconnection network

Elastic Cloud Systems for Scalable Computing (cont.) • A computing grid offers an integrated resource • Couples computers, software/middleware, special instruments, and people and sensors together • Constructed across LAN, WAN, or Internet backbone networks often at a regional, national, or global scale

Elastic Cloud Systems for Scalable Computing (cont.) • Utility computing focuses on a business model • Customers receive computing resources from a paid service provider like grid/cloud platforms • Cluster and grid computing leverage the use of many computers in parallel • Utility computing provides the computing resources • Cloud computing leverages dynamic resources to deliver a large number of services to end users • Frees up users to focus on user applications development by outsourcing the job execution to cloud providers

Enabling Technologies • Clouds are enabled by the progress in developing new hardware, software, and networking technologies • These technologies play instrumental roles in making cloud computing a reality • Most of these technologies are mature enough today to meet the increasing demand • The rapid progress in multicore CPUs, memory chips, and disk arrays • Possible to build faster data centers with huge storage spaces • Resource virtualization • Enables rapid cloud deployment with HTC and disaster recovery capabilities

Enabling Technologies(cont.) • The progress of providing SoA, Web 2.0 standards, and Internetperformance • All contributed to the emergence of cloud services • Clouds are designed to serve numerous tenants over massive volumes of data • The availability of large-scale, distributed storage systems lays the underlying foundation of today’s data centers • Cloud computing is greatly benefitted by • The progress made in license management and automatic billing techniques

Convergence of Technologies • Cloud computing is enabled by the convergence of four technologies • Hardware virtualization and multicore chips • Possible to have dynamic configurations in clouds • Cloud computing explores multicore and parallel computing technologies • Utility and grid computing technologies • Lay the necessary foundation for cloud computing • Recent advances in service oriented architecture (SoA), Web 2.0, and mashups of platforms • Push the cloud another step forward • Autonomic computing and automated data center operations have enabled cloud computing

Convergence of Technologies(cont.)

Convergence of Technologies(cont.) • To realize the data-intensive systems • One needs to integrate hardware, Internet, and data centers • Today’s Internet technology emphasizes SoA and Web 2.0 services • The widespread use of data centers with virtualization techniques can be applied to automate the resources provisioning process in clouds • Cloud presents some technological challenges in almost all aspects of computer science and engineering • e.g., Users may demand new network-efficient processors, scalable memory and storage schemes, distributed operating systems

Convergence of Technologies(cont.) • Middleware for machine virtualization, new programming models, effective resource management, and application program development • Necessary to facilitate mobile cloud computing in various IoT application domains

Evolution of ScalableDistributed/ParallelComputing • The general computing trend is toward increased leveraging on shared web resources over the Internet • Emphasis from the HPC to the HTC • Cloud computing, and web service platforms focus more on HTC than HPC applications • The HTC paradigm pays more attention to high-flux multicomputing • Internet searches and web services are requested by millions of users simultaneously • The new performance goal • Shifted from speed to measuring the high throughput or the number of tasks completed per unit of time

Evolution of ScalableDistributed/ParallelComputing (cont.) • Cost, energy efficiency, security, and reliability in clouds are also of vital importance • In this era of big data, a data deluge problem are faced • Data comes from IoT sensors, lab experiments, simulations, society archives, and the web in all scales and formats • Preservation, movement, and access of massive data sets require generic tools • Supporting high-performance, scalable file systems, databases, algorithms, workflow, and visualization • A new data-centric paradigm of scientific discovery is based on data-intensive technologies

Evolution of ScalableDistributed/ParallelComputing (cont.) • New tools for data capture, data creation, and data analysis are needed • The cloud technologies are driven by the surge of interest in the data deluge situation • The Internet and World Wide Web are used by billions of people every day • Large data centers or clouds must be designed to provide not only big storage • But also distributed computing power to satisfy the requests from a large number of users simultaneously • The emergence of public or hybrid clouds requires upgrading many data centers • Using larger server clusters, distributed file systems, and high-bandwidth networks

Evolution of ScalableDistributed/ParallelComputing (cont.) • With massive smartphone and tablet usage requesting services • The cloud engines, distributed storage, and mobile networks must interact closely with the Internet • To deliver mashup services in web-scale mobile computing over the social and media networks • Advances in virtualization make it possible to use Internet clouds • To process a huge number of service requests • The differences among clusters, grid systems, and clouds may become blurred • Some view the clouds as computing clusters with modest changes in virtualization

Evolution of ScalableDistributed/ParallelComputing (cont.) • Other users consider cloud platforms a form of utility computing or service computing • Anticipate the effective processing of huge data sets generated by web services, social networks, and IoT • Cloud is a transformative approach • Promises much more than the data center model • Basically changes how to interact with information • The cloud provides services on demand • e.g., Infrastructure, platform, or software • At the platform level, MapReduce offers a new programming model • Transparently handles data parallelism with natural fault tolerance capability

Data Center Growth and Cost Breakdown • A large data center may be built with thousands of servers • Smaller ones are built with only hundreds of servers • The cost to build and maintain data center servers has increased over the years • The cost of utilities exceeds the cost of hardware after just three years • The cost to run a data center is dominated by about 60% in management and maintenance costs • Typically only 30 percent of data center costs goes toward purchasing IT equipment • The server purchase cost did not increase much with time

Data Center Growth and Cost Breakdown (cont.) • Growth and cost breakdown of data centers over the years

Low-Cost Design Philosophy • The basic architecture and design considerations of data centers • Built with commodity hardware devices • Almost all cloud platforms choose x86 processors • Low-cost terabyte disks and gigabit Ethernet are used • The software layer handles the network traffic balancing, fault tolerance, and expandability • High-end switches or routers may be too cost-prohibitive in building data centers • Using high-bandwidth networks may not fit the economics of cloud computing • Focuses more on the performance/price ratio, storage and energy efficiency than sheer speed performance

Virtualized Resources in Cloud Systems • Working with large data sets will typically mean sending the computations to the data • Rather than copying the data to the workstations • This reflects the trend in IT • Moves computing and data from desktops to large data centers • On-demand provision of software, hardware, and data as a service are available • Data explosion promoted the idea of cloud computing • A cloud is a pool of virtualized computer resources

Virtualized Resources in Cloud Systems (cont.) • A cloud can host a variety of different workloads • Including batch-style backend jobs and interactive, user-facing applications • A cloud allows workloads to be deployed and scaled out quickly • Through the rapid provisioning of virtual machines (VMs) or physical machines (PMs) • Supports redundant, self-recovering, highly scalable, programming models • Allow workloads to recover from many unavoidable hardware or software failures • Should be able to monitor resource use promptly • To enable rebalancing of allocations when needed

Internet Clouds • Cloud computing applies a virtualized platform with elastic resources on demand • By dynamically provisioning hardware, software, and data sets • The idea is to use server clusters and huge databases at data centers • Cloud computing leverages its low cost and simplicity to benefit users and providers • Machine virtualization has enabled such cost-effectiveness • Cloud computing is intended to satisfy many user applications simultaneously

Internet Clouds (cont.) • The cloud ecosystem must be designed to be secure, trustworthy, and dependable • Some computer users think of a cloud as a centralized resource pool • Others consider a cloud as a server cluster to practices distributed computing over used servers • Cloud computing is a large-scale distributed computing paradigm • Driven by economics of scale • A pool of virtualized, dynamically-scalable, managed computing power, storage, platforms, and services • Delivered on demand to external customers over the Internet

Internet Clouds (cont.) • Common characteristics of Internet clouds • The cloud platform offers a scalable computing paradigm built around the data centers • Cloud resources are dynamically provisioned by data centers upon user demand • The cloud system provides compute, storage, and flexible platforms for upgraded web services • Cloud computing relies heavily on the virtualization of all kinds of resources • Cloud computing defines a new paradigm • For collective computing, data consumption, and delivery of information services over the Internet • Clouds stress the cost of ownership reduction in mega data centers

Internet Clouds (cont.) • Traditional systems have encountered several performance bottlenecks • System maintenance, poor utilization, and increasing costs of hardware/software upgrades • Cloud computing resolves or provides relief from these difficulties as an on-demand computing paradigm

Cloud Computing versus On-Premise Computing • Conventional computing systems involve • Buying the hardware equipment • Acquiring the necessary system software • Installing the system & testing the configuration • Executing the application codes and management of resources, etc • In the case of clouds • All hardware and software resources are leased from the provider • Without much capital investment on the part of users • Only the execution phase requires service charge • One can easily save 80-95% of the cost to execute small jobs by using the cloud

Cloud Computing versus On-Premise Computing (cont.) • Very appealing to small businesses • Eliminates the need to invest in permanent and expensive computers or servers • Traditional computing applications are primarily executed on local hosts • Being on the premise • e.g., Desktops, desk-side workstations, notebooks, tablets, etc • Differs from cloud computing primarily in resource control and infrastructure management • All resources must be acquired by the users except networking shared between users and the provider • Creates a heavy burden and operating expense on the part of the users

Cloud Computing versus On-Premise Computing (cont.) • In cloud computing • Users entrust their program execution to a remote cloud through the Internet • Thereby eliminating such an expense • Cloud computing differs from network computing or outsourced computing • Users leave all infrastructure management and program execution to the cloud platform • The cloud acts as a compute/storage rental company • Users lease the computing power from the cloud providers • A cloud platform provides many VMs that execute dedicated services to users

Cloud Computing versus On-Premise Computing (cont.) • Both separately and simultaneously • Can benefit individuals, families, communities, and organizations, concurrently • Three basic cloud service models • Infrastructure as a service (IaaS) or infrastructure cloud • Platform as a service (PaaS) or platform cloud • Software as a service (SaaS) or application cloud • In IaaS clouds like AWS EC2 • The user only needs to worry about application software deployment • The VMs are jointly deployed by the user and provider

Cloud Computing versus On-Premise Computing (cont.) • The vendors are responsible for providing the remaining hardware and networks • In PaaS clouds like Google App Engine • Both application codes and VMs are jointly deployed by the user and vendor • The remaining resources are provided by vendors • In the SaaS model like the Salesforce cloud • Everything is provided by the vendor including the application software • Compare three basic cloud service models with the on-premise computing paradigm • In five types of hardware and software resources

Cloud Computing versus On-Premise Computing (cont.) • Storage, servers, VMs, networking, and applications • Cloud computing reduces a user’s infrastructure management burden from two resources to none • As one moves from IaaS to PaaS and SaaS services • The advantages of separating the application from resources investment and management

Cloud Design Objectives • Six design objectives for cloud computing • Shifting computing from desktops to data centers • Uses computer processing, storage, and software delivery over the Internet • Service provisioning and cloud economics • Providers supply cloud services by signing service-level agreements (SLA) with consumers and end users • Services must be economically feasible with efficiency in computing, storage, power consumption, etc • Pricing models are based on a pay-as-you-go policy • Scalability in performance • The cloud platforms and software and infrastructure services must be able to increase in performance as the number of users increase

Cloud Design Objectives (cont.) • Data privacy protection • The concern regarding users’ data and record privacy must be addressed • To make clouds a successful and trusted service • High-quality of cloud services • The QoS of cloud computing must be standardized to make clouds interoperable among multiple providers • New standards and interfaces • Solving the data lock-in problem associated with data centers or cloud providers • Universally accepted application programming interfaces (APIs) and access protocols are needed • To provide high portability and flexibility of virtualized applications

Cloud Development Trends • Many executable application codes are much smaller than the web-scale data sets to process • Cloud computing avoids large data movement during execution • Result in less traffic on the Internet and better network utilization • The core of a cloud is the server cluster or VM cluster • The cluster nodes are used as compute nodes • A few control nodes are used to manage and monitor of the cloud activities

Cloud Development Trends (cont.) • The scheduling of user jobs on a cloud • Requires assigning the work to virtual clusters created for users • The gateway nodes • Provide the access points of the service from the outside world • Also can be used for security control of the entire cloud platform • In physical clusters, users expect a static demand of resources • Clouds are designed to face fluctuating workloads • Thus dynamically demand varying amounts of resources

Cloud Development Trends (cont.) • Private clouds will satisfy this demand more efficiently • The scaling is a fundamental requirement for data centers • The server clusters are built with thousands to even a million servers (nodes) • e.g., Microsoft has a data center in the Chicago area that has 100,0008-core servers that are housed in 50 containers • A datacenter uses local disks attached to server nodes plus memory cache • Networks in data centers are primarily IP-based commodity networks like 10 Gbps Ethernet • Optimized for Internet access

Generic Cloud Architecture • The cloud platform is envisioned as a massive cluster of servers • These servers are provisioned on demand • To perform collective web services or distributed applications using data center resources • Formed dynamically by the provisioning, or de-provisioning, of servers, software, and databases • Servers in the cloud can be PMs or VMs • User interfaces are applied to request services • The provisioning tool establishes the cloud system to deliver the requested service • The cloud platform demands distributed storage and accompanying services

Generic Cloud Architecture (cont.) • The computing resources are built in data centers • Typically owned and operated by third-party providers • Consumers do not need to know the underlying technologies • In a cloud, software becomes a service • The cloud demands a high level oftrust of massive data retrieved from large data centers • A framework must be built to process large-scale data stored in the storage system • Demands a distributed file system over the database • Other resources are added into a cloud platform • Including the storage area networks, database systems, firewalls, and security devices

Generic Cloud Architecture (cont.)

Generic Cloud Architecture (cont.) • Web service providers offer special APIs • Enable developers to exploit Internet clouds • Monitoring and metering units • Used to track the usage and performance of provisioned resources • The bottom layer is the hardware and hosting machine infrastructure • The top layer includes the cloud applications for user services • The middle layer are called middleware • For virtualization and resources management purposes

Virtual Machines • Multiple VMs can be started and stopped on demand on a single PM • To meet accepted service requests • Providing maximum flexibility to configure various partitions of resources on the same PM • To different specific requirements of service requests • Multiple VMs can concurrently run applications based on different operating system environments on a single PM • Each VM is isolated from one another on the same PM • The software infrastructure of a platform • Must handle all resource management and do most of the maintenance, automatically

Virtual Machines (cont.) • Software must detect the status of each node and perform the tasks accordingly • e.g., Server joining and leaving • Cloud computing providers have built numerous data centers all over the world • e.g., Google and Microsoft • Each data center may have thousands of servers • Location is chosen to reduce power and cooling costs • The data centers are often built around a hydroelectricity power stop • Common characteristics of cloud operations • Defined by the US NIST (National Institute of Standard and Technology)

Virtual Machines (cont.) • Massive scale in terms of number of servers use • Often in tens of hundreds or up to a million • Homogeneity in component servers use • Often low-cost ×-86 servers; • Smart Clouds, Virtualization and Mashup Services • Virtualization of servers is heavily used • To provide multi-tenant user services • Lost-cost commodity software • Such as using Linux-based hosts • Resilient computing is required • With fault-tolerance fast disaster recovery • Geographical distribution of multiple datacenters • To reduce access latency

Virtual Machines (cont.) • The cloud operation is often service oriented • To provide infrastructure, platforms and applications • Better security and data protection are necessary • Enforced in service-level agreements • To support the above, today’s cloud systems must be featured with • Resources pooling and measurable services • Rapid elasticity in system reconfiguration • Broadband network access • All of the above are needed to support IaaS, PaaS and SaaS services models • Overall, the user demands on-demand self-service in any cloud construction

Virtual Machines (cont.)

Cloud Platforms • A public cloud is built over the Internet • Can be accessed by any user who has paid for the service • Owned by service providers • Accessed by subscription • Well-known public clouds • The Google App Engine (GAE), Amazon Web Service (AWS), Microsoft Azure, IBM Smart Cloud, Salesforce Sales Cloud, etc • These providers offer a publicly accessible remote interface for creating and managing VM instances within the system • Community Clouds are a growing subclass of public clouds

Cloud Computing for Machine Learning and Cognitive Applications

Cloud Computing for Machine Learning and Cognitive Applications

Presentation Transcript