1 / 167

Advanced Data Management in Cloud Computing

Explore the fundamentals of cloud computing and cloud data management, including Google's GFS, Bigtable, and MapReduce, as well as Yahoo's Hadoop. This course also covers the challenges of cloud data management and the development of distributed systems and cloud computing platforms.

wrighta
Download Presentation

Advanced Data Management in Cloud Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 云计算与云数据管理 陆嘉恒 中国人民大学 www.jiahenglu.net 《先进数据管理》前沿讲习班

  2. 主要内容 • 云计算概述 • Google 云计算技术:GFS,Bigtable 和Mapreduce • Yahoo云计算技术和Hadoop • 云数据管理的挑战 2

  3. 人民大学新开的《分布式系统与云计算》课程 • 分布式系统概述 • 分布式云计算技术综述 • 分布式云计算平台 • 分布式云计算程序开发 3

  4. 第一篇分布式系统概述 • 第一章:分布式系统入门 • 第二章:客户-服务器端构架 • 第三章:分布式对象 • 第四章:公共对象请求代理结构 (CORBA) 4

  5. 第二篇 云计算综述 • 第五章:云计算入门 • 第六章:云服务 • 第七章:云相关技术比较 • 7.1网格计算和云计算 • 7.2 Utility计算(效用计算)和云计算 • 7.3并行和分布计算和云计算 • 7.4集群计算和云计算 5

  6. 第三篇 云计算平台 • 第八章:Google云平台的三大技术 • 第九章:Yahoo云平台的技术 • 第十章:Aneka 云平台的技术 • 第十一章:Greenplum云平台的技术 • 第十二章:Amazon dynamo云平台的技术 6

  7. 第四篇 云计算平台开发 • 第十三章:基于Hadoop系统开发 • 第十四章:基于HBase系统开发 • 第十五章:基于Google Apps系统开发 • 第十六章:基于MS Azure系统开发 • 第十七章:基于Amazon EC2系统开发 7

  8. Cloud computing

  9. Why we use cloud computing?

  10. Why we use cloud computing? Case 1: Write a file Save Computer down, file is lost Files are always stored in cloud, never lost

  11. Why we use cloud computing? Case 2: Use IE --- download, install, use Use QQ --- download, install, use Use C++ --- download, install, use …… Get the serve from the cloud

  12. What is cloud and cloud computing? Cloud Demand resources or services over Internet scale and reliability of a data center.

  13. What is cloud and cloud computing? Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a serve over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.

  14. Characteristics of cloud computing • Virtual. software, databases, Web servers, operating systems, storage and networking as virtual servers. • On demand. add and subtract processors, memory, network bandwidth, storage.

  15. Types of cloud service SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure as a Service

  16. SaaS Software delivery model No hardware or software to manage Service delivered through a browser Customers use the service on demand Instant Scalability

  17. SaaS Examples Your current CRM package is not managing the load or you simply don’t want to host it in-house. Use a SaaS provider such as Salesforce.com Your email is hosted on an exchange server in your office and it is very slow. Outsource this using Hosted Exchange.

  18. PaaS Platform delivery model Platforms are built upon Infrastructure, which is expensive Estimating demand is not a science! Platform management is not fun!

  19. PaaS Examples You need to host a large file (5Mb) on your website and make it available for 35,000 users for only two months duration. Use Cloud Front from Amazon. You want to start storage services on your network for a large number of files and you do not have the storage capacity…use Amazon S3.

  20. IaaS Computer infrastructure delivery model A platform virtualization environment Computing resources, such as storing and processing capacity. Virtualization taken a step further

  21. IaaS Examples You want to run a batch job but you don’t have the infrastructure necessary to run it in a timely manner. Use Amazon EC2. You want to host a website, but only for a few days. Use Flexiscale.

  22. Cloud computing and other computing techniques

  23. The 21st Century Vision Of Computing Leonard Kleinrock , one of the chief scientists of the original Advanced Research Projects Agency Network (ARPANET) project which seeded the Internet, said: “ As of now, computer networks are still in their infancy, but as they grow up and become sophisticated, we will probably see the spread of ‘computer utilities’ which, like present electric and telephone utilities, will service individual homes and offices across the country.”

  24. The 21st Century Vision Of Computing Sun Microsystems co-founder Bill Joy He also indicated “It would take time until these markets to mature to generate this kind of value. Predicting now which companies will capture the value is impossible. Many of them have not even been created yet.”

  25. The 21st Century Vision Of Computing

  26. Definitions Cluster Grid Cloud utility

  27. Definitions Cluster Grid Cloud Utility computing is the packaging of computing resources, such as computation and storage, as a metered service similar to a traditional public utility utility

  28. Definitions Cluster Grid Cloud utility A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer.

  29. Definitions Cluster Grid Cloud utility Grid computing is the application of several computers to a single problem at the same time — usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data

  30. Definitions Cluster Grid Cloud utility Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.

  31. Grid Computing & Cloud Computing • share a lot commonality intention, architecture and technology • Difference programming model, business model, compute model, applications, and Virtualization.

  32. Grid Computing & Cloud Computing • the problems are mostly the same • manage large facilities; • define methods by which consumers discover, request and use resources provided by the central facilities; • implement the often highly parallel computations that execute on those resources.

  33. Grid Computing & Cloud Computing • Virtualization • Grid • do not rely on virtualization as much as Clouds do, each individual organization maintain full control of their resources • Cloud • an indispensable ingredient for almost every Cloud

  34. Any question and any comments ? 2020/1/1 36

  35. 主要内容 • 云计算概述 • Google 云计算技术:GFS,Bigtable 和Mapreduce • Yahoo云计算技术和Hadoop • 云数据管理的挑战 37

  36. Google Cloud computing techniques

  37. The Google File System

  38. The Google File System (GFS) • A scalable distributed file system for large distributed data intensive applications • Multiple GFS clusters are currently deployed. • The largest ones have: • 1000+ storage nodes • 300+ TeraBytes of disk storage • heavily accessed by hundreds of clients on distinct machines

  39. Introduction • Shares many same goals as previous distributed file systems • performance, scalability, reliability, etc • GFS design has been driven by four key observation of Google application workloads and technological environment

  40. Intro: Observations 1 • 1. Component failures are the norm • constant monitoring, error detection, fault tolerance and automatic recovery are integral to the system • 2. Huge files (by traditional standards) • Multi GB files are common • I/O operations and blocks sizes must be revisited

  41. Intro: Observations 2 • 3. Most files are mutated by appending new data • This is the focus of performance optimization and atomicity guarantees • 4. Co-designing the applications and APIs benefits overall system by increasing flexibility

  42. The Design • Cluster consists of a single master and multiple chunkservers and is accessed by multiple clients

  43. The Master • Maintains all file system metadata. • names space, access control info, file to chunk mappings, chunk (including replicas) location, etc. • Periodically communicates with chunkservers in HeartBeat messages to give instructions and check state

  44. The Master • Helps make sophisticated chunk placement and replication decision, using global knowledge • For reading and writing, client contacts Master to get chunk locations, then deals directly with chunkservers • Master is not a bottleneck for reads/writes

  45. Chunkservers • Files are broken into chunks. Each chunk has a immutable globally unique 64-bit chunk-handle. • handle is assigned by the master at chunk creation • Chunk size is 64 MB • Each chunk is replicated on 3 (default) servers

  46. Clients • Linked to apps using the file system API. • Communicates with master and chunkservers for reading and writing • Master interactions only for metadata • Chunkserver interactions for data • Only caches metadata information • Data is too large to cache.

  47. Chunk Locations • Master does not keep a persistent record of locations of chunks and replicas. • Polls chunkservers at startup, and when new chunkservers join/leave for this. • Stays up to date by controlling placement of new chunks and through HeartBeat messages (when monitoring chunkservers)

  48. Operation Log • Record of all critical metadata changes • Stored on Master and replicated on other machines • Defines order of concurrent operations • Also used to recover the file system state

More Related