1 / 10

大规模数据处理 Massive Data Processing

大规模数据处理 Massive Data Processing. http://net.pku.edu.cn/~course/cs402/2014 闫宏飞 北京大学信息科学技术学院 7 / 1 /201 4. Outline. MDP 是什么? MDP 课程安排和内容. Massive Data Processing. Data-intensive information processing the relevant datasets are too large to t in memory and must be held on disk.

ruth-perez
Download Presentation

大规模数据处理 Massive Data Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 大规模数据处理Massive Data Processing http://net.pku.edu.cn/~course/cs402/2014 闫宏飞 北京大学信息科学技术学院 7/1/2014

  2. Outline • MDP是什么? • MDP课程安排和内容

  3. Massive Data Processing • Data-intensive information processing • the relevant datasets are too large to t in memory and must be held on disk. • data-intensive processing is beyond the capability of any individual machine and requires clusters • Big data problems • Focus on MapReduce programming • An entry-level course~

  4. 大数据的特点 • 量大(Volume),是指它的复杂性 • 许多小的数据集结构复杂,尽管没有占用很多物理空间,也被认为是大数据. • 大数据库占用大的存储空间,因为结构简单,不认为是大数据.  • 样多(Variety)是指多种结构的特性 • 例如:混合结构,半结构和无结构数据的文本,声音和视频. • 速度(Velocity)是指它成生和分析的速率 • 在某些应用中需要实时或者近实时. • 真实性(Veracity),价值(Value)

  5. What is MapReduce? • Programming model for expressing distributed computations at a massive scale • Execution framework for organizing and performing such computations • Open-source implementation called Hadoop

  6. 课程的组织与安排 • 课堂时间 • 周二,周四(8:30开始)三教201 • 讲课老师:闫宏飞、彭博 • 助教:李睢、江翰 • 教学环节 • 课堂讲授,作业,上机指导,答疑 • 评分方法 • 以作业为中心,评分也以作业&报告为准 • 课程网站 • Webhttp://net.pku.edu.cn/~course/cs402/2014 • Group http://groups.google.com/group/cs402pku

  7. TextBooks • [Lin] Jimmy Lin and Chris Dyer, Data-Intensive Text Processing with MapReduce, 2013.1. • [Tom] Tom White, Hadoop: The Definitive Guide, O'Reilly, 3rd, 2012.5.

  8. This schedule is tentative and subject to change without notice

  9. 选课登记 • 个人选课登记,通过浏览器完成 • http://net.pku.edu.cn/~course/cs402/2014/regcourse.html

  10. Thank You! Q&A

More Related