1 / 10

Course Mining Massive Datasets: Course Overview

This course provides an overview of mining massive datasets, covering topics such as data-intensive scalable computing, cloud computing, data mining, and machine learning. The course requires knowledge of data structures, algorithms, linear algebra, and programming languages like Java and C++.

warnold
Download Presentation

Course Mining Massive Datasets: Course Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 0: Course Overview Mining Massive Datasets

  2. General Information • Instructor: Wu-Jun Li (李武军) • Email: liwujun@cs.sjtu.edu.cn • Homepage: http://www.cs.sjtu.edu.cn/~liwujun • Office: Rm 3-537, SEIEE Building • Office Hours: Tue 14:00 - 15:00 • Course web site: http://www.cs.sjtu.edu.cn/~liwujun/course/mmds.html • Teaching Assistant: Zhi-Qin Yu (余志琴) • Email: xiaoyu199175@gmail.com • Office Hours: TBD; Rm 3-503, SEIEE Building • Time and Venue: Mon 14:00 – 15:40; Wed 10:00 - 11:40; Fri 08:00 - 09:40 ; Rm 105, Dong Shang Yuan (东上院 105) 2

  3. Textbook Anand Rajaraman and Jeffrey D. Ullman. Mining of Massive Datasets. Cambridge University Press,2011.You can download it from the book website (http://i.stanford.edu/~ullman/mmds.html).

  4. Reference Books Jiawei Han, and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, Second Edition, 2006. (The English reprint edition can be bought through China-Pub.) Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Chuck Lam. Hadoop in Action. Manning Publications, First Edition, 2010. 周憬宇,李武军,过敏意.《飞天开放平台编程指南-阿里云计算的实践》. 电子工业出版社,2013年3月.

  5. Course Topics Data-Intensive Scalable Computing (DISC) Cloud Computing MapReduce and Hadoop Data Mining and Machine Learning Basics: supervised learning; unsupervised learning; matrix factorization Large-scale (distributed) implementations with Hadoop Data-Intensive Applications Search, link analysis, recommender systems, mining data streams, advertising on Web

  6. Prerequisites Data structure Design and analysis of algorithms Linear algebra Probability theory Programming languages : Java, c++

  7. Grading Scheme Class attendance (10%) Homework (20%) Exam (40%): Final (40%) Project (30%) 3 students / group

  8. Late Assignments Assignments turned in late will be penalized 20% per late day

  9. Academic Honor Code Honesty and integrity are central to the academic work. All your submitted assignments must be entirely your own (or your own group's). Any student found cheating or performing plagiarism will receive a final score of zero for this course.

  10. Questions?

More Related