80 likes | 201 Views
The upcoming midterm exam is scheduled for March 13, 2014, from 11:00 AM to 12:20 PM in 101 Davis. Prepare for a closed-book exam covering critical topics from "Doing Data Science," including statistical inference, exploratory data analysis (EDA), model fitting, and probability distributions. Key algorithmic concepts such as linear regression, K-NN, and K-means will be examined. Understand HDFS and MapReduce fundamentals as well. The exam format includes five questions of varied weight and will assess your grasp of key concepts and their applications.
E N D
General Information • Date: 3/13/2014 • Time: 11-12.20 • Location: 101 Davis • Closed book, closed notes
Topics • Doing data science text: Ch.2 • Statistical inference, exploratory data analysis, and data science process • Population and samples, sample sizes • Data model • Statistical model • Algorithms • Fitting a model • Probability distributions • EDA: plots, graphs and summaries • One question
Topics (contd.) • Doing data science: Ch. 3 • Comparison of algorithms and stat models • Three basic algorithms • Linear regression • K-NN (semi-supervised.. Classification) • K-means (unsupervised clustering) • Intuitive idea • Algorithmic steps for each of these algorithms • Representative examples • Why and when would you use each of these algorithms? • 2 questions
Topics: Lin & Dyer’s text • Hadoop: HDFS as in Chapter 2 • MapReduce: MR data-flow including combiners and partitioners • 2 questions
Bloomberg Tech Talk on ML • Building Intelligent solution • See the presentation • Up to slide#16 (No NLP or MT) • 1 question
Format • 5 questions not equally weighed • HDFS: direct • Ch.2 dds: direct • MR and K-NN: little tricky • K-means: direct • Questions will test your understanding of the concepts • Example: what is the effect of large K vs smaller K in K-NN?
Seating for the exam • Question, space for answer format • Designated seating: Will let you know the plan