1 / 19

CC5212-1 Procesamiento Masivo de Datos Otoño 2014

CC5212-1 Procesamiento Masivo de Datos Otoño 2014. Aidan Hogan aidhog@gmail.com Wra p-Up. Course Marking. 45% for Weekly Labs (~3% a lab!) 35% for Final Exam 20% for Small Class Project. Final Exam (35%). Next Tuesday, 9am, Room to be confirmed

anneke
Download Presentation

CC5212-1 Procesamiento Masivo de Datos Otoño 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CC5212-1ProcesamientoMasivo de DatosOtoño2014 Aidan Hogan aidhog@gmail.com Wrap-Up

  2. Course Marking • 45% for Weekly Labs (~3% a lab!) • 35% for Final Exam • 20% for Small Class Project

  3. Final Exam (35%) • Next Tuesday, 9am, • Room to be confirmed • Goal: test your understanding of concepts • Coding covered by labs/project • No syntax writing questions! • but there will be design and syntax reading questions • Max. three hours (it won’t take that long !) • Not marking you on English • If really stuck, write in Spanish! • Four questions (marked on best three) …

  4. The following is not a legally abiding agreement. It is just a helpful guide for what’s important.

  5. Q1: Distributed Systems

  6. Q1: Distributed Systems (Slides) Slides: [02] MDP-02-Intro-Dist-Sys-20140317.pptx [03] MDP-03-Fallacies-CAP-20140324.pptx [04] MDP-04-Consensus-Paxos-20140331.pptx e.g., [02, S. 3–7] = Slide deck [02],slides 3 to 7 Names per the homepage: http://aidanhogan.com/teaching/cc5212-1/ Slides indicated are only a guide!

  7. Q1: Distributed Systems (Topics) Possible Topics: • Advantages/disadvantages of a distributed system[02, S. 3–7] • Five distributed system design goals [02, S. 9–11] • Distributed architectures (P2P vs. C–S, Fat/Thin, n-Tier, etc.) [02, S. 14–31] • Java RMI (high-level) [02, S. 47–59] • Eight fallacies of distributed computing[03, S. 12–20] • Consensus basics (fail-stop vs. Byzantine, synchronous vs. asynchronous, goals) [04, S. 04–22] • Consensus protocols (2PC, 3PC, Paxos) [04, S. 25–55] • CAP theorem will appear in Q4

  8. GFS (HDFS) / MapReduce (Hadoop)

  9. Q2: GFS (HDFS) / MapReduce (Hadoop) Slides: [05] MDP-05-DFS-MapReduce-20140407.pptx [06] MDP-06-Hadoop-20140414.pptx [07] MDP-07-Pig-20140421.pptx e.g., [02, S. 3–7] = Slide deck [02], slides 3 to 7 Names per the homepage: http://aidanhogan.com/teaching/cc5212-1/ Slides indicated are only a guide!

  10. Q2: GFS (HDFS) / MapReduce (Hadoop) Possible Topics: • Google File System (reads, writes, fault-tolerance) [05, S. 11–27] • MapReduce(incl. design question)[05, S. 36–46; 06 S. 11–16] • HDFS/Hadoop (architecture) [06, S. 18–20] • Pig (high-level, give result from input and script) [07]

  11. Information Retrieval

  12. Q3: Information Retrieval (Slides) Slides: [08] MDP-08-Search-20140428.pptx [09] MDP-09-Ranking-20140505.pptx e.g., [02, S. 3–7] = Slide deck [02], slides 3 to 7 Names per the homepage: http://aidanhogan.com/teaching/cc5212-1/ Slides indicated are only a guide!

  13. Q3: Information Retrieval (Topics) Possible Topics: • Crawling (high-level multi-threading, (D)DoS, robots.txt, sitemap, distribution, bow-tie) [08, S. 18–32] • Inverted indexes (data structure, normalisation, Heap’s law, Ziph’s law, Elias encoding, etc.)[08, S. 36–51] • Ranking (relevance vs. importance, TF-IDF, Vector Space Model, etc.) [09, S. 09–31] • PageRank (concept, random surfer, calculation) [09, S. 35–56]

  14. Bring a Calculator!

  15. NoSQL and Querying

  16. Q4: NoSQL and Querying (Slides) Slides: [03] MDP-03-Fallacies-CAP-20140324.pptx [10] MDP-10-Intro-to-NoSQL-20140512.pptx [11] MDP-11-BigTable+Cassandra.pptx e.g., [02, S. 3–7] = Slide deck [02], slides 3 to 7 Names per the homepage: http://aidanhogan.com/teaching/cc5212-1/ Slides indicated are only a guide!

  17. Q4: NoSQL and Querying (Topics) Possible Topics: • CAP theorem [03, S. 23–39] (<- note out of order) • The Database Landscape [10, S. 10] • Key–Value stores (data model, operations, distribution, consistent hashing, replication, Dynamo, Merkle trees) [10, S. 18–38] • Document stores (high-level) [10, S. 44–45] • Tabular/column-families (data model, Bigtable, sorting, tablets, column families, SSTables, writes, reads, compactions, hierarchy, bloom filters [11, S. 17–36] • Graph databases (high-level) [11, S. 45–52] • Cassandra (high-level) [11, S.61–69]

  18. Final Exam (35%) Recap • Next Tuesday, 9am, • Room to be confirmed • Goal: test your understanding of concepts • Coding covered by labs/project • No syntax writing questions! • but there will be design and syntax reading questions • Max. three hours (it won’t take that long !) • Not marking you on English • If really stuck, write in Spanish! • Four questions (marked on best three) …

More Related