1 / 16

Student Workshop Readout

Student Workshop Readout. Big Data and Cloud Read By: Parag Deshmukh. Schedule. Sample Research Areas in Big Data and Cloud. Warehouse scale computing Big Data Algorithms and Data Structures by Giridhar Nag Yasa Resource management at scale

marcy
Download Presentation

Student Workshop Readout

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Student Workshop Readout Big Data and Cloud Read By: Parag Deshmukh NetApp Confidential - Internal Use Only

  2. Schedule NetApp Confidential - Internal Use Only

  3. Sample Research Areas in Big Data and Cloud • Warehouse scale computing • Big Data Algorithms and Data Structures by Giridhar Nag Yasa • Resource management at scale • Cloud Resource Management using Machine Learning by P C Nagesh • Issues in multi-tenant environment • Security in Cloud by SrinivasanNarayanamurthy • Reliable computing with unreliable components • Reliability in Cloud by Ranjit Kumar NetApp Confidential - Internal Use Only

  4. Brainstorming Outcome NetApp Confidential - Internal Use Only

  5. Group 1 (Sai Susarla) Sunil Kumar (IISc), Sandeep Kumar (IISc), Shashank Gupta (IIT Bombay) • Size of lemma model for index building grown beyond memory size • Tasks are uneven in there complexities Distribution of work for even utilization of cluster while handling the lemma model which is larger than memory size NetApp Confidential - Internal Use Only

  6. Group 2 (Vipul Mathur) Vineet P (ATG), LavanyaT (IISc), B. Ramakrishna (IIT Delhi) , Nikhil Krishnan (IISc), S.SreeVivek (IIT Chennai) • How do we secure inline-deduped uploads? • A scheme for making sure a user actually has the data, before deduplicating uploads. • Data Redundancy: Dedup, Replication and Erasure Coding • Can we find the appropriate level of redundancy to feed dedup vs. replication vs. erasure coding mechanisms? • Accessing petabytes of data at small block granularity is inefficient. • Can we learn the “appropriate” block size for a file using regression and change dynamically NetApp Confidential - Internal Use Only

  7. Group 3 (Ajay Bakre) BirenjithSasidharan (IISc),ManjeetDahiya (IIT Delhi), PriyankaKumar (IIT Patna) • “Aadhar” dedup problem • What data structures can be used for avoiding perturbations in the finger printing store. • What should be the layout of data store and/or change in dedup algorithm so that • We have a deterministic response time of dedup algorithm irrespective of repository size NetApp Confidential - Internal Use Only

  8. Group 4 (Ameya Usgaonkar) N. Prakash (IISc), V. Lalitha (IISc), PriyankaSingla (IISc) • De‐duplication and RAIDing • Both being at the level of 4K blocks, is there any advantage to jointly design them ? NetApp Confidential - Internal Use Only

  9. NetApp Confidential - Internal Use Only

  10. NetApp Confidential - Internal Use Only

  11. Table arrangement for breakout session NetApp Confidential - Internal Use Only

  12. Workshop Readout (Three Ideas) Table 2 Students: Nikhil, Vivek, Lavanya, Ramakrishna NetApp: Vineet,Vipul NetApp Confidential - Internal Use Only

  13. A: Secure Deduped Uploads • How do we secure inline-deduped uploads? • A scheme for making sure a user actually has the data, before deduplicating uploads. • User H1(D)  Server match+ dedup is insecure if some malicious person gets hold of H1(D), they can ask for D. • Server generates nonce r User H2(H1(D), r)  Server match + dedup is secure as H1(D) is never sent over the network. NetApp Confidential - Internal Use Only

  14. B: Data Redundancy: Dedup, Replication and Erasure Coding • Considerations: • Dedup removes redundancy in data,replication for performance adds redundancy • Replication for reliability vs. erasure coding • Can we find the appropriate level of redundancy to feed dedup vs. replication vs. erasure coding mechanisms? • Ideas: • Learn/ specify an activity level for m users • No dedup, possible replication for active data • Heavy dedup for cold data • Erasure coding for reliability not needed if performance replicas provide reliability too or if non-deduped copies exist • Summary: Derive and use a function f(m) to select the appropriate redundancy level taking into account dedup, replication and erasure coding. NetApp Confidential - Internal Use Only

  15. C: Variable Block Sizes • Accessing petabytes of data at small block granularity is inefficient. • Can we learn the “appropriate” block size for a file using regression based on: • Access patterns: sequential vs. random • File sizes • Duplication factor • Track changes in patterns over time • Vary block size to adapt • Reliability methods affected: “block checksums” • Considerations: • Can a single file have variable block sizes? • Is it possible to change block sizes over time? • Use multiples of single block size. • Start with a prediction based on user’s profiles. • Hot vs. cold data should have different block sizes NetApp Confidential - Internal Use Only

  16. NetApp Confidential - Internal Use Only

More Related