1 / 8

2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010)

2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010). April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference (WWW2010). Embedded Analytics. Dashboards. Mash ups. Financial Planning. Scorecards. Billions of mobile devices.

Albert_Lan
Download Presentation

2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2010 Workshop on Massive Data Analytics on the Cloud(MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference (WWW2010)

  2. Embedded Analytics Dashboards Mash ups Financial Planning Scorecards Billions of mobile devices Making Sense of Mountains of Data Search Online Transaction Processing System Feedback/Action Structured • ClickSteam, CRM • Claim data (text, picture, video) • Call data records • Location Tracking (GPS), • iPhone, Vehicle Use Data, • $ Trans tracking (Across borders & IP providers), Continuous arrival of high volume information (evolving, highly variant) (struct-/semi--/un-structured Semi-Un-struct Auto/Cross Correlation Analytics, Predictive Analytics Semi-struct • Feeds: • Census Bureau Data • Market Data, Weather Data • Sensors data Semi-Un-struct Deep & Wide Analytics Fine grained – individual product and customer at a time and place PetaBytes -> Exabytes • Web Data (for search) • Web Buz data (for reputation analysis)

  3. Massive Data Analytic Platforms M C R PartitionSort M C • Google: Original MapReduce implementation • Microsoft: Dryad • Yahoo!, Facebook, and many others: Hadoop • Ecosystems: Hive, Pig, Jaql, Zookeeper, • Alternatives to Map/Reduce, e.g. Pregel R C M • “Easy” parallelism • Scalability • Fault-Tolerance • Elastic • Flexibility • Cost / Performance • 1000’s processors • Petabytes of data • …and growing

  4. Chairpeople Perspective • Other parallel systems technology and customers • Parallel Database – enterprise data warehousing • Parallel ETL (extraction, transformation, load) • Search and text analytics • Hadoop and related technologies • Finance, Telco, Healthcare, Retail, Government, …

  5. Questions Posed in Call For Papers • What kinds of problems are people trying to solve? • How are existing massive-scaleout platforms used, and what extensions would be helpful? • Other kinds of platforms for different problems? • How to integrate with existing environments such as data warehouses? • Challenges in managing massive datasets? • Legal/moral challenges associated with mining these data sets?

  6. Agenda (morning) 9:00 - 10:30: Session 1 Introduction and Welcome Invited Talk: "Hadoop: An Industry Perspective" Dr. Amr Awadallah, CTO, VP-Engineering, Cloudera 10:30 - 11:00: Coffee Break* 11:00 - 12:30: Session 2 Distributed Indexing of Web Scale Datasets for the Cloud Ioannis Konstantinou, Evangelos Angelou, Dimitrios Tsoumakos, Nectarios Koziris; National Technical University of Athens Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce Joos-Hendrik Böse1, Artur Andrzejak2, Mikael Högqvist2; 1Intl. Comp. Sci. Institute, 2Zuse Institute Berlin (ZIB) Efficient Updates for a Shared Nothing Analytics Platform Katerina Doka3, Dimitrios Tsoumakos4, Nectarios Koziris3; 3National Technical University of Athens, Greece, 4University of Cyprus 12:30 - 1:30: Lunch*

  7. Agenda (afternoon) 1:30 - 3:30: Session 3 Invited Talk: "Large Scale Applications on Hadoop in Yahoo" Dr. Vijay Narayanan, Yahoo! Labs Silicon Valley, Extracting User Profiles from Large Scale Data Michal Shmueli-Scheuer, Haggai Roitman, David Carmel, Yosi Mass, David Konopnicki; IBM Research, Haifa A Novel Approach to Multiple Sequence Alignment using Hadoop Data Grids Sudha Sadasivam, G. Baktavatchalam; PSG College of Technology 3:30 - 4:00: Coffee Break* 4:00 - 5:30: Session 4 Towards Scalable RDF Graph Analytics on MapReduce Padmashree Ravindra, Vikas Deshpande, Kemafor Anyanwu; North Carolina State University SPARQL Basic Graph Pattern Processing with Iterative MapReduce Jaeseok Myung, Jongheum Yeon, Sang-goo Lee; Seoul National University Parallelizing Random Walk with Restart for Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng; National Chiao Tung University Hsinchu, Taiwan

  8. Workshop Chairs Ullas Nambiar, IBM India Research Lab, New Delhi, India John McPherson, IBM Almaden Research Center, USA David Konopnicki, IBM Haifa Research Lab, Israel Steering Committee Rakesh Agrawal, Microsoft Search Labs, Mountain View, CA, USA Alon Halevy, Google Inc., Mountain View, CA, USA Invited Speakers Amr Awadallah, CTO, VP-Engineering, Cloudera, "Hadoop: An Industry Perspective" Vijay Narayanan, Yahoo! Labs Silicon Valley, "Large Scale User Modeling on Hadoop" Program Committee Amr Awadallah, Cloudera, USA Andrew McCallum, University of Massachusetts Amherst, USA Assaf Schuster, Technion - Israel Institute of Technology Gautam Das, University of Texas, Arlington, USA Jimeng Sun, IBM Watson Research Center, USA John Shafer, Microsoft Search Labs, USA Kevin Chang, University of Illinois at Urbana-Champaign, USA Kun Liu, Yahoo! Labs, USA Louiqa Raschid, University of Maryland, College Park, USA Michal Shmueli-Scheuer, IBM Haifa Research Lab, Israel Michael Sheng, University of Adelaide, Australia Mong Li Lee, National University of Singapore, Singapore Rajeev Gupta, IBM India Research Lab, India Vanja Josifovski, Yahoo Research, USA Yannis Sismanis, IBM Almaden Research Center, USA Yi Chen, Arizona State University, USA Wen-syan Li, SAP, China Acknowledgements

More Related