2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010)

2010 Workshop on Massive Data Analytics on the Cloud(MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference (WWW2010)

Embedded Analytics Dashboards Mash ups Financial Planning Scorecards Billions of mobile devices Making Sense of Mountains of Data Search Online Transaction Processing System Feedback/Action Structured • ClickSteam, CRM • Claim data (text, picture, video) • Call data records • Location Tracking (GPS), • iPhone, Vehicle Use Data, • $ Trans tracking (Across borders & IP providers), Continuous arrival of high volume information (evolving, highly variant) (struct-/semi--/un-structured Semi-Un-struct Auto/Cross Correlation Analytics, Predictive Analytics Semi-struct • Feeds: • Census Bureau Data • Market Data, Weather Data • Sensors data Semi-Un-struct Deep & Wide Analytics Fine grained – individual product and customer at a time and place PetaBytes -> Exabytes • Web Data (for search) • Web Buz data (for reputation analysis)

Massive Data Analytic Platforms M C R PartitionSort M C • Google: Original MapReduce implementation • Microsoft: Dryad • Yahoo!, Facebook, and many others: Hadoop • Ecosystems: Hive, Pig, Jaql, Zookeeper, • Alternatives to Map/Reduce, e.g. Pregel R C M • “Easy” parallelism • Scalability • Fault-Tolerance • Elastic • Flexibility • Cost / Performance • 1000’s processors • Petabytes of data • …and growing

Chairpeople Perspective • Other parallel systems technology and customers • Parallel Database – enterprise data warehousing • Parallel ETL (extraction, transformation, load) • Search and text analytics • Hadoop and related technologies • Finance, Telco, Healthcare, Retail, Government, …

Questions Posed in Call For Papers • What kinds of problems are people trying to solve? • How are existing massive-scaleout platforms used, and what extensions would be helpful? • Other kinds of platforms for different problems? • How to integrate with existing environments such as data warehouses? • Challenges in managing massive datasets? • Legal/moral challenges associated with mining these data sets?

Agenda (morning) 9:00 - 10:30: Session 1 Introduction and Welcome Invited Talk: "Hadoop: An Industry Perspective" Dr. Amr Awadallah, CTO, VP-Engineering, Cloudera 10:30 - 11:00: Coffee Break* 11:00 - 12:30: Session 2 Distributed Indexing of Web Scale Datasets for the Cloud Ioannis Konstantinou, Evangelos Angelou, Dimitrios Tsoumakos, Nectarios Koziris; National Technical University of Athens Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce Joos-Hendrik Böse1, Artur Andrzejak2, Mikael Högqvist2; 1Intl. Comp. Sci. Institute, 2Zuse Institute Berlin (ZIB) Efficient Updates for a Shared Nothing Analytics Platform Katerina Doka3, Dimitrios Tsoumakos4, Nectarios Koziris3; 3National Technical University of Athens, Greece, 4University of Cyprus 12:30 - 1:30: Lunch*

Agenda (afternoon) 1:30 - 3:30: Session 3 Invited Talk: "Large Scale Applications on Hadoop in Yahoo" Dr. Vijay Narayanan, Yahoo! Labs Silicon Valley, Extracting User Profiles from Large Scale Data Michal Shmueli-Scheuer, Haggai Roitman, David Carmel, Yosi Mass, David Konopnicki; IBM Research, Haifa A Novel Approach to Multiple Sequence Alignment using Hadoop Data Grids Sudha Sadasivam, G. Baktavatchalam; PSG College of Technology 3:30 - 4:00: Coffee Break* 4:00 - 5:30: Session 4 Towards Scalable RDF Graph Analytics on MapReduce Padmashree Ravindra, Vikas Deshpande, Kemafor Anyanwu; North Carolina State University SPARQL Basic Graph Pattern Processing with Iterative MapReduce Jaeseok Myung, Jongheum Yeon, Sang-goo Lee; Seoul National University Parallelizing Random Walk with Restart for Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng; National Chiao Tung University Hsinchu, Taiwan

Workshop Chairs Ullas Nambiar, IBM India Research Lab, New Delhi, India John McPherson, IBM Almaden Research Center, USA David Konopnicki, IBM Haifa Research Lab, Israel Steering Committee Rakesh Agrawal, Microsoft Search Labs, Mountain View, CA, USA Alon Halevy, Google Inc., Mountain View, CA, USA Invited Speakers Amr Awadallah, CTO, VP-Engineering, Cloudera, "Hadoop: An Industry Perspective" Vijay Narayanan, Yahoo! Labs Silicon Valley, "Large Scale User Modeling on Hadoop" Program Committee Amr Awadallah, Cloudera, USA Andrew McCallum, University of Massachusetts Amherst, USA Assaf Schuster, Technion - Israel Institute of Technology Gautam Das, University of Texas, Arlington, USA Jimeng Sun, IBM Watson Research Center, USA John Shafer, Microsoft Search Labs, USA Kevin Chang, University of Illinois at Urbana-Champaign, USA Kun Liu, Yahoo! Labs, USA Louiqa Raschid, University of Maryland, College Park, USA Michal Shmueli-Scheuer, IBM Haifa Research Lab, Israel Michael Sheng, University of Adelaide, Australia Mong Li Lee, National University of Singapore, Singapore Rajeev Gupta, IBM India Research Lab, India Vanja Josifovski, Yahoo Research, USA Yannis Sismanis, IBM Almaden Research Center, USA Yi Chen, Arizona State University, USA Wen-syan Li, SAP, China Acknowledgements

2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010)

2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010)

Presentation Transcript

THE WAR ON CANCER – 2010

Information on the 2010 Act

PDE 2010 May 18th – 20th, 2010 The 12th NASA-ESA Workshop on Product Data Exchange

Workshop - 2010

Census 2010: Data on Race and Ethnicity

Massive Database Analysis on the Cloud with D4M

DATA ANALYTICS on web scale

Run on 2010 data

2010 Report on Oasis Data 2007-2010

Notes on the Hungarian data collection, 2010

Large Scale Applications on Hadoop in Yahoo

Microsoft Office 2010 free on cloud

Highway Data Workshop February, 2010

WORKSHOP ON SCANNER DATA Geneva 10 May 2010

The First International Workshop on XML Data management (XML-DM 2010)

Microsoft Data Access Components and ADO.NET

Mecklenburg Disability Action Collaborative (MDAC)

Wipo Regional Workshop on Patent Analytics

Introduction Workshop on Transparency 18 October 2010

Workshop on

Scientific Data Analytics on Cloud and HPC Platforms

WORKSHOP ON SCANNER DATA Geneva 10 May 2010