70 likes | 229 Views
T OWARDS A B IG D ATA C OMMUNITY C HALLENGE. Tilmann Rabl, Florian Stegmaier, Michael Granitzer and Hans-Arno Jacobsen 3rd W orkshop on Big D ata Benchmarking July 16-17 Xi‘an , China. B IG D ATA – W HY C OMMUNITY C HALLANGES M ATTER.
E N D
TOWARDS A BIG DATACOMMUNITY CHALLENGE Tilmann Rabl, Florian Stegmaier, Michael Granitzerand Hans-Arno Jacobsen 3rd Workshop on Big Data Benchmarking July 16-17 Xi‘an, China
BIG DATA – WHY COMMUNITY CHALLANGES MATTER • Big Data is a major buzzword in scientific's world • Conferences, workshops, tutorials, panels • Component benchmark, end-to-end systems, etc. • Variety leads to incomparability of results • Research communities run challenges to • … enable comparability of results • … foster evolution of a research field • … “Kites rise highest against the wind, • not with it.”(W. Churchill)
WHAT SHOULD BE IN THE FOCUS? DATA! HOW SHOULD IT BE? INTERESTING! „[...] other communities, like information retrieval, natural language processing, or Web research, have a much richer and agile culture in creating, disseminating, and re-using interesting new data resources for scientific experimentation [...]” – G. Weikum, SIGMOD Blog
HOW ARE „THE OTHERS“ DOING? • Information retrieval community: • TREC, TRECVid(task-based, measurable scientific impact) • CLEF Initiative (task-based, benchmarking initiatives) • Multimedia community: • Multimedia Grand Challenge (tasks defined by “global players”, e.g., Yahoo! and Microsoft) • Open Source Software Comp. (foster community activities) • Semantic Web guys: • Linked Data Cup (data generation) • Semantic Web in-Use (mashup creation)
SUCCESSFUL COMMUNITY CHALLENGES: TAKE-HOME MESSAGE • Challenges are not a single event • On-going process, running through different stages: • Data generation • Solving restricted, high-impact issues • Fostering open source frameworks • Assembling mashups • Accepted by the community
BRAINSTORMING AREA:STRUCTURE OF THE CHALLENGE • Challenge needs to be focused on specific tasks: • Tasks assemble a “Big Data pipeline” • Specified by academia and industry • Hybrid approach to engage participants: • Utilize benchmark activities • Computing tasks on “Open Data”
TIME TO BREAKOUT! • Discussions should focus on: • Where to find large-scale, interesting “open” data sets? • Which tasks could form a sophisticated Big Data pipeline ensuring a broad range of implementations? • BREAKOUT HOW-TO: • Breakout and student groups as yesterday • Prepare one slide for each question