520 likes | 628 Views
"Stories" in data and the roles of crowdsourcing – views of a Web miner. Bettina Berendt Dept . of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt / Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Dr ă gan. A story. Story structure.
E N D
"Stories" in data and the roles of crowdsourcing – views of a Web miner Bettina Berendt Dept. of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt/ Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Drăgan
Crowd-sourcing the truth? Wikipedia (here: the Gaza Flotilla Raid)
Challenge 5: vagueness - reprise Challenge 4: More specifically
The “live crowdsourcing activity“ • Goal: crowdsource data citation metadata • Motivation 1 / possible extension • Motivation 2 / case study
The data Datasets Publications [People]
The datasets Preloaded: USEWOD datasets DBpedia SWDF Bio2RDF LinkedGeoData BioPortal OpenBioMed
The datasets Preloaded: Generic (!) Versions/releases References
The datasets Add new: Name* Version Release date URL
The publications Preloaded: USEWOD workshop papers
The publications Add new: Title* Authors Year URL
The task Capture which dataset is used in which publication and how
Data representation Datasets Publications Connections between them schema.org prov:Entity ?
Data representation Datasets Publications Connections between them schema.org prov:Entity prov:Derivation
The task Capture which dataset is used in which publication and how
Connections Publication – Publication Publication – Dataset Dataset – Publication Dataset - Dataset
Connections Publication – Publication citation
Connections Publication – Dataset Dataset – Publication mentions describes evaluates analyses compares
Connections Dataset – Dataset extends includes overlaps transformation of generalisation of
Data representation Subclasses of prov:Derivation (inverse of Publication-DS)
The task Capture which dataset is used in which publication and how
Lessons learned Data is dirty even coming from experts Focus on the task make everything else simpler minimise data input
Questionnaire results Inconclusive results on the suitability of the vocabulary, But interesting answers to: „“what questions would this information answer for you?“: “What are popular datasets?” “Which datasets are facilitators for research on X?” “What publications are related through a dataset (but don't mention each other)?”
Outlook (1): Dimensions of crowdsourcing What is outsourced Who is the crowd How is the task designed How are the results validated How can the process be optimised [Quinn & Bederson, 2012]