1 / 23

Christophe Reffay UMR STEF, ENS Cachan - IF É Christophe.Reffay@ens-cachan.fr

Christophe Reffay UMR STEF, ENS Cachan - IF É Christophe.Reffay@ens-cachan.fr. Methodology for transliteracy research (the case of Concours Castor) Sharing research data inside the Translit project and beyond. 27 June 2013 – AIERI – Dublin, Ireland.

lara
Download Presentation

Christophe Reffay UMR STEF, ENS Cachan - IF É Christophe.Reffay@ens-cachan.fr

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Christophe Reffay UMR STEF, ENS Cachan - IFÉ Christophe.Reffay@ens-cachan.fr Methodology for transliteracy research (the case of Concours Castor)Sharing research data inside the Translit project and beyond 27 June 2013 – AIERI –Dublin, Ireland Christophe Reffay

  2. Introduction:Towards a world with (open) data • Check (transparency)Open Notebook Science (J.-C Bradley, 2006) • Make data tangible [replication possible]Dataverse (G. King, 2007)Benchmarks: compare algorithms on datasetsComputer science (i.e.: MLcomp, IPOL,…) • Many others… Christophe Reffay

  3. Logics motivating data sharing in humanities (S. Duchesne, 2013) • Patrimonial • Economical • Scientific Publication Data Historical doc. Hard to build Make the proof Christophe Reffay

  4. Data sharing initiatives in humanities • France • Calico (Bruillard) • Mulce (Reffay & Chanier, 2007) • Huma-Num (TGE Adonis): • beQuali (Duchesne), DataPublication (Chanier), … • United Kingdom • ESDS Qualidata in UK Data Service • Datacite: a list of 650 repositories Data digitally available make it technically manageable Christophe Reffay

  5. What makes datasets… • Sharable? • Visibility: standard metadata (OAI) • Access: Public?/not, Long term, Curation • Ethics: consent / Anonymisation • Reusable? (readable and computable) • Documentation (data, process, context) • Structure (transparent, manageable) • Format (interoperable) • Interesting? / Re-Analysable? Christophe Reffay

  6. “Translit” project: Trans-literacy on:Media – Information - Computer • Researchers from 3 different cultures • Looking at their common concepts • Sharing some vocabulary • Analysing common or transferable skills • Having shared or separated experiments • Information search tasks observation • Computer Science: Beaver contest Christophe Reffay

  7. Broadcast? or Tuned sharing path? • Whole documented package (all at once) • Huge effort for the provider • Is it adapted to potential re-users? • A path to be built by both parties • General description of the context and global description of data (as a “Data paper”) • Declaration of interest – Start reuse • Tuning data towards new research questions Christophe Reffay

  8. Beaver contest: Introduction(fr: Concours Castor Informatique) • Goal: Discover some principles • Specific/Useful in Informatics • Directly available from any connected classroom – (duration 45 min.) • With funny/interactive tasks • Without any pre-requisite • Teams: 1 or 2 pupils • 1 subscription per group (by the teacher) • For 2012: 90 794 pupils, 721 schools Christophe Reffay

  9. A task example: The Text Machine Paste Paste Paste QUESTION QUESTION Christophe Reffay

  10. Some statistics publicly available Source: http://castor-informatique.fr/resultats.php (June 2013) Christophe Reffay

  11. Data and documentationavailable for research • The data collected during the contest • More than 90 000 participants (single/pair) • 721 schools • All results for each task/team in a database • All interaction registered in a database • Some observations in classrooms • Contest rules: (web site documentation) • All the tasks: (web site try it) • Questionnaires (2011; 2012) • Interviews (coming soon…) database Christophe Reffay

  12. #Question_id Key Folder Name AnswerType ExpectedAnswer #Contest_id Name Level Year Status NbMin Folder MinScore MaxScore Order Answer Score Date #Team_id #group_id StartTime EndTime Score IsUnofficial #School_id Name Region NbStudents Validated #Group_id #School_Id Grade GradeDetails #Access Name NbStudents Nb_Team #Contest_Id StartTime IsUnofficial #Cont_id Lastname Firstname Gender #Team_id GlobalRank RankInSchool The current format: SQL Databases Example: List of scores and time for all official teams in the 2012 contest for grade 6 in the “Bordeaux” region. SQL request: SELECT DISTINCT T.ID, T.`startTime`, Q.key, TQ.`score`, TQ.`date` FROM `group` G JOIN `school` S ON ( G.schoolID = S.ID ) JOIN `team` T ON (T.groupID = G.ID) JOIN `team_question` TQ ON (T.ID=TQ.teamID) JOIN `question` Q ON(TQ.questionID=Q.ID) WHERE T.isUnofficial=2 AND S.`region`="bordeaux" AND G.`grade`=6 ORDER BY T.ID C-Q Contest Question  T-Q   School Group Team  Contestant Christophe Reffay

  13. Result for Request 1 (3096 lines) Christophe Reffay

  14. Example N°2: # team /grade/region SQL Request: SELECT G.grade,S.region, COUNT(*) FROM `group` G JOIN `team` T ON ( T.groupID = G.ID ) JOIN `school` S ON ( G.schoolID = S.ID ) WHERE G.contestID=6 AND T.isUnofficial=2 GROUP BY G.grade, S.region Christophe Reffay

  15. Example N°2: # team /grade/region Resulting table: Christophe Reffay

  16. Localisation of the 721 schools (2012) Christophe Reffay

  17. First steps of the sharing path? For each participant/team define: • Level (grade 6 to 13), Group, Region, • Gender (M / F / MM / FF / MF) • For each task: Right / Wrong / No answer=> Statistics gender/level/tasks • For each task: Content of answers=> didactics • For each task: Time/sequence=> Behaviour Christophe Reffay

  18. An interesting comparison with Social Network Analysis: Jacob L. Moreno (1943) measured affinity networks and showed that: Pupils of age 11-13 do prefer same gender peers Gender/Tasks Statistics (grade=6) Christophe Reffay

  19. Documenting the research process:on the fly => “Roadmap” (I. Quentin) • Intermediary hypotheses & objectives • The rationales of giving up • The needs for new data • The origin of the data (who/when/where) • All needed access information • How data were collected • Analysis methods and their accuracy Christophe Reffay

  20. Suggestions - Proposal • Clean up your data ASAP • Reference version in open formats • Document your investigation process • For yourself, your team/project • For forthcoming partners => Make your data visible & accessible • Make your data interesting for someone else : communicate/publish a "Data Paper" Christophe Reffay

  21. Any Internet user Any researcherunder contract? Authenticated? Data Paper? Circles for access to data TranslitProject participants Informationsearchtaskobservation Concours Castor Informatique Christophe Reffay

  22. Documenting a dataset • Define guidelines for each kind of data • Questionnaires, interviews, Observations • Databases (traces, results, …) • Define a manageable way to build this • Only useful information (upon request) • Written by requesters? • Capitalize requests and documentation Christophe Reffay

  23. Thank you. Merci "Go raibh maith agat" Christophe Reffay UMR STEF, ENS Cachan - IFÉ Christophe.Reffay@ens-cachan.fr

More Related