1 / 39

~15’000 HEP theorists scratch their heads to make sense of all that stuff and then some more

INSPIRE A new information system for High-Energy Physics. Lessons learnt. Salvatore Mele, CERN On behalf of the INSPIRE collaboration OR2010 | Madrid | July 6 th 2010. ~15’000 High Energy Physics (HEP) scientists smash stuff at the speed of light to produce new stuff.

dregina
Download Presentation

~15’000 HEP theorists scratch their heads to make sense of all that stuff and then some more

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INSPIREA new information system forHigh-Energy Physics.Lessons learnt.Salvatore Mele, CERNOn behalf of the INSPIRE collaborationOR2010 | Madrid | July 6th 2010

  2. ~15’000 High Energy Physics (HEP) scientists smash stuff at the speed of light to produce new stuff

  3. ~15’000 HEP theorists scratch their heads to make sense of all that stuff and then some more

  4. Krause et al. CERN-OPEN-2007-014 …and they write 10k papers/year ‘bout it 90% of papers… …on theory … 3 authors

  5. The “preprint culture” L.Goldschmidt-Clermont, 1965, http://eprints.rclis.org/archive/00000445/02/communication_patterns.pdf • Scientific journals of ‘60s too slow for HEP • Mass-mail preprints to institutes worldwide • Ante litteram (institute-pays) Open Access • Leading libraries catalogue & serve preprints • From the ‘70s SPIRES: e-Catalogue CERN Library, circa 1960

  6. Something “vague but exciting” @CERN (T. Berners-Lee at CERN, early ‘90s)

  7. Ist Web Site in the U.S.

  8. 1991: arXiv.org Discovery and first plateaus Steady state & constant output Conference contributions WWW

  9. Why don’t we need mandates?Why don’t we need advocacy?

  10. What do scientists want ?

  11. VisibilityAccelerationImpact

  12. Visibility

  13. Where do HEP scientists look for info? Gentil-Beccot et al. arxiv:0804.2701 • Survey of 2’000+ scientists (10% of community) • OA tools answer scientists’ information needs • Google as proxy of arXiv, SPIRES, publishers

  14. Acceleration

  15. Ten years in the life of a HEP article • SPIRES counts: citations to/from preprints/articles • Citation peaks at publications • Scientific discourse proceeds on discipline repository

  16. Impact

  17. Citation augmentation • Discipline repository yields immense avantage • Five times more citations for articles in arXiv • 20% of 2-year citations occur before publication

  18. By the way, do they read journals?

  19. 97% of HEP journals’ content is in arXiv

  20. Gentil-Beccotet al. arxiv:0906.5418 Given a choice, only one in five (maybe ten) scientists goes to the published version of an article. ∼30,000 clicks (choice between arXiv and journal) Publisher server 18% arXiv 82% (As many scientists as analyzed here go straight to arXiv) ( ) Still, peer-review and journals are indispensable SCOAP3.org will solve this conundrum

  21. Enter INSPIRE What is it? What does it do? What will it do? Why shall I care?

  22. INSPIRE in a nutshell • Joint Project of CERN, DESY, Fermilab and SLAC • Inspired by survey HEP scientists (2000+ respondents) expecting the future from ageing SPIRES infrastructure • Unify DESY/Fermilab/SLAC SPIRES content with CERN Invenio Open Source Digital Library Platform • 860’000 Records/500’000 Full-text/All that matters in HEP • 40 years of manual curation; arXiv and publishers feeds

  23. Invenio – CERN Open Source Digital Library solution http://invenio-software.org Designed for scientific libraries with 0.2-10M records Now also powering OpenAIRE “Orphan” repository (Listen to S. Kaplun’s talk today @1500 Room 2)

  24. Who do we know? • 80K names (20K affiliation histories, 25K e-mails) • 860K papers with authors and affiliations • 22M ‘signatures’ on papers Automatic Disambiguation Henning Weiler – PhD@CERN “Who was where when and wrote on what with whom ?” E.g. Scan 5.4M signatures. Attribute 5.3M to 250 authors ! E.g. 963 papers by "Chen, G”, 98% attributed to 21 authors

  25. Coming soon Content, metadata and feature enhancement

  26. Back to the users • Personal libraries, alerts • Claim-my-papers (with arXiv and ORCID) • Submit theses and old non-arXiv material • Attach non-text material (high level data files) • OCR of library holdings (with D4Science-II) • Advanced feeds (with ADS, arXiv, Publishers)

  27. Full-text search

  28. Figure extraction

  29. Coming later • Holistic recommender system (logs, cites, text) • Crowdsourcing of keywording (tagging) • Semantic layer (did-you-mean and classification) • ORE’ish aggregation of cite/people/data/papers • (Semantic) image search • Platform for high-level data preservation • NOTE: Looking for PhD/Post-Docs – e-mail me

  30. Lesson learnt • Understand users/authors drivers and barriers • Invest in value-added service for scientists • Harvesting and curation • Visibility, accessibility, efficiency • Co-operation, collaboration, partnerships • Aim – ACCELERATE SCIENCE

  31. Thank you ! Salvatore.Mele@cern.ch http://inspirebeta.net Additional resources: R. Heuer et al. Innovation in Scholarly Communication: Vision and Projects from High-Energy Physics http://arxiv.org/abs/0805.2739 A.Gentil-Beccot et al. Information Resources in High-Energy Physics: Surveying the Present Landscape and Charting the Future Course http://arxiv.org/abs/0804.2701 A.Gentil-Beccot et al. Citing and Reading Behaviors in HEP: How a Community Stopped Worrying about Journals and Learned to Love Repositories http://arxiv.org/abs/0906.5418

More Related