1 / 17

Where indicated slides licensed under

Dealing with software: the research data issues http://dx.doi.org/10.6084/m9.figshare.1150298 26 August 2014, Dealng with Data Conference Neil Chue Hong (@ npch ), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | N.ChueHong@software.ac.uk. Project funding from. Supported by.

Download Presentation

Where indicated slides licensed under

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dealing with software:the research data issueshttp://dx.doi.org/10.6084/m9.figshare.115029826August 2014,Dealng with Data ConferenceNeil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | N.ChueHong@software.ac.uk Project funding from Supported by Where indicatedslides licensed under

  2. “Re-” is the new black

  3. The Research Cycle Research Outputs Research is a continuous cycle. When we publish we are contributing to the body of knowledge. Interpret Data Test Revise Publish Paper Software Create

  4. Research/Reuse/Reward Cycle Research Reuse Reuse is also a cycle. We build our research on the work of others. Reward mechanisms should encourage reuse. Interpret Index Test Revise Publish Identify Create Reward Cite

  5. The current process Startresearch Writesoftware Usesoftware Produce results Publishresearchpaper Which mentions software and data Release data This process is simple but does not reward production orreuse of good software and data. It also has a long contribution cycle. Release software

  6. “Re-”positoriesBackup|Sharing|Archivingof software

  7. Differing roles, different repositories backup sharing archiving Timescales Policy Licensing Ingest Metadata Assurance

  8. Versioning • Why do we version? • To indicate a change • To allow sharing • To confer special status Version control systems make this easy and conceptof a person and an outputare there but not unique Public v1 Public v2 Public v3 Personal v3 Personal v3a Personal v1 Personal v2 Personal v2a Personal v2a

  9. Granularity Function Algorithm Program Library / Suite / Package … • What do we define? • Useful units of reuse

  10. Boundary • What do we choose to identify: • Workflow? • Software that runs workflow? • Software referenced by workflow? • Software dependencies? • What’s the minimum citable part?

  11. Authorship Authorship • Which authors have had what impact on each version of the software? • Who had the largest contribution to the scientific results in a paper? • Can micro-attribution work? Can track author, but not contribution? • http://beyond-impact.org/?p=175 • Why do we identify? • To measure • To restrict • To communicate • To include OGSA-DAI projects statistics from Ohloh

  12. Code as a Research Object • What if you could assign DOIs to code easily? • Could we make software more reusable? • http://mozillascience.org/code-as-a-research-object-a-new-project/ • https://guides.github.com/activities/citable-code/

  13. A better process? Startresearch Writesoftware Adapt/extendsoftware Usesoftware Produce results Publishresearchpaper Identify existingsoftware Release software Release data Which references software and data papers Software and data papers are needed as proxies for rewarding reuse. But it enables a shorter contribution cycle for data and software. Publish software paper Publish data paper

  14. Alternative Metrics

  15. One-click challenge • “One-click” archiving of a significant version of software in a code repository to a suitable institutional repository • “Suitable” repository: • Clear access / deposit / preservation policy • Adherence to standards • Ability to easily “transfer” in / out • Allows use of appropriate licenses for code • Sustainability of hosting organisation • Ability to monitor, check integrity • Provides permanent unique identifiers • Proposing a hackday to make this happen

  16. Summary • Software is an important output of the research cycle, and should be rewarded • Repositories play an important role in the research cycle, including software • But software has specific issues with regards to research data management • Tooling is needed to lower barriers to deposit

  17. Further information • This presentation: • Slides: http://dx.doi.org/10.6084/m9.figshare.1150298 • Abstract: http://dx.doi.org/10.6084/m9.figshare.1150299 • Where does it go from here: the place of software in digital repositories • http://www.research.ed.ac.uk/portal/en/publications/where-does-it-go-from-here-the-place-of-software-in-digital-repositories(ab6130c6-aee6-4972-9256-8ea0eb1862c9).html • Software Papers: improving the reusability and sustainability of scientific software • http://dx.doi.org/10.6084/m9.figshare.795303 • Software Sustainability Institute • http://www.software.ac.uk/ Supported by EPSRC Grant EP/H043160/1

More Related