eScience at: - PowerPoint PPT Presentation

aric
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
eScience at: PowerPoint Presentation
Download Presentation
eScience at:

play fullscreen
1 / 10
Download Presentation
eScience at:
170 Views
Download Presentation

eScience at:

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. eScience at: Barend Mons WHAT I WILL NOT TALK ABOUT:

  2. Goal: in silico Knowledge discovery via pattern recognition in big data Nature, June 6, 2012 Nature, June 10, 2012 Cell, March 16, 2012 Nature, June 21, 2012 Genetics in Medicine, March, 2011

  3. The Data Challenge…..no option to go on like this • Computer speed and storage capacity is doubling every 18 months and this rate is steady • DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade Guy Cochrane, ENA, EMBL-EBI Soon enough, data stewardship and analysis is will be THE limiting factor in eScience

  4. All Legacy information New dataset New Insights User

  5. A Cardinal Assertion aggregates all ‘n’ Nanopublications making the same assertion. It therefore has 1 assertion and ‘n’ provenances, eliminating redundancy. A Nanopublicationis the smallest unit of publishable information containing: Assertion A statement of concepts in terms of one or more ‘subject -> predicate -> object’ (triple) relationships. Provenance Attribution – Who made this assertion, when and where? Supporting information – Any other information which is relevant to the assertion (e.g. this assertion is only valid in humans under 18). Nanopublications & Cardinal Assertions Nanopublication 1 identical assertion ‘n’ different provenances Cardinal Assertion

  6. Managing volume & complexity 5 Combining Cardinal Assertions with Concept profiles reduces the amount of data with ≈99.999996% Individual Concept Profiles ≈4x106 5 4 1 2 5 Individual Cardinal Assertions > 1011 Individual Nanopublications > 1014 2 4 1

  7. Information compression: From 1014 nanopublications, to 1011 cardinal assertions to a concept web of only106Knowlets: from a 1.5 M euro machine to my local server > Data reduction! for KD

  8. Traditional Publishing >>>> eScience Publishing Computer Reasoning (takes charge) Gigsa size of datasets (beyond narrative) Collaborative Intelligence (calls for million minds) Irreversable movement (to OA)

  9. Data and information interoperability in the digital science era New Insights Confirmational reading (full provenance) Research Community Publication 6 App-GUI App-GUI App-GUI App-GUI 5 Multiple Analytics-Enabling Environments (any format) 4 Interoperable Exchange Environment (RDF-Open PHACTS-type) 3 2 Any Format Databases (curated) Patient Blogs Clinical Data TranSmart) Data Sets (TranSmart) 1

  10. Public Terminal The Safe Data Harbour Ground Plan Published Commons Data and Analysis Market Place Freemium Terminal 46 Today Mixed Public/secure Terminal (patient data) High Security Terminal