1 / 26

Semantic Application for Digital Repositories

Semantic Application for Digital Repositories . Fabrizio Gagliardi EMEA & LATAM Director Technical Computing MSR External Research Microsoft Corporation. Microsoft Research’s Commitment to Science. Advancement of Science Global Collaboration Technology Excellence Interoperability.

zipporah
Download Presentation

Semantic Application for Digital Repositories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Application forDigital Repositories Fabrizio Gagliardi EMEA & LATAM Director Technical Computing MSR External Research Microsoft Corporation

  2. Microsoft Research’s Commitment to Science • Advancement of Science • Global Collaboration • Technology Excellence • Interoperability • Putting computing into science… • Applying Microsoft products and research technologies to advance the scientific research and engineering innovation process • Putting science into computing… • Ensuring that research community requirements are factored into future versions of Microsoft software

  3. Scholarly Communications: Project Overview • Current or Completed Projects • Cornell – arXiv.org + Word 2007 (and repository interoperability via SWORD) • MIT / Broad Institute – Authoring (Word 2007) + data for research reproducibility • MSR – CMT++ interoperability with data + metadata transfer/exchange (conference management tool enhancements) • LiveLabs – eJournal publishing online service (community publishing tool) • UC San Diego / PLoS – Semantic mark-up of scholarly articles (+ submission) • Chem4Word with Office & Cambridge University – Create add-in to Word 2007 to facilitate drawing of chemical compounds and equations • Johns Hopkins University – Digital Archive for Astronomy/Astrophysics data (storage, preservation and access) • Planets Project / EU (with MSR – Cambridge) OpenXML and file format preservation + interoperability • eChemistry Project (Cornell, Penn State, Indiana, Cambridge, Southampton) – ORE exemplar: access to compound chemical info objects (cross-repository access to open chemistry data) • British Library – Researcher Information Centre (RIC) online workflow tool for scientists and researchers • Creative Commons Add-in for Office 2007 – evolving the Word 2003 effort • University of Southampton (UK) – Port ePrints Repository Software for installation on the Windows platform • University of Manchester / “MyExperiment” Project – social networking for scientists • ORE Acceleration Project (OAI – Object Reuse & Exchange) – Alpha spec development • Indiana University – Toolbox for Social Networking (SRT) • UK National Archives – Virtual PC / Emulation of legacy systems to facilitate preservation • National Library of Medicine / NCBI – “PubMed Int’l” UK version of PubMed + NLM DTD • Pipeline • DRIVER 2 (EU) – Infrastructure integration of across a network of European research repositories

  4. Research Output Repository Platform Goals • A platform for building services and tools for research output repositories • Papers, Videos, Presentations, Lectures, References, Data, Code, etc. • Relationships between stored entities • Enable a tools and services ecosystem for “research output” repositories on MS technologies Execution • Utilizing OAI-ORE, SWORD, and other community protocols • In development, deployment within MSR in early Q4 • Beta release to the community in late Q4 • Built on SQL Server 2008 + Entity Framework • Using WPF and Silverlight for UI

  5. Research Output Repository Platform Non-goals • A generic platform for asset management • Support the lifecycle of publications • Compete with existing repository solutions Goals • Create a platform for building “research output” repositories • Engage with the digital library and scholarly communications community • Become the “research output” repository for MSR (RMCr project) • Papers, Videos, Presentations, Lectures, References, Data, Code, etc. • Support an ecosystem of services and tools • Available to the community for free (we are still considering the open source route) • Build an easy-to-install collection of basic services and tools

  6. An Ecosystem of Research Repositories Support of harvesting & federation to/from Institutional Repositories - arXiv.org - DSpace - ePrints - Fedora - etc. Entities + Relationships can be synched to cloud storage so that they are: - Always Available - Sharable - Mixable - Harvestable Researchers manage their personal research entities(data, citations, documents, workflows, etc.)

  7. Current Project Status • Limit Tech Preview release due June 2008 • Public Beta targeted for Aug/Sept 2008 For more details • Contact: • Alex Wade (Program Manager) / alex.wade@microsoft.com • Community Forum: • http://community.research.microsoft.com/forums/90.aspx

  8. eScience and Semantic Computing meet the Cloud The cyberinfrastructure for the next generation of researchers

  9. The Future: Software plus Services for Science? • Expect scientific research environments will follow similar trends to the commercial sector • Leverage computing and data storage in the cloud • Scientists already experimenting with Amazon S3 and EC2 services, with mixed results; • For many of the same reasons • Siloed research teams, no resource sharing across labs • High storage costs • Low resource utilization • Excess capacity • High costs of reliably keeping machines up-to-date • Little support for developers, system operators

  10. A smart cyberinfrastructure • Collective intelligence • If last.fm can recommend what song to broadcast to me based on what my friends are listening to, why cannot the cyberinfrastructure of the future recommend articles of potential interest based on what the experts in the field that I respect are reading? • Already examples emerging but the process is manual(Connotea, BioMedCentral Faculty of 1000 ...) • Automatic correlation of scientific data • Smart composition of services and functionality • Cloud computing to aggregate, process, analyze and visualize data

  11. A world where all data is linked… • Important/key considerations • Formats or “well-known” representationsof data/information • Pervasive access protocols are key (e.g. HTTP) • Data/information is uniquely identified (e.g. URIs) • Links/associations between data/information • Data/information is inter-connected through machine-interpretable information (e.g. paper Xis about star Y) • Social networks are a special case of ‘data networks’ Attribution: Richard Cyganiak

  12. …and stored/processed/analyzed in the cloud visualization and analysis services scholarly communications Vision of Future Research Environment with both Software + Services domain-specific services search books citations blogs &social networking Reference management instant messaging identity mail Project management notification document store storage/data services knowledge management The Microsoft Technical Computing mission to reduce time to scientific insights is exemplified by the June 13, 2007 release of a set of four free software tools designed to advance AIDS vaccine research. The code for the tools is available now via CodePlex, an online portal created by Microsoft in 2006 to foster collaborative software development projects and host shared source code. Microsoft researchers hope that the tools will help the worldwide scientific community take new strides toward an AIDS vaccine. See more. compute services virtualization knowledge discovery

  13. Thanks you for your attention

  14. Emergence of a New Research Paradigm? • Thousand years ago – Experimental Science • Description of natural phenomena • Last few hundred years – Theoretical Science • Newton’s Laws, Maxwell’s Equations… • Last few decades – Computational Science • Simulation of complex phenomena • Today – eScience or Data-centric Science • Unify theory, experiment, and simulation • Using data exploration and data mining • Data captured by instruments • Data generated by simulations • Data generated by sensor networks • Scientists overwhelmed with data • Computer Science and IT companieshave technologies that will help (With thanks to Jim Gray)

  15. Today Scientists... • Annotate, share, discover data • Custom, standalone tools • Conferences, Journals • Publication process is long, subscriptions, discoverability issues • Collaborate on projects, exchange ideas • Email, F2F meetings, video-conferences • Use workflow tools to compose services • Domain-specific services/tools Web users... • Generate content on the Web • Blogs, wikis, podcasts, videocasts, etc. • Form communities • Social networks, virtual worlds • Interact, collaborate, share • Instant messaging, web forums, content sites • Consume information and services • Search, annotate, syndicate

  16. Data can be easily produced http://ecrystals.chem.soton.ac.uk Thanks to Jeremy Frey

  17. Data and services can be easily composed • Taverna Workflow • Compose services from the Web SensorMap Functionality: Map navigation Data: sensor-generated temperature, video camera feed, traffic feeds, etc.

  18. Data is easily accessible With thanks to Catharine van Ingen

  19. Data is easily shareable Sloan Digital Sky Server/SkyServer http://cas.sdss.org/dr5/en/

  20. Today… Computers aregreat tools for huge amountsof data For example, Google and Microsoft both have copies of the Web for indexing purposes

  21. Tomorrow… Computers will stillbe great tools for huge amountsof data We would likecomputers to alsohelp with theautomatic of the world’s information

  22. Semantic Computing

  23. What is Semantic Computing? • Set of concepts and technologies • Data modeling • Relationships • Ontologies • Machine learning (entity extraction) • Inference, reasoning • Data, information, knowledge… Current technologies Possibilities for innovation

  24. Semantics • Term used to refer to the concept of “meaning” • The linguistics, AI, Natural Language Processing, etc. communities have been working on “meaning” and ”knowledge” related technologies for decades • Pragmatic approach to Semantic Computing • Emergence of a new breed of technologies to capture meaning (RDF, OWL, etc.) • Combine with the pervasiveness of the Web community technologies such as folksonomies …

  25. A word about the “Semantic Web” • The term is used to describe a set of technologies used to represent data, concepts, and their relationships • Become a buzzword like Web 2.0 • Prefer to use the term “Semantic Computing” which is about modeling data in ways that can be automatically processed by computers

  26. Semantic Computing • Some efforts are driven by the traditional “knowledge engineering” community • Engaged in building well-controlled ontologies • Important for domain-specific vocabularies with data formats and relationships specific to a community • Model does not easily scale to the Internet • Some efforts are driven by the Web 2.0 community • Focus on the pervasiveness of Web protocols/standards • Emphasis on microformats (small, flexible, embeddable structures) • Exploit evolving and ever-expanding vocabularies such as folksonomies and tag clouds

More Related