1 / 40

Tools for Repositories: Microsoft Research & the Scholarly Information Ecosystem

Tools for Repositories: Microsoft Research & the Scholarly Information Ecosystem. Lee Dirks, Alex Wade & Oscar Naim External Research Division / Microsoft Corporation. featuring Joe Townsend Unilever Centre for Molecular Informatics University of Cambridge. Agenda.

morrisa
Download Presentation

Tools for Repositories: Microsoft Research & the Scholarly Information Ecosystem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools for Repositories: Microsoft Research & the Scholarly Information Ecosystem Lee Dirks, Alex Wade & Oscar Naim External Research Division / Microsoft Corporation featuring Joe Townsend Unilever Centre for Molecular Informatics University of Cambridge

  2. Agenda

  3. Microsoft Research MSR Cambridge • Research lab locations: • Redmond, Washington (Sep, 1991) • San Francisco, California (Jun, 1995) • Cambridge, United Kingdom (July, 1997) • Beijing, China (Nov, 1998) • Mountain View, California (July, 2001) • Bangalore, India (Jan, 2005) • Cambridge, MA (July ,2008) MSR Asia MSR India

  4. Microsoft External Research • Division within Microsoft Research focused on partnerships between academia, industry and government to advance computer science, education, and research in fields that rely heavily upon advanced computing • Supporting groundbreaking research to help advance human potential and the wellbeing of our planet • Developing advanced technologies and services to support every stage of the research process • Microsoft External Research is committed to interoperability and to providing open access, open tools, and open technology

  5. Worldwide External Research Themes Community and Geographic Outreach Advanced Research Tools and Services

  6. Mission • Optimize and extend Microsoft software to meet the specific needs of the academic community • Our approach: • Conduct applied projects to enhance academic productivity by evolving Microsoft’s scholarly communication offerings • Microsoft External Research is uniquely positioned to drive this initiative across Microsoft

  7. business model (dual benefit)

  8. The Scholarly Communication Lifecycle Excel 2010 Windows Server HPC “Astoria” / “Pop Fly” Collaboration SharePoint LiveMeeting Office Live • Office 2010: • Word • PowerPoint • Excel • OneNote • Tablet PC/UMPC Office OpenXML XPS Format SQL Server & Entity Framework Rights Management Data Protection Manager Discoverability FAST MSR Academic Search “Bookweb” SharePoint 2010 Word 2010 + PowerPoint 2010 WPF & Silverlight “Sea Dragon” / “PhotoSynth” / “Deep Zoom”

  9. Goal: Transform Scholarly Communication • Interoperability is essential • Actively lobby and drive for consensus around technical standards and standardized protocols proactively adopted by the community; enable broad community engagement • Customers have told Microsoft that interoperability is OUR responsibility • Leverage Existing Community Protocols, Practices, Guidelines, etc. • Example – metadata conventions / taxonomies / ontologies: a traditional strength for libraries – and a critical component in enabling Web 2.0 • Optimize for data-driven research • To both data (scientific) and to information (scholarly publications) • Reproducible research + computational science • Properly document / annotate scholarly output • Data preservation (and provenance) should be baseline • Documentation of the data’s provenance • Preservation needs to be like “accessibility” features – i.e., assumed as required • Semantic knowledge discovery & social networking • Harnessing collective intelligence must be a consideration – since accessing research is a core step in the life-cycle. Enable knowledge discovery • Optimize for Web 2.0 scenarios and allow end-users/experts to find things easier

  10. Who we work with

  11. Membership / Participation DataCite is an international consortium to establish easier access to scientific research data on the Internet increase acceptance of research data as legitimate, citable contributions to the scientific record, and to support data archiving that will permit results to be verified and re-purposed for future study. The Open Planets Foundation has been established to provide practical solutions and expertise in digital preservation, building on the €15 million investment made by the European Union and Planets consortium. OPF members benefit from the Planets results, new developments and the growing OPF community that includes experts at some of the most prestigious research, technology and memory institutions in Europe. The Confederation of Open Access Repositories (COAR) is a not-for-profit association of repository initiatives launched in October 2009. It aims to enhance greater visibility and application of research outputs through global networks of Open Access digital repositories. The Coalition for Networked Information (CNI) is an organization dedicated to supporting the transformative promise of networked information technology for the advancement of scholarly communication and the enrichment of intellectual productivity. Membership includes some 200 institutions representing higher education, publishing, network and telecommunications, information technology, and libraries and library organizations. ICSTI, the International Council for Scientific and Technical Information, offers a unique forum for interaction between organizations that create, disseminate and use scientific and technical information. ICSTI's mission cuts across scientific and technical disciplines, as well as international borders, to give member organizations the benefit of a truly global community. CrossRefis a not-for-profit membership association whose mission is to enable easy identification and use of trustworthy electronic content by promoting the cooperative development and application of a sustainable infrastructure. CrossRef'sgeneral purpose is to promote the development and cooperative use of new and innovative technologies to speed and facilitate scholarly research.

  12. Membership / Participation (cont’d) Global Research Library 2020 (GRL2020) brings together researchers and stakeholders from diverse domains and countries to deliberate how the challenges ahead can best be addressed. Particular emphasis is placed on both the technical and non-technical hurdles that need addressing as instrumental in defining the steps to advance towards such an infrastructure. BioMed Central & Microsoft Research seek to recognize researchers who have published in BioMed Central’s journals and have demonstrated leadership in the sharing, standardization, publication, or re-use of biomedical research data at our Annual Research Awards.

  13. Working with publishers

  14. Outreach: Engagements + Dialogue Publishers Partner Ecosystem Societies & Related

  15. Scholarly Communications: Projects Overview Current or Completed Projects: • Cornell – arXiv.org + Word 2007 (and repository interoperability via SWORD) • MIT / Broad Institute – Authoring (Word 2007) + data for research reproducibility • MSR – CMT++ interoperability with data + metadata transfer/exchange (conference management tool enhancements) • LiveLabs – eJournal publishing online service (community publishing tool) • UC San Diego / PLoS – Semantic mark-up of scholarly articles (+ submission) • Chem4Word with Office & Cambridge University – Create add-in to Word 2007 to facilitate drawing of chemical compounds and equations • Johns Hopkins University – Digital Archive for Astronomy/Astrophysics data (storage, preservation and access) • Planets Project / EU (with MSR – Cambridge) OpenXML and file format preservation + interoperability • eChemistry Project (Cornell, Penn State, Indiana, Cambridge, Southampton) – ORE exemplar: access to compound chemical info objects (cross-repository access to open chemistry data) • British Library – Researcher Information Centre (RIC) online workflow tool for scientists and researchers • Creative Commons Add-in for Office 2007 – evolving the Word 2003 effort • University of Southampton (UK) – Port ePrints Repository Software for installation on the Windows platform • University of Manchester / “MyExperiment” Project – social networking for scientists • ORE Acceleration Project (OAI – Object Reuse & Exchange) – Alpha spec development • UK National Archives – Virtual PC / Emulation of legacy systems to facilitate preservation • National Library of Medicine / NCBI – “PubMed Int’l” UK version of PubMed + NLM DTD • DRIVER 2 (EU) – Infrastructure integration of across a network of European research repositories

  16. GenePattern Reproducible Research Add-in Services: Connects to GenePattern database Relationships: Inline graphics are synchronized to dataset Data: Control and execute query pipelines into GenePattern Data: Resulting data (and provenance) stored within Word document Source code and binary: http://GenepatternWordAddin.codeplex.com

  17. Creative Commons Add-in for Office 2007 Intent: Insert Creative Commons licenses from within Office 2007 Services: Integrates with Creative Commons Web API to create new licenses Relationships: license information stored as RDF XML within the document OOXML Source code and binary: http://ccaddin2007.codeplex.com

  18. Ontology Add-in for Word 2007 Services: Ontology download web service • John Wilbanks • Phil Bourne • Lynn Fink Intent: Term recognition & disambiguation Relationships: Ontology browser Source code and binary: http://research.microsoft.com/ontology/

  19. Article Authoring Add-in for Word 2007 Services: repository deposit via SWORD Structure: Read, convert, and author NLM XML documents Relationships: ORE Resource Map creation Relationships: Citation lookup and reference management Structure: Client-side XML validation Binary (version 2.0): http://research.microsoft.com/authoring/ This work is licensed under a Creative Commons Attribution 3.0 United States License.

  20. Chem4Word - Chemistry Drawing in Word Author/edit 1D and 2D chemistry. Change chemical layout styles. • Peter Murray-Rust • Joe Townsend • Jim Downing Intent: Recognizes chemical dictionary and ontology terms Relationships: Navigate and link referenced chemistry Data: Semantics stored in Chemistry Markup Language <?xmlversion="1.0" ?> <cmlversion="3" convention="org-synth-report" xmlns="http://www.xml-cml.org/schema"> <moleculeid="m1"> <atomArray> <atomid="a1" elementType="C" x2="-2.9149999618530273" y2="0.7699999809265137" /> <atomid="a2" elementType="C" x2="-1.5813208400249916" y2="1.5399999809265137" /> <atomid="a3" elementType="O" x2="-0.24764171819695613" y2="0.7699999809265134" /> <atomid="a4" elementType="O" x2="-1.5813208400249912" y2="3.0799999809265137" /> <atomid="a5" elementType="H" x2="-4.248679083681063" y2="1.5399999809265137" /> <atomid="a6" elementType="H" x2="-2.914999961853028" y2="-0.7700000190734864" /> <atomid="a7" elementType="H" x2="-4.248679083681063" y2="-1.907348645691087E-8" /> <atomid="a8" elementType="H" x2="1.0860374036310796" y2="1.5399999809265132" /> </atomArray> <bondArray> <bondatomRefs2="a1 a2" order="1" /> <bondatomRefs2="a2 a3" order="1" /> <bondatomRefs2="a2 a4" order="2" /> <bondatomRefs2="a1 a5" order="1" /> <bondatomRefs2="a1 a6" order="1" /> <bondatomRefs2="a1 a7" order="1" /> <bondatomRefs2="a3 a8" order="1" /> </bondArray> </molecule> </cml> Intelligence: Verifies validity of authored chemistry Available soon: http://research.microsoft.com/chem4word/

  21. Data Curation Add-in for Microsoft Excel PROPOSED • Microsoft Research, in partnership withCalifornia Digital Library’s Curation Center • Collaboration with Trisha Cruse & John Kunze • Part of the DataONE (an NSF DataNet Project) • Proposed functionality under consideration: • Support for versioning, so that revision history and the original raw data can be easily protected and recovered, • Standardized date/time stamps so that researchers can easily determine when the data were created and last updated. • A “workbook builder” allowing researchers to select from globally shared standardized layouts for capturing data, • Ability to export metadata in a standard format (e.g., a DataCite citation or an EML document that describes the dataset(s) in a workbook) so that researchers can readily share their data, • Ability to select from a globally shared vocabulary of terms for data descriptions (e.g., column names), and as needed to add new terms to the globally shared vocabulary, to enable wide collaboration between researchers • Ability to import term descriptions from the shared vocabulary and annotate them locally to refine their definitions as used in the dataset, • “Speed bumps” to discourage use of macros and customizations that would impede interoperation of data imported from Excel into other applications, and • Ability to deposit data and metadata directly into a data archive to enable compliance with funding agency requirements to preserve and publish research data.

  22. Research Information Centre (RIC) Collaborative environment for researchers Personal site for each researcher and project site for each project Federated search, co-search, annotations, tags, ratings, etc. Social networking, real-time communication, blogs, wikis Project site navigation and tool based on project lifecycle Available soon: http://research.microsoft.com/ric/

  23. Project Trident: Scientific Workflow Workbench Author, Execute and Monitor Workflows View data products, performance metrics, and provenance data Compose and modify workflows via drag & drop canvas Organize collection of individual workflow activities • http://research.microsoft.com/collaboration/tools/trident.aspx Available now: http://research.microsoft.com/collaboration/tools/trident.aspx

  24. Large collaboration project focusing on interoperability • At-source capture of chemistry data • Chemical structure search • Compound object authoring • Retrospective harvesting of chemistry data • Reuse through common ORE data model • Semantic authoring • Virtualized triple storage oreChem – The Chemical Semantic Web • Geoffrey Fox • Carl Lagoze • Jeremy Frey • Simon Coles • Peter Murray-Rust • Jim Downing • Nico Adams • Lee Giles • Karl Mueller • PrasenjitMitra • Demonstrating: • Large collaboration project focusing on interoperability • At-source capture of chemistry data • Chemical structure search • Compound object authoring • Retrospective harvesting of chemistry data • Reuse through common ORE data model • Semantic authoring • Virtualized triple storage Semantic storage experiments scientists documents molecules text data molecules data Compound document authoring Mash-up (re-use) of data measurements

  25. Zentity – a Research Output Repository Platform Native support for RSS, OAI-PMH, OAI-ORE, AtomPub and SWORD Default web UI with CSS support and custom ASP.Net controls Flexible data model enables many scenarios and can be easily extended over time A semantic computing platform to store and expose relationships between digital assets Binary (version 1.0): http://research.microsoft.com/zentity/

  26. How to engage with Microsoft External Research • Tell us more about your projects, your workflows, your issues. We’re always in “requirements gathering mode” • Email us directly at scholar@microsoft.com with questions and ideas • Especially if you are already utilizing Microsoft technologies • Download our add-ins and try them, and then give us feedback! • Let us know if we can facilitate a connection with the appropriate product group(s) • Follow announcements via our RSS feedor via our Facebook group

  27. Questions? Lee Dirks Director—Education & Scholarly Communication Microsoft External Research ldirks@microsoft.com http://research.microsoft.com/people/ldirks URL – http://www.microsoft.com/scholarlycomm/ Facebook: Scholarly Communication at Microsoft

  28. "I am pleased that Microsoft is taking innovative steps to support more open, efficient, and effective scholarly communication in the digital networked environment. For example, the free eJournal  Service gives many scholarly societies a valuable new option for online publication and a way to avoid taking on high costs. The Article Authoring and Creative Commons add-ins to Word also are good news, offering capacities that could bring down production costs and allow authors to better manage their intellectual property rights." – Heather Joseph, Executive Director, SPARC (Scholarly Publishing & Academic Resources Coalition) "Partnering with members of the scholarly community, Microsoft External Research is working to facilitate the next step in the transformation of scholarly communications with networking tools built into Microsoft products. The Article Authoring add-in for Microsoft Word2007 permits authors to produce documents directly in the format used by the NLM's PubMed Central repository, and is a significant step towards producing next-generation documents semantically tied to distributed network databases and relevant ontologies.  …We look forward to further enhancements, permitting autonomous discovery of related documents, relevant materials, and other linkages, accelerating the move towards a better integrated scholarly knowledge network." – Paul Ginsparg, professor of Physics, Computing and Information Science at Cornell University (and founder of arXiv.org)  “NCBI welcomes Microsoft’s decision to support NLM format XML in the Article Authoring add-in for Microsoft Word.  NLM’s archival format for electronic documents has been adopted by the Library of Congress and the British Library, and directly supporting this standard in Word is an important step toward simplifying the process to archive the scientific literature. It also opens doors to new possibilities to integrate data and tools with the traditional scientific authoring process.” –James M. Ostell, Ph.D. – Chief, Information Engineering Branch, National Center for Biotechnology Information (NCBI), NLM, NIH “Technology that effectively addresses the increasing need to integrate the research lifecycle and provide a holistic end-to-end perspective has the potential to revolutionize the way academics collect data, publish findings and preserve information.  Companies that work closely with academia can understand how their products might benefit the scholarly workflow and so inform their product development. Microsoft is engaged with the academic community and is releasing a series of tools aimed at streamlining the academic workflow.” – Daniel Pollock, Vice President & Lead Analyst at Outsell, Inc., a research and advisory firm specializing in the information and education industries “The Article Authoring add-in for Microsoft Word2007 will enable scholars and scholarly publishers to use the familiar Word environment for writing, editing, and tagging scholarly articles in the industry standard NLM XML DTD. With about two million articles authored and published every year, the potential impact on this Add-In should not be underestimated.” – Ahmed Hindawi, CEO of Hindawi Publishing Corporation  “There are fundamental shifts taking place in how we manage the flow of scientific knowledge, and they bring demand for new tools that expand our choices for knowledge sharing and collaboration.  We’re thrilled that Microsoft has taken these important steps to meet that demand.” – John Wilbanks, Vice President of Science at Creative Commons (from http://creativecommons.org/weblog/entry/8661)

  29. Other relevant web services technologies from

  30. Microsoft Translator Query-time translation Embeddable widget Bilingual side-by-side viewer http://www.microsofttranslator.com/AddIn.aspx http://www.microsofttranslator.com/dev/ajax/

  31. Bing Translator Bilingual Viewer

  32. Now, we have the essential ingredients for real-time translation of science • National science databases in multiple languages • Federated search • Multilingual translation on both front and back end of the user experience A public-private partnership, introduced as Multilingual WorldWideScience.orgBeta

  33. With the launch of Multilingual WorldWideScience.org, we are . . . • Opening vast reservoirs of heretofore under-utilized scientific knowledge • Providing equal access to science for anyone on the Internet • Promoting scientific collaboration, participation, and transparency . . . and accelerating scientific discovery!

  34. Discovery & Visualization Tools • NodeXL • Network visualization & exploration for Excel • http://nodexl.codeplex.com • Gazer • Silverlight Control for graphical network browsing • FacetLens • WPF control for faceted browsing • http://research.microsoft.com/cue/facetlens/

  35. Document Conversion Service Convert to and from Word, ODF, Word Perfect , RichText, and UOF View documents in various formats Compare original and converted documents http://odf-converter.sourceforge.net/

  36. Microsoft Research – Academic Research Betahttp://academic.research.microsoft.com • Based out of MSR Asia (in Beijing) • Formally known as “Libra” • Focus = Computer Science • Expanding to include www.arXiv.org (Physics) • Key functionality includes • Find top papers in a domain (20+ domains within Computer Science) • Easily search the top papers, authors, conferences, and journals for a topic • See details about a specific paper, author, conference or journal • Quickly find relationships between authors (with a visual explorer) • Get a related Bing Answer

  37. Supports conference workflow Bidding Author Feedback Camera Ready Submissions Paper Submission Paper Assignment Paper Decision Making Time Reviewing Author Notification Discussions Sessions and Presentations Peer Reviewing Conference Capture & Online Publishing Conference Management Tool (CMT)Service for Academic Conference Management http://cmt.research.microsoft.com Includes features/functionality: • Peer-reviewing of academic conferences/workshops • Conference capture and online publishing • Interoperability with other scholarly publication services (SWORD compliant) • arXiv.org, eJournal, etc. • Web service for managing academic conference workflows • A no-cost hosted service sponsored by MSR since 1999

  38. Usage Statistics • 240+ conferences used CMT in the past 12 months • Includes large conferences such as CVPR, VLDB, ACM SIGMOD • 40K+ distinct users from 90+ different countries • ~15K papers managed 8/23/2013 40 TCI

More Related