1 / 20

EUDAT Towards a European Collaborative Data Infrastructure

EUDAT Towards a European Collaborative Data Infrastructure. Alison Kennedy and Rob Baxter Jan 2012. Outline of the talk. EUDAT concept EUDAT consortium EUDAT service approach Expected benefits and challenges of a Collaborative Data Infrastructure.

fuller
Download Presentation

EUDAT Towards a European Collaborative Data Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EUDATTowards a European Collaborative Data Infrastructure Alison Kennedy and Rob Baxter Jan 2012

  2. Outline of the talk • EUDAT concept • EUDAT consortium • EUDAT serviceapproach • Expectedbenefits and challenges of a Collaborative Data Infrastructure

  3. EUDAT Key facts and objectives

  4. The current data infrastructurelandscape: challenges and opportunities • Long history of data management in Europe: several existing data infrastructures dealing with established and growing user communities (e.g., ESO, ESA, EBI, CERN) • New Research Infrastructures are emerging and are also trying to build data infrastructure solutions to meet their needs (CLARIN, EPOS, ELIXIR, ESS, etc.) • A large number of projects providing excellent data services (EURO-VO, GENESI-DR, Geo-Seas, HELIO, IMPACT, METAFOR, PESI, SEALS, etc.) • However, most of these infrastructures and initiatives address primarily the needs of a specific discipline and user community • Challenges • Compatibility, interoperability, and cross-disciplinary research • how to re-use and recombine data in new scientific contexts (i.e. across disciplinary domains) • Data growth in volume and complexity (the so-called “data tsunami”) • strong impact on costs threatening the sustainability of the infrastructure • Opportunities • Potential synergies do exist: although disciplines have different ambitions, they have common basic needs and service requirements that can be matched with generic pan-European services supporting multiple communities, thus ensuring at the same time greater interoperability. • Strategyneeded at pan-Europeanlevel

  5. Towards a Collaborative Data Infrastructure Source: HLEG report, p. 31 • EUDAT willfocus on building a generic data infrastructurelayeroffering a trusteddomain for long term data preservationwithservices to store, identify, authenticate and minethese data. • Thisneedbedone in closecollaborationwith the Communities • Collaboration the communities involved in designing specific services and the data centers willing to provide generic solutions (coreservicesmustmatchcommunitiesrequirements) • Communityservicescanalsobeincorporated into the common data serviceinfrastructurewhentheyare of use to othercommunities.

  6. EUDAT core services Core services arebuilding blocks ofEUDAT‘s Common Data Infrastructure mainlyincluded on bottomlayerofdataservices • Fundamental Core Services • Long-termpreservation • Persistent identifierservice • Data accessandupload • Workspaces • Web executionandworkflowservices • Single Sign On (federated AAI) • Monitoringandaccountingservices • Network services • Extended Core Services (community-supported) • Joint metadataservice • Joint dataminingservice No need to match the needs of all at the same time, addressing a group of communities can be very valuable, too

  7. The EUDAT Consortium

  8. The EUDAT Communities

  9. The EUDAT Communities

  10. The EUDAT Communities (byfield) • EUDAT targetsallscientificdisciplines(disciplineneutral): • To enable the capture and identifycross-disciplinerequirements • To involving the scientists of all the communities in the shaping of the • infrastructure and itsservices

  11. EUDAT Services Activities – Iterative Design • EUDAT’s Services activity is concerned with identification of the types of data services needed by the European research communities, delivering them through a federated data infrastructure and supporting their users • 1. Capturing Communities Requirements (WP4) • Services to be deployed must be based on user communities needs • Strong engagement and collaboration with user communities (EUDAT communities and beyond) to capture requirements • 2. Building the services (WP5) • Userrequirementsmustbematchedwithavailabletechnologies • Need to identify: • availabletechnologies and tools to develop the required services (technologyappraisal) • gaps and marketfailuresthatshouldbeaddressedby EUDAT researchactivities • Services must be designed, built and tested in a pre-production test bed environment and made available to WP4 for evaluation by their users • 3. Deploying the services and operating the federatedinfrastructure (WP6) • Services mustbedeployed on the EUDAT infrastructure and made available to users, withinterfaces for cross-site, cross-communityoperation • Infrastructure mustprovidefull life cycle data management services, ensuring the authenticity, integrity, retention and preservation of data, especiallythosemarked for long-termarchiving.

  12. Capturing Communities Requirements – First steps • 1st round of interviews with initial communities (October-December 2011) • Series of f2f and phone interviews with ENES, EPOS, CLARIN, VPH and Lifewatch • Goals: • Understand how data is organised in each community • Collect first wishes and specific requirements from a common data service layer • Outcome • Differences and commonalities found, both on data organisations (e.g. databases, PID systems, etc.) and wishes (e.g. safe/dynamic replication needs). • First shortlist of services identified • 2nd round of interviews to refine analysis (December-February 2012) • 1st EUDAT User Forum, 7-8 March 2012, Barcelona • Getting additional communities involved in the design process and discussion • Capturing communities requirements • Continuing existing work with EUDAT core communities • Expanding the analysis to the ”second circle” and other interested communities

  13. Building the services SSO Federated AAI Registries Accounting Monitoring PID Integration Replication Hosting services Up/down load Services Workspace services Workflow engines Meta data services Long Term Preservation Web execution services Policy/Rule Based services

  14. Building the services Scounting for technologies Tecnology appraisal Argus VOMS Moonshot Match technologies and identify gaps Task 5.1 Cilogon Shibboleth SAM-Nagios Propose solutions Deisa Accounting DPMDB BDII EPIC Handle Integration Adapt services to match requirements Candidate Services Replix Cloud Beehup FTS Integrate with community and SP services Task 5.2 Weblicht Hadoop EasyStore Webdav HPSS ActiveBPEL Test and evaluate with communities DMF Hcatalog iCAT dCache Many more… Productize services iRODS TSM * Existing technologies

  15. Site A Site A Site A Site A Deputy Deputy Deputy Deputy Site B Site C Site D Site A SUPPORT SUPPORT SUPPORT SUPPORT Services Services Services Services Storage Storage Storage Storage Compute Compute Compute Compute Network Network Network Network Security Security Security Security Operating the federated infrastructure (WP6) Coordination & Support Service Provisioning ResourceProvisioning QA, Security & Trust

  16. Different classes of generic services: community-specific, multi-community, domain-specific and cross-domain services

  17. Key strategicpillars for sustainability Sustainability

  18. EUDAT Timeline 1st User Forum 2nd User Forum 3rd User Forum 4th User Forum EUDAT Kick-Off SustainabilityPlan Cross- Community Services Fullcore Services deployed First Services available USER REQUIREMENTS SERVICE DESIGN SERVICE DEPLOYMENT Service deployment 2012 2013 2014 2015

  19. Expectedbenefits of a Collaborative Data Infrastructure • Cost-efficiency through shared resources and economies of scale • Better exploitation of synergies between communities and service providers • Support to existing scientific communities’ infrastructures and smaller communities • Trans-disciplinarity • Inter-disciplinary collaboration • Communities from different disciplines working together to build services • Data sharing between disciplines – re-use and re-purposing • Each discipline can solve only part of a problem • Cross-border services • Data nowadays distributed across states, countries, continents, research groups are international • User-driven infrastructure • User-centric approach in designing the services, testing and evaluation • Strategic user empowerment in governance approach • Sustainability • Ensuring wide access to and preservation of data • Greater access to existing data and better management of data for the future • Increased security by managing multiple copies in geographically distant locations • Put Europe in a competitive position for important data repositories of world-wide relevance

  20. How to join the EUDAT initiative? • EUDAT has now 25 partners coming from 13 countries • Scaling the infrastructure to other countries and partners is good (increase in complexity and richness, new solutions and practices, etc.) and is needed to build up a pan-European solution • EUDAT project team currently defining best way to integrate new partners to the initiative • User forums open to all stakeholders interested in adapting their solutions or contributing to the design of the infrastructure • Associated membership also being defined to allow external partners to follow and to contribute to the activities of the project.

More Related