1 / 52

Pasquale Pagano D4Science Technical Director National Research Council, ISTI-CNR

D4Science technical features and opportunities of the grid infrastructure for large scale data management. Pasquale Pagano D4Science Technical Director National Research Council, ISTI-CNR. www.d4science.eu. A closer look on gCube technology. Presentation Services.

ofira
Download Presentation

Pasquale Pagano D4Science Technical Director National Research Council, ISTI-CNR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. D4Science technical features and opportunities of the grid infrastructure for large scale data management Pasquale Pagano D4Science Technical Director National Research Council, ISTI-CNR www.d4science.eu

  2. A closer look on gCube technology Presentation Services • gCube architectural overview • Services, layers & specifications Information Organisation Services Information Retrieval Services The talk is not about this Core Services

  3. Outline • D4Science Mission • D4Science World • E-Infrastructure • Technology • Service • D4Science Exploitation • Data Management • VREs • Summing Up D4Science technical features

  4. D4Science mission D4Science mission to provide a scientific e-Infrastructures that • removes all heterogeneity, sustainability, scalability, and other technical concerns from the minds of scientists, • hides all related complexities from their perception, and • enables them to focus on their science and collaborate on common research challenges gCube is a framework to manage e-infrastructures where it is possible to define, host, and maintain dynamic Virtual Research Environments (VREs) capable to satisfy the collaboration needs of distributed Virtual Organizations (VOs) D4Science technical features

  5. From a testbed to a production ecosystem Oct .’04 Nov.’07 Jan.’08 Oct .’09 Dec.’09 Sept.’11 D4Science technical features

  6. From a testbed to a production ecosystem Oct .’04 Nov.’07 Jan.’08 Oct .’09 Dec.’09 Sept.’11 functionality gCube gLite D4Science technical features

  7. D4Science: a Threefold World D4Science technical features

  8. Infrastructure vs. e-Infrastructure • An infrastructure is the basic physical and organizational structures and facilities (roads, power supplies, ..) needed for the operation of a society or enterprise • The D4Science e-Infrastructure provides support for effectiveconsumption of shared resources: • hardware-bound resources (i.e. networks, storage, instruments, and computational resources), • system-level software resources (i.e. basic middleware services), • and application-level software resources (i.e. data sources and services). D4Science technical features

  9. Infrastructure vs. e-Infrastructure • An infrastructure • Connects remote places by providing facilities to assist supported resources and consumers. • Has policies • The D4Science e-infrastructure • enables scientific communities to cooperate within a coherent model, regardless of the location of their research facilities • Enforces policies D4Science technical features

  10. D4Science e-Infrastructure (1/2) • Facilitate the life of scientists by hiding the complexity D4Science e-Infrastructure 50 Gb Data analysis D4Science technical features

  11. D4Science e-Infrastructure (2/2) • Facilitate the life of scientists by supporting collaboration D4Science e-Infrastructure 50 Gb 50 Gb access share D4Science technical features

  12. D4Science as e-Infrastructure: Key Features • D4Science e-Infrastructure provides scientists with • Easy-to-use tools for infrastructuralresources registration and management • Cost-effective tools for data resource registration, metadata generation, and curation • Seamless access to shared, distributed and heterogeneous resources organized in dynamically created Virtual Research Environments D4Science technical features

  13. e-Infrastructure Resources The D4Science managed resources are: D4Science technical features

  14. e-Infrastructure Resources [cont.] D4Science technical features

  15. e-Infrastructure Site C Site A Site B D4Science technical features

  16. Virtual Organization • A Virtual Organization (VO) specifies how a set of users can access a set of resources • by defining • what is shared, • who is allowed to share, • the conditions under which sharing occurs • and enforcing the authentication and authorization policies. VO D4Science technical features

  17. Virtual Research Environment (1/3) • VRE scenarios • Data needs to be assessed before to make it publically exploitable by the VO members. • Restricted set of users have to collaborate to refine processes and implement show cases. • Products generated through elaboration of data or simulation have to be validated by expert users. Is the VO adequate to represent a growing aggregation of resources tailored to satisfy the evolving needs of the user community? • NO, it is not ! D4Science technical features

  18. Virtual Research Environment (2/3) VRE 1 VRE resources can be published in the VO at any time by the VRE data managers. • Virtual Research Environment (VRE) is • a distributed and dynamically created environment • where subset of resources can be assigned to a subset of users • for a limited timeframe. VO VRE 2 D4Science technical features

  19. Virtual Research Environment (3/3) • A Virtual Research Environment (VRE) supports cooperative activities like • data analysis and processing; • data generation, integration, enrichment, and curation; • production of new knowledge using specialized tools D4Science technical features

  20. Infrastructure, Virtual Organisation and VRE VRE VO Infrastructure D4Science technical features

  21. D4Science as Technology: Key Features • gCube Core (gCore) • simplifies and standardizes all systemic aspects of service development; • promotes the adoption of best practices in multiprogramming and distributed programming • gCube Enabling Services • lift the Grid approach for batch job execution and resource sharing to Web Services deployment and invocation • in a SOA empowered e-Infrastructure D4Science technical features

  22. gCore: innovation in developing • An initiative to reduce complexity in the design and implementation of gCube services • an application framework for the consolidation / development of existing/new services • the gCube Core Framework (gCF) • An initiative to meet the needs of system administrators, infrastructure managers, and resource providers • an easy-to-install, self-contained sandbox to participate to the D4Science empowered e-Infrastructure • the gCube Core Distribution (gHN) D4Science technical features

  23. gCube Enabling Services – IS gCube provides an Information and Monitoring System where rich set of resources including computing, storage, service, data, metadata, and applications can be independently of their type : • registered, discovered, and accessed • monitored, shared in a controlled way, accounted Is a simple Registry sufficient to manage a growing set of heterogeneous resources? • NO, it is not ! D4Science technical features

  24. gCube Enabling Services - IS [cont] • gCube Information System: collects information about the capabilities and status of all resources: • Glue schema for computational and storage resources • profiles for gCube services and their running instances • profiles for content and metadata collections • Currently it manages more than • 100 M operations per year • Serving more than 300 web services D4Science technical features

  25. gCube Information System gHN embedded Mandatory D4Science technical features

  26. gCube Enabling Services – dynamic application building VRE 1 • gCube VRE Management System: • manages services and applications • reduces deployment costs • reduces operational costs and application porting timeframes • grants execution only to certified software VO VRE 2 It reduces the costs related to e-Infrastructure ownership, maintenance, and upgrade without compromising the essence of secure sharing D4Science technical features

  27. gCube VRE Management System gHN embedded Mandatory D4Science technical features

  28. D4Science as a Services Provider: Key Features • gCube Service Frameworkstailored set of services to effectively manage all resources by providing seamlessly discover, access, and retrieval of data, metadata, and annotations through a variety of tools and protocols • gCube Documentationtailored set of manuals to maximise the exploitation of the functionality by users, developers, and system administrators. D4Science technical features

  29. gCube Services – powerful information model describe • gCube Data Management System • Persistently stores compound objects • Manages heterogeneous metadata • Supports metadata cleaning, enrichment, and transformation by exploiting mapping schema, controlled vocabulary, thesauri, and ontology similar to VRE 2 VRE 1 • Supports programmatic/manual annotation of content, e.g. data provenance • Supports content linking • Provides support for collections • Supports collections sharing across VREs aggregate C 1 C 2 C 3 D4Science technical features

  30. gCube Data Management System D4Science technical features

  31. gCube Services – flexible IR • gCube Search Management provides an XML-based query language over full text, geospatial, and temporal information • Maximizes the usefulness of resources available to VREusers by • promoting resource sharing • avoiding suboptimal usage • Combines information retrieval and data processing capabilities VRE 1 D4Science technical features

  32. gCube Search Management • Search types • Structured data (fielded search / xml search) • Semi structured data (xml search) • Geospatial / temporal data (R-Tree) • Content based search • Full text search • Image similarity search • Access • XML-based Query Language • Web user interface (portal / search portlets) • Command line UI • Retrieval • Incremental result delivery • Automatic caching • Result persistence D4Science technical features

  33. gCube Services – collaboration Collaborative Environment: a workspace where users can • share • Private data • Data process results • Annotation • Process definition • Derived data • collaborate • to define new document templates, new documents • to tune applications and processes to compare execution results • … opens unique opportunities for virtual collaborations • Contain both objects owned by the workspace owner and objects the workspace owner has been allowed to see, e.g. group objects; D4Science technicalfeatures

  34. Exploiting D4Science:Data Management D4Sciene technical features

  35. Data Resources Staging Data and metadata formats, access protocols Compound object, relationships, policies Data, descriptive and provenance metadata Data workflow specification D4Science technical features

  36. Data Resources Staging I D4Science technical features

  37. Data Resources Staging II AquaMaps IO EEA Report IO URI global cover Indian Ocean report / part multi media & multi part alternative views (a 2D map and 13 Global 3D views) … … talk Asia data D4Science technical features

  38. Data Resources Staging III • AquaMaps IO • Descriptive metadata in proprietary format • Data and metadata generated by filtering and rendering Relational DB data • Standard classification (Phylum – Class – Order – Family – Species) • Data provenance injected • EEA Report IO • Descriptive metadata in DC • Data and metadata generated by web pages scavenging • Data provider classification (e.g. Agriculture, Land use) • Data provenance injected D4Science technical features

  39. Data Resources Staging IV • based on a scripting language • abstracting over the gCube powerful data model • 3 object types: {collection, resource, relationship} • Each object has a set of properties • Each object has a unique “external identifier” • equipped with common data manipulation constructs, e.g., XSLT, Xpath • provided with predefined data and metadata importers • hiding infrastructure complexities • Very compact workflow specifications D4Science technicalfeatures

  40. Exploiting D4Science:The VREs

  41. AquaMaps Grid implementation of the current AquaMaps.org approach • Takes benefit from the computing capabilities • Adds advanced filtering • Manages integration of different data sources • Generates provenance data • 5 seconds to generate an AquaMaps object • Up to hundreds concurrent generation • Bulk support • Still to come a facility to compare maps D4Science technicalfeatures

  42. FCPPS VRE Provides support for the generation of fisheries and aquaculture country report • Uses annotations as a means for the editors to communicate on specific topics and sections • Supports aggregation of evolving data • Enriched with a rich set of metadata • Generates provenance data • HTML publishing with a variety of XSLT • OpenXML export • Text, Images, TimeSeries D4Science technicalfeatures

  43. ICIS VRE Offers a set of tools to manage capture statistics • Supports the complete TS lifecycle • Supports validation, curation, and analysis • Provides support for data reallocation • Produces uniform data-set • Generates provenance data • Multiple key families support • Filtering, grouping, and aggregation • Union • Still to come facilities to perform complex reallocation rules • Still to come facilities to compare large TSs D4Science technicalfeatures

  44. Summing up

  45. Exploitation Models A new user community can exploit gCube / D4Science • By creating a new infrastructure • Different communities can run their own infrastructure • The new community provides all resources • By joining the D4Science infrastructure • The production infrastructure currently serves two user communities (Earth Monitoring and Fisheries Management) • The new community provides part of the resources D4Science technicalfeatures

  46. VOs & VREs building • A VRE brings together different types of resources through a well defined cost-effective process by offering a rich variety of functionality to access and exploit them. • The creation of the community environment is simple and easy: • A new VO can join one infrastructure in less then 1 day • A new VRE can be deployed in less then 1 hours • Many automatic deployment & configuration operations managed via the gCube Portal D4Science technicalfeatures

  47. D4Science & the Grid • Grid is controlled sharing of computing and storage facilities • D4Science provides controlled sharing of • Computing and storage facilities • Services and applications • Data, metadata and related resources • To offer control-oriented and cross-domain content-oriented applications to store, describe, curate, annotate, search, select, merge, and transform heteregeneous information • In the landscape of an on-demand created collaborative environment (VRE) D4Science technical features

  48. WS-* WSRF X-* WS-BPEL JSR Glue Schema GSI-Security Java Globus Toolkit gLite gCube Specifications, Standards & Technologies • More Exploited: • DC • ISO19* • More coming: • OAI-PMH & OAI-ORE • WS-DAI • OpenSearch • OpenGIS - related https://quality.wiki.d4science.research-infrastructures.eu/quality/index.php/Standards

  49. The gCube Technology is open source. Questions?

  50. gCube Main Links • gCube software • http://software.d4science.research-infrastructures.eu/ • gCube Administrator Guide • https://wiki.gcore.research-infrastructures.eu/gCube/index.php/Administrator_Guide • gCube User Guide • https://technical.wiki.d4science.research-infrastructures.eu/documentation/index.php/User%27s_Guide • gCube Developer Guide • https://technical.wiki.d4science.research-infrastructures.eu/documentation/index.php/Developer%27s_Guide

More Related