1 / 27

Scientific Data Infrastructure in CAS

Scientific Data Infrastructure in CAS. Dr. Jianhui Li(lijh@cnic.cn) Scientific Data Center Computer Network Information Center Chinese Academy of Sciences. Scientific databases. Scientific Data infrastructure. Application enabled environments and typical applications.

Download Presentation

Scientific Data Infrastructure in CAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Data Infrastructure in CAS Dr. Jianhui Li(lijh@cnic.cn) Scientific Data Center Computer Network Information Center Chinese Academy of Sciences

  2. Scientific databases Scientific Data infrastructure Application enabled environments and typical applications Software and Toolkits (scientific data collection, curation, and publishing, data analyzing and visualization…) Middle ware (Scientific data grid middleware, internet-based storage service middleware…) Massive storage system Data-intensive computing facilities High speed network

  3. DRC: Data Resource Center A new organization responsible for data preservation, curation and access service in CAS Long-term preservation of important data Data Resource Center collaborator Technology service Network storagespace Management system staff mass data Application service Data online service Mass data analysis and process system environment Mass data backup

  4. Infrastructure for DRC • High Speed Network • 2Gbps linked with CSTNET • 2 Gbps linked with CSTNET-CNGI • GLORIAD • Data Intensive Computing facilities • ~1000 CPU Core Clusters + Scientific Computing Grid(~200Tflops) • Massive Storage System • 1PB online disk + 5PB Tape • A storage network will start to build this year • 1 center + 1 archive center + 10 storage nodes around China • Over 20PB

  5. Scientific Databases (SDB) • A Long-term mission started in 1986 which funded by CAS • many institutes involved • long-term, large-scale collaboration • data from research, for research • Collecting multi-discipline research data and promoting data sharing • More than 350 research databases and 400 datasets by 61 institutes • Over 60TB data available to open access and download http://www.csdb.cn

  6. Scientific Databases (cont.) SDB Contents Physics & Chemistry, Geosciences, Biosciences, Atmospheric & Ocean Science, Energy Science, Material Science, Astronomy & Space Science

  7. Scientific Databases (cont.) • Database integration • Resource database • Reference database • Application oriented database Application oriented database Reference database Resource database Research database Research database

  8. Scientific Databases (cont.) • 8 Resource databases • Geo-Science • Biodiversity • Chemistry • Astronomy • Space Science • Micro biology and virus • Material science • Environment • 2 Reference databases • China Species • compound • 4 application-Oriented databases • High Energy (ITER) • Western Environment Research • Ecology research • Qinghai Lake Research

  9. CAS Scientific Data Grid Based on Scientific Data Grid Middleware (SDG) SDG is built upon the Scientific Database, supporting to find and access large scale, distributed and heterogeneous scientific data uniformly and conveniently in a SECURE and proper way Building scientific data application grid according to domain requirements Integrate distributed data, analysis tools and storage and computing facilities, providing a uniform data service interface 4 pilot grids bioscience grid geoscience grid Chemistry grid Astronomy and space science grid

  10. Function Framework of SDG A scalable and integrated data sharing environment Providing services for grid users, grid managers and resource provides Operating by the operation center, science gateways and data nodes User Grid Manager Resource Provider Operation Center Science Gateway Data Node

  11. Access Scientific Data Grid Science Gateway and access portal App-Oriented Databases Reference Databases External Data Source Resource Databases Research Database Research Database Research Database Research Database Grid Middleware Grid Middleware Software Tool

  12. VisualDB - Powered your database • A toolkit to manage, publish and share scientific database by visual configure interface without writing codes • A database integration access broker • A data quality assessment tool • A database access and usage statistics tool

  13. Function Framework of VisualDB

  14. Catalog Builder

  15. Security Center

  16. Data Forge

  17. vReport

  18. Application enabled environments and typical applications • Domain specific data intensive application environment • Support one specific research area • Integrated scientific data, storage, computing analysis model and tools • An easily and friendly interactive interface • Scalable user defined data process workflow • Typical pilot systems • Remote sensing data on-demand accessing and processing service environment • CFCI - China FLUX Cyber-Infrastructure • DarwinTree——Molecular data analysis and application environment • Atmospheric science data integration analysis platform

  19. Atmospheric science data integration analysis platform • Status quo

  20. Atmospheric science data integration analysis platform • Problems • The size of Atmospheric data has reached TB level and they are distributed. • The personal computer hard disk, memory limit of the research work • Many algorithm finished by scientific researcher can’t be shared easily.

  21. Web browser 1)custom 2)visualize AlgorithmModel Resercher Using Iterative Define workflow Result Result Data Finding Computing for Workflow Algorithm Chosen Distributed Distributed data Combined with data and model Architecture Scientific Data Analysis Online Platform

  22. Choose algorithm Config param Select Data plot Analyse result work flow Five step Iterative

  23. Select data

  24. Choose algorithm

  25. Config param

  26. plot and result

  27. Thank you!

More Related