1 / 29

GRID , Облачные технологии, Большие Данные ( Big Data )

GRID , Облачные технологии, Большие Данные ( Big Data ). Кореньков В.В. Директор ЛИТ ОИЯИ Зав. кафедры РИВС Университета «Дубна». Grids, clouds, supercomputers. Grids, clouds, supercomputers, etc. Grids Collaborative environment Distributed resources (political/sociological)

ellard
Download Presentation

GRID , Облачные технологии, Большие Данные ( Big Data )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GRID, Облачные технологии, Большие Данные (Big Data) Кореньков В.В. Директор ЛИТ ОИЯИ Зав. кафедры РИВС Университета «Дубна»

  2. Grids, clouds, supercomputers.. Grids, clouds, supercomputers, etc. • Grids • Collaborative environment • Distributed resources (political/sociological) • Commodity hardware (also supercomputers) • (HEP) data management • Complex interfaces (bug not feature) • Supercomputers • Expensive • Low latency interconnects • Applications peer reviewed • Parallel/coupled applications • Traditional interfaces (login) • Also SC grids (DEISA, Teragrid) Many different problems: Amenable to different solutions No right answer • Clouds • Proprietary (implementation) • Economies of scale in management • Commodity hardware • Virtualisation for service provision and encapsulating application environment • Details of physical resources hidden • Simple interfaces (too simple?) • Volunteer computing • Simple mechanism to access millions CPUs • Difficult if (much) data involved • Control of environment  check • Community building – people involved in Science • Potential for huge amounts of real work Mirco Mazzucato DUBNA-19-12-09 Ian Bird 2

  3. Концепция «Облачных вычислений» •  Все есть сервис (XaaS) • AaaS: приложения как сервис • PaaS: платформа как сервис • SaaS: программное обеспечение как сервис • DaaS: данные как сервис • IaaS: инфраструктура как сервис • HaaS: оборудование как сервис •  Воплощение давней мечты о компьютерном обслуживаниина уровне обычной коммунальной услуги:  масштабируемость •  оплата по реальному использованию (pay-as-you-go)

  4. Что такое Cloud computing? Централизация IT ресурсов Виртуализация IT ресурсов Динамическое управление IT ресурсами Повсеместный доступ к ресурсам Автоматизация IT процессов Упрощение IT услуг Стандартизация IT инфраструктуры

  5. Real World Problems Taking UsBEYOND PETASCALE Aerodynamic Analysis: Laser Optics: Molecular Dynamics in Biology: Aerodynamic Design: Computational Cosmology: Turbulence in Physics: Computational Chemistry: 1 Petaflops 10 Petaflops 20 Petaflops 1 Exaflops 10 Exaflops 100 Exaflops 1 Zettaflops Source: Dr. Steve Chen, “The Growing HPC Momentum in China”,June 30th, 2006, Dresden, Germany SUM Of Top500 1 ZFlops #1 100 EFlops 10 EFlops 1 EFlops 100 PFlops 10 PFlops 1 PFlops • Example Real World Challenges: • Full modeling of an aircraft in all conditions • Green airplanes • Genetically tailored medicine • Understand the origin of the universe • Synthetic fuels everywhere • Accurate extreme weather prediction What we can just model today with <100TF 100 TFlops 10 TFlops 1 TFlops 100 GFlops 10 GFlops 1 GFlops 100 MFlops 1993 1999 2005 2011 2017 2023 2029

  6. Reach Exascale by 2018From GigFlops to ExaFlops ~2018 2008 Sustained ExaFlop ~1997 Sustained PetaFlop ~1987 Sustained TeraFlop Note: Numbers are based on Linpack Benchmark. Dates are approximate. ustained GigaFlop S “The pursuit of each milestone has led to important breakthroughs in science and engineering.” Source: IDC “In Pursuit of Petascale Computing: Initiatives Around the World,” 2007

  7. Top 500

  8. КонцепцияГрид «Грид - это система, которая:  · координирует использование ресурсов при отсутствии централизованногоуправления этими ресурсами ·использует стандартные, открытые, универсальные протоколы и интерфейсы. ·обеспечивает высококачественное обслуживание» (Ian Foster: "What is the grid? ", 2002г.) • Модели грид: • Distributed Computing • High-Throughput Computing • On-Demand Computing • Data-Intensive Computing • Collaborative Computing • Междисциплинарныйхарактергрид:развиваемыетехнологииприменяются вфизикевысокихэнергий, космофизике, микробиологии, экологии, метеорологии, различныхинженерных и бизнес приложениях. • Виртуальные организации (VO)

  9. Мобильный доступ О Б Е С П Е Ч Е Н И Е Г Р И Д П Р О М Е Ж У Т О Ч Н О Е П Р О Г Р А М М Н О Е Суперкомпьютеры, ПК- кластеры Массовая память, сенсоры, эксперименты Рабочие станции Визуализация Интернет, сети Грид - это средство для совместного использования вычислительных мощностей и хранилищ данных посредством интернета

  10. Enter a New Era in Fundamental Science The Large Hadron Collider (LHC), one of the largest and truly global scientific projects ever built, is the most exciting turning point in particle physics. CMS LHCb ALICE ATLAS Exploration of a new energy frontier Proton-proton and Heavy Ion collisions at ECM up to 14 TeV LHC ring: 27 km circumference TOTEM LHCf MOEDAL Korea and CERN / July 2009

  11. Large Hadron Collider …observed in giant detectors Collision of proton beams…

  12. 1.25 GB/sec (ions) Ian.Bird@cern.ch

  13. Tier 0 – Tier 1 – Tier 2 • Tier-0 (CERN): • Data recording • Initial data reconstruction • Data distribution • Tier-1 (11 centres): • Permanent storage • Re-processing • Analysis • Tier-2 (>200 centres): • Simulation • End-user analysis 13 Ian.Bird@cern.ch

  14. The Worldwide LHC Computing Grid (WLCG)

  15. GRID as a brand-new understanding of possibilities of computers and computer networks foresees a global integration of information and computer resources. At the initiative of CERN, a project EU-dataGrid started up in January 2001 with the purpose of testing and developing advanced grid-technologies. JINR was involved with this project. The LCG (LHC Computing Grid) project was a continuation of the project EU-dataGrid. The main task of the new project was to build a global infrastructure of regional centres for processing, storing and analysis of data of physical experiments on the Large Hadron Collider (LHC). 2003 – Russian Consortium RDIG – Russian Data Intensive Grid – was established to provide a full-scale participation of JINR and Russia in the implementation of the LCG/EGEE project. 2004 – The EGEE (Enabling Grid for E-Science) projects was started up. CERN is its head organization, and JINR is one of its executors. 2010 – The EGI-InSPIRE project (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe)

  16. Country Normalized CPU time (2013) All Country - 12,797,893,444 Russia-345,722,044 Job446,829,34012,836,045 16

  17. JINR Central Information and Computing Complex (CICC) JINR-LCG2 Tier2 Site Foreseen computing resources to be allocated for JINR CICC RDIG Normalized CPU time (HEPSPEC06) per site(2010-2013) ~ 18 million tasks were executed in 2010-2013 CICC comprises 2582Cores Disk storage capacity 1800 TB Availability and Reliability = 99% JINR covers 40% of the RDIG share to the LHC

  18. Multifunctional centre for data processing, analysis and storage • Tier2 level grid-infrastructure to support the experiments at LHC (ATLAS, ALICE, CMS, LHCb), FAIR (CBM, PANDA) as well as other large-scale experiments; • a distributed infrastructure for the storage, processing and analysis of experimental data from the accelerator complex NICA; • a cloud computing infrastructure; • a hybrid architecture supercomputer; • educational and research infrastructure for distributed and parallel computing.

  19. Collaboration in the area of WLCG monitoring • The Worldwide LCG Computing Grid (WLCG) today includes more than 170 computing centers where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites. • Monitoring of the LHC computing activities and of the health and performance of the distributed sites and services is a vital condition of the success of the LHC data processing • For several years CERN (IT department) and JINR collaborate in the area of the development of the applications for WLCG monitoring: • - WLCG Transfer Dashboard • - Monitoring of the XRootD federations • - WLCG Google Earth Dashboard • - Tier3 monitoring toolkit

  20. JINR distributed cloud grid-infrastructure for training and research There is a demand in special infrastructure what could become a platform for training, research, development, tests and evaluation of modern technologies in distributed computing and data management. Such infrastructure was set up at LIT integrating the JINR cloud and educational grid infrastructure of the sites located at the following organizations: Main components of modern distributed computing and data management technologies • Institute of High-Energy Physics (Protvino, Moscow region), • Bogolyubov Institute for Theoretical Physics (Kiev, Ukraine), • National Technical University of Ukraine "Kyiv Polytechnic Institute" (Kiev, Ukraine), • L.N. Gumilyov Eurasian National University (Astana, Kazakhstan), • B.Verkin Institute for Low Temperature Physics and Engineering of the National Academy of Sciences of Ukraine (Kharkov,Ukraine), • Institute of Physics of Azerbaijan National Academy of Sciences (Baku, Azerbaijan) Scheme of the distributed cloud grid-infrastructure

  21. Library of congress Big Data Has Arrived at an Almost Unimaginable Scale Business e-mails sent per year 3000 PBytes Content Uploaded to Facebook each year. 182 PBytes ATLAS Managed Data Volume 130 PBytes Climate ATLAS Annual Data Volume 30 PBytes LHC Annual 15 PBytes Nasdaq Google search index 98 PBytes Youtube 15 PBytes Health Records 30 PBytes US census NEC 2013 http://www.wired.com/magazine/2013/04/bigdata

  22. Evolving PanDA for Advanced Scientific Computing NEC 2013 • Proposal titled “Next Generation Workload Management and Analysis System for BigData” – Big PanDAstarted in Sep 2012 (funded DoE) • Generalization of PanDA as meta application, providing location transparency of processing and data management, for HEP and other data-intensive sciences, and a wider exascale community. • There are three dimensions to evolution of PanDA • Making PanDA available beyond ATLAS and High Energy Physics • Extending beyond Grid (Leadership Computing Facilities, High-Performance Computing, Google Compute Engine (GCE), Clouds, Amazon Elastic Compute Cloud (EC2), University clusters) • Integration of network as a resource in workload management

  23. Leadership Computing Facilities. Titan Big Data Workshop Slide from Ken Read

  24. Tier1 center The Federal Target Programme Project: «Creation of the automated system of data processing for experiments at the LHC of Tier-1 level and maintenance of Grid services for a distributed analysis of these data» Duration:2011 – 2013 • March 2011 - Proposal to create the LCG Tier1 center in Russia (official letter by Minister of Science and Education of Russia A. Fursenko has been sent to CERN DG R. Heuer): • NRC KI for ALICE, ATLAS, and LHC-BLIT JINR (Dubna) for the CMS experiment September 2012 – Proposal was reviewed by WLCG OB and JINR and NRC KI Tier1 sites were accepted as a new “Associate Tier1” Full resources - in 2014 to meet the start of next working LHC session.

  25. JINR Tier1 Connectivity Scheme

  26. JINR CMS Tier-1 progress Engineering infrastructure (system of uninterrupted power supply and climate-control); High-speed reliable network infrastructure with the allocated reserved channel to CERN (LHCOPN); Computing system and storage system on the basis of disk arrays and tape libraries of high capacity; 100% reliability and availability.

  27. US-BNL CERN Bologna/CNAF Ca-TRIUMF Russia: NRC KI Taipei/ASGC NDGF US-FNAL JINR UK-RAL Amsterdam/NIKHEF-SARA De-FZK Barcelona/PIC Lyon/CCIN2P3

  28. Frames for Grid cooperation of JINR • Worldwide LHC Computing Grid (WLCG) • Enabling Grids for E-sciencE (EGEE) - Now is EGI-InSPIRE • RDIG Development • Project BNL, ANL, UTA “Next Generation Workload Management and Analysis System for BigData” • Tier1 Center in Russia (NRC KI, LIT JINR) • 6 Projects at CERN • BMBF grant “Development of the grid-infrastructure and tools to provide joint investigations performed with participation of JINR and German research centers” • “Development of grid segment for the LHC experiments” was supported in frames of JINR-South Africa cooperation agreement; • Development of grid segment at Cairo University and its integration to the JINR GridEdu infrastructure • JINR - FZU AS Czech Republic Project “The grid for the physics experiments” • NASU-RFBR project “Development and support of LIT JINR and NSC KIPT grid-infrastructures for distributed CMS data processing of the LHC operation” • JINR-Romania cooperation Hulubei-Meshcheryakovprogramme • JINR-Moldova cooperation (MD-GRID, RENAM) • JINR-Mongolia cooperation (Mongol-Grid) • Project GridNNN (National Nanotechnological Net) 28

  29. Резюме Подготовка высококвалифицированных ИТ-кадров Современная инфраструктура (суперкомпьютеры, грид, cloud) для обучения, тренинга, выполнения проектов; Центры передовых ИТ-технологий, на базе которых создаются научные школы, выполняются проекты высокого уровня в широкой международной кооперации с привлечением студентов и аспирантов; Участие в мегапроектах (LHC, FAIR, NICA, PIC), в рамках которых создаются новые ИТ-технологии (WWW, Grid, Big Data); Международные студенческие школы, летние практики на базе высокотехнологических организаций (ЦЕРН, ОИЯИ, НИЦ «Курчатовский институт» 29

More Related