1 / 0

Cyberinfrastructure Transforming Science and Engineering Alan Blatecky Office of Cyberinfrastructure Rutgers University

Cyberinfrastructure Transforming Science and Engineering Alan Blatecky Office of Cyberinfrastructure Rutgers University February 8, 2012. 1. Framing the Challenge: Science and Society Transformed by Data. Modern science Data- and compute-intensive Integrative, multiscale

morty
Download Presentation

Cyberinfrastructure Transforming Science and Engineering Alan Blatecky Office of Cyberinfrastructure Rutgers University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cyberinfrastructure Transforming Science and Engineering Alan Blatecky Office of Cyberinfrastructure Rutgers University February 8, 2012 1
  2. Framing the Challenge:Science and Society Transformed by Data Modern science Data- and compute-intensive Integrative, multiscale Multi-disciplinary Collaborations for Complexity Individuals, groups, teams, communities Sea of Data Age of Observation Distributed, central repositories, sensor- driven, diverse, etc
  3. Explosive Growth in Size, Complexity, and Data Rates Enormous static or streaming data sets are generated by modern experiments and observations Automatic extraction of new knowledge about the physical, biological and cyber world continues to accelerate Infusion of computation into science and engineering is revolutionizing research Multi-cores, concurrent and parallel algorithms, virtualization and advanced server architectures will enable data mining and machine learning, and discovery of “Big Data” A word cloud generated from all of the content from the Dealing with Data special section. From Science (Feb 11, 2011) 331 (6018). Reprinted with permission from AAAS.
  4. Computer Architecture Trends Continuing growth in number of cores Increased use of hybrid accelerators Continued dependence on DRAM memory Advances in interconnect technologies will slow; more complex memory subsystems will be deployed Power consumption becoming ever more important because of cost and performance Application performance will be dominated by data movement Clouds and data centers will play an increasingly larger role in data and compute infrastructure
  5. Software Challenges Simulation and model scalability is a major requirements for algorithm research and development Parallel programming research is required to address order of magnitude changes in compute resources New operating systems, architectures, and file systems Research in uncertainity quantification, fault tolerance, verification and validation, complex simulation, and cybersecurity Inadequate numbers of software workforce and expertise being produced; this is a threat to discoveryand competitiveness Focus on sustainability and usability is essential
  6. Cyberinfrastructure Ecosystem (CIF21) Organizations Universities, schools Government labs, agencies Research and Medical Centers Libraries, Museums Virtual Organizations Communities Scientific Instruments Large Facilities, MREFCs,,telescopes Colliders, shake Tables Sensor Arrays - Ocean, environment, weather, buildings, climate. etc Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and operations Cyberscience Discovery Collaboration Education Data Databases, Data repositories Collections and Libraries Data Access; storage, navigation management, mining tools, curation, privacy Computational Resources Supercomputers Clouds, Grids, Clusters Modeling, Visualization Compute services Data Centers Networking Campus, national, international networks Research and experimental networks End-to-end throughput Cybersecurity Software Applications, middleware Software development and support Cybersecurity: access, authorization, authentication Maintainability, sustainability, and extensibility
  7. CIF21 – a metaphor A goal of Virtual Proximity –-- “ you are one with your resources” Continue to collapse the barrier of distance and remove geographic location as an issue ALL resources (including people) are virtually present, accessible and secure End-to-end integrated resources Science, simulation, discovery, innovation, education are the metrics An organizing fabric and foundation for science, engineering and education
  8. CIF21 Principles Builds national infrastructure for S&E Leverages common methods, approaches, and applications – focus on interoperability Catalyzes other CI investments across NSF Provides focus and is a vehicle for coordinating efforts and programs Based upon a shared governance model involving every directorate and office AD Steering Committee Working group of Division Directors and Program Officers Managed as a coherent program by OCI Spiral Development methodology
  9. Data
  10. Scientific Data Challenges Square Kilometer Array Climate, Environment Exa Bytes Peta Bytes Tera Bytes Giga Bytes Volume/Growth Genomics Bytes per day Useful Lifetime Climate, Environment TeraGrid, Blue Waters LHC LHC LSST Distribution Genomics Many smaller datasets… 2012 2020 Data Access
  11. A Multi-tiered and Multi-Disciplinary Landscape Modeling and Simulation Communities Population, Climate, Environment Communities Observational & Experimental Communities Data-enabled Science Data Content Data Storage
  12. Data Intensive Science Data is becoming a major driver for all science, education, and engineering Increase in volume of simulation-based data will strain and break existing usage models Need significant investments in data analytics, tools and applications development Storage solutions and models already a critical problem New sustainability models for data stewardship need to be developed CDS&E workforce expertise is becoming more critical; from algorithm development to data creators and technicians
  13. Foundational Research in Large-Scale Data Management and Analysis Collection, Storage, and Management of “Big Data” Data representation, storage, and retrieval New parallel data architectures, including clouds Data management policies, including privacy and access Communication and storage devices with extreme capacities Sustainable economic models for access and preservation Data Analytics and Machine Learning Computational, mathematical, statistical, and algorithmic techniques for modeling high dimensional data Learning, inference, prediction, and knowledge discovery for large volumes of dynamic data sets, information infusion Data mining to enable automated hypothesis generation, event correlation, and anomaly detection Research in Data Sharing and Collaboration Tools for distant data sharing, real time visualization, and software reuse of complex data sets Cross disciplinary information and knowledge sharing Remote operation and real time access to distant data sources and instruments Sloan Digital Sky Survey telescope. Credit: FermilabPhotoz.
  14. Computing
  15. Advanced ComputingInfrastructure (ACI) The continuing rapid change in computing technologies coupled with the exponential growth and importance of data for science, engineering and education, requires a new NSF vision and strategy for Advanced Computing Infrastructure
  16. ACI Vision The National Science Foundation will be a leader in creating and deploying a comprehensive portfolio of advanced computing infrastructure, programs and other resources to facilitate cutting-edge foundational research in computational and data-enabled science and engineering (CDS&E) and its applications to all disciplines.  The NSF will also build on its leadership role to promote human capital development and education in CDS&E to benefit all fields of science and engineering.
  17. ACI Strategies Foundational research to fully exploit parallelism and concurrency through innovations in computational models and languages, mathematics and statistics, algorithms, compilers, operating and run-time systems, middleware, software tools, application frameworks, virtual machines, and advanced hardware. Applications research and development in use of high-end computing resources in partnerships with scientific domains, including new computational, mathematical and statistical modeling, simulation, visualization and analytic tools, aggressive domain-centric applications development, and deployment of scalable data management systems. Building, testing, and deploying both sustainable and innovative resources into a collaborative ecosystem that encompasses integration/coordination with campus and regional systems, networks, cloud services, and/or data centers in partnerships with scientific domains.
  18. ACI Strategies con’t Development of comprehensive education and workforce programs, from deep expertise in computational, mathematical and statistical simulation, modeling, and CDS&E to developing the technical workforce and enabling career paths in science, academia, government, and industry.   Development and evaluation of transformational and grand challenge community programs that support contemporary complex problem solving by engaging a comprehensive and integrated approach to science, utilizing high-end computing, data, networking, facilities, software, and multidisciplinary expertise across communities, other government agencies, and international partnerships.
  19. Evolving Dimensions and Ecology of ACI Alternative performance measures are relevant to new computing environments: Desktop computers and hand-held portable devices with powerful multi-core processors Graphics processors now used to accelerate scientific apps Clusters of multi-core computers on campuses Enormous web service datacenters “Supercomputers” All exploit parallelism in some form or other
  20. Software
  21. Software CDS&E SW as the modality for CF21 and CDS&E in the 21st Century Software is essential to every aspect of CI – “the glue” Drivers, middleware, runtime, programming systems/tools, applications, … Software crisis? Software complexity is impeding the use of Cyberinfrastructure Science apps have 103 to 106+ lines, have bugs Developed over decades – long lifecycles (~35 years) Software/systems design/engineering issues Emergent rather than by design Quality of science in question CI SW
  22. Parallelism and Concurrency are the Only Solutions for Exponential Growth in Computing Power “In the future, all software must be able to exploit multiple processors to enter into a new virtuous cycle with successive generations of parallel hardware that expands software capabilities and generates new applications.” - The Future of Computing Performance: Game Over or Next Level? 2011, the National Academy of Sciences.
  23. Computing and Computational Science Centered on Parallelism and Concurrency Programming languages to enable effective expression of parallelism and concurrency at every scale Algorithms to better exploit parallelism and concurrency Parallel architectures to achieve energy- and power-efficiency, resilient and secure systems, possibly customized for applications Techniques to map legacy apps onto parallel architectures Rethinking the canonical computing “stack” – applications, programming language, compiler, run-time systems, OS, architecture Computing education to include teaching of parallelism and concurrency
  24. Creating Scalable SoftwareDevelopment Environments Create a software ecosystem that scales from individual or small groups of software innovators to large hubs of software excellence Focus on innovation Focus on sustainability
  25. This is a golden time for innovation in computing architectures and software The end of the exponential growth in single processor performance marks the end of an era in computing. The next generation of discoveries requires fundamental breakthroughs at both the hardware and the software levels. With a shift of emphasis from HPC to ACI, there is a focus on the broader base of data and computation intensive science and engineering. Image Credit: Exploratorium.
  26. Applications
  27. The Single-Processor Performance Plateau is Problematic Scientists and engineers have an increasing appetite and need for speed and performance accentuated by emergence of massive data sets not just do old things faster but the ability to answer new questions (e.g., in physics, materials, biology, climate). Support of national defense and intelligence community will always demand increasingly more processing power Consumer needs and enterprise applications e.g., search and data mining, real-time decision-making, digital content creation, speech recognition, product design
  28. BIO NSF HPC resource use is dominated by computational modeling Stewart, U. Indiana: Adding high memory per node capacity to Xsede to accommodate genome assembly Goff, U. Arizona:Scaling phylogenetics tree visualizations to .5M Taxa
  29. BIO and Data Processing Spaulding, U Wisconsin:Quantitative trait mapping of plant root development. Automatically captured growth images processed in parallel using Open Science Grid Kelling, Cornell:Crowd-sourced bird observations modeled against annual and seasonal variation in climate parameters. Computation performed on TACC/Ranger
  30. Cyber-infrastructure: EarthCube Goal: to transform the conduct of research in geosciences by supporting community-based cyberinfrastructure to integrate data and information for knowledge management across the Geosciences. Community: More than 600 members subscribed to EarthCube web site. More than 140 on-site participants at planning Charrette and almost 140 virtual participants. GEO-OCI Partnership
  31. A Unifying Architecture and Technology Advances Will Lead to Convergence Modes of Support Well-Connected through EarthCube Loosely or Not Connected
  32. Transient & Data-intensive Astronomy New era: seeing events as they occur (Almost)here now ALMA, EVLA in radio Ice Cube neutrinos On horizon 24-42m optical? LIGO south? LSST = SDSS (40TB) every night! SKA = exabytes Simulations integrate all physics Astronomy 1500-2010 was passive. No longer! ?
  33. Disruptions Changing role of business and industry in advanced computing, data and software Ubquitious availability requires new workflow approaches, software and algorithms Data and new challenges in conducting science changes the relationship between researcher and institution
  34. Some observations Science and Scholarship are team sports Competitiveness and success will come to those who can put together the best team, and can marshal the best resources and capabilities Collaboration/partnerships will change significantly Growth of dynamic coalitions and virtual organizations International collaboration will become even more important Ownership of data plus low cost fuels growth and number of data systems Growth in both distributed systems and local systems More people want to access more data Federation and interoperability become more important
  35. More observations Innovation and discovery will be driven by analysis Mining vast amounts of new and disparate data Collaboration and sharing of information Mobility and personal control will continue to drive innovation and research communities Gaming, virtual worlds, social networks will continue to transform the way we do science, research and education The Internet has collapsed six degrees of separation and is creating a world with two or three degrees.
  36. Challenges Embrace the new world and culture of cyberinfrastructure Make multi-disciplinarity a basis for research and education Leverage diversity as a strength social networking, crowd sourcing, sensor networks Develop grand challenge communities “long tail” efforts contribute significantly Address data challenges Entire data-life cycle
  37. Challenges (2) Take the opportunity to “leap frog” capabilities by investing smart Jump to next-generation approaches End-to-end capabilities; to campus and desktop Create models and prototypes that can scale Develop next generation CDS&E education, expertise and experience
  38. New NSF Programs CREATIV SAVI I-CORPS CIF21 BIGDATA National Data Infrastructure ACI
  39. End
More Related