1 / 44

Data, Data Everywhere

Data, Data Everywhere. Why We Need Broadband Connectivity By Ruzena Bajcsy. Who Generates the Data?. Astronomers Biologists High Energy Physicists Geophysicists Archeologists and Anthropologists Psychologists Engineers Artists. Center for Information Technology Research

pennie
Download Presentation

Data, Data Everywhere

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data, Data Everywhere Why We Need Broadband Connectivity By Ruzena Bajcsy

  2. Who Generates the Data? • Astronomers • Biologists • High Energy Physicists • Geophysicists • Archeologists and Anthropologists • Psychologists • Engineers • Artists

  3. Center for Information Technology Research in the Interest of Society A Year of Innovation and Accomplishment UC Santa Cruz

  4. Solving Societal-Scale Problems • Energy Conservation • Emergency Response and Homeland Defense • Transportation Efficiency

  5. Solving Societal-Scale Problems Monitoring Health Care Land and Environment Education

  6. Massive Cluster Clusters Gigabit Ethernet “Server” Scalable, Reliable, Secure Services “Client” Information Appliances MEMS Sensors Societal-Scale Systems Secure, non-stop utility Diverse components Adapts to interfaces/users Always connected

  7. February 2001 February 2000 February 2002 August 2001

  8. $8,000 each Seismic Monitoring of Buildings: Before CITRIS

  9. Seismic Monitoring of Buildings: With CITRIS Wireless Motes $70 each

  10. Ad-hoc sensor networks work • 29 Palms Marine Base, March 2001 • 10 Motes dropped from an airplane landed, formed a wireless network, detected passing vehicles, and radioed information back • Intel Developers Forum, Aug 2001 • 800 Motes running TinyOS hidden in auditorium seats started up and formed a wireless network as participants passed them around • tinyos.millennium.berkeley.edu

  11. Recent Progress:Energy Efficiency andSmart BuildingsArens, Culler, Pister, Orens, Rabaey, Sastry

  12. 800 700 600 500 $/MWh 400 300 200 100 0 20000 25000 30000 35000 40000 45000 MW The Inelasticity of California’s Electrical Supply Power-exchange market price for electricity versus load (California, Summer 2000)

  13. How to Address the Inelasticity of the Supply • Spread demand over time (or reduce peak) • Make cost of energy • visible to end-user • function of load curve (e.g. hourly pricing) • “demand-response” approach • Reduce average demand (demand side) • Eliminate wasteful consumption • Improve efficiency ofequipment and appliances • Improve efficiency of generation and distribution network (supply side) Enabled by Information!

  14. Energy Consumption in Buildings (US 1997) Commercial End Use Residential 6.7 2.0 Space heating 1.5 1.1 Space cooling 2.7 0.9 Water heating 1.7 0.6 Refrigerator/Freezer 1.1 3.8 Lighting 0.6 - Cooking 0.6 - Clothes dryers 0.8 - Color TVs 0.4 0.6 Ventilation/Furnace fans - 1.4 Office equipment 3.0 4.9 Miscellaneous 19.0 15.2 Total (Units: quads per year = 1.05 EJ y-1) Source: Interlaboratory Working Group, 2000

  15. A Three-Phase Approach • Phase 1: Passive Monitoring • The availability of cheap, connected (wired or wireless) sensors makes it possible for the end-user to monitor energy-usage of buildings and individual appliances and act there-on. • Primary feedback on usage • Monitor health of the system (30% inefficiency!) • Phase 2: Quasi-Active Monitoring and Control • Combining the monitoring information with instantaneous feedback on the cost of usage closes the feedback loop between end-user and supplier. • Phase 3: Active Energy-Management through Feedback and Control—Smart Buildings and Intelligent Appliances • Adding instantaneous and distributed control functionality to the sensoring and monitoring functions increases energy efficiency and user comfort

  16. Cory Hall Energy Monitoring Network • 50 nodes on 4th floor • 30 sec sampling • 250K samples to database over 6 weeks • Moved to Intel Lab – come play!

  17. Smart Buildings Dense wireless network of sensor, control, andactuator nodes • Task/ambient conditioning systems allow conditioning in small, localized zones, to be individually controlled by building occupants and environmental conditions • Joint projects among BWRC/BSAC, Center for the Built Environment (CBE), IEOR, Intel Lab, LBNL

  18. Conventional Overhead System Underfloor Air Distribution Control of HVAC systems

  19. Control of HVAC Systems • Underfloor system can save energy because it can get hotter near ceiling • Project with CBE (Arens, Federspiel) • Need temperature sensors at different heights • Simulation results • Hot August day in Sacramento • Underfloor HVAC saves 46% of energy • Future: test in instrumented room

  20. More sensors – air velocity • Uses time of flight of sound to determine 3D air velocity • Significance • Heat transfer (energy) • Air quality • Perception of temperature

  21. Smart Dust Goes National • Academia:UCSD, UCLA, USC, MIT, Rutgers, Dartmouth, U. Illinois UC, NCSA, U. Virginia, U. Washington, Ohio State • Industry:Intel, Crossbow, Bosch, Accenture, Mitre, Xerox PARC, Kestrel • Government:National Center of Supercomputing, Wright Patterson AFB

  22. Why Broadband Connectivity When Memory Is So Cheap? • Because users want to interact with the data in real time • Users need to access the data at the right time and at the right place • They need to access data in the right format • They want the right amount of data

  23. Examples • Distributed computation • Cluster technology • The Berkeley Millenium Project

  24. Cluster Counts • NOW (circa 1994) 4proc HP ->36proc SPARC10 ->100proc Ultra1 • Millennium Central Cluster (Intel Donation) • 99 Dell 2300/6400/6450 Xeon Dual/Quad: 332 processors • Total: 211GB memory, 3TB disk • Myrinet 2000 + 1000Mb fiber ethernet • OceanStore/ROC cluster, Astro cluster, Math cluster, Cory cluster, more • CITRIS Pilot Cluster : 3/2002 deployment (Intel Donation) • 4 Dell Precision 730 Itanium Duals: 8 processors • Total: 20GB memory, 128GB disk • Myrinet 2000 + 1000Mb copper ethernet

  25. Current Network

  26. CITRIS Network Rollout

  27. Network Rollout • Millennium Cluster • Keep existing Nortel 1200/1100/8600 • New Foundry FastIron 1500 • CITRIS Cluster • New Foundry FastIron 1500 • Backbone • 2 Foundry BigIron 8000 • Cost of expansion $280K (SimMillennium)

  28. Millennium Cluster Tools • Rootstock Installation • Ganglia Cluster Monitoring • gEXEC – remote execution/load balancing • Pcp – parallel copying/job staging All in production, open source, cluster community development on sourceforge.net

  29. Rootstock Installation Tool • Installation configuration stored centrally • Build local cluster specific root from central root • Install/reinstall cluster nodes from local rootstock • http://rootstock.millennium.berkeley.edu/ • Has become basis for http://rocks.npaci.edu/ cluster distribution.

  30. Ganglia Monitoring • Coherent distributed hash of cluster information • Static: cpu speed, total memory, software versions, boottime, upgradetime etc. • Dynamic: load, cpu idle, memory available, system clock, etc. • Heartbeat • Customizable with simple API for any other metric • Data is exchanged in well defined XML and XDR • Lightweight – small memory footprint and minimal communication (tunable). • Scalable – tested on several 512+ node clusters • Trusted hosts - feature allows clusters of clusters to be linked within a single monitoring and execution domain. • Ported to Linux, FreeBSD, Solaris, AIX, and IRIX, +active development by community for other ports • Dell Open Cluster Group seriously evaluating this as basis for their cluster computing tool distribution. “The only monitoring that scales over 64 nodes”

  31. gEXEC – remote execution • History • Glunix from NOW • rEXEC from Millennium • gEXEC UCB/CalTech collaboration • Lightweight – minimal number of threads on frontend + fanout • Decentralized – no central point of failure • Fault tolerant – fallback ability + failure checks at runtime • Interactive – feels like a single machine • Load balanced from Ganglia Monitoring data • Scalable to at least 512 nodes. • Unix authorization plus cluster keys e.g. gexec –n 3 hostname gexec –n 0 render –in input.${VNN} –out output.${VNN}

  32. Pcp – parallel copy • Newest addition to cluster suite • Fanout copy of files/directories to nodes • Scalable • Used for job staging • Future of this tool is to wrap it up as an option into gEXEC.

  33. Centre National De La Recherche Scientifique http://www.in2p3.fr SDSChttp://www.sdsc.edu IE&M http://iew3.technion.ac.il/ GMX http://www.gmx.fr CAS, Chemical Abstracts Service http://www.cas.org Keldysh Institute of Applied Mathematics (Russia) http://www.kiam1.rssi.ru LUCIE (Linux Universal Config. & Install Engine) http://matsu-www.is.titech.ac.jp/~takamiya/lucie/ Mellanox Technologies http://www.mellanox.co.il/ TerraSoft Solutions (PowerPC Linux) http://terraplex.com/tss_about.shtml Intelhttp://www.intel.com/ BellSouth Internet Services http://services.bellsouth.net/external/ ArrayNetworks http://www.clickarray.com/ MandrakeSofthttp://www.mandrakesoft.com Technische Universitat Graz http://www.TUGraz.at/ GeoCrawler http://www.geocrawler.com/ Crayhttp://www.cray.com/ Unlimited Scale http://www.unlimitedscale.com/ UCSF Computer Sciencehttp://cs.usfca.edu/ RoadRunner http://www.houston.rr.com Veritas Geophysical Integrity http://www.veritasdgc.com Dow http://www.dow.com/ The Max Planck Society for the Advancement of Science http://www.mpg.de Lockheed Martinhttp://www.lockheedmartin.com Duke Universityhttp://www.duke.edu Framestore Computer Film Company http://www.framestore-cfc.com nVidia http://www.nvidia.com/ SAIC http://www.saic.com Paralogic http://www.plogic.com/ Singapore Computer Systems Limited http://www.scs.com.sg/ Hughes Network Solutions http://www.hns.com University of Washington, Computer Science http://www.cs.washington.edu Experian http://www.experian.com L'Universite de Geneva http://www.unige.ch Purdue Physics Departmenthttp://www.physics.purdue.edu/ Atos Origin Engineering Services http://www.aoes.nl/ Teraport http://www.teraport.se Daresbury Laboratory http://www.dl.ac.uk Clinica Sierra Vista http://www.clinicasierravista.org LondonTown http://www.londontown.com/ National Hellenic Research Foundation http://www.eie.gr RightNow Techologies http://www.rightnow.com/ Idaho National Engineering and Environmental Laboratory http://www.inel.gov WesternGeco http://www.westerngeco.com 80/20 Software Tools http://rc.explosive.net Optiglobe Brazil http://www.optiglobe.com.br Brunel University http://www.brunel.ac.uk Cinvestav Instituto Politecnico Nacional http://www.ira.cinvestav.mx Conexant http://www.hotrail.com Dellhttp://www.dell.com/ SuSE Linuxhttp://www.suse.de Arabic on Linux http://www.planux.com Delgado Community College, New Orleans http://www.dcc.edu Boeinghttp://www.boeing.com RedHathttp://www.redhat.com/ University of Pisa, Italy http://www.df.unipi.it Ecole Normale Superieure De Lyon http://www.ens-lyon.fr iMedium http://www.imedium.com Moving Picture Company http://www.moving-picture.com Professional Service Super Computers http://www.pssclabs.com AlgoNomics http://www.algonomics.com Ocimum Biosolutions http://www.ocimumbio.com Caltechhttp://www.caltech.edu VitalStream http://www.publichost.com Sandia National Laboratoryhttp://www.sandia.gov/ UC Irvinehttp://www.uci.edu Guide Corporation http://www.guidecorp.com/ Matav http://www.matav.hu Math Tech, Denmark http://www.math-tech.dk Istituto Trentino Di Cultura http://www.itc.it Compaqhttp://www.compaq.com/ National Research Council Canada http://www.nrc.ca Overture http://www.overture.com Petroleum Geo-Services http://www.pgs.com National Research Laboratory of the US Navyhttp://www.nrl.navy.mil White Oak Technologies, Inc. http://www.woti.com/ Known Sites Using Ganglia Cluster Toolkit Most popular cluster and distributed computing software on sourceforge.net Over 7000 downloads since release of 1/2002

  34. Grid computing • Working with key cluster software developers from research and industry to standardize cluster tools within the Global Grid Forum (GGF).

  35. CITRIS Cluster • Goal is to build a production level cluster environment that supports and is driven by CITRIS applications • NOW mostly experimental • Millennium ½ developmental ½ production • Clusters adopted as primary compute platform • ~800 current Millennium users • 65% average CPU utilization on Millennium cluster, many times 100% utilization • 50% of top 20 PACI users compute on Linux clusters for development and production runs.

  36. 2 Frontend Nodes Myrinet 2000 Campus Core Foundry 8000 2 2 Foundry 1500 1TFlop 1.6TB memory 100 Dual Itanium Compute Nodes 100 100 Foundry 8000 10 Storage Nodes 10 10 10 1 Gigabit Ethernet Myrinet 50TB Fibre Channel Storage Fibre Channel

  37. Steve Brenner ProjectLarge Molecular Sequence and Structure Databases • These databases are in gigabytes • They provide web services in which low latency is important • They often work remotely • The campus 70Mbit limit is increasingly saturated, making it impossible to effectively provide services and do the work • They need tele/video conferencing over IP

  38. Background of the Brain Imaging Center at Berkeley • Campus-wide resource dedicated to Functional Magnetic Resonance Imaging (FMRI) research • Non-invasive “neuroimaging” technique used to investigate the blood flow correlates of neural activity • BIC houses a Varian 4 Tesla scanner and Neuroimaging Computational Facility providing collaboration among neuroscientists, physicists, chemists, statisticians, ee and cs scientists

  39. Currant LAN • Due to high volume of data, we established high speed connections between computers in buildings around the campus • LAN consists of two Cisco Catalyst 6500 switches connected with optic fiber and communicate at Gigabit Ethernet speed • Workstations connected to network at Fast Ethernet speed (100 Mbits/sec, full duplex)

  40. WAN Needs • Geographically distributed collaborative researchers and immense data sets make high speed networking a priority. • Collaborations exist between researchers at UCSD, UCSF, UC Davis, Stanford, Varian Inc. and NASA Ames. • With spiral imaging, we will soon be capable of generating data in excess of 1MB/s per scanner

  41. CITRIS Network in Smart Classroom

More Related