1 / 14

Discovery Net

Discovery Net. Discovery Net. Yike Guo , John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering) Tony Cass (Department of Biochemistry), S evket Durucan (T. H. Huxley School of Environment)

michel
Download Presentation

Discovery Net

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovery Net Discovery Net • Yike Guo, John Darlington (Dept. of Computing), • John Hassard (Depts. of Physics and Bioengineering) • Bob Spence (Dept. of Electrical Engineering) • Tony Cass (Department of Biochemistry), • Sevket Durucan (T. H. Huxley School of Environment) • Imperial College London

  2. AIM • To design, develop and implement an infrastructure to support real time processing, interaction, integration, visualisation and mining of massive amounts of time critical data generated by high throughput devices.

  3. Industry Connection : 4 Spin-off companies + related companies (AstraZeneca, Pfizer, GSK, Cisco, IBM, HP, Fujitsu, Gene Logic, Applera, Evotec, International Power, Hydro Quebec, BP, British Energy, ….) The Consortium

  4. Hardware : sensors (photodiode arrays, hybrid photodiodes, PMTs), systems (optics, mechanical systems, DSPs, FPGAs) Software (analysis packages, algorithms, data warehousing and mining systems) Intellectual Property: access to IP portfolio suite at no cost Data: raw and processed data from biotechnology, pharmacogenomic, remote sensing (GUSTO installations, satellite data from geo-hazard programmes) and renewable energy data (from our own remote tidal power systems) Industrial Contribution

  5. Distributed Reference DBs Distributed Users Collaborative applications Distributed Devices Distributed warehousing High Throughput Sensing Characteristics • Different Devices but same computational characteristics • Data intensive & • Data dispersive • large scale, • heterogeneous • distributed data • Real-time data manipulation Need to • calibrate • integrate • analyse Discovery issues:Distributed Knowledge Discovery, Management Incremental, Interactive Discovery & Collaborative Discovery Information issues:annotations semantics, reference, integrated view of data Data issues:different measurements for same object: Data registration, normalisation, calibration & quality control GRID issues:wide area, high volume, scalability (data, users), collaboration

  6. DNet Architecture High Throughput Sensing (HTS) Applications Large-scale Dynamic Real- time Decision support Large-scale Dynamic System Knowledge Discovery Based on Kensington Discovery Platform Grid-based Knowledge Discovery Grid-based Data Mining, Collaborative Visualisation Information Structuring Information Integration & Composition, Semantics & Domain-based Ontologies, Sharing Distributed Data Engineering Data Registration, Data Normalisation, Data Quality Based on Globus & ORB Infrastructure High Throughput Computing Services Utilising Grid Infrastructure for HT Computing Grid Basic Infrastructure Globus/Cordon/SRB

  7. Throughput (GB/s) Size (petabytes) Node Number operations Testbed Applications HTS Applications Large-scale Dynamic Real- time Decision support Large-scale Dynamic System Knowledge Discovery 1-10 1-10 >20000 Structuring Mining Optimisation RT decisions • Renewable energy Applications • Tidal Energy • Connections to other renewable initiatives • (solar, biomass, fuel cells), & to CHP and baseload stations • Remote Sensing Applications • Air Sensing, GUSTO • Geological, geohazard analysis 1-100 10-100 >50000 Image Registration Visualisation Predictive Modelling RT decisions • Bio Chip Applications • Protein-folding chips: SNP chips, Diff. Gene chips using LFII • Protein-based fluorescent micro arrays 1-1000 10-1000 >10000 Data Quality Visualisation Structuring Clustering Distributed Dynamic Knowledge Management

  8. Large-scale urban air sensing applications GUSTO GUSTO Each GUSTO air pollution system produces 1kbit per second, or 1010 bits per year. We expect to increase the number (from the present 2 systems) to over 20,000 over next 3 years, to reach a total of 0.6 petabytes of data within the 3-year ramp-up. The useful information comes from time-resolved correlations among remote stations, and with other environmental data sets. NO simulant 6.7.2001 You are here

  9. Renewables characterised by • large number of small units, • often in remote areas • wireless connectivity • fluctuating,unpredictable loading • As total exceeds 12% grid control • becomes very difficult • without RT e-grid. Electrical grid There is large potential in embedded generation renewable sources – they will dominate in new build (nuclear., hydro and carbon) power stations. Decentralised power is the new paradigm. . • active management, • RT monitoring, • RT control, • minute to minute security, • pan network optimisation. • This requires very high bandwidth • RT remote station data acquisition, • warehousing and analysis.

  10. End devices Floor switches Central Computing Facilities Building Router Switches workstation cluster wireless SMP Core Router Switches storage • Access to disparate off-campus sites: IC hospitals, Wye College etc. Proposed Firewall London MAN/ JANET The IC Advantage The IC infrastructure: microgird for the testbed Over than 12000 end devices 10 Mb/s – 1Gb/s to end devices ICPC Resource 1 Gb/s between floors 150 Gflops Processing 10 Gb/s to backbone >100 GB Memory 10 Gb/s between backbone router matrix and wireless capability 5 TB of disk storage £3m SRIF funding Network upgrade +20 TB of disk storage 2x1Gb/s to LMAN II (10Gb/s scheduled 2004) +25 TB of tape storage 3 Clusters (> 1 Tera Flops)

  11. Particle Physics and Astronomy Research Council (PPARC) • ASTROGRID (http://www.astrogrid.ac.uk/) • a ~£5M project aimed at building a data-grid for UK astronomy, which will form the UK contribution to a global Virtual Observatory

  12. Particle Physics and Astronomy Research Council (PPARC) • GridPP (http://www.gridpp.ac.uk/) • to develop the Grid technologies required to meet the LHC computing challenge • collaboration with international grid developments in Europe and the US

  13. EPSRC Testbeds (1) • MyGrid Personalised extensible environments for data-intensive in silico experiments in biology • Distributed Aircraft Maintenance Environment • RealityGrid closely couple high performance computing, high throughput experiment and visualization

  14. EPSRC Testbeds (2) • GEODISE : Grid Enabled Optimisation and DesIgn Search for Engineering • CombiChem : Combinatorial ChemistryStructure-Property Mapping • Discovery Net : High Throughput Sensing

More Related