Tokyo Institute of Technology
Download
1 / 14

Tokyo Institute of Technology Commodity Grid Computing Infrastructure (and other Commodity Grid resources in Japan) - PowerPoint PPT Presentation


  • 290 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

Tokyo Institute of Technology Commodity Grid Computing Infrastructure (and other Commodity Grid resources in Japan). Satoshi Matsuoka Professor, GSIC & Dept. Mathematical and Computing Sciences Tokyo Institute of Technology. “Commodity Grid” Resources Starting April, 2002 (under our control).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Tokyo Institute of Technology Commodity Grid Computing Infrastructure (and other Commodity Grid resources in Japan)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tokyo Institute of TechnologyCommodity Grid Computing Infrastructure(and other Commodity Grid resources in Japan)

Satoshi Matsuoka

Professor, GSIC & Dept. Mathematical and Computing Sciences Tokyo Institute of Technology


“Commodity Grid” Resources Starting April, 2002(under our control)

  • 1. My Lab Machines: ~900 CPUs, ~2 TeraFlops, IA32

  • 2. Titech (Campus) Grid Resources: ~800 CPUs, 1.3 TeraFlops, IA32 “Blades”

  • 3. “Nationwide” Commodity Grid Experiment: ~several hundred IA32 CPUs (2H2002)

  • Total: 1700~2000 processors in 2002

    • 256 Processors for APGrid Testbed


Titech GSIC Matsuoka Lab Grid Cluster Infrastructure (4Q2001)

1H2002 Total: 6 Clusters, 890 Procs ~2 TFlops(Peak), >100TeraByte


Full Assistance from AMD, Donation of Athlon CPUs

Gfarm/LHC/ATLAS, Bioinfomatic Apps

Presto III(1), April, 2001 – 80 node Athlon 1.33Ghz

206 Gigaflops Peak

Top500 2001/June, 439th

The first-ever AMD-Powered Cluster on the Top500

PrestoIII(2) – Oct, 2001 Dual 256 proc AthlonMP 1.2Ghz

614 GigaFLOPS Peak

Top500 2001/Nov. 331.7GigaFlops, 86th

Presto III(3) – April 2002 – Dual 512 proc AthlonMP 1900+

1.6 TeraFlops Peak, 1TFlops Linpack

100Terabytes (for LHC/ATLAS)

Presto III Athlon Cluster (2001-2002)1.6 TeraFlops, 100Terabytes


Grid Clusters at TITECH Matsuoka Lab


Dependable, Fault-tolerant clustering for the Grid

Parakeet Fault Tolerant MPI

Fault Tolerant GridRPC

Plug&Play Clustering

Extended Parakeet

Lucie Dynamic Cluster installer

Heterogeneous Clustering

Hetergeneous Omni OpenMP

Heterogeneous High Performance Linpack (HPL)

Muliple Cluster Coupling on the Grid

Grid Projects on Clusters

Ninf-G GridRPC

SOAP/XML GridRPC prototype

GFarm – middleware for Petascale data processing

Grid Performance Benchmarking and Monitoring

Bricks Parallel Grid Simulator

JiPANG Jini-based Grid Portall

Java for Cluster and the Grid

OpenJIT Flexible high-performance JIT compiler

JavaDSM – Secure and Portable Java DSM System for Clusters

Titech Grid – Campus Grid Infrastructure

Matsuoka Lab Grid Clustering Project


2. Background for theTitech Campus Commodity Grid

  • Titech GSIC operates 3 supercomputers

    • 16 proc SX-5, 256 proc Origin2K, 64 proc AlphaServer GS320, 400GFlops total

    • All are heavily utilized (99.9% for SX-5)

    • Annual rental budget: $5 million

    • We have 4 years left on 6-year rental contract

    • All may disappear from the Top500 in Heidelberg (June, 2002)

    • We don’t have large extra money for new stuff or staff

  • Chicken and Eggs Problem for Grid Adoption

    • Most Japanese SC centers share the same problem


Titech Campus Grid - System Image

  • Titech Grid is a large-scale, campus-wide, pilot commodity Grid deployment for next generation E-Science application development within the Campuses of Tokyo Institute of Technology (Titech)

  • High-density blade PC server systems consisting of 800 high-end PC processors installed at 13 locations throughout the Titech Campuses, interconnected via the Super TITANET backbone.

  • The first campus-wide pilot Grid system deployment in Japan, providing next-generation high-performance “virtual parallel computer” infrastructure for high-end computational E-Science.

Suzukake-dai

Campus

30km

Super TITANET(1-4Gbps)

NEC Express 5800 Series Blade Servers

24-processor Satellite Systems @

each department ×12 systems

Oo-okayama

Campus

Grid-wide Single System Image via Grid middleware: Globus, Ninf-G, Condor, NWS, …

GSIC Main Servers

(256 processers) x 2 systems

in just 5 cabinets

Super SINET

(10 Gbps MOE National Backbone Network)

to other Grids

800-processor high-perf blade servers, > 1.2 TeraFlops, over 25 Terabytes stoarge


Titech Campus Commodity Grid(NEC) April 2002


Titech Grid Campus Sites

  • 15 installation sites amongst 2 campuses

  • 18 participating departments

    • Univ.-wide solicitation and applications thereof

    • Each department lists its own apps – Bioinfo, CFD, Nanotech, Env. Sci, etc.etc.

Comp. Eng. C

Math. Comp.Science C

Oo-okayama (10 sites)

Suzukakedai (5 sites)


How we implement the Field of Dreams

  • Fact: Departments lack space, power, air-conditioning, maintenance expertise etc.

  • Technological solutions

    • High density, high performance blade design

      • x2 density c.f., 1U rack server design

      • GSIC – 512 P3 1.4G in just 5 19inch racks

      • Department – 24 P3 1.4G in a small desk-sized unit, can be run off a wall plug (Just 2K Watts/Cluster)

      • High Operational Temperature (33 degrees Celsius)

    • Employ remote server management technologies for low-level management

      • Need to be firewall friendly

  • And of course, all the Cluster & Grid middleware

    • Globus, Condor, Sun GridEngine, NWS, MPICH-G, Cactus, Ninf-G, Gfarm, Lucie…

  • The first Titech Grid is just a operational prototype

    • Proposal for 60 TeraFlops, 1.6 Petabyte Campus Grid Federation w/Univ. Tsukuba


3. Nationwide Commodity Grid

4) Grid-Enabled, Terascale Mathematical Optimization Libraries and Apps

-Non-Convex Quadratic Optimizaion using SCRM

-Higher-order polynomial solving w/Homotopy meth.

-BMI optimization for control theory apps

-Parallel GA for Genome Informatics apps

2) Highly-Reliable Commodity Cluster Middlewarea) Nonintrusive FTb) Dynamic Plug&Play

c) Heterogeneity

x1+x2+x3=c1x1x2+x1x3+x2x3=c2x1x2x3=c3Cyclic Polynomial All-Solutions

3) Scalable and FT extensions for GridRPCa) >million task parallelism

b) FT under various fault models

c) High-level and generalized GridRPC-API

Structural Optimization Probs

Protein NMR structural prediction

GridRPC

Titech

(Super)SINET

Kyoto-U

AIST

Tokushima-U

Objective: sustain 1 Teraflop for a week at 1/100 cost

1) Commodty Nationwide PC Cluster Testbed

> 1000 processors, multi-teralops


Ex. TeraScale GA Optimization Challenges on the Grid

Apply to difficult problem domain where experts spend considerable time in try&error in optimal solution finding

Example1: Lens Design

Search for curvature, spacing, material, etc. to achieve optimal image  takes an expert weeks, GA-based design in few minutes

NMR Protein Structure Analysis

Optimize protein structure according to the observed NMR signal (takes months ~ year by an expert)

Evaluation of Solution Extremely costly  require massive parallelization (millions of GridRPC calls)

Plan to determine structures of up to 30,000Da class proteins, limit of current NMR scan


APGrid Issues

  • The everlasting Private Address Issue

    • Globus CANNOT SPAN across private addresses

      • VPN currently the only solution

    • Grid folks in US (and EU) are too ignorant

      • Private Addresses a Norm in Business Computing

      • Should really move on to IPv6 and IPSec

  • CA/RA/CP Issue

    • Membership issues as well – Do we support > 1000 users?

  • Resource Brokering

  • Stable Testbed

    • Can’t just be something that goes up temporarily

  • Systems Testbed or Applications Testbed

    • But maybe not enough resources for semi-production


ad
  • Login