1 / 14

Evolution of Glen Culler’s Architectures for Interactive Scientific Computing

Evolution of Glen Culler’s Architectures for Interactive Scientific Computing. David Culler SC2000 Masterworks Nov. 7, 2000. Plan for this session. Evolution of GJCs Array Processors “Internal architecture that expresses algebra of bilinear forms”

thoneycutt
Download Presentation

Evolution of Glen Culler’s Architectures for Interactive Scientific Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution of Glen Culler’s Architectures for Interactive Scientific Computing David Culler SC2000 Masterworks Nov. 7, 2000

  2. Plan for this session • Evolution of GJCs Array Processors • “Internal architecture that expresses algebra of bilinear forms” • Video of GJC presentation at ACM Conference on the History of the Personal Computer GJC Evolution

  3. Timeline 1985 Culler-7 MP AP Unix Computer Sever 1963 UCSB On-Line Computer Classroom 1976 UCLA Plasma Simulation System 1990 Star 910/VP Vector Workstation 1970 MP32A Sonor Signal Processor 1961 RW-400 Culler-Fried System 1972 CHI AP120 Array Processor 1981 CHI-5 General Purpose AP 1982 Motorola Single Chip APU 1986 Personal Supercomputer 1966 IBM 360 On-Line system 1969 Culler-Harrison Inc 1954 Ramo Wooldridge 1975 FPS AP120B 1974 CHI AP90B 1964 Teleputer 1951 RadLab 1979 LPCAP 1959 UCSB GJC Evolution

  4. MP32A - 1970 • 16-bit fixed-point processor @ 6 MHz • Multiple operations per microinstruction • 28-bit instructions • 2 cycle multiply • Parallel memories • 64-word scratch pad • 512-word fixed + 64-word writable instruction memory • 64KW instruction & data memory • SONAR signal processing GJC Evolution

  5. AP120: 1972-73 • CHI Serial 1 • DARPA acoustical research center at Moffett Field GJC Evolution

  6. AP120 (CHI Serial 2) – 1974 • Constructed to perform signal analysis and speech compression • Used for real-time digital speech transmission on ARPA net • with SRI, Lincoln labs, ISI • basis for Floating-Point Systems AP120b GJC Evolution

  7. Floating Point Systems AP120B (1975) • 6 MHz (167 ns), 38-bit floating point, 64-bit instructions • Independent floating Add (2 stage) and Mult (3 stage) – peak 12 MFLOPS • Memories • Two 32-word data pad (DX, DY) – 2 per cycle • 2560 word fixed table memory – 1 per cycle, 2 cycle delay • 64KW data memory – ½ per cycle, 3 cycle delay • 512 word instruction memory • Two blocks of 32 word accumulators (dx, dy) • Address indexing & counting (SPAD & ALU) GJC Evolution

  8. UCLA Plasma Simulation Interactive (PSI) System - 1976 • MP32A: Scheduling and Control • FPS AP120B: most calculation • 6 MHz, parallel pipelined Multiply Add • four CHI IOPs: data movement • Fixed pgm microprcessors, 4 way xfer at ¼ MW/s • Math System Language interactive interpreter • 2-1/2 D Million Particle Simulation: 6 MFLOPS “out of core” • 3D MagnetoHydordynamics @ 4 MFLOPS • Particle-by-particle or grid-by-grid, not vectorization • 4x IBM 360/91 at 1/160th the cost GJC Evolution

  9. LPCAP - 1979 • 12/24-bit fixed point speech processor • Statistical models of speech • Linear Predictive Coding • Very small form factor (large shoe box) • Used in ground-air comm GJC Evolution

  10. CHI-5 General-Purpose AP (1980) • 16/ 32/ 48-bit fixed point speech processor with parallel memories • Stand-alone or hosted operation • Very fast macro-micro dispatch • Program sequencer (80-bit x 3KW) • Three 16-bit adders (linked to form 32 or 48) • Parallel storage • Four accumulators • 16/32 bit main memory + 16 address registers • Two 1024x16-bit array memories • 32-bit ROM table memory • Extensive bussing • Host block transfer, A/D D/A 8 KHz, Serail ports GJC Evolution

  11. Motorola APU: 1982 • 3 micron CMOS platinum silicide, 4 MHz, 100 pin • 16 MHz multiplexed instruction port (78-bit instr) • 30.5 K transistors, 296x305 mils • 20 16-bit data buses, 184 control lines • 16/32-bit fixed or floating point array and signal processor • Data arithmetic processor • 1 Multiply, 3 Add, 4 accum, multiplier storage • Array memory address controllers • 2D 9-point stencil matrix addressing • External X, Y, R busses • Control => Micro-nets of array processors GJC Evolution

  12. Culler-7 (1985 – PC AT) • 2-16 MFLOPS Linpack @ 250K$ - 1 M$ • Bipolar TTL • 1-4 Computer Processors + Kernel Proc • A, XY, & D machine per processor • Dual 64-bit data busses • 96-bit instructions (48 A, 48 XY) • Memories • Kernel memory (2 MB) • Global Data memory (5-42 MB, 32-bit VAS) • Program memory (256 KB real, 32 MB virtual) • Array memory – 4 x 16 KB GJC Evolution

  13. Personal Supercomputer (1986) • ¼ Cray 1S under 100K$ (< 6k$ PER mips) • PC-AT / Sun 3 days • 3-4 mflops DP linpack (387 does 0.02) • 200-bit wide instruction • Multiple levels of parallelism • Multiple processors • XY and A Machines per processor • Multiple operations per instruction in each • Very high delivered/peak GJC Evolution

  14. Star 910/VP (1990) • 40 MHz Sparc (cypress chip-set) • TI 8847 CMOS vector processor • 80 MFLOPS SP, 160 MFLOPS SP • Vector DMA, Vector Cache • 1.3 GB/s • 320 MB/s shared memory system • 18 MFLOPS Linpack for 200K$ GJC Evolution

More Related