1 / 29

SLAAC S ystems L evel A pplications of A daptive C omputing

SLAAC S ystems L evel A pplications of A daptive C omputing. DARPA/ITO Adaptive Computing Systems PI Meeting Napa, California April 13-14 Presented by: Bob Parker Deputy Director, Information Sciences Institute. System Level Applications of. Adaptive Computing.

Download Presentation

SLAAC S ystems L evel A pplications of A daptive C omputing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SLAACSystems Level Applications of Adaptive Computing DARPA/ITO Adaptive Computing Systems PI Meeting Napa, California April 13-14 Presented by: Bob Parker Deputy Director, Information Sciences Institute

  2. System Level Applications of Adaptive Computing Utilizing Three Phases of Adaptive Computing Components Large Current Generation FPGAs Rapid Reconfigurable and/or Fine Grain FPGAs Hybrid FPGAs Integrating Multiple Constituent Technologies Scalable Embedded Baseboard Gigabit/Sec Networking Modular Adaptive Compute Modules Smart Network Based Control Software Algorithm Mapping Tools Developing Reference Platforms Flight Worthy Deployable System Low Cost Researchers Kit Lab Demo of an ACS implemented SAR ATR algorithm First Generation of Reference Platforms Embedded SAR ATR Demo of ACS HW (Clear, 1Mpixel/s, 6TT) Significant reduction in power, weight, volume, Embedded SAR ATR Demo (CC&D, 1Mpixel/s, 6TT) and cost for several challenging DoD embedded Embedded SAR ATR Demo applications (CC&D, 10Mpixel/s, 6TT) SAR ATR • Sonar Beamforming • IR ATR • Others • ‘97 ‘98 ‘99 ‘00 ‘01 Team Members: USC/ISI (Lead), BYU, UCLA, Sandia National Labs

  3. SLAAC Objectives • Define a system-level open, distributed heterogeneous adaptive systems architecture • Design, develop and evolve scalable reference platforms implementing the adaptive systems architecture • Validate the approach by deploying reference platforms in multiple defense application domains • SAR ATR • Sonar Beamforming • IR ATR • Others

  4. SLAAC Affiliates Sonar Beamforming Ultra Wide- Band Coherent RF NUWC IR ATR NVL LANL Sandia ACS Research Community UCLA BYU SAR/ ATR Sandia Multi- dimensional Image Processing LANL ISI Component Developers Lockheed Martin SLAAC Developers Electronic Counter- measures Challenge Problem Owners Applications

  5. SLAAC Architecture DSP ACS Sensor Device Device Control Control Processor Processor Network Host Control Control Processor Processor ACS ACS Device Device Host Control Network Interface Processor Processor Network Interface Processor Myricom L5 Baseboard SLAAC1 Board Myricom L4 Orca Board UCLA Board

  6. SLAAC Programming Model • Single host program controls distributed system of nodesand channels • system dynamically allocated at runtime • multiple hosts compete for nodes • channels stream data between host/nodes Host Nodes 1 2 3 Network

  7. Runtime System Application Runtime System Messaging Layer Network Layer • System Layer- High-level programming interface (e.g., ACS_Create_Sytem (Node_list, Channel_list)) • Node Layer - Hide-device specific information (e.g., FPGA configuration) • Control Layer - Node-independent communication commands (i.e., blocking and non-blocking message passing primitives)

  8. Application Application Runtime Runtime Messaging Messaging Messaging Messaging Runtime Runtime Application FPGA Runtime FPGA Remote Node Processing Alternatives Host Node Network Network Remote Node • Less power required from compute node • Less latency between application and low-level control

  9. Runtime and Debugging • Interactive debugger • all system layer C functions provided in command-line interface • symbolic VHDL debugging support using readback • single-step clock • scriptable • SLAAC Runtime • monitor system state • hardware diagnostics • Other tools • network traffic monitors (MPI based?) • load balancing • visualization tools

  10. Runtime Status • Complete • System Layer API specification • Control Layer API specification, partially simulated • Scheduled • May: VHDL simulation of SLAAC board • June: Implementation of basic runtime system functions

  11. System Layer System Layer Node Layer Node Layer SLAAC Control Layer Control Layer Myrinet PMC Card SLAAC1 PMC Board Myrinet PMC Card SLAAC1 PMC Board L5 Baseboard SBC SBC System Layer SLAAC Node Layer P0 / Myrinet Control Layer L5 Baseboard Development Platform Path SLAAC Runtime System Low Cost COTS Development Platform System Layer Node Layer Control Layer SBC w/ External Network SLAAC Double-Wide PMC Card SLAAC Double-Wide PMC Card SBC w/ Embedded Network Myrinet P0/ Myrinet SBC Improved Compute Density Improved Development Environment SBC Fully Embedded Platform

  12. Node O.S. Host Node Application Application Application Application Runtime Runtime Runtime Runtime Custom NI COTS OS COTS OS COTS OS No Node O.S. Host Node Hardware Platforms and Software Development Performance Risk • Low risk development path • Standards compliant (MPI, VxWorks) • Recompile to change platforms • GP programming environment at node level • Bandwidth limited by MPI • Custom network interface program (exploits GM) • Direct network/compute connection • Immature development environment • SLAAC provides programming environment • Maximum bandwidth

  13. BYU/UCLA Domain-Specific Compilers for ATR BYU: Focus of Attention (FOA) UCLA: Template Matching Image Morphology “CYTO” code Optimization Here Templates Neighborhood Processor Generator Correlator Generator VHDL (Structural) VHDL (Structural) Synopsys Logic Synthesis Synopsys Logic Synthesis Optimization Here Hand optimized neighborhood operators (Viewlogic Library) Xilinx Logic & Route Xilinx Place &Route FPGA FPGA FPGA FPGA FPGA • Optimize VHDL using template overlap • Creates optimized template subset with • minimum number of reconfigurations • Map “CYTO’ neighborhood operations to pre-defined FPGA blocks • High packing density to enable single configuration

  14. The UCLA Testbench Mojave • Mojave board can interface to the i960 Board development system for in-house testing (as shown), or with the Myricom LANai board. Static FPGA Bus Connector Host Processor External PCI Slot System Processor PCI Bus

  15. SLAAC 1 Board XP_LEFT XP_XBAR X1 X0_LEFT XP_RIGHT PMC BUS X0 X0_XBAR X0_RIGHT XP_LEFT X2 XP_XBAR PROM CLK XP_RIGHT PCI BUS FIFO Data (64 pins) Jumper block FIFO Control (~16 pins) 256Kx18 SRAM Clock, Configuration, Inhibit External Memory Bus

  16. Surveillance Challenge Problem - SAR / ATR * SAR Area Coverage Rate (sqnm / day @1 ft Res.) 1000 40,000 40X (FOA, Number of Target Classes 6 30 5X (Indexer, Ident.) Level / Difficulty of CC&D Low High 100X (Indexer) 10X (Ident.) * Corresponds to a data rate of 40 Megapixels / sec 40,000 sqnm / day @ 1 ft. Resolution System Parameter Current Challenge Scale Factor Indexer, Ident.)

  17. Project Benefit IncludesImproved Compute Density

  18. ATR Flight Demo System For: 1 Mpixel/sec with 6 target configurations (targets in-the-clear scenario) • Baseline 1996 System: • Hardware Architecture • Systolic – 3 algorithm modules • SIMD – 1 algorithm module • Early 2-level multiprocessors/DSP – 3 algorithm modules • 1997 Flight Demo System: • Hardware Architecture • 2-level multiprocessor/DSP – 8 algorithm modules (1 additional algorithm module implemented over baseline system) Power, Volume, Weight Product (W- ft3 -lbs.) Weight (lbs.) Power (W) Volume (ft3) 354 1680 10,407,600 (5 VME chassis) 17.5 26.47 ratio W- ft3 -lbs ft3 7 W lbs. 124 453 (2 VME chassis) 393,204 2-level multiprocessor/DSP configuration implements algorithms (with additional algorithm module) with better performance and significantly lower power, size, and weight versus baseline implementation

  19. Common SAR/ATR - DARPA ACS & EHPC FY97 Testbed Systolic Datacube/Custom Systolic CYTO SIMD CNAPS MIMD Multicomputers PowerPC HPSC Real-Time Deployable Element (Common ATR Model Year 1) Joint STARS Laboratory Development Element MIMD Myrinet SIMD Myrinet SIMD CPP DAP Systolic SBC HIPPI Workstations SUN / SGI / PC MIMD Intel Paragon MIMD SGI RAID (Data) Next Generation Embeddable HPC Technologies

  20. JSTARS ATR Processor • PowerPC Multicomputer • 13 Commercial Motorola VMEbus CPU boards • 200Mhz 603e PowerPC per board • 5.2 GFLOPS Peak • Commercial Myrinet High Speed Communications • 1.28Gbits/sec full duplex • Cross point topology • SHARC Multicomputer • 4 Sanders HPSC processor boards • 8 33Mhz Analog Devices SHARC DSP processors per board • 3.2 GFLOPS Peak • Myrinet High Speed Communications

  21. DARPA SAR ATR EHPC Testbed Experiments in Action • TMD/RTM Real-Time ATR delivered 6/97 • FOA, SLD, MPM, MSE, CRM, LPM,& PGA • Supported 5 real-time ESAR/ATR airborne flight exercises • 2 Engineering check-out flights • 3 Phase III evaluation flights • Features • 1 Mpixel/sec, 6 Configurations targets in-the-clear scenario • Large scale dynamic range capability • Modular, Scalable, Plug & Play Architecture • 2 VMEbus Chassis ATR System • Heterogeneous Two-Level Multicomputer, COTS PowerPC and Sanders SHARC RTM ATR Advanced Technology Demonstration This work performed under the sponsorship of the Air Force Aeronautical Systems Center and the Air Force Research Laboratory (formerly Rome Laboratory)

  22. Joint STARS SAR/ATR Transition Description Accomplishments • Jointly managed, USAF ASC/FBXT and AFRL/IF. • Provided JSTARS with a real-time ATR capability. • Leveraged prior Service & DARPA investments. • Sandia developed the ATR System, Northrop Grumman developed the ESAR system and led the integration of both systems onto the aircraft. • ATR system enables an image analysts to identify threats in real-time by prescreening large amounts of data for potential targets. • Developed Airborne Real-time SAR / ATR System • Demonstrated initial system at Pentagon(Sep 96) • All COTS system implementation (Apr 97) • Full system integrated on T3 aircraft (Aug 97) • Engineering / integration flights completed with fully operational system (Sep 97) • Three real-time demonstration flights (Oct 97) • Operationally significant Pd/FAR performance

  23. SAR Image Detection Focus of Attention Indexer Coarse Data Angle Location Estimate Target ID Fine Data Sensor Preprocessor Identification Confidence BYU- FOA and Indexing • Superquant FOA • 1 pass adaptive threshold technique • Produces ROI blocks for indexing • >7.8 Gbops/second/image • 1 Mpixel/second, FY98 • 40 Mpixel/second, FY01 • CC&D indexing • Algorithm definition in process

  24. BYU - SAR ATR Status • Non-Adaptive FOA • Wildforce PCI platform • 3 months to retarget to SLAAC board • Compilation Strategies • Current approach • “VHDL synthesis from scratch” • Traditional tool flow • Planned approach - July 1999 • “Gate Array” approach • Fixed chip floorplan regular arrays • 30x speedup, compile < 1 hour • ~ 10% efficiency loss

  25. Sonar Beamforming with 96 Element TB-23 • Goals: • First RT matched field algorithm deployment • 1000x  51 Beams  51000 Beams • Ranging + “look forward” capability • Demonstrate adaptation among algorithms at sea • Validate FPGAs for signal processing • Computation: • 2 stage (course and fine) • 16 Gop/sec, 2.5 GB memory, 80 MB/sec I/O • Approach: • Use k-w + matched field algorithms • Leverage ACS to provide course grain RTR • Environmental adaptability • Multiple resolution processing

  26. STEALTH ADVANCEMENTS BROADBAND QUIETING COMPARISON VICTOR 1 LEAD SHIP KEEL LAID DEC 93 ALFA VICTOR III 594 IMPROVED VICTOR III 637 AKULA IMPROVED AKULA NOISE LEVELS 688 SEVERODVINSK 688I SSN-21 NSSN 1960 1970 1980 1990 2000 2010 Credit: NUWC

  27. Sonar Status • Algorithm identified by NUWC 4/3/98 • validation in process • BYU mapping of NUWC algorithm • underwayfor Wildforce board • map to SLAAC boards when available • Sonar module generation • Operational generators include: • pipelined multipliers & CORDIC units • C/C++ programs generating VHDL code • generators used in Wildforce mapping

  28. FY98 FY99 FY00 FY01 Feasibility Study 1st Mapping 1 Advanced algor. dev. on RRP 2 Sonar module generators 1 Advanced algor. dev. 2 Sonar subcompilers Top level compiler NUWC specifies algorithm Advanced SLAAC ACS boards avail. BYU end-to-end lab demo complete and delivered to ISI 1st SLAAC ACS board avail. ISI delivers demo system to NUWC for testing SEA TEST (summer 2000) Timeline - Baseline + Option 1 BYU prelim. mapping to ACS 2 Submarine I/F specified and I/F construction begun

  29. SLAAC Conclusions • Great early success in deployed capability • Interesting runtime tradeoffs • Significant risk reduction through COTS standards • Promising simulation results - headed for hardware • Adaptive systems are hard - but getting easier http://www.east.isi.edu/SLAAC

More Related