Stream-Based Wireless Computing: Innovations in Media Processing and Reconfigurable Architectures

‘Stream’-based wireless computing Sridhar Rajagopal Research group meeting December 17, 2002 The figures used in the slides are borrowed from papers at VT and Stanford.

Motivation • ‘Stream’-based computing • what does it mean? • Not a well-defined term • ‘computation’ that uses flow of self-guided info. • ‘sequence of data’ • Related to flow of data through architecture • Application to implementing wireless algorithms

Outline • Stallion • reconfigurable computing at Virginia Tech • ‘stream’-based computing #1 • Custom Configurable Machines (CCM) • Imagine • media processing at Stanford • ‘stream’-based computing #2 • programmable architectures

Stallion at VT • Wormhole Run-Time Reconfiguration (RTR) • coarse-grained structure • reconfiguration using ‘streams’

‘Stream’ packets A stream packet Stream flow through architecture

Functional description of PE

Stream module description 4 States: IDLE – reconf. in progress BUSY – doing work PROGRAM – load reconf. data PASS – meant for next module Need to output packet/cycle VALID – maintain sync. - set INVALID instead of wait states - strip information off stack

Processing layer • Static section • configures the reconf. section • buffers data during reconf. & sends ‘IDLE’ packets • Reconf. Section • processing of the data done here • Higher layers convert algorithm to data and configuration patterns

Cart before the horse Colt before the Stallion Colt architecture (also at VT) IFU Mesh – Mesh of interconnected func. units

Stallion chip 2 4 3 16-bit data 4-control 3 4 2

IFU mesh in Stallion Dash-line –- skip buses Can send operands over 1/more IFUs

IFU details Only left input can do barrel shifting ALU based on LUT Control register – stores control information for reconfiguration Optional Delay Register - provides latency to synchronize path lengths of different pipeline streams Cond. unit Output control unit

Radio testbed at VT Stallion

Worm-hole routing • stream = worm architecture = holes • multiple, independent streams can wind their way through the chip simultaneously • parts of system can be processing, parts could be reconfiguring • GOAL: Layered Software Radio Architecture

‘Stream’ processing at Stanford • Speeding up media applications • Need lots of computations per memory reference • Lots of data and sub-word parallelism • Current GPP architectures do not have enough ALUs • ‘Stream’ processors to the rescue

Special-purpose processors Lots (100s) of ALUs Fed by dedicated wires/memories

Care and feeding of ALUs Instr. Cache IP Instruction Bandwidth IR Data Bandwidth Regs ‘Feeding’ Structure Dwarfs ALU

Architecture implications • Tremendous opportunities • media problems have lots of parallelism and locality • VLSI technology enables 100s of ALUs/chip (1000s soon) • (in 0.18um 0.1mm2 per integer adder, 0.5mm2 per FP adder) • Challenging problems • locality - global structures won’t work • explicit parallelism - ILP won’t keep 100 ALUs busy • memory - streaming applications don’t cache well • Its time to try some new approaches

Register file organization • Register files functions: • short term storage for intermediate results • communication between multiple function units • Global register files don’t scale with #ALUs • need more registers to hold more results (grows with #ALUs ) • need more ports to connect all of the units (grows with #ALUs 2)

Register files dwarf ALUs

Distributed register files • Distributed register files means: • not all functional units can access all data • each functional unit input/output no longer has a dedicated route from/to all register files

Input Data Kernel Stream Output Data Image 0 convolve convolve Depth Map SAD Image 1 convolve convolve Stream processing • Little data reuse (pixels never revisited) • Highly data parallel (output pixels not dependent on other output pixels) • Compute intensive (60 operations per memory reference)

Stream programming • Streams • Communication void main() { Stream<int> a(256); Stream<int> b(256); Stream<int> c(256); Stream<int> d(1024); ... example1(a, b, c); example2(c, d); ... } • Kernels • Computation KERNEL example1(istream<int> a, istream<int> b, ostream<int> c) { loop_stream(a) { int ai, bi, ci; a >> ai; b >> bi; ci = ai * 2 + bi * 3; c << ci; } }

Stream Processor • Instructions are Load, Store, and Operate • operands are streams • Operate performs a compound stream operation • read elements from input streams • perform a local computation • append elements to output streams • repeat until input stream is consumed • (e.g., triangle transform)

SDRAM SDRAM SDRAM SDRAM Streaming Memory System Stream Controller Network Host Stream Register File Network Interface Processor Microcontroller ALU Cluster 7 ALU Cluster 0 ALU Cluster 1 ALU Cluster 2 ALU Cluster 3 ALU Cluster 4 ALU Cluster 5 ALU Cluster 6 Imagine Stream Processor Imagine

Intercluster Network Local Register File + * * + + / CU To SRF Cross Point From SRF Arithmetic clusters

SDRAM ALU Cluster ALU Cluster SDRAM Stream Register File SDRAM SDRAM ALU Cluster 544GB/s 2GB/s 32GB/s Bandwidth hierarchy • VLIW clusters with shared control • 41.2 32-bit operations per word of memory bandwidth

Conclusions • ‘Streams’ shown to be promising for reconfigurable computing • wireless may need reconfigurability • ‘Streams’ shown to be promising for media processing • wireless may have similar workloads • Important to understand pros and cons of different methodologies for good wireless architectures • Important to have the right tools

Stream-Based Wireless Computing: Innovations in Media Processing and Reconfigurable Architectures

Stream-Based Wireless Computing: Innovations in Media Processing and Reconfigurable Architectures

Presentation Transcript

Wireless Strategies Inc.

Guide to Wireless Communications

Wireless Handheld Computing Devices

Wireless Data

Mobile and Wireless Database Access for Pervasive Computing

CWNA Guide to Wireless LANs, Second Edition

3.- Wireless technologies

CWNA Guide to Wireless LANs, Second Edition

CHAPTER FIVE STREAM FLOW AND RAINFALL RUNOFF

University of Aberdeen, Computing Science CS2013 Mathematics for Computing Science Adam Wyner

Chapter 12 - C++ Stream Input/Output

Value Stream Mapping

Wireless Location Technologies

Wireless Local Networks are Emerging

Introduction to IP-Based Next Generation Wireless Networks

Ad Hoc Wireless Networks: Routing

Multi-Channel Wireless Networks: Capacity and Protocols

Phased Scheduling of Stream Programs

Wireless Communications Engineering

Cloud Computing