implementation of high rate jpeg2000 coding on a virtex 2 pro reconfigurable computing board n.
Skip this Video
Loading SlideShow in 5 Seconds..
Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board PowerPoint Presentation
Download Presentation
Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board

Loading in 2 Seconds...

play fullscreen
1 / 20

Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board - PowerPoint PPT Presentation

  • Uploaded on

Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board. Presented by Damon Van Buren SEAKR Engineering MAPLD 2004 Submission 133. The Sensor Bandwidth Problem. Commercial satellite imaging systems are experiencing growth in imaging capability...

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board' - clay

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
implementation of high rate jpeg2000 coding on a virtex 2 pro reconfigurable computing board

Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board

Presented by Damon Van Buren

SEAKR Engineering

MAPLD 2004

Submission 133

the sensor bandwidth problem
The Sensor Bandwidth Problem
  • Commercial satellite imaging systems are experiencing growth in imaging capability...
    • Higher resolution: < 1 m
    • Larger images: >10k image width and height
    • More spectral components
      • Panchromatic
      • Red/Green/Blue
      • Multi-spectral
  • Improved capabilities are leading to high sensor data rates
    • Data output rates > 2 Gbps for some systems
  • Providing storage and downlink bandwidth for the data is becoming a significant challenge for system designers
    • The largest data recorders can store less than 20 minutes of data at 2 Gbps
    • Downlinks must be several hundred Mbps to downlink 15 minutes of data in under an hour
    • Data storage and high-bandwidth downlinks require lots of power
  • By reducing the amount of image data, compression provides a solution to the bandwidth problem!
desired compressor features
Desired Compressor Features
  • Real Time
    • Compression must be performed in real time, prior to storage.
    • High throughput (> 2 Gbps)
  • Excellent Performance in Lossy and Lossless Modes
    • Purchasers of satellite imagery are sensitive to reductions in image quality caused by lossy compression.
    • Scientific users prefer undistorted data (bit true).
  • Space-Qualified
    • Must survive hazards of launch and space operation, including radiation.
  • Low Risk
    • Satellite imaging companies seek high reliability solutions..
  • Low Cost
    • Commercial customers require cost effective solutions.
  • Flexible
    • The ability to support varying compression ratios and contents would allow more effective use of available storage and bandwidth.
jpeg2000 algorithm
JPEG2000 Algorithm
  • JPEG2000 is an excellent choice for satellite image compression.
    • Latest still image compression standard from the JPEG committee
  • Meets two key requirements for satellite image compression:
    • Excellent performance in both lossy and lossless modes.
      • ~1.7 to 1 lossless compression for typical satellite imagery - 70% improvement!
      • Visually lossless compression > 2 to 1 - 100% improvement in storage and downlink performance.
    • Very flexible:
      • Many options for compressed images.
  • Other advantages:
    • International Standard
    • Wavelet based
      • High quality lossy images with comp. ratios > 100:1
    • Packet oriented
      • Allows random access to the compressed code stream.
      • Makes compressed data more robust in the presence of bit errors.
      • Allows selection of image quality, spatial region, resolution, and color component after compression.
jpeg2000 implementation challenges
JPEG2000 Implementation Challenges
  • JPEG2000 is a very complex algorithm.
    • More Features = More Complexity.
  • Operation intensive
    • Several hundred operations per pixel, because each bit must be processed many times, for the wavelet transform, entropy coding, MQ coding, packet generation, etc.
  • Complex
    • Many different stages to produce compressed output.
      • Wavelet transform.
      • Quantization.
      • Context generation.
      • Arithmetic coding.
      • Packet generation.
    • Many parameters must be tracked individually for each code block (64x64).
  • Memory intensive
    • Each pixel must be accessed many times, so many small buffers are needed to get good throughput.
  • Few processors are capable of implementing JPEG2000 at high rates!
high performance processing using xilinx fpgas
High-Performance Processing Using Xilinx FPGAs
  • Xilinx FPGAs have many advantages for fast parallel processing:
    • Millions of gates.
    • System clocks of several hundred MHz.
    • High speed I/O
      • 622 Mbps LVDS
      • Multi-Gigabit serial I/O
    • Hundreds of internal block RAMS.
    • Hundreds of internal 18 bit multipliers.
  • Xilinx FPGAs are available in a space qualified versions:
    • Radiation testing is complete on the Virtex and Virtex-II devices.
      • ~200 kRad total dose, latchup immune.
    • Radiation testing to begin on the Virtex-II Pro devices soon.
  • Xilinx FPGAs are very flexible, reducing risk:
    • May be re-programmed an infinite number of times.
    • Configurations may be uploaded at any time during the mission to fix errors or add new capability.
  • Xilinx FPGAs are the best solution for fast compression in space!
challenges for xilinx use in space
Challenges for Xilinx Use in Space
  • The effects of radiation in spacecraft electronics are well known.
    • Caused primarily by charged particles.
    • May cause permanent damage over time by ionizing SiO2 (total dose).
    • May also cause errors in digital logic by upsetting registers (single event effects).
    • Mitigation techniques are used to reduce or eliminate the effect of radiation upsets.
      • Triple Modular Redundancy (TMR) uses voting to select the correct output from 3 separate instances of the design.
  • Mitigation of radiation effects in SRAM-based FPGAs presents an additional challenge:
    • As with other digital electronics, the functional logic of the device is susceptible to upset, however...
    • Another layer of logic (configuration logic) controls the routing of the part, giving the device its capability to be reprogrammed to perform different functions.
    • Configuration logic is also susceptible to radiation upsets.
  • Xilinx FPGAs require system level mitigation strategies in addition to the device level mitigation techniques (such as TMR) that are commonly used for space electronics.
    • Configuration data must be continuously re-written, or scrubbed using a read-and-correct approach.
seakr s rcc board processing solutions
SEAKR’s RCC Board Processing Solutions
  • SEAKR has developed a line of Reconfigurable Computing (RCC) products based on the Xilinx FPGAs.
    • RCC 1 – 4x Virtex 1000s
    • RCC 2 – 4x Virtex II 6000s
    • RCC 3 (NTRCC) – 4x Virtex II Pro 70/100s
  • Boards include system-level upset mitigation (scrub) for the Xilinx devices.
    • Configuration data is continuously read and checked for errors.
    • Errors are corrected by overwriting the corrupted frames, without interrupting the operation of the device.
  • Other devices on board employ radiation mitigation strategies as well:
    • Radiation hardened
    • EDAC
  • Boards also have dedicated resources to support high-performance processing:
    • High speed I/O.
    • External memories.
  • Industry standard form-factor: 6U Compact PCI.
network rcc ntrcc
Network RCC (NTRCC)
  • Four Xilinx XC2VP70-6FF1704 FPGA CO-Processors
    • Design compatible with XC2VP100-6FF1706 and V2P-X
  • (4) banks of 1Mx36 Quad Data Rate (QDR) SRAMs for each COP
  • 512MB of DDRII Shared SDRAM memory for prototype
    • 1GB of 128M x 64 EDAC (R-S) Protected DDRII SDRAM shared memory (19.2Gbps @150MHz) using 1Gbit memory
  • Network IF
    • (2) parallel 16bit RapidIO ports to front panel (8 Gbps)
    • (1) 4x3.125 Gbps serial port to front panel (>10Gbps)
    • 4x3.125 Gbps ports from NIC to each COP (>10Gbps)
    • 4x3.125 Gbps ports from each COP to each neighbor COP (>10Gbps)
  • Shared Data Buses
    • Cop Interconnect Bus (~4.224 Gbps)
    • cPCI 32bit 33Mhz
  • Read and write COP configurations via cPCI
  • Extended 6U form factor
  • Configuration RAM SEU detection and correction
    • DDRII SDRAM on configuration controller for shadow config program storage
  • Non-Volatile memory for 16 different configurations (1 Gbit Flash)
ntrcc layout
NTRCC Layout
  • 24 Layer board
  • MicroVias, blind vias, via-in-pad
  • High speed 3.125 Gbps Serial links
  • 82 pages of schematic capture
  • 10 weeks of PCB layout time
implementation of the jpeg2000 algorithm
Implementation of the JPEG2000 Algorithm
  • The JPEG2000 core has been in development for over a year.
    • Eventual target data rate 600 Mbps/device.
    • Written in VHDL.
    • Simulations performed in Modelsim.
    • Synthesis in Synplify_Pro.
  • Targeted to the NTRCC-R summer ‘04.
    • Targeted to a reduced version of the NTRCC with a single coprocessor.
    • Take advantage of improved external memory throughput.
    • Ultimately use the high-speed serial I/O to move image information on the board.
  • Designed for high throughput.
    • Cycle efficient coding style.
    • Highly parallel design.
    • Pipelined architecture.
    • Rolling wavelet transform.
  • Designed for flexible output file format.
    • Output is divided into quality layers for easy selection of compression ratio.
jpeg2000 coding steps
JPEG2000 Coding Steps
  • Image is broken into tiles
  • Tiles are wavelet transformed
    • 5/3 reversible or 9/7 irreversible, also user defined.
    • Selectable number of transform levels.
  • Each subband from the transform is further broken up into code blocks (typically 32x32 or 64x64) for entropy coding.
  • Each code block is entropy coded, starting from the top bit plane and working down.
    • The current bit of each pixel is passed to an arithmetic coder, along with context information.
    • The MQ encoder takes advantage of any skewing of the probability for each context, and adapts contexts as the coding progresses.
  • Packets are formed by combining the entropy coder outputs from a single resolution.
  • Tile parts are formed from all the packet in a given bit plane.
jpeg2000 architecture drivers
JPEG2000 Architecture Drivers
  • To achieve high data rates, the processing must be paralleled as much as possible.
  • The “tall pole in the tent” is the arithmetic coding, because the coding of a single data bit with its context can take several clock cycles.
  • Significance propagation coding is also a challenge, because each coefficient must be accessed many times, as each bit plane is processed.
  • Other operations, such as wavelet transform, code block loading, and packet generation are much more efficient, and require fewer parallel paths.
  • A pipelined architecture with many entropy coders in parallel was used to achieve the required throughput.
architecture description
Architecture Description
  • Processes 256x256 tiles.
  • Pipelined architecture, using separate external memories for image, tile, and compressed data storage.
  • 19 Entropy coders working in parallel to improve throughput, one for each code block.
    • 64x64 code blocks.
  • FIFO buffering between the stages improves data flow efficiency.
  • A rolling wavelet transform is used to reduce memory accesses and improve efficiency.
  • Entropy coder outputs are formed into layers, giving each tile a progressive output format.
  • Tile parts are interleaved as the image tiles are processed.
  • Performs lossy or lossless compression.
ntrcc r implementation results
NTRCC-R Implementation Results
  • The JPEG2000 encoder was targeted to the V2Pro 70 FPGA on the NTRCC-R.
    • Lossless or Lossy compression.
    • Data precision up to 13 bits.
  • Simulation and Routing Results:
    • Slices: 30043 out of 33088, 90%
    • Block RAMS: 148 out of 328, 45%
    • Max system clock ~43 MHz without optimization.
  • Hardware Throughput:
    • ~140 Mbps w/ 33 MHz clock (depending on image.)
    • ~180 Mbps w/ 43 Mhz clock.
jpeg2000 floorplan
JPEG2000 Floorplan
  • The Pro 70 Device is quite full!
planned improvements
Planned Improvements
  • Optimize design to hit 66 MHz.
    • Un-optimized design will operate at up to 43 MHz.
    • Use of asynchronous fifos will allow optimal clocking of various parts of the design.
  • Improve pipelining of code block loader and wavelet transform.
    • Allow “autonomous” operation of each stage, so that operations take place as soon as input data and output buffers are ready.
  • Make use of additional QDR SRAMs available to each coprocessor by creating separate buffers for wavelet transform and packetizer output.
    • NTRCC has 4 QDR memories for each coprocessor.
  • Arithmetic coder bypass.
    • Arithmetic coder requires > 2 cycles per bit coded, on average.
  • 9/7 wavelet transform with quantization.
    • Use of the 9/7 wavelet results in better SNR and max error performance for lossy compression.
  • Add RapidIO serial interface to Network Interface Chip (NIC).
  • The JPEG2000 core is expected to provide a valuable option for satellite imagery systems.
    • Compression will result in a dramatic improvement in system performance.
    • Lossless compression will allow ~70% more image data to be stored and downlinked by a system.
    • Lossy compression will allow even greater improvements.
  • NTRCC hardware is an excellent platform for the compressor.
    • High bandwidth interconnect and I/O (several Gbps).
    • High bandwidth external memories.
    • Excellent processing capability with the Virtex-II Pro devices.
  • The sky’s the limit!
    • Target rate of 600 Mbps per device appears to be a realistic goal.
    • Some improvements are left to be made to the clock rate and pipelining of the design.