1 / 20

Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board

Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board. Presented by Damon Van Buren SEAKR Engineering MAPLD 2004 Submission 133. The Sensor Bandwidth Problem. Commercial satellite imaging systems are experiencing growth in imaging capability...

clay
Download Presentation

Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented by Damon Van Buren SEAKR Engineering MAPLD 2004 Submission 133

  2. The Sensor Bandwidth Problem • Commercial satellite imaging systems are experiencing growth in imaging capability... • Higher resolution: < 1 m • Larger images: >10k image width and height • More spectral components • Panchromatic • Red/Green/Blue • Multi-spectral • Improved capabilities are leading to high sensor data rates • Data output rates > 2 Gbps for some systems • Providing storage and downlink bandwidth for the data is becoming a significant challenge for system designers • The largest data recorders can store less than 20 minutes of data at 2 Gbps • Downlinks must be several hundred Mbps to downlink 15 minutes of data in under an hour • Data storage and high-bandwidth downlinks require lots of power • By reducing the amount of image data, compression provides a solution to the bandwidth problem!

  3. Desired Compressor Features • Real Time • Compression must be performed in real time, prior to storage. • High throughput (> 2 Gbps) • Excellent Performance in Lossy and Lossless Modes • Purchasers of satellite imagery are sensitive to reductions in image quality caused by lossy compression. • Scientific users prefer undistorted data (bit true). • Space-Qualified • Must survive hazards of launch and space operation, including radiation. • Low Risk • Satellite imaging companies seek high reliability solutions.. • Low Cost • Commercial customers require cost effective solutions. • Flexible • The ability to support varying compression ratios and contents would allow more effective use of available storage and bandwidth.

  4. JPEG2000 Algorithm • JPEG2000 is an excellent choice for satellite image compression. • Latest still image compression standard from the JPEG committee • Meets two key requirements for satellite image compression: • Excellent performance in both lossy and lossless modes. • ~1.7 to 1 lossless compression for typical satellite imagery - 70% improvement! • Visually lossless compression > 2 to 1 - 100% improvement in storage and downlink performance. • Very flexible: • Many options for compressed images. • Other advantages: • International Standard • Wavelet based • High quality lossy images with comp. ratios > 100:1 • Packet oriented • Allows random access to the compressed code stream. • Makes compressed data more robust in the presence of bit errors. • Allows selection of image quality, spatial region, resolution, and color component after compression.

  5. JPEG2000 Implementation Challenges • JPEG2000 is a very complex algorithm. • More Features = More Complexity. • Operation intensive • Several hundred operations per pixel, because each bit must be processed many times, for the wavelet transform, entropy coding, MQ coding, packet generation, etc. • Complex • Many different stages to produce compressed output. • Wavelet transform. • Quantization. • Context generation. • Arithmetic coding. • Packet generation. • Many parameters must be tracked individually for each code block (64x64). • Memory intensive • Each pixel must be accessed many times, so many small buffers are needed to get good throughput. • Few processors are capable of implementing JPEG2000 at high rates!

  6. High-Performance Processing Using Xilinx FPGAs • Xilinx FPGAs have many advantages for fast parallel processing: • Millions of gates. • System clocks of several hundred MHz. • High speed I/O • 622 Mbps LVDS • Multi-Gigabit serial I/O • Hundreds of internal block RAMS. • Hundreds of internal 18 bit multipliers. • Xilinx FPGAs are available in a space qualified versions: • Radiation testing is complete on the Virtex and Virtex-II devices. • ~200 kRad total dose, latchup immune. • Radiation testing to begin on the Virtex-II Pro devices soon. • Xilinx FPGAs are very flexible, reducing risk: • May be re-programmed an infinite number of times. • Configurations may be uploaded at any time during the mission to fix errors or add new capability. • Xilinx FPGAs are the best solution for fast compression in space!

  7. Challenges for Xilinx Use in Space • The effects of radiation in spacecraft electronics are well known. • Caused primarily by charged particles. • May cause permanent damage over time by ionizing SiO2 (total dose). • May also cause errors in digital logic by upsetting registers (single event effects). • Mitigation techniques are used to reduce or eliminate the effect of radiation upsets. • Triple Modular Redundancy (TMR) uses voting to select the correct output from 3 separate instances of the design. • Mitigation of radiation effects in SRAM-based FPGAs presents an additional challenge: • As with other digital electronics, the functional logic of the device is susceptible to upset, however... • Another layer of logic (configuration logic) controls the routing of the part, giving the device its capability to be reprogrammed to perform different functions. • Configuration logic is also susceptible to radiation upsets. • Xilinx FPGAs require system level mitigation strategies in addition to the device level mitigation techniques (such as TMR) that are commonly used for space electronics. • Configuration data must be continuously re-written, or scrubbed using a read-and-correct approach.

  8. SEAKR’s RCC Board Processing Solutions • SEAKR has developed a line of Reconfigurable Computing (RCC) products based on the Xilinx FPGAs. • RCC 1 – 4x Virtex 1000s • RCC 2 – 4x Virtex II 6000s • RCC 3 (NTRCC) – 4x Virtex II Pro 70/100s • Boards include system-level upset mitigation (scrub) for the Xilinx devices. • Configuration data is continuously read and checked for errors. • Errors are corrected by overwriting the corrupted frames, without interrupting the operation of the device. • Other devices on board employ radiation mitigation strategies as well: • Radiation hardened • EDAC • Boards also have dedicated resources to support high-performance processing: • High speed I/O. • External memories. • Industry standard form-factor: 6U Compact PCI.

  9. Network RCC (NTRCC) • Four Xilinx XC2VP70-6FF1704 FPGA CO-Processors • Design compatible with XC2VP100-6FF1706 and V2P-X • (4) banks of 1Mx36 Quad Data Rate (QDR) SRAMs for each COP • 512MB of DDRII Shared SDRAM memory for prototype • 1GB of 128M x 64 EDAC (R-S) Protected DDRII SDRAM shared memory (19.2Gbps @150MHz) using 1Gbit memory • Network IF • (2) parallel 16bit RapidIO ports to front panel (8 Gbps) • (1) 4x3.125 Gbps serial port to front panel (>10Gbps) • 4x3.125 Gbps ports from NIC to each COP (>10Gbps) • 4x3.125 Gbps ports from each COP to each neighbor COP (>10Gbps) • Shared Data Buses • Cop Interconnect Bus (~4.224 Gbps) • cPCI 32bit 33Mhz • Read and write COP configurations via cPCI • Extended 6U form factor • Configuration RAM SEU detection and correction • DDRII SDRAM on configuration controller for shadow config program storage • Non-Volatile memory for 16 different configurations (1 Gbit Flash)

  10. Network RCC Block Diagram

  11. NTRCC Layout • 24 Layer board • MicroVias, blind vias, via-in-pad • High speed 3.125 Gbps Serial links • 82 pages of schematic capture • 10 weeks of PCB layout time

  12. Implementation of the JPEG2000 Algorithm • The JPEG2000 core has been in development for over a year. • Eventual target data rate 600 Mbps/device. • Written in VHDL. • Simulations performed in Modelsim. • Synthesis in Synplify_Pro. • Targeted to the NTRCC-R summer ‘04. • Targeted to a reduced version of the NTRCC with a single coprocessor. • Take advantage of improved external memory throughput. • Ultimately use the high-speed serial I/O to move image information on the board. • Designed for high throughput. • Cycle efficient coding style. • Highly parallel design. • Pipelined architecture. • Rolling wavelet transform. • Designed for flexible output file format. • Output is divided into quality layers for easy selection of compression ratio.

  13. JPEG2000 Block Diagram

  14. JPEG2000 Coding Steps • Image is broken into tiles • Tiles are wavelet transformed • 5/3 reversible or 9/7 irreversible, also user defined. • Selectable number of transform levels. • Each subband from the transform is further broken up into code blocks (typically 32x32 or 64x64) for entropy coding. • Each code block is entropy coded, starting from the top bit plane and working down. • The current bit of each pixel is passed to an arithmetic coder, along with context information. • The MQ encoder takes advantage of any skewing of the probability for each context, and adapts contexts as the coding progresses. • Packets are formed by combining the entropy coder outputs from a single resolution. • Tile parts are formed from all the packet in a given bit plane.

  15. JPEG2000 Architecture Drivers • To achieve high data rates, the processing must be paralleled as much as possible. • The “tall pole in the tent” is the arithmetic coding, because the coding of a single data bit with its context can take several clock cycles. • Significance propagation coding is also a challenge, because each coefficient must be accessed many times, as each bit plane is processed. • Other operations, such as wavelet transform, code block loading, and packet generation are much more efficient, and require fewer parallel paths. • A pipelined architecture with many entropy coders in parallel was used to achieve the required throughput.

  16. Architecture Description • Processes 256x256 tiles. • Pipelined architecture, using separate external memories for image, tile, and compressed data storage. • 19 Entropy coders working in parallel to improve throughput, one for each code block. • 64x64 code blocks. • FIFO buffering between the stages improves data flow efficiency. • A rolling wavelet transform is used to reduce memory accesses and improve efficiency. • Entropy coder outputs are formed into layers, giving each tile a progressive output format. • Tile parts are interleaved as the image tiles are processed. • Performs lossy or lossless compression.

  17. NTRCC-R Implementation Results • The JPEG2000 encoder was targeted to the V2Pro 70 FPGA on the NTRCC-R. • Lossless or Lossy compression. • Data precision up to 13 bits. • Simulation and Routing Results: • Slices: 30043 out of 33088, 90% • Block RAMS: 148 out of 328, 45% • Max system clock ~43 MHz without optimization. • Hardware Throughput: • ~140 Mbps w/ 33 MHz clock (depending on image.) • ~180 Mbps w/ 43 Mhz clock.

  18. JPEG2000 Floorplan • The Pro 70 Device is quite full!

  19. Planned Improvements • Optimize design to hit 66 MHz. • Un-optimized design will operate at up to 43 MHz. • Use of asynchronous fifos will allow optimal clocking of various parts of the design. • Improve pipelining of code block loader and wavelet transform. • Allow “autonomous” operation of each stage, so that operations take place as soon as input data and output buffers are ready. • Make use of additional QDR SRAMs available to each coprocessor by creating separate buffers for wavelet transform and packetizer output. • NTRCC has 4 QDR memories for each coprocessor. • Arithmetic coder bypass. • Arithmetic coder requires > 2 cycles per bit coded, on average. • 9/7 wavelet transform with quantization. • Use of the 9/7 wavelet results in better SNR and max error performance for lossy compression. • Add RapidIO serial interface to Network Interface Chip (NIC).

  20. Conclusions • The JPEG2000 core is expected to provide a valuable option for satellite imagery systems. • Compression will result in a dramatic improvement in system performance. • Lossless compression will allow ~70% more image data to be stored and downlinked by a system. • Lossy compression will allow even greater improvements. • NTRCC hardware is an excellent platform for the compressor. • High bandwidth interconnect and I/O (several Gbps). • High bandwidth external memories. • Excellent processing capability with the Virtex-II Pro devices. • The sky’s the limit! • Target rate of 600 Mbps per device appears to be a realistic goal. • Some improvements are left to be made to the clock rate and pipelining of the design.

More Related