1 / 13

Virtex-6 Investigations: Porting the CP FPGA Design

This investigation compares the Virtex-E and Virtex-6 FPGA families, exploring the feasibility of porting the CP FPGA design to Virtex-6 and determining the resource utilization and timing differences between them.

kshawn
Download Presentation

Virtex-6 Investigations: Porting the CP FPGA Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtex-6 Investigations Motivation Virtex-E/Virtex-6 Comparison Porting the CP FPGA to Virtex-6 Next Steps Conclusion Ian Brawn

  2. Virtex-6 Investigations • Motivation • Virtex-6 is a 'baseline' technology choice for the Phase 1 upgrade • Phase 2, Level 0 Trigger Processor is also likely to use a comparable technology • Role similar to current Level 1 trigger processor: • Low, fixed latency, • Algorithms of comparable form and complexity (ie, not several orders of magnitude more complex) • Desirable to learn capabilities of Virtex-6 • May also provide some guide to future development of FPGAs • (Virtex-7 devices have been anounced but data sheets not yet available) Ian Brawn

  3. Virtex-E Resources • Virtex-E family released ~2000 • Used as benchmark for Virtex-6 investigations because used widely in current L1 Calo Trigger • XCV1000E used for CPFPGA (+CMM Crate & System Merger FPGAs) Ian Brawn

  4. LXT devices SXT devices  extra DSP resources HXT devices  extra MGT resources Virtex-6 Resources • Virtex-6 Family released 2009 • Compare to XCV1000E, XC6VLX75T has ~ x2 LUTs, x4FFs, x10 RAM • Only approximate comparison possible because structure of families is different • eg, 6-input LUTs in Virtex-6; 4-input LUTs in Virtex-E Ian Brawn

  5. DSP48E1 slice: 25-bit pre-adder 25 x 18 mulitplier 48-bit accumulator Pattern detect logic Optional pipelining @ 600 MHz 3 cycles necessary for pre-add, multiply, accumulate Manipulate algorithms to implement divisions as multiplications, eg Threshold < Esum1/Esum2  Esum2 x Threshold < Esum1 Possible to have multiply/divide operations in V6 without breaking resource/latency budget (50-pade manual on this component, which I've not fully digested) Multiplication in a Virtex 6 Ian Brawn

  6. Virtex E / Virtex 6 Speed Comparison • Virtex E • We use them mostly at 40 MHz, in some places at 160 MHz • Data sheet: “synchronous system clock rates up to 240 MHz” • Virtex 6 • Data sheet quotes a maximum frequency of 600 MHz for internal logic • Implies Virtex-6 is 2.5 x faster, but only if we assume the same amount of processing can be squeezed into each clock cycle in the two families. Clock Data Transformations B C D A Set-up time Ian Brawn

  7. Porting CP FPGA design to a Virtex-6 • Main motivation is to understand how much processing we can fit into each Virtex-6 clock cycle • Better latency estimate • Also provide a guide to resource usage on Virtex-6 • Sam has ported CMM-Jet design to Virtex-6 for different reasons: baseline design for CMM++ • Chose CP FPGA design to port • Time-critical design on real-time path • Most complex algorithms • Thank you to Richard for providing the source code • Porting • Upgraded Mentor tools.... • Re-implemented block RAMs & • Re-implemented relationally-placed macros (time-critical areas of design with fixed placing) • Minor re-design of clock tree • Virtex-6 components allowed & required simplified design • All straight forward • Caveats • Interested in what I could learn, not in producing working design • Ignored fine timing constraints on IO • Specific to this design with no wider implications for latency • Probably not most efficient implementation of design in Virtex-6 Ian Brawn

  8. CPFPGA: Virtex-E/6 Resource Utilization • Virtex-E Virtex-6 • XCV1000E-6BG560 XC6VLX75T-3FF784 • LUTs 62% 28% • Flip-Flops 27% 6% • BLOCKRAMs 20% 4% • External IOBs 46% 52% • No. flip-flops in Virtex-6 lower than expected • Haven't yet investigated why • Less logic duplication to meet timing requirements? • RAM not used efficiently in Virtex-6 • IO is the one area where things haven't improved • Will need to rely more heavily on Serialised data • Can use GTX for incoming calorimeter data (eg, from SNAP12 @ 6.24 Gb/s/channel) • Also use GTX for data sharing required by overlapping algorithm windows? (latency considerations) Ian Brawn

  9. CPFPGA: Virtex-E/6 Timing Comparison • Ideally, re-time registers in Virtex-6 to optimise for 40 MHz clock and recalculate latency • Lot of work for a purely academic exercise • A quicker exercise, which yields approximately the same information, is to shrink the clock period • No changes were made to design here; just tightened clock constraints • For fair comparison, also established how fast design can be run on Virtex-E • Normally run at 40 MHz ( 160 MHz clock for some logic) • This doesn't mean it can't be run faster Ian Brawn

  10. CPFPGA: Virtex-E/6 Timing Results (1) • In Virtex-E, CPFPGA minimum clock period is 19 ns • Remember the caveats - fine timing of IO would be destroyed at this speed • This is just a measure of how fast we can run the internal algorithmic processing • In Virtex-6, CPFPGA minimum clock period is 10 ns • ~x2 fast at Virtex-E • However, showed these figures dominated by x4 clock speed logic • Very little processing performed between clock cycles here • Signal latency is dominated by routing delays • Therefore these aren't good measures of comparitive speeds for algorithmic operations Ian Brawn

  11. CPFPGA: Virtex-E/6 Timing Results (2) • To get better measurement of speed for algorithmic operations, implemented vertical slice through algorithm block • No logic at x4 clock speed • (Not enough IO to implement whole alogirthm block, hence vertical slice) • Results • In Virtex-E, CPFPGA minimum clock period is 11 ns • In Virtex-6, CPFPGA minimum clock period is 4 ns • Which is ~x2.5 increase in speed estimated from the data sheet • Re-optimising design for structure of Virtex-6 would probably provide a further increase in speed, but not by 100% • In 9 years from Virtex-E to Virtex-6 speed has increased by x2.5 • Seems unlikely speed is going to increase by order of magnitude during lifetime of this project Ian Brawn

  12. Implement potention Phase-2 L0 algorithms to calculate latency e- coincidence veto Use hit map from e/g algorithm Assume  arrive as list of candidates with PT,  &  (worst case for concurrent processing) Use Virtex-6 CAM Ternary mode 1–512 wide, 16–4096 deep e-jet coincidence veto Invarient mass Investigate Virtex-6 IO as part of wider investigation into data transport in Phase-2 L0. e/g m 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 Next steps • e- veto algorithm • (not intended to show true size of CAM): m list qualified e/g hit map PT   Threshold & Map CAM PT   PT   e/g hit map example data in: output Ian Brawn

  13. Conclusion • FPGA archirecture has advanced in the decade since we implemented the current L1 Calo Trigger Processor • Particular features such as DSP blocks are of interest • Size of devices have risen by > order of magnitude • But speed has increased more slowly: ~ x2.5 • No. IO pins hasn’t increased at all • High-speed serial IO available, but at latency cost • For the Phase 1 Upgrade and Phase 2 Level 0 processor • More complex algorithms (e.g., involving multiplication) are within our scope • But latency concerns haven’t been eliminated by FPGA progress Ian Brawn

More Related