600 likes | 758 Views
DESIGN. SILICON. PRODUCT. How do we move mainstream designs from ASICs to high performance ... the-Art : Clock Domains. State-of-the-Art : Computer Design Hardware ...
E N D
1. Design Techniques for Million Gate, High Speed FPGAs
2. Agenda The Problem
State-of-the-Art Technology
Design Issues
Performance Oriented Design
3. The Problem
4. State-of-the-Art : 2000 Technology
Gate Count
Frequency
Clock Domains
Computer Hardware
Design Software
RTL Language
Design
5. “Those who can not remember thepast are condemned to repeat it.”
6. State-of-the-Art : Technology
7. State-of-the-Art : Gate Count
8. State-of-the-Art : Frequency
9. State-of-the-Art : Clock Domains
10. State-of-the-Art : Computer Design Hardware
11. State-of-the-Art : RTL Language Abstract Data Types
Design reusability
Compiled concepts
Design Management
Structure replication
12. State-of-the-Art : Design In Packaged Power’s design creation stage, we use the Renoir editors to create your design.
Since a picture is worth a thousand words, being able to express you design with graphics makes your design easier to understand and describe to other members of your team. With Packaged Power you can express your design in any of the five editors: block diagrams, truth tables, flow charts, state machines, and text.In Packaged Power’s design creation stage, we use the Renoir editors to create your design.
Since a picture is worth a thousand words, being able to express you design with graphics makes your design easier to understand and describe to other members of your team. With Packaged Power you can express your design in any of the five editors: block diagrams, truth tables, flow charts, state machines, and text.
13. State-of-the-Art : Failures
14. State-of-the-Art : FPGA APEX and Virtex at 3+ Million Gates
Maximum Operating Frequency is ~200Mhz (pushing 300Mhz)
Large blocks of memory
Imbedded Processors (PowerPC, ARM, Mips)
Copper interconnect
On the FPGA front it isn’t much better, this is why Exemplar Logic using our ASIC expertise, is working with Xilinx and Altera and other FPGA vendors to promote a new design methodology. This is essential if are going to take advantage new found silicon.
One fact is, that as devices grow, so will your design environment. On the FPGA front it isn’t much better, this is why Exemplar Logic using our ASIC expertise, is working with Xilinx and Altera and other FPGA vendors to promote a new design methodology. This is essential if are going to take advantage new found silicon.
One fact is, that as devices grow, so will your design environment.
15. The Development Gap Today we find that their has been a very rapid increase in the available silicon that designers can use to implement there systems. In fact this growth rate of available silicon is exceeding the ability of designers to effectively implement and more importantly VERIFY these large designs.
The design gap is mainly due to the deep sub-micron effects on timing constraints. On the opposite, the verification gap is directly related to the size of current designs and to the fact that the verification flow has not really changed for years.
As a leading supplier of verification technology, Mentor Graphics has developed new verification tools among which a true next generation equivalence checker that will help in closing this verification gap.
Today we find that their has been a very rapid increase in the available silicon that designers can use to implement there systems. In fact this growth rate of available silicon is exceeding the ability of designers to effectively implement and more importantly VERIFY these large designs.
The design gap is mainly due to the deep sub-micron effects on timing constraints. On the opposite, the verification gap is directly related to the size of current designs and to the fact that the verification flow has not really changed for years.
As a leading supplier of verification technology, Mentor Graphics has developed new verification tools among which a true next generation equivalence checker that will help in closing this verification gap.
16. System / SOC Design Methodology
17. Adjusting to a New Methodology Team Design
IP Logic
More software content
Heavy with memory
Less synthesis / more chip level assembly Why?
Because designs are getting larger. Not in terms of blocks, but in terms of the number of block.
IP, Bottom up design, incremental design, if you are not using this methodology today, you will be in your next design. You need a tool that is designed for this challenge. Why?
Because designs are getting larger. Not in terms of blocks, but in terms of the number of block.
IP, Bottom up design, incremental design, if you are not using this methodology today, you will be in your next design. You need a tool that is designed for this challenge.
18. Effects of the Design Flow
19. ASIC versus FPGA design
20. A Designer’s Life
21. How to make a better designer Provide proper training
Designers went to college to learn digital logic design, but most have less than 10 hours RTL training.
Provide a proven Design Methodology
Enforce Design for Quality techniques
Quality circuits are always easier to manufacture and are the most profitable.
Functionality is only a minor part of the design process. Using Performance Orient Design techniques are the key to a successful product development
22. Performance Oriented Design Techniques RTL Coding Styles
Design Architecture trade-offs
Design Structure
Timing Optimization
Physical Optimization
23. Coding style impact Coding style does impact performance
It affect FPGAs more than ASICs
Different level of RTL
Different descriptions give different results
Tools are also part of the equation
Different tools give different results
Learn to know your tool !!! OKOK
24. The Keys to Language Synthesis Data Types
Packages
Ports
Hierarchy
Combinational Logic
Relational Operators
Arithmetic Operators
Sequential Logic
Memory
IOs
25. Structuring A Design A design should read like a book.
Table of contents : An explanation of the design structure.
Logical flow from beginning to end.
Chapters : Logical breaks in a design.
Commentary : Comments on complex structure in the design.
26. Source Code Control
27. Hierarchy
28. Understand what the RTL does!!
29. Serial / Priority Structure
30. Parallel Structure
31. Tri-State
32. Bi-directional Buffer
33. Relational Operators
34. Addition Operators
35. Resource Sharing (when it really hurts) OKOK
36. Multiplication Operator
37. Pipelined Multipliers Improve timing by introducing parallelism
Registers, introduced by pipelining may have modest area impact
Requirements
Certain constructs in the input RTL source code description
Output of the multiplier must be registered.
Optimal pipeline stages = log2(input data bus width)
A 16 bit databus => optimal pipeline value of 4;
32 bit bus => optimal pipeline value of 5.
38. A little Algebra goes a long ways Minimize all arithmetic equation to eliminate operators.
Frequency increased dramatically.
39. D Flip-flop
40. Complex Clock Enables
41. Latches
42. Counter
43. State Machine Tools have made progress with FSM compilers
Reachability analysis, highly optimal results
Extended encoding techniques
Without FSM ‘one hot’ is often the best choice
Deflates the next state decoding logic ‘cloud’
FSM compiler without ‘Safe’ State
Implements the functionality, however the state machine may not be totally bullet proof
The ‘Safe’ option
‘default’ switch in the case may be ignored
Recovery logic is implemented to go back to the reset state
The ‘Exact’ implementation
You want a better match with simulation
Performance is not an obstacle
Your design works in a harsh environment
Check with Tom HillCheck with Tom Hill
44. State Machine Try the ‘safe’ for SynplicityTry the ‘safe’ for Synplicity
45. Read Only Memory (ROM)
46. Single Port Rams Look at Synplicity limitations
Check with Tom HillLook at Synplicity limitations
Check with Tom Hill
47. Dual Port Rams Look at Synplicity limitations
Check with Tom HillLook at Synplicity limitations
Check with Tom Hill
48. Content Addressable Memory (CAM) Use a CAM when address translation is needed.
Use CAMs for sparsely used addresses.
CAMs replace large priority encoders.
60% area reduction
80% timing reduction
49. Checklist for performance Pipeline for high performance
Make hardware work in parallel
Optimize late-arriving signals
Control arithmetic circuits
Use IP and hard-macros OKOK
50. Parallel Gates
51. Attributes Attributes enable...
Mapping control
DLLs setup
IOB flop control
Ram initialization
Soft macros for speed
Synthesis attributes helpful for...
Improved usability
Name preservation
Replication
Resource sharing
Speed / area control
FSM encoding
52. Physical Optimization Floor Plan your FPGA.
Produces a faster circuit
Circuit is more predictable and repeatable.
Timing convergence occurs quickly.
Back Annotate real timing data.
Allows 2nd pass of synthesis works on real critical paths.
53. FPGA High-Level Floorplanner Tight links to Exemplar’s synthesis tool.
Position blocks into regions of device
Generates area constraints
Required for new Incremental design flow
Useful for Design Planning
56. Constraint Based Clustering Another cause of excessive delay can be traced to high fan out nets. Many times a designer can inadvertently infer many loads, for example on a bus, with out realizing it.
This is where LeonardoSpectrum’s TrueTiming Logic replication algorithm kicks in.. By accurately identifying the correct path, we can then apply logic replication to remove excessive loading on a path, thus reducing the delays cause by long routes across the design.
Our TrueTiming algorithms are design to help you meet timing. Combined with LeonardoSpectrum's Incremental flow, where only the nets that have been changed are required to be rerouted, TimeCloser technology gets you to market FAST. Another cause of excessive delay can be traced to high fan out nets. Many times a designer can inadvertently infer many loads, for example on a bus, with out realizing it.
This is where LeonardoSpectrum’s TrueTiming Logic replication algorithm kicks in.. By accurately identifying the correct path, we can then apply logic replication to remove excessive loading on a path, thus reducing the delays cause by long routes across the design.
Our TrueTiming algorithms are design to help you meet timing. Combined with LeonardoSpectrum's Incremental flow, where only the nets that have been changed are required to be rerouted, TimeCloser technology gets you to market FAST.
57. Logic Replication Another cause of excessive delay can be traced to high fan out nets. Many times a designer can inadvertently infer many loads, for example on a bus, with out realizing it.
This is where LeonardoSpectrum’s TrueTiming Logic replication algorithm kicks in.. By accurately identifying the correct path, we can then apply logic replication to remove excessive loading on a path, thus reducing the delays cause by long routes across the design.
Our TrueTiming algorithms are design to help you meet timing. Combined with LeonardoSpectrum's Incremental flow, where only the nets that have been changed are required to be rerouted, TimeCloser technology gets you to market FAST. Another cause of excessive delay can be traced to high fan out nets. Many times a designer can inadvertently infer many loads, for example on a bus, with out realizing it.
This is where LeonardoSpectrum’s TrueTiming Logic replication algorithm kicks in.. By accurately identifying the correct path, we can then apply logic replication to remove excessive loading on a path, thus reducing the delays cause by long routes across the design.
Our TrueTiming algorithms are design to help you meet timing. Combined with LeonardoSpectrum's Incremental flow, where only the nets that have been changed are required to be rerouted, TimeCloser technology gets you to market FAST.
58. Critical Path Restructuring Another cause of excessive delay can be traced to high fan out nets. Many times a designer can inadvertently infer many loads, for example on a bus, with out realizing it.
This is where LeonardoSpectrum’s TrueTiming Logic replication algorithm kicks in.. By accurately identifying the correct path, we can then apply logic replication to remove excessive loading on a path, thus reducing the delays cause by long routes across the design.
Our TrueTiming algorithms are design to help you meet timing. Combined with LeonardoSpectrum's Incremental flow, where only the nets that have been changed are required to be rerouted, TimeCloser technology gets you to market FAST. Another cause of excessive delay can be traced to high fan out nets. Many times a designer can inadvertently infer many loads, for example on a bus, with out realizing it.
This is where LeonardoSpectrum’s TrueTiming Logic replication algorithm kicks in.. By accurately identifying the correct path, we can then apply logic replication to remove excessive loading on a path, thus reducing the delays cause by long routes across the design.
Our TrueTiming algorithms are design to help you meet timing. Combined with LeonardoSpectrum's Incremental flow, where only the nets that have been changed are required to be rerouted, TimeCloser technology gets you to market FAST.
59. User Applied Physical Constraints Preserve signals
Assign nets to secondary routing resources
Specify fanout on net by net basis
60. Design Techniques for Million Gate, High Speed FPGAs