1 / 40

Miloš Krstić, Dr.-Ing.

Asynchronous and Synchronous Design Techniques for Communication Systems Applications Faculty of Electronic Engineering, Nis, Serbia. Miloš Krstić, Dr.-Ing. Overview. Motivation Synchronous design solutions GALS - State of the Art Introduction to request-driven GALS technique

judda
Download Presentation

Miloš Krstić, Dr.-Ing.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asynchronous and Synchronous Design Techniques for Communication Systems ApplicationsFaculty of Electronic Engineering, Nis, Serbia Miloš Krstić, Dr.-Ing.

  2. Overview • Motivation • Synchronous design solutions • GALS - State of the Art • Introduction to request-driven GALS technique • Asynchronous wrapper for request-driven GALS blocks • GALSification of the baseband processor for IEEE 802.11a standard • Testing our GALS baseband design • Design-flow and implementation • Experimental results • Conclusions 2

  3. Motivation – Key Design Issues for Wireless Systems • A system integration framework for the complex digital blocks is needed in order to avoid clock-skew and timing-closure problems. • Lowering of the EMI has great importance in the mixed-signal environment. • Minimization of the power consumption is a key issue in mobile systems. • We are aiming to achieve high data throughput with low latency. 3

  4. Challenges with Synchronous Design • Most digital systems today operate synchronously. • However, the complexity of wireless communication systems grows enormously. 4

  5. Synchronous solutions • There are synchronous solutions for the integration, power and EMI problems. • System integration Use of deskewing circuits, hybrid networks, DLLs, PLLs… • Reduction of the power consumption Clock gating, Voltage scalling… • Reduction of EMI Clock modulation, Clock jittering… 5

  6. System Integration – Synchronous Solutions • Increasing challenges in distributing low-jitter clocks in presence of power-supply noise. • Power consumption and complexity is very high. • Justified only for high-performance ASICs. Clock distribution network with deskewing circuit (Geannopoulos and Dai 1998) 6

  7. Power Reduction – Synchronous Solutions • Pdyn=A·Ceff·Vdd2·f • Some power saving techniques are based around activity reduction Example is clock-gating • The others are trying to reduce supply voltage and/or frequency Examples are Voltage Scaling and Dynamic Voltage Scaling 7

  8. FUB FUB clock clock enable Power Reduction – Clock Gating • Pros: • Significant power reduction • Cons: • Increases gate count, needs additional control logic • Not effective when used for less than several clock periods • Clock-tree design even more harder! 8

  9. Fast Low Supply Voltage High Supply Voltage Slow Slow Power Reduction – Voltage Scaling • Pros: • Saves power very effectively • Cons: • Additional power delivery network • Needs special care of interface between power domains 9

  10. EMi Reduction • One powerful method to reduce EMI is the spread spectrum technique which modulates the signal and spreads the energy over a wider frequency range. • The other possibility is the introduction of clock jitter • Finally, asynchronous circuit design reduces EMI very effectively. 10

  11. GALS as a design technique • We mentioned several methods and tools for menaging each design challenge separately. • There are almost no technique that address all these issues in the same time. • However, GALS techniques have the potential to solve some of the most challenging design issues of SoC integration of communication systems. 11

  12. What is GALS? • GALS is abbreviationfor Globally-Asynchronous Locally-Synchronous systems. Req Ack Data 12

  13. GALS as a Powerful Design Technique • In the wireless communication systems GALS can approach the main design challenges. • GALS makes data transfer between the blocks very easy. • Design problems as timing closure or clock-tree generation are limited to the level of much smaller local blocks. • Decoupling of local blocks from central clock source reduces spectral noise considerably. • Power saving is automatically integrated in asynchrnous wrapper. 13

  14. Power reduction with GALS • Clock signal is the dominant source of power consumption . • First estimations showed that about 30% of power savings could be expected in the clock net due to the application of GALS. • Recently, some more pessimistic power estimation figures were presented • GALS techniques offer independent setting of frequency and voltage levels for each locally synchronous module. • When using dynamic voltage scaling (DVS), an average energy reduction of up to 30% can be reached Power distribution in high-performance CPU 14

  15. Potential for reducing EMI with GALS • We have simulated noise generated on the power supply line in the synchronous and request-driven GALS system. dB GALS introduces reduction of about 20 dB -20 -40 -60 -80 -100 -120 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Frequency GHz dB -20 -40 -60 -80 -100 -120 -140 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Frequency GHz 15

  16. Classical GALS approach • Published in Jens Muttersbach et al., Globally-Asynchronous Locally-Synchronous Architectures to Simplify the Design of On-Chip Systems, In Proc. of ASIC/SOC Conference, pp. 317-321, Sept. 1999. Locally Synchronous Module 1 Output port Input port Locally Synchronous Module 2 Data handshake Local Clock Generator 1 Local Clock Generator 2 Asynchronous Wrapper 1 Asynchronous Wrapper 2 stretch2 stretch1 16

  17. Pausable Clock Generator 17

  18. Main challenges of the typical GALS methods • In many solutions, the problems of data transfer and throughput is critical. Most of them can perform data transfer every second clock cycle of the local clock. • Some described circuits can theoretically transfer data every clock cycle. However, the intensive stretching of the pausable clock generator will significantly diminish the practical performance. • The latency of the transferred data is not known in advance and may vary significantly from one data transfer to the other one. • It is not very practical to use the ring oscillators for local clock generation. • All solutions are oriented towards a very general application. They are not optimised for specific systems and environmental demands. 18

  19. Basic concept of the request-driven operation • This approach covers point-to-point communication with very intensive but bursty data transfer. • When receiving input burst, GALS block can operate in a request-driven mode. • When there is no input activity, the data stored inside the locally synchronous pipeline has to be flushed out. Then a local clock generator drives the GALS blocks. • A Time-out function controls the transition from request driven operation to local clock generation mode. 19

  20. Request-driven asynchronous wrapper • Local clock can be generated either internally or externally. 20

  21. What can we gain from this GALS technique? • Reliable and fast transfer of large bursts of data is achieved. Data transfer is possible at every clock cycle of synchronous block. • In request-driven mode operation there is no arbitration in input port. Consequently, the circuit immediately responds to input requests. • The clock speed is determined by the master and not by the slower participant in the communication. • The local clock can be generated internally or externally. • This proposed architecture offers an efficient power-saving mechanism, similar to clock gating. • EMI should be reduced due to varying delays and frequencies in different asynchronous wrappers. 21

  22. Building the wrapper components - input port • Input port has to provide control of the dataflow according to a ‘broad’ 4-phase handshake protocol. • The input port consists of a speed-independent (SI) input controller along with few additional gates that have to provide glitch-free transitions of the input signals. 22

  23. Input controller specification Idle mode • Input controller is modeled as an AFSM (asynchronous finite state machine). • The controller is specified according to burst-mode requirements. • Burst-mode AFSM is implemented as ‘Huffman Machine’ without explicit latches. Request-driven mode outputs inputs A X Hazard-Free Combinational Network Local clock generation mode B Y Z C Transitional mode State (several bits) State graph of the input controller 23

  24. Input controller implementation • Burst-mode input controller is synthesized using 3D tool that supports 2-level hazard-free logic minimization and achieves optimal state assigment: REQ_INT = REQ_A1 REQ_INT + ACKC' REQ_INT + REQ_A1 ACKC' ST' ACKEN' ACK_A = ACKC' REQ_INT + REQ_A1 RST +ACKC' ST ACKI1' ACKEN Z0' + REQ_A1 ACKC' ST' ACKEN' ACKEN = ACKI1 + REQ_A1 ACKEN + ST ACKEN RST = STOP + ACKC' REQ_INT + REQ_A1 RST + ST RST + ACKC' ST ACKI1' ACKEN Z0' + REQ_A1 ACKC' ST' ACKEN' REQ_I1 = REQ_A1 ST ACKI1' ACKEN' Z0 = ACKI1 + REQ_A1' ACKC + REQ_A1' ST' Z0 + ACKC' ACKEN Z0 + ACKC ACKEN' Z0 • Logic equations are automatically converted into synthesizable structural VHDL code with our 3DC tool. • Formal analysis of the asynchronous wrapper is performed. 24

  25. Externally-driven GALS Wrapper

  26. Clock Menagement Unit

  27. Baseband processor for WLAN • The goal of our project is to develop a single-chip wireless broadband communication system in the 5 GHz band. • The modem is compliant with the IEEE802.11a WLAN standard . • System uses Orthogonal Frequency Division Multiplexing (OFDM) with data rates ranging from 6 to 54 Mbit/s. • The synchronousbaseband processor was implemented as an ASIC (700k gates). 27

  28. Baseband Processor Transmitter Scrambler Pilot scrambler Encoder Input buffer Guard interval insertion Mapper Interleaver IFFT Preamble insertion Pilot insertion Signal field generator Receiver Parallel converter Descrambler Viterbi decoder Deinterleaver Demapper Buffer 20 - 80 Channel estimator FFT Synchronizer datapath Mapper Synchronizer tracking Interleaver Encoder Buffer 80 -20 Structure of the synchronous baseband processor • Baseband processor includes receiver and transmitter datapath structure. • Very complex blocks are implemented such as Viterbi decoder, FFT, IFFT, CORDIC processors, ... 80 Msps block 20 Msps block 28

  29. Design challenges in the baseband processor • Design of the baseband processor involves the challenges as: - several clock domains, - global clock tree generation, - large number of clock leaves (36 k flip- flops), - clock skew handling, - timing closure between the different modules, - clock gating, - power consumption, - EMI. • Our request–driven GALS architecture was developed as a possible solution for those problems. 29

  30. Tx_int (async-sync interface) Activation interface Rx_int (async-sync interface) GALS partitioning Baseband Processor Tx_1 Tx_2 Tx_3 Scrambler Pilot scrambler • The partitioning process has to take into account possible power saving. Encoder Input buffer Guard interval insertion Mapper Interleaver IFFT Preamble insertion Pilot insertion Signal field generator Rx_3 Rx_TRA Rx_2 80 Msps block Parallel converter Descrambler Viterbi decoder Deinterleaver Demapper Buffer 20 - 80 Token rate adaptation Channel estimator FFT Synchronizer datapath 20 Msps block Rate adaption block Rx_1 Mapper FIFO TA Synchronizer tracking Interleaver Encoder Buffer 80 -20 Interface block 30

  31. Test strategy • We are using a hardware tester which is strictly cycle based and cannot react to asynchronous output signals of the circuit. • The GALS arbitration processes preclude cycle level determinism. • We want to have a possibility to run very complex functional tests internally. • Applied test technique should support system diagnosis. • A test strategy based on Built-In Self-Test (BIST) is proposed. • BIST reduces the effort for generating a test program and enables us to use a synchronous tester. 31

  32. Design for Testability in GALS • TPG and TDE are based on the linear feedback shift register structure with embedded additional logic. • A central BIST controller performs control of the test procedure. • We can run hierarchical tests. • This BIST technique can be used as a method for prototype verification. • In combination with the scan approach, BIST can be even used as a basis for the manufacturing test. 32

  33. Asynchronous wrappers AFSM specifaction Design flow Synchronous blocks 3D - Logic synthesis Functional specification 3DC tool – translation from 3D to structural VHDL VHDL description LoLA Model Sim Formal analysis • We have used our in-house 0.25 CMOS process. • Asynchronous wrapper is equivalent to about 1.3 k inverter gates. Only tunable clock generation is 0.9 k gates. • Asynchronous wrapper has throughput up to 150 Msps in request driven mode and 100 Msps in local mode. This application needs 80 Msps. Abstract behavioural simulation Synopsys DC Gate mapping Model Sim Realistic behavioural simulation Synopsys DC Timing driven synthesis Model Sim Postsynthesis simulation Prime Power Power estimation Cadence Silicon Encounter Layout Model Sim Back annotation Prime Power Power estimation Tape-out 33

  34. Area and power distribution • Area and power statistics are based on the synthesized netlist data. Locally synchronous blocks occupy around 90% of the total area, The BIST circuitry requires around 3.5%, interface blocks 2.9%, and asynchronous wrappers 2%. • Based on the switching activities, in the realistic transceiver scenario, power estimation with Prime Power tool has been performed. Synchronous datapath logic uses most of the power (around 52.4%), then local synchronous clock trees are using 34.5%, async-to-sync interfaces 7%, and asynchronous wrappers 2.9%. • After layout, the estimated power consumption is 324.6 mW. 34

  35. Implementational results • Our GALS baseband processor is fabricated and tested. • The total number of pins is 120 and the silicon area including pads is 45.1 mm2. • Measured dynamic power dissipated in the pure synchronous baseband processor was 332 mW, and for the GALS baseband processor slightly lower, at 328 mW. 35

  36. Improving System Integration with GALS • Synchronous baseband processor challenges: - several clock domains, - global clock tree generation, - large number of clock leaves, - clock skew handling, - timing closure between blocks, - clock gating. Solved by GALS architecture No global clock in GALS Clock leaves distributed over GALS blocks Clock skew is reduced from 660ps to 486 ps Communication between the blocks through handshaking Clock-gating embedded in the asynchronous wrapper 36

  37. EMI measurement (I) • The supply voltage variation spectrum of the inner processor core is measured. ~ 5 dB 37

  38. EMI measurement (II) • Additionally, instantaneous supply voltage peaks are reduced from 140 mV (synchronous design) from cycle to cycle to the less than 100 mV (GALS). • This reduction can be very important for mixed-signal designs and for secure systems. • An application with fine-grained GALS partitioning can lead to results closer to theoretical maximum reduction. 38

  39. Conclusions • GALS can be successfully used as a design technique in the wireless communication systems. • The main goal of simplifying the system integration was achieved. • Furthermore, we achieved a significant reduction of supply noiseand a slightly lower dynamic power consumption. 39

  40. Future activities in GALS area • Automation of design flow for GALS systems. • Further activities in reducing EMI. GALS - Synthesis Modelling &Verification Flexible bilding of Model • • System description • • Netlist, • • Rules for partitioning Abstract Verification • • Layout • • Circuit synthesis • • High - level - Datapath model • • Layout scripts GALS - Libraries Clock jitter • • Gates • • Jitter generators • • asynchronous basic components • • EMI - - Analysis • • parameterized Wrapper FPGA - - Synthesis • • • asynchronous FPGA Wrapper • • • Desynchronisation Thank you very much for your attention. 40

More Related