190 likes | 313 Views
Explore the AMULET3i, a next-gen asynchronous ARM with innovative Harvard core, developed within the OMI ATOM project. Learn about tools like LARD and BALSA for efficient design and synthesis in asynchronous circuits. Discover future challenges and alternative timing solutions for high-performance SoC designs.
E N D
AMULET3i - asynchronous SoC Steve Furber - sfurber@cs.man.ac.uk • Agenda: • AMULET3i • Design tools • Future problems
AMULET3 • a third generation asynchronous ARM • performance comparable with ARM9 • radically new internal organisation • based on reorder buffer • Harvard core, unified I/D memory • under development within the OMI ATOM project • first application as as part of a telecommunications controller
AMULET3 core organisation • Harvard core • forward from reorder buffer • out-of-order completion • in-order register update • aborts handled at writeback
AMULET3H local bus RAM • segmented memory • I & D ports arbitrate at each block • quad-word I & D line buffers
AMULET3 tools • LARD • behavioural modelling tool for async design • 10x designer productivity vs Asim • Petrify • much enhanced FORCAGE descendant • can handle wider range of circuits • Balsa • synthesis tool used for DMA controller
Tools - LARD • Language for Asynchronous Research and Development • parallel processes with communication primitives • extensive data types • modelling of elapsed time • used to model AMULET3 • available from AMULET web site
Tools - LARD • Features • time view • block view • HLL debug • test generation • co-simulation • Platforms • UNIX/Linux
Tools - BALSA • Synthesis system for asynchronous circuits • similar to Philips ‘Tangram’ • used for AMULET3H DMA controller • direct HLL to netlist compilation • syntax directed translation • peephole optimisation
Tools - Petrify • Petri Net modelling tool • for low-level asynchronous circuits • speed-independent synthesis • technology mapping • very powerful • can be tricky to use • extensively used to design AMULET3 modules
AMULET3 validation • workstations now powerful enough to run ARM validation suite under TimeMill • around 8 CPU-weeks total • testing full functionality now very hard • very complex system-on-chip • design aimed at high performance • timing margins much reduced • validation complex and uncertain
AMULET3 - problems • high performance target • timing margins must be small • timing is hard to verify • very dependent on accurate extraction, models • modelling tools are imperfect • e.g. crosstalk • bus wire delay 1.5ns +/- 1ns crosstalk • careful layout gives 0.9ns +/- 0.15ns • how can we be sure such factors are OK?
The Future • timing accuracy is getting harder • wire delays will become more significant • crosstalk will get worse • on-chip transistor variance will increase • higher speeds will lead to higher noise • will delay-matching be viable? • alternatives are dual-rail or other DI codes • incur significant area and power overheads
Alternatives to bundled data • Delay-insensitive codes • timing encoded in data • dual-rail encoding • 100% area overhead c.f. bundled data • significant power cost • e.g. NCL from Theseus • deal just announced with Motorola • use conventional synthesis tools • timing closure ceases to be an issue
Alternatives to bundled data • Delay-insensitive codes • N-of-M codes • 3-of-6 code • 50% area overhead • 3 transitions to send 4 bits • 2-of-7 code • 75% area overhead • 2 transitions to send 4 bits • well-suited to inter-chip communication • may suit on-chip buses
Conclusions • complex async design is feasible • standard tools are just about survivable • additional tools improve productivity • ideal design flow: • LARD-like specification • formal verification of high-level properties • automated synthesis onto module library • timing closure is the major problem • may ultimately rule out bundled data