Precision Timed Embedded Systems Using TickPAD Memory

Precision Timed Embedded Systems Using TickPAD Memory Matthew M Y Kuo* Partha S Roop* Sidharta Andalam† Nitish Patel* *University of Auckland, New Zealand †TUM CREATE, Singapore

Introduction • Hard real time systems • Need to meet real time deadlines • Catastrophic events may occur when missed • Synchronous execution approach • Good for hard real time systems • Deterministic • Reactive • Aids static timing analysis • Well bounded programs • No unbounded loops or recursions

Synchronous Languages • Executes in logical time • Ticks • Sample input → computation → emit output • Synchronous hypothesis • Tick are instantaneous • Assumes system is executes infinitely fast • System is faster than environment response • Worst case reaction time • Time between two logical ticks • Languages • Esterel • Scade • PRET-C • Extension to C

PRET-C • Light-weight multithreading in C • Provides thread safe memory access • C extension implemented as C macros

Introduction • Practical System require larger memory • Not all applications fit on on-chip memory • Require memory hierarchy • Processor memory gap [1] Hennessy, John L., and David A. Patterson. Computer Architecture: A Quantitative Approach. San Francisco, CA: Morgan Kaufmann, 2011.

Introduction • Traditional approaches • Caches • Scratchpads • However, • Scant research for memory architectures tailored for synchronous execution and concurrency.

Caches CPU Main Memory

Caches CPU Main Memory • Traditionally Caches • Small fast piece of memory • Temporal locality • Spatial locality • Hardware Controlled • Replacement policy Cache

Caches CPU Main Memory • Hard real time systems • Needs to model the architecture • Compute the WCRT • Caches models • Trade off between length of computation time and tightness • Very tight worse case estimate is not scalable Cache

Scratchpad CPU Main Memory • Scratchpad Memory (SPM) • Software controlled • Statically allocated • Statically or dynamically loaded • Requires an allocation algorithm • e.g. ILP, Greedy SPM

Scratchpad CPU Main Memory • Hard real time systems • Easy to compute tight the WCRT • Reduces the worst case performance • Balance between amount of reload points and overheads • May perform worst than cache in the worst case performance SPM

TickPAD CPU Main Memory Cache SPM • Good at overall performance • Hardware controlled • Good at worst case performance • Easy for fast and tight static analysis

TickPAD CPU Main Memory TPM Cache SPM • Good at overall performance • Hardware controlled • Good at worst case performance • Easy for fast and tight static analysis

TickPAD CPU Main Memory TPM • TickPAD Memory • TickPAD - Tick Precise Allocation Device • Memory controller • Hybrid between caches and scratchpads • Hardware controlled features • Static software allocation • Tailored for synchronous languages • Instruction memory

TickPAD Design flow

PRET-C main int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2

PRET-C main Computation int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2

PRET-C main Spawn children threads int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2

PRET-C main End of tick – Synchronization boundaries int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2

PRET-C main Child thread terminate int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2

PRET-C main Main thread resume int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2

PRET-C Execution main t1 t3 t2 Sample inputs Time

PRET-C Execution main t1 t3 t2 main Time

PRET-C Execution main t1 t3 t2 main t1 Time

PRET-C Execution main t1 t3 t2 main t1 t2 Time

PRET-C Execution main t1 t3 t2 main t1 t2 t2 Time

PRET-C Execution main t1 t3 t2 Emit Outputs main t1 t2 t2 Time

PRET-C Execution main t1 t3 t2 1 tick (reaction time) main t1 t2 t2 Time

PRET-C Execution main t1 t3 t2 local tick main t1 t2 t2 Time

Assumptions 4 Instructions 1 Cache Line Takes 1 burst transfer from main memory Cache miss, takes 38 clock cycles [2] buffer Buffers are 1 cache line in size Each instructions takes 2 cycles to execute 2. J. Whitham and N. Audsley. The Scratchpad Memory Management Unit for Microblaze: Implémentation, Testing, and Case Study. Technical Report YCS-2009-439, University of York, 2009.

TickPAD - Overview

TickPAD - Overview • Spatial memory pipeline • To accelerate linear code

TickPAD - Overview • Associative loop memory • For predictable temporal locality • Statically allocated and Dynamically loaded

TickPAD - Overview • Tick address queue • Stores the resumptions address of active threads

TickPAD - Overview • Tick instruction buffer • Stores the instructions at the resumption of the next active thread • To reduce context switching overhead at state/tick boundaries

TickPAD - Overview • Command table • Stores a set of commands to be executed by the TickPAD controller.

TickPAD - Overview • Command buffer • A buffer to store operands fetched from main memory • Command requiring 2+ operands

Spatial Memory Pipeline • Cache – on miss • Fetches from main memory on to cache • First instruction miss, subsequence instructions on that line hits • Requires history of cache needed for timing analysis • Scratchpad – unallocated • Executes from main memory • Miss cost for all instructions • Simple timing analysis

Spatial Memory Pipeline • Memory controller • Single line buffer • Simple analysis • Analyse previous instruction • First instruction miss, subsequence instructions on that line hits Main Memory CPU

Spatial Memory Pipeline • Computation required many lines of instructions • Exploit spatial locality • Predictability prefetch the next line of instructions • Add another buffer

Spatial Memory Pipeline • To preserve determinism • Prefetch only active if no branch

Spatial Memory Pipeline

Precision Timed Embedded Systems Using TickPAD Memory

Precision Timed Embedded Systems Using TickPAD Memory

Presentation Transcript

Embedded Systems

EMBEDDED SYSTEMS

Online Memory Compression for Embedded Systems

Embedded Systems

The Case for Precision Timed (PRET) Machines

Predictable Programming on a Precision Timed Architecture

Dynamic Memory Management for new embedded systems

Scheduling Using Timed Automata

Embedded Systems

Precision Timed Machine (PRET)

Embedded Systems

Reasoning about Timed Systems Using Boolean Methods

Predictable Programming on a Precision Timed Architecture

Precision Timed (PRET) Architecture

Programming Memory-Constrained Networked Embedded Systems

Reasoning about Timed Systems Using Boolean Methods

Predictable Programming on a Precision Timed Architecture

EMBEDDED SYSTEMS