predictable programming on a precision timed architecture n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Predictable Programming on a Precision Timed Architecture PowerPoint Presentation
Download Presentation
Predictable Programming on a Precision Timed Architecture

Loading in 2 Seconds...

play fullscreen
1 / 29

Predictable Programming on a Precision Timed Architecture - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

Predictable Programming on a Precision Timed Architecture. Hiren D. Patel UC Berkeley hiren@eecs.berkeley.edu Joint work with: Ben Lickly , Isaac Liu, Edward A. Lee - UC Berkeley Sungjun Kim, Stephen A. Edwards - Columbia University. Edwards and Lee - Case for PRET.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Predictable Programming on a Precision Timed Architecture' - hoshi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
predictable programming on a precision timed architecture

Predictable Programming on a Precision Timed Architecture

Hiren D. Patel

UC Berkeley

hiren@eecs.berkeley.edu

Joint work with:

Ben Lickly, Isaac Liu, Edward A. Lee - UC Berkeley

Sungjun Kim, Stephen A. Edwards - Columbia University

edwards and lee case for pret
Edwards and Lee - Case for PRET

Patel, UC Berkeley, PRET

  • 2007 – Edwards and Lee made a case for precision timed computers (PRET machines)
    • Predictability
    • Repeatability

S. A. Edwards and E. A. Lee, The case for the precision timed (PRET) machine. In Proceedings of the 44th Annual Conference on Design Automation (San Diego, California, June 04 - 08, 2007). DAC '07. ACM, New York, NY, 264-265.

2

edwards and lee case for pret1
Edwards and Lee - Case for PRET

Patel, UC Berkeley, PRET

  • Unpredictability
    • Difficulty in determining timing behavior through analysis
  • Non-repeatability
    • Lack of guarantee that every execution yields the same timing behavior
  • Brittleness
    • Small changes have big effects on timing behavior

3

brittleness
Brittleness

Source: www.skycontrol.net

Patel, UC Berkeley, PRET

Expensive affair

Tight coupling of software and hardware

Reliance on testing for validation

Upgrading difficult

Solution: stockpile

4

but wait
But wait …

Sebastian Altmeyer, Christian Hümbert, Björn Lisper, and Reinhard Wilhelm. Parametric Timing Analysis for Complex Architectures. In Proceedings of the 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'08), pages 367-376, Kaohsiung, Taiwan, August 2008. IEEE Computer Society.

Patel, UC Berkeley, PRET

  • Real-time scheduling
    • Worst-case execution time
      • Detailed model of hardware
      • Large engineering effort
      • Valid for particular hardware models
    • Interrupts, inter-process communication, locks …
      • Bench testing
    • Brittle

5

precise timing and high performance
Precise Timing and High Performance

Traditional

Alternative

Caches

Scratchpads

Deep out-of-order pipelines

Thread-interleaved pipelines

Function-only ISAs

ISAs with timing instructions

Function-only languages

Languages and programming models with timing

Best-effort communication

Fixed-latency communication

Time-sharing

Multiple independent processors

Patel, UC Berkeley, PRET

6

outline
Outline

Patel, UC Berkeley, PRET

Introduction

Related Work

PRET Machine

Programming Example

Future Work

Conclusion

7

related work
Related Work

Patel, UC Berkeley, PRET

  • Java Optimized Processor
    • Schoeberl et al. [2003]
  • Timing instructions
    • Ip and Edwards [2006]
  • Reactive processors
    • Von Hanxleden et al. [2005]
    • Salcic et al. [2005]
  • Virtual Simple Architecture
    • Mueller et al. [2003]

8

semantics of timing instructions
Semantics of Timing Instructions

Deadline instructions

Denote the required execution time of a block

When decoded

Stall instruction if timer value is not 0

Otherwise set timer value to new value

deadi $t0, 10

deadi $t0, 8

deadi $t0, 0

L0:

deadi $t0, 10

b L0

Straight Line Block 0

Straight Line Block 1

Loop

Block

Patel, UC Berkeley, PRET

9

tracing a program fragment
A: deadi $t0, 6

B: sethi %hi(0x3f800000), %g1

C: or %g1, 0x200, %g1

D: st %g1, [ %fp + -12 ]

E: deadi $t0, 8

F: …

0

6

5

4

3

2

1

0

8

Tracing A Program Fragment

cycle

$t0

Patel, UC Berkeley, PRET

precision timed architecture
Precision Timed Architecture

Scratchpad memories

Round-robin thread scheduling

Thread-interleaved pipeline

Time-triggered main memory

access

Patel, UC Berkeley, PRET

11

memory hierarchy
Memory Hierarchy

Core

Main

Mem.

SPM

SPM

SPM

SPM

SPM

SPM

DMA

Patel, UC Berkeley, PRET

  • Clocks
    • Main clock
    • Derived clocks
  • Instruction and data scratchpad memories
    • 1 cycle access latency
  • Main memory
    • 16MB size
    • Latency of 50ns
    • Frequency:250Mhz
      • ~13 cycles latency

12

thread interleaved pipeline
Thread-interleaved Pipeline

Decrement Deadline

Timers

Fetch

F/D

Decode

D/R

Stall if

Deadline

Instruction

Reg. Access

R/E

Execute

Check main memory access

E/M

Memory

M/W

Increment PC

WriteBack

Patel, UC Berkeley, PRET

  • Thread stalls
    • Main memory access
    • Multi-cycle operations
    • Deadline instructions
  • Replay mechanism
    • Execute same PC next iteration
    • Multi-cycle ALU ops replay instructions

13

time triggered access through memory wheel
Best-case access time

If accessed 1st cycle

Worst-case access time

If accessed 2nd cycle of window

Time-Triggered Access through Memory Wheel
  • Decouple thread’s access pattern
  • Time-triggered access

90 cycles until thread0 completes

On time

On time

On time

On time

On time

thread0

thread1

thread2

thread3

thread4

thread5

thread0

Patel, UC Berkeley, PRET

14

tool flow
Tool Flow

GCC 3.4.4, SystemC 2.2, Python 2.4

Boot code

Motorola SREC files

GCC to compile boot code

and program code

C programs

timing instructions

Patel, UC Berkeley, PRET

15

simple mutual exclusion example
Simple Mutual Exclusion Example

Write to output

Write to shared data

Read from shared data

Patel, UC Berkeley, PRET

  • Producer followed by Consumer and Observer
    • Consumer and Observer execute together
  • Loop rate of two rotations of memory wheel
    • 1st for Producer to write
    • 2nd Consumer and Observer to read

16

video game example
Video Game Example

Main-Control Thread

Graphic Thread

VGA-Driver Thread

Pixel Data

Command

Even Buffer

Even Queue

Command

Pixel Data

Odd Buffer

Odd Queue

Swap (When Sync

Requested and When

Odd Queue Empty)

Swap (When sync

requested and when

Vertical blank)

Update Screen (Sync request)

Refresh (Sync request)

Sync (After queue swapped)

Sync (After buffer swapped)

Patel, UC Berkeley, PRET

17

timing requirements
Timing Requirements

Signal

Timing Requirement

Pixel

Cycles

V. Sync

64µs

1611

V. Back-porch

1.02ms

25679

Draw 480 lines

15.25ms

V. Front-porch

350µs

8811

H. Sync

3.77µs

96

H. Back-porch

1.89µs

48

Draw 640 pixels

25.42µs

H. Front-porch

0.64µs

16

Patel, UC Berkeley, PRET

18

timing implementation
Timing Implementation

Patel, UC Berkeley, PRET

  • Pixel-clock using derived clock
    • 25.175Mhz
    • ~ 39.72ns cycle period
  • Drawing 16 pixels

19

future work
Future Work

Architecture

DMA

DDR2 main memory model

Thread synchronization primitives

Shared data between threads

Real-time Benchmarks

With timing requirements

Programming models

Memory allocation schemes

Synchronizations

Patel, UC Berkeley, PRET

20

conclusion
Conclusion

What we want …

Time as a first class citizen of embedded computing

Predictability

Repeatability

Where we are at …

PRET cycle-accurate simulator

Release …

Patel, UC Berkeley, PRET

21

extras
Extras

Patel, UC Berkeley, PRET

more on brittleness
More on Brittleness
  • Small changes may have big effects on timing behavior

Theorem (Richard’s anomalies):

If a task set with fixed priorities, execution times, and precedence constraints is optimally scheduled on a fixed number of processors, then increasing the number of processors, reducing execution times, or weakening precedence constraints can increase the schedule length.

Richard L. Graham, “Bounds on the performance of scheduling algorithms”, in E. G. Coffman, Jr.(ed.), Computer and Job-Shop Scheduling Theory, John Wiley, New York, 1975.

Patel, UC Berkeley, PRET

richard s anomalies

T1/3

T2/2

T3/2

T4/2

1

2

3

4

9

5

6

7

8

T9/9

T5/4

T6/4

T7/4

T8/4

Richard’s Anomalies
  • 9 tasks, 3 processors, priority list, precedence order, execution times.

0

3

12

Patel, UC Berkeley, PRET

richard s anomalies reducing execution times

T1/2

T2/1

T3/1

T4/1

1

2

3

4

9

5

6

7

8

T9/8

T5/3

T6/3

T7/3

T8/3

Richard’s Anomalies: Reducing Execution Times
  • eTime’ = eTime - 1

0

3

12

Patel, UC Berkeley, PRET

richard s anomalies more processors

T1/3

T2/2

T3/2

T4/2

1

2

3

4

9

5

6

7

8

T9/9

T5/4

T6/4

T7/4

T8/4

Richard’s Anomalies: More Processors
  • 4 processors

0

3

12

15

Patel, UC Berkeley, PRET

richard s anomalies changing priority list

T1/3

T2/2

T3/2

T4/2

1

2

6

3

7

4

3

8

9

T9/9

T5/4

T6/4

T7/4

T8/4

Richard’s Anomalies: Changing Priority List
  • L = (T1,T2,T4,T5,T6,T3,T9,T7,T8)

0

3

12

Patel, UC Berkeley, PRET

brittleness again
Brittleness Again…
  • In general, all task scheduling strategies are brittle

Patel, UC Berkeley, PRET