two case studies in predictable application scheduling using rialto nt l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Two Case Studies in Predictable Application Scheduling Using Rialto/NT PowerPoint Presentation
Download Presentation
Two Case Studies in Predictable Application Scheduling Using Rialto/NT

Loading in 2 Seconds...

play fullscreen
1 / 46

Two Case Studies in Predictable Application Scheduling Using Rialto/NT - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Two Case Studies in Predictable Application Scheduling Using Rialto/NT. Michael B. Jones – Microsoft Research John Regehr – University of Virginia Stefan Saroiu – University of Washington. Application Case Studies. Two applications needing predictable execution on Windows 2000

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Two Case Studies in Predictable Application Scheduling Using Rialto/NT' - shelly


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
two case studies in predictable application scheduling using rialto nt

Two Case Studies in Predictable Application Scheduling Using Rialto/NT

Michael B. Jones – Microsoft Research

John Regehr – University of Virginia

Stefan Saroiu – University of Washington

application case studies
Application Case Studies
  • Two applications needing predictable execution on Windows 2000
    • Soft Modem Driver
    • Digital Audio Player
  • The case studies
    • analyze behavior on normal Windows 2000
    • study improvements possible using Rialto/NT CPU Reservation mechanism
consumer real time
Consumer Real-Time
  • General-purpose Operating Systems,such as Windows 2000:
    • maximize aggregate throughput
    • approximate fair sharing of the resources
  • Increasing use of time-dependent tasks
    • signal processing, audio, video
  • Need support for:
    • predictable scheduling for independently developed applications
    • low latency responses
    • explicit resource allocation mechanisms
rialto nt abstractions
Rialto/NT Abstractions
  • Two real-time software abstractions:
    • CPU Reservations – ongoing reservation for at least X time units out of every Y units for a thread
    • Time Constraints – one-shot time reservation for specified amount of work between start time and deadline
  • Case studies use only CPU Reservations
rialto nt implementation
Rialto/NT Implementation
  • Rialto/NT developed on top of Windows 2000 priority scheduler
  • Limitations:
    • CPU Reservations must be integer multiples of milliseconds
    • Frequency of reservations must be power-of-two multiple of 1ms
first case study

First Case Study

Predictable Scheduling for a Soft Modem

why study soft modems
Why Study Soft Modems ?
  • Signal Processing done on host CPU:
    • requires predictable scheduling
    • requires low latency responses
  • While coexisting with other system activities
    • Soft Modem is a background real-time task
  • Successful in home computer market:
    • Low cost
    • Easy to update – software upgrade
methodology
Methodology
  • Instrumented Windows 2000 performance kernel:
    • Logs predefined and custom events
    • Writes them to a memory buffer
    • Dumps buffers to disk at end of trace
  • Driver Software:
    • No source for signal processing code
  • Measurement Environment:
    • All experiments run with normal-priority spinning competitor thread
  • System:
    • Windows 2000 Professional
    • Pentium II 450 MHz (uniprocessor)
    • 384 MB ECC SDRAM - 100 MB allocated to logging
vendor driver version processing in interrupt int
Vendor Driver Version - Processing in Interrupt (INT)
  • Operation of the modem:
    • 1. DMA transfers between A/D and D/A and physical memory
    • 2. When enough data samples, the modem raises an interrupt
    • 3. Inside ISR, process incoming data and provide outgoing samples, before buffers exhausted
  • Uses input and output data buffers holding 512 16-bit samples (1024 bytes/buffer)
three additional versions
Three Additional Versions
  • DPC Version (DPC)
    • The ISR queues a DPC
    • DPC performs signal processing
  • Thread Version (THR)
    • The ISR queues a DPC that signals a thread via a semaphore
    • Thread performs signal processing
    • Experimented with several different priorities
  • Rialto/NT Version (RES)
    • Same as THR, but thread scheduled using Rialto/NT real-time periodic CPU Reservation
interrupt rate
Interrupt Rate

3 different phases, interrupts very regular

Falls within PC 99 recommended interrupt rates of 3-16ms

elapsed times in isr int
Elapsed Times in ISR (INT)

1.8 ms with repeatable worst case of 3.3 ms

PC 99 recommends maximum time during which a driver-based modem disables interrupts should not exceed 100 µs

cpu utilization
CPU Utilization

14.7% sustained load on 450MHz Pentium II

elapsed times in isr dpc
Elapsed Times in ISR (DPC)

ISR times now small, typically < 6µs

elapsed times in queued dpc
Elapsed Times in Queued DPC

But now long DPC times: 1.8ms avg., 3.3 max (same as elapsed times in ISR for INT)

PC 99 recommends that the total execution time required for all queued DPCs should not exceed 500 µs

samples pending to be processed int thr 24
Samples Pending to be Processed(INT & THR 24)

Small relative to 512 sample buffer size

samples pending to be processed thr 8
Samples Pending to be Processed (THR 8)

Unsurprisingly, contention kills modem

latency results
Latency Results
  • Set the multimedia timers to fire once every millisecond
  • Register a routine to be called every millisecond
  • Routine does very little work
    • Stores cycle counter value and sleeps again
  • Histograms show differences between recorded times and ideal times
coexisting thread latencies int
Coexisting Thread Latencies (INT)

Maximum 5313µs between wakeups

coexisting thread latencies dpc
Coexisting Thread Latencies (DPC)

Maximum 4396µs between wakeups

coexisting thread latencies thr 24
Coexisting Thread Latencies (THR 24)

Maximum 2239µs between wakeups

what have we learned so far
What Have We Learned So Far?
  • Signal processing in the context of the interrupt handler is:
    • unnecessary
    • detrimental to the latencies and predictability of coexisting activities
  • Vendor choice understandable
    • For any priority there is a potentially unbounded delay between the interrupt and the thread running
  • In practice
    • Delays are reasonable for well-configured systems [Intel OSDI ’99]
    • Using interrupts extreme form of priority inflation
two possible solutions
Two Possible Solutions
  • Rate Monotonic Analysis – determine the “right” priority assignments among all threads - two problems:
    • Assumes cooperative priority assignment among all threads - unrealistic
    • Working priority assignment dependent upon timing requirements of all threads
      • Changes in application mix may require changes in priority assignments
  • Use a time-based real-time scheduler
    • Such as Rialto/NT
samples pending to be processed res 2ms 8ms 25
Samples Pending to be Processed (RES 2ms/8ms – 25%)

Fits well within 512-sample buffer size

file transfer times
File Transfer Times

Results for 10 copies of 200,000 bytes each

For 1/8, 2/15, 3/17, 4/17, 7/20 no test passed

modem reservation ranges
Modem Reservation Ranges

Sensitivity to both percentage and gaps

If period < 12.5ms, must get 14.7% to work

If period > 12.5ms, (period – amount) >= 12.5ms must also hold

soft modem conclusions
Soft Modem Conclusions
  • Signal Processing in interrupt context is:
    • Unnecessary
    • Detrimental to the predictability and latencies of the coexisting activities
  • The DPC version has similar problems
  • Threads help alleviate these problems
    • Modem runs well with real-time priorities and non-real-time competition
    • However modem threads may interfere with other threads
  • Real-time scheduler allows
    • Control over modem’s degree of interference with other time-sensitive activities
    • Performance isolation for threads using reservations
industry perspective
Industry Perspective
  • Vendor did try their own THR version
    • Worked fine during normal load
    • However, modem was starved when:
      • Copying data between two IDE devices
      • Using USB scanner (Intel 440BX chipset) that turned off interrupts for 30-50 ms
    • Therefore they shipped the INT version
  • Vendor is willing to be a “good citizen” only if ensured that others would be as well
  • Systematic latency timing verification of components is needed to enforce good behavior
soft dsl is coming
Soft DSL is Coming
  • More demanding than soft modems
    • 4ms processing period
  • G.lite
    • 1.531Mbps downstream and 512Kbps upstream
    • ~ 25% of a 600 MHz Pentium III
  • Full rate DSL
    • 3.062Mbps downstream and 512Kbps upstream
    • Nearly 50% of a 600 MHz Pentium III
  • Soft Bluetooth period 312.5µs
further soft modem studies
Further Soft Modem Studies
  • Software-based Digital Subscriber Line (SoftDSL) studies
  • Multiple Soft Modems within the same machine
  • Similar studies on multiprocessors
second case study

Second Case Study

Predictable Scheduling for Digital Audio

methodology34
Methodology
  • Empirically reverse-engineer thread requirements in a complex, legacy soft real-time application
    • without use of source code
  • Assign CPU reservations to threads
    • without modifying the application
  • Measure application behavior during contention
windows media player
Windows Media Player
  • Default player for mp3, wav, avi, mpeg
  • Experimental method
    • Modelled contention using spinning thread at various priorities
    • Gave CPU Reservations to media player threads
    • Played an mp3 song
      • Listened for glitches
      • Used instrumented kernel to detect buffer under-runs
media player thread structure simplified
Media Player Thread Structure (Simplified)

(*) Received CPU Reservations in some experiments.

mp3 playback w o contention
MP3 Playback w/o Contention
  • Kmixer thread (top) runs every 10ms
  • MP3 decoder (4th line) runs every 100ms
  • Works fine
starvation caused by competing thread @ priority 10
Starvation Caused by Competing Thread @ Priority 10
  • Media Player runs only when NT priority inversion avoidance logic kicks in
media player reservation
Media Player + Reservation
  • 1ms every 16ms reserved for decoder thread
  • Competing with priority 10 thread
  • Works fine
priority inversion caused by competing thread
Priority Inversion Caused by Competing Thread
  • Competitor thread (priority 9) preempts MP3 decoder while holding Kmixer buffer lock
  • Kmixer misses next two time slots (x)
    • Starves, causes audio glitch
  • Fix: raise decoder priority before grabbing lock

x

x

media player deadlock
Media Player Deadlock
  • Circular wait among Media Player threads
  • Deadlock broken by a timeout
  • Fix: file a bug report…
media player results
Media Player Results
  • Expected
    • In the presence of contention, the Windows priority scheduler allows real-time apps to starve
    • This can be fixed by giving real-time threads CPU Reservation
  • Unexpected
    • Competitor thread changes sequencing, exposes races in Media Player
      • Hard to write correct programs with many threads & mutexes
      • Fixed using priority ceiling emulation
implications of results
Implications of Results
  • Periods of threads in complex legacy apps can be reverse engineered
    • Amounts are platform-dependent and are harder
  • Next step to store application requirements and use middleware to automatically assign reservations
    • No application support needed
    • Potentially a way around the chicken/egg problem of using reservations in a world of legacy OSs and applications
possible continued media experiments
Possible ContinuedMedia Experiments
  • Study software DVD player
    • CPU intensive and time sensitive
overall conclusions
Overall Conclusions
  • Status quo insufficient
    • Applications either inflate their priorities
      • as did the soft modem driver
    • or are at the mercy of applications that may be run at higher priorities
      • as is the case with the digital audio player
  • CPU Reservations solve this problem
    • by allowing applications to reliably obtain the time they need
    • while allowing other applications to do the same
for more information
For More Information
  • See Mike Jones (mbj@microsoft.com):
    • http://research.microsoft.com/~mbj/
  • or John Regehr (regehr@cs.utah.edu):
    • http://www.cs.utah.edu/~regehr/
  • or Stefan Saroiu (tzoompy@cs.washington.edu):
    • http://www.cs.washington.edu/homes/tzoompy/
  • Related papers at Mike’s web site