Capriccio scalable threads for internet services
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Capriccio: Scalable Threads for Internet Services PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on
  • Presentation posted in: General

Capriccio: Scalable Threads for Internet Services. Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley {jrvb,jcondit,zf, necula, [email protected] http://capriccio.cs.berkeley.edu. The Stage. Highly concurrent applications

Download Presentation

Capriccio: Scalable Threads for Internet Services

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Capriccio scalable threads for internet services

Capriccio: Scalable Threads for Internet Services

Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer

University of California at Berkeley

{jrvb,jcondit,zf, necula, [email protected]

http://capriccio.cs.berkeley.edu


The stage

The Stage

  • Highly concurrent applications

    • Internet servers & frameworks

      • Flash, Ninja, SEDA

    • Transaction processing databases

  • Workload

    • High performance

    • Unpredictable load spikes

    • Operate “near the knee”

    • Avoid thrashing!

Ideal

Peak: some resource at max

Performance

Overload: someresource thrashing

Load (concurrent tasks)


The price of concurrency

The Price of Concurrency

  • What makes concurrency hard?

    • Race conditions

    • Code complexity

    • Scalability (no O(n) operations)

    • Scheduling & resource sensitivity

    • Inevitable overload

  • Performance vs. Programmability

    • No current system solves

    • Must be a better way!

Threads

Ideal

Ease of Programming

Events

Threads

Performance


The answer better threads

The Answer: Better Threads

  • Goals

    • Simplify the programming model

      • Thread per concurrent activity

      • Scalability (100K+ threads)

    • Support existing APIs and tools

    • Automate application-specific customization

  • Tools

    • Plumbing: avoid O(n) operations

    • Compile-time analysis

    • Run-time analysis

  • Claim: User-Level threads are key


The case for user level threads

The Case for User-Level Threads

  • Decouple programming model and OS

    • Kernel threads

      • Abstract hardware

      • Expose device concurrency

    • User-level threads

      • Provide clean programming model

      • Expose logical concurrency

  • Benefits of user-level threads

    • Control over concurrency model!

    • Independent innovation

    • Enables static analysis

    • Enables application-specific tuning

App

User

Threads

OS


The case for user level threads1

The Case for User-Level Threads

  • Decouple programming model and OS

    • Kernel threads

      • Abstract hardware

      • Expose device concurrency

    • User-level threads

      • Provide clean programming model

      • Expose logical concurrency

  • Benefits of user-level threads

    • Control over concurrency model!

    • Independent innovation

    • Enables static analysis

    • Enables application-specific tuning

App

User

Threads

OS


Capriccio internals

Capriccio Internals

  • Cooperative user-level threads

    • Fast context switches

    • Lightweight synchronization

  • Kernel Mechanisms

    • Asynchronous I/O (Linux)

  • Efficiency

    • Avoid O(n) operations

    • Fast, flexible scheduling


Safety linked stacks

Safety: Linked Stacks

Fixed Stacks

  • The problem: fixed stacks

    • Overflow vs. wasted space

    • Limits thread numbers

  • The solution: linked stacks

    • Allocate space as needed

    • Compiler analysis

      • Add runtime checkpoints

      • Guarantee enough space until next check

Linked Stack


Linked stacks algorithm

Linked Stacks: Algorithm

  • Parameters

    • MaxPath

    • MinChunk

  • Steps

    • Break cycles

    • Trace back

  • Special Cases

    • Function pointers

    • External calls

    • Use large stack

3

3

2

5

2

4

3

6

MaxPath = 8


Linked stacks algorithm1

Linked Stacks: Algorithm

  • Parameters

    • MaxPath

    • MinChunk

  • Steps

    • Break cycles

    • Trace back

  • Special Cases

    • Function pointers

    • External calls

    • Use large stack

3

3

2

5

2

4

3

6

MaxPath = 8


Linked stacks algorithm2

Linked Stacks: Algorithm

  • Parameters

    • MaxPath

    • MinChunk

  • Steps

    • Break cycles

    • Trace back

  • Special Cases

    • Function pointers

    • External calls

    • Use large stack

3

3

2

5

2

4

3

6

MaxPath = 8


Linked stacks algorithm3

Linked Stacks: Algorithm

  • Parameters

    • MaxPath

    • MinChunk

  • Steps

    • Break cycles

    • Trace back

  • Special Cases

    • Function pointers

    • External calls

    • Use large stack

3

3

2

5

2

4

3

6

MaxPath = 8


Linked stacks algorithm4

Linked Stacks: Algorithm

  • Parameters

    • MaxPath

    • MinChunk

  • Steps

    • Break cycles

    • Trace back

  • Special Cases

    • Function pointers

    • External calls

    • Use large stack

3

3

2

5

2

4

3

6

MaxPath = 8


Linked stacks algorithm5

Linked Stacks: Algorithm

  • Parameters

    • MaxPath

    • MinChunk

  • Steps

    • Break cycles

    • Trace back

  • Special Cases

    • Function pointers

    • External calls

    • Use large stack

3

3

2

5

2

4

3

6

MaxPath = 8


Scheduling the blocking graph

Scheduling:The Blocking Graph

Web Server

  • Lessons from event systems

    • Break app into stages

    • Schedule based on stage priorities

    • Allows SRCT scheduling, finding bottlenecks, etc.

  • Capriccio does this for threads

    • Deduce stage with stack traces at blocking points

    • Prioritize based on runtime information

Open

Read

Write

Accept

Read

Write

Close


Resource aware scheduling

Resource-Aware Scheduling

  • Track resources used along BG edges

    • Memory, file descriptors, CPU

    • Predict future from the past

    • Algorithm

      • Increase use when underutilized

      • Decrease use near saturation

  • Advantages

    • Operate near the knee w/o thrashing

    • Automatic admission control


Thread performance

Thread Performance

  • Slightly slower thread creation

  • Faster context switches

    • Even with stack traces!

  • Much faster mutexes

Time of thread operations (microseconds)


Runtime overhead

Runtime Overhead

  • Tested Apache 2.0.44

  • Stack linking

    • 78% slowdown for null call

    • 3-4% overall

  • Resource statistics

    • 2% (on all the time)

    • 0.1% (with sampling)

  • Stack traces

    • 8% overhead


Web server performance

Web Server Performance


Future work

Future Work

  • Threading

    • Multi-CPU support

    • Kernel interface

  • (enabled) Compile-time techniques

    • Variations on linked stacks

    • Static blocking graph

    • Atomicity guarantees

  • Scheduling

    • More sophisticated prediction


Conclusions

Conclusions

  • Capriccio simplifies high concurrency

    • Scalable & high performance

    • Control over concurrency model

      • Stack safety

      • Resource-aware scheduling

      • Enables compiler support, invariants

  • Themes

    • User-level threads are key

    • Compiler techniques very promising


Apache blocking graph

Apache Blocking Graph


Microbenchmark buffer cache

Microbenchmark: Buffer Cache


Microbenchmark disk i o

Microbenchmark: Disk I/O


Microbenchmark producer consumer

Microbenchmark: Producer / Consumer


Microbenchmark pipetest

Microbenchmark: pipetest


  • Login