slide1
Download
Skip this Video
Download Presentation
QoS Support in Operating Systems

Loading in 2 Seconds...

play fullscreen
1 / 67

QoS Support in Operating Systems - PowerPoint PPT Presentation


  • 312 Views
  • Uploaded on

QoS Support in Operating Systems. Banu Özden Bell Laboratories [email protected] Vision. Service providers will offer storage and computing services through their distributed data centers connected with high bandwidth networks to globally distributed clients.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'QoS Support in Operating Systems' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
vision
Vision
  • Service providers will offer storage and computing services
    • through their distributed data centers
    • connected with high bandwidth networks
    • to globally distributed clients.
  • Clients will access these services via diverse devices and networks, e.g.:
    • mobile devices and wireless networks,
    • high-end computer systems and high bandwidth networks.
  • These services will become utilities (e.g., storage utility, computing utility).
  • Eventually resources will be exchanged and traded between geographically dispersed data centers to address fluctuating demand.
motivation
Motivation
  • QoS support for (server) applications:
    • web servers
    • video servers
  • Isolation and differentiation of different
    • entities serviced on the same platform
    • applications running on the same platform
  • QoS requirements:
    • client-based
    • service-based
    • content-based
design goals
Design Goals
  • QoS support in a general purpose operating system
  • Remain compatible with the underlying operating system
  • QoS parameters:
      • Isolation
      • Differentiation
      • Fairness
      • (Cumulative) throughput
  • Flexible resource management
      • capable of implementing a large set of provisioning needs
      • supports a large set of server applications without imposing significant changes to their design
talk outline
Talk Outline
  • Schedulers
  • Reservation File System (reservfs)
  • Tagging
  • Web Server Experiments
  • Access Control and Profiles
  • Eclipse/BSD Status
  • Related Work
  • Future Work
proportional sharing
Proportional sharing
  • Generalized processor sharing (GPS)

weight of flow i

service received by flow i in

set of flows

      • For any flow i continuously backlogged in
      • Thus, rate of flow i in is:
qos guarantees
QoS Guarantees
  • Fairness
  • Throughput
  • Packet delay
schedulers in eclipse
Schedulers in Eclipse
  • Resource characteristics differ
  • Different hierarchical proportional-share schedulers for resources
    • Link scheduler: WF2Q
    • Disk scheduler: YFQ
    • CPU scheduler: MTR-LS
    • Network input: SRP
hierarchical gps example
server

server

0.8

0.2

0.4

0.2

0.4

company A

company B

company A

page 1

company A

page 2

company B

0.5

0.5

page 1

page 2

Hierarchical GPS Example

hierarchical

proportional sharing

proportional sharing

schedulers
Schedulers
  • Hierarchical proportional-sharing (GPS)

descendant queue nodes of node n

serviced received by scheduler node n

in

set of immediate descendant nodes of the parent of node n

  • For any node n continuously backlogged in
link aggregation
link

scheduler

link

scheduler

Link Aggregation
  • Need to incrementally scale bandwidth
  • Resource aggregation is emerging as a solution:
    • Grouping multiple resources into a single logical unit
  • QoS over such aggregated links?
multi server model
GPS

MSFQ

Nr

Nr

r

r

r

Multi-Server Model
  • Multi Server Fair Queuing (MSFQ)
    • A packetized algorithm for a system with N links, each with a bandwidth of r, that approximates a GPS system with a single link with Nr bandwidth

Reference

model

Packetized

scheduler

multi server model contd
Multi-Server Model (Contd.)
  • Goals:
    • Guarantee bandwidth and packet delay bounds that are independent of the number of flows
    • Allow flows arrive and depart dynamically
    • Be work-conserving
  • Algorithm:
    • When a server is idle, schedule the packet that would complete transmission earliest under a single server GPS system with a bandwidth of Nr

Sigcomm 2001

msfq preliminary properties
a1

a2

a1

a2

GPS

GPS

1

2

1

2

MSFQ

serv1

WFQ

1

serv

1

2

serv2

2

time =

0

1

2

3

4

time =

0

1

2

3

4

a1

a2

a3

a4

a5

a6

a7

GPS

1

2

3

4

5

6

7

serv1

6

1

4

7

2

5

serv2

MSFQ

3

serv3

time =

0

1

2

3

4

5

6

7

8

9

10

MSFQ Preliminary Properties

Multi-Server specific properties

  • Ordering: a pair of packets scheduled in the order of their GPS finishing times may complete in reverse order
  • GPS busy MSFQ busy, but converse is not true
  • Non-coinciding busy periods
  • Work backlog?
msfq properties
GPS

service

MSFQ

Packet delay

time

GPSi

service

MSFQi

Service discrepancy

time

MSFQ Properties
  • Maximum service discrepancy (buffer requirement)
  • Maximum packet delay
  • Maximum per-flow service discrepancy
schedulers contd
Schedulers (contd.)
  • Disk scheduling with QoS
    • tradeoffs between QoS and total disk performance
      • driver queue management
      • queue depth
      • queue ordering
      • fragmentation
    • Hierarchical YFQ
  • CPU scheduling with QoS
    • length of cpu phases are not known a priori
    • cumulative throughput
    • Hierarchical MTR-LS
eclipse s key elements
Eclipse’s Key Elements
  • Hierarchical, proportional share resource schedulers
  • Reservation, reservation file system (reservfs)
  • Tagging mechanism
  • Access and admission control, reservation domain
reservations and schedulers
Reservations and Schedulers
  • (Resource)reservations
    • unit for QoS assignment
    • similar to the concept of a flow in packet scheduling
  • Hierarchical schedulers
    • a tree with two kinds of nodes:
      • scheduler nodes
      • queue nodes
      • each node corresponds to a reservation
  • Schedulers are dynamically reconfigurable
web server example
disk bandwidth

cpu cycles

0.8

0.8

0.8

0.2

0.2

0.2

company A

company B

company A

company B

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

page 1

page 1

page 1

page 1

page 2

page 2

page 2

page 2

Web Server Example
  • Hosting two companies’ web sites, each with two web pages

network bandwidth

company A

company B

reservfs
Web Server

Video Server

Application

Interface

Reservation file system

Scheduler

Interface

CPU scheduler

Link scheduler

Disk scheduler 1

Disk scheduler 2

Net 1

Net 2

CPU 1

CPU 1

Disk1

Disk2

Disk3

Reservfs
  • We built the reservation file system
    • to create and manipulate reservations
    • to access and configure resource schedulers
reservfs22
/reserv

cpu

fxp0

fxp1

da0

Reservfs
  • Hierarchical
  • Each reservation directory corresponds to a node at a scheduler
  • Each resource is represented by a reservation directory under /reserv
reservfs23
Reservfs
  • Two types of reservation directories:
    • scheduler directories
    • queue directories
  • Scheduler directories are hierarchically expandable
  • Queue directories are not expandable
reservfs24
/reserv

cpu

fxp0

fxp1

ca0

q0

q0

r1

q0

q0

q1

q0

share

newqueue

newreserv

share

backlog

Reservfs
  • Scheduler directory:
    • share
    • newqueue
    • newreserv
    • special queue: q0
  • Queue directory:
    • share
    • backlog
reservfs25
CPU scheduler

Link scheduler

Disk scheduler

Net 1

Net 2

CPU 1

Disk1

Disk2

Reservfs

Web Server

Video Server

Application Interface:

Reservation file system

Scheduler Interface:

reservfs api
Reservfs API
  • Creation of a new queue/scheduler reservation
    • fd=open(newqueue/newreserve,O_CREAT)
    • fd of newly created share file
creating queue reservation
da0

q1

q0

q1

share

newqueue

newreserv

share

backlog

Creating Queue Reservation

/reserv

cpu

fxp0

fxp1

da0

q0

q0

r1

q0

q0

q0

q1

fd=

open(“newqueue”,O_CREAT)

creating scheduler reservation
da0

da0

q0

q1

r0

r0

q0

q1

share

newqueue

newreserv

q0

share

newreserv

newqueue

fd=

open(“newreserv”,O_CREAT)

Creating Scheduler Reservation

/reserv

cpu

fxp0

fxp1

q0

q0

r1

q0

q0

q1

reservfs api29
Reservfs API
  • Changing QoS parameters
    • writing a weight and min value to the share file
  • Getting QoS parameters
    • reading the share file
  • Getting/setting queue parameters
    • reading/writing the backlog file
reservfs api30
Reservfs API

Command line output:

killerbee$ cd /reserv

killerbee$ ls -al

total 5

dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 .

drwxr-xr-x 20 root wheel 512 Sep 12 21:54 ..

dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 cpu

dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 fxp0

dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 fxp1

killerbee$ cd fxp0

killerbee$ ls -alR

total 6

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

-rw------- 1 root wheel 1 Sep 15 11:39 newqueue

-rw------- 1 root wheel 1 Sep 15 11:39 newreserv

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

-r-------- 1 root wheel 1 Sep 15 11:39 share

./q0:

total 4

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

-rw------- 1 root wheel 1 Sep 15 11:39 backlog

-rw------- 1 root wheel 1 Sep 15 11:39 share

reservfs api31
Reservfs API

killerbee$ cd r0

killerbee$ ls -al

total 6

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

-rw------- 1 root wheel 1 Sep 15 11:39 newqueue

-rw------- 1 root wheel 1 Sep 15 11:39 newreserv

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

-r-------- 1 root wheel 1 Sep 15 11:39 share

killerbee$ echo “50 1000000” > newqueue

killerbee$ ls -al

total 6

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

-rw------- 1 root wheel 1 Sep 15 11:39 newqueue

-rw------- 1 root wheel 1 Sep 15 11:39 newreserv

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q1

-r-------- 1 root wheel 1 Sep 15 11:39 share

killerbee$ cd q1

killerbee$ ls -al

total 4

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

-rw------- 1 root wheel 1 Sep 15 11:39 share

-rw------- 1 root wheel 1 Sep 15 11:39 backlog

killerbee$ cat share

50 1000000

killerbee$

reservfs32
CPU scheduler

Link scheduler

Disk scheduler

Net 1

Net 2

CPU 1

Disk1

Disk2

Reservfs

Web Server

Video Server

Application Interface:

Reservation file system

Scheduler Interface:

reservfs scheduler interface
Reservfs Scheduler Interface
  • Schedulers registers by providing

the following interface routines via

reservfs_register():

      • init(priv)
      • create(priv, parent, type)
      • start(priv, parent, type)
      • delete(priv, node)
      • get/set(priv, node, values, type)
reservfs implementation
Reservfs Implementation
  • Built via vnode/vfs interface
  • A reserv{} structure represents each reservfs file
  • reserv{} representing a directory contains a pointer to the corresponding node at scheduler
  • Scheduler independent
  • Implements garbage collection mechanism
talk outline35
Talk Outline
  • Introduction
  • Schedulers
  • Reservation File System (reservfs)
  • Tagging
  • Web Server Experiments
  • Access Control and Profiles
  • Eclipse/BSD Status
  • Related Work
  • Future Work
tagging
Tagging
  • A request arriving at a scheduler must be associated with the appropriate reservation
  • Each request is tagged with a pointer to a queue node
    • mbuf{}, buf{} and proc{} are augmented
  • How is a request tagged?
tagging contd
Tagging (contd.)
  • For a file, its file descriptor is tagged with a disk reservation
  • For a connected socket, its file descriptor is tagged with a network reservation
  • For unconnected sockets, we provide a late tagging mechanism
  • Each process is tagged with a cpu reservation
  • We associate reservations with references to objects
default list of a process
Default List of a Process
  • Default reservations of a process, one for each resource
  • A list of tags (pointers to queue directories)
  • Used when a tag is otherwise not specified
  • Two new files are added for each process pid in /proc/pid
    • /proc/pid/default to represent the default list
    • /proc/pid/cdefault to represent the child default list
default list of a process contd
Default List of a Process (contd.)
  • Reading these file returns the name of default queue directories, e.g.,

/reserv/cpu/q1

/reserv/fxp0/r2/q1

/reserv/da0/r1/q3

  • A process, with the appropriate access rights, can change the entries of default files
implicit tagging
Implicit Tagging
  • The file descriptor returned by open(), accept() or connect() is automatically tagged with default
  • The tag of the file descriptor of an unconnected socket is set to default at sendto() and sendmesg()
  • When a process forks, the child process is tagged with the default cpu reservation
explicit tagging
Explicit Tagging
  • The tag of a file descriptor can be set/read with new commands to fcntl():
    • F_SET_RES
    • F_GET_RES
  • A new system call chcpures() to change the cpu reservation of a process
reservation domains
Reservation Domains
  • Permissions of a process to use, create and manipulate reservations
  • The reservation domain of a process is independent of its protection domain
reservations and reservation domains
disk bandwidth

network bandwidth

cpu cycles

0.8

0.8

0.8

0.2

0.2

0.2

reserv A

reserv B

reserv A

reserv B

reserv A

reserv B

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

reserv 1 reserv 2

reserv 1 reserv 2 reserv 1 reserv2

reserv 1

reserv2

Reservations and Reservation Domains

Reservationdomain

1

Reservation domain 2

reservfs garbage collection
Reservfs Garbage Collection
  • Based on reference counts
    • every application that is using a specific node adds a reference on it (to the vnode)
  • Triggered by the vnode layer
    • when the last application finishes using the node this is garbage collected
  • fcntl() available to maintain the node even if no references to it exist
srp input processing
SRP Input Processing
  • Demultiples incoming packets
    • before network and higher-level protocol processing
  • Unprocesed input queue per socket
  • Processes input protocols in context of receiving process
  • Drops packets when per-socket queue is full
  • Avoids receive livelock
talk outline46
Talk Outline
  • Introduction
  • Schedulers
  • Reservation File System (reservfs)
  • Tagging
  • Web Server Experiments
  • Access Control and Profiles
  • Eclipse/BSD Status
  • Related Work
  • Future Work
qos support for web server
QoS Support for Web Server
  • Virtual hosting with Apache server:
    • separate Apache server for each virtual host
    • single Apache server for all virtual hosts
  • Eclipse/BSD isolates and differentiates performance of virtual hosts
    • multiple Apache servers----implicit tagging
    • single Apache server----explicit tagging
      • We implemented an Apache module for explicit tagging
experimental setup
Experimental Setup
  • Apache Web Server:
    • A multi-process server
    • (Pre)spawns helper processes
    • A process handles one request at a time
    • Each process calls accept() to service the next connection request
  • HTTP clients run on five different machines
  • Servers are running FreeBSD 2.2.8 or Eclipse/BSD 2.2.8 on a PC (266 MHz Pentium Pro, 64 MB RAM, 9 GB Seagate ST39173W fast wide SCSI disk)
  • Machines are connected with a 10/100 Mbps Ethernet switch
experiments
/reserv

cpu

fxp0

da0

q0

q0

q0

q1

q1

q1

q2

q2

q2

Experiments
  • Hosting two sites with two servers

Reservation domain of server 1

Reservation domain of server 2

experiments56
/reserv

cpu

fxp0

da0

q0 q1 q2 q3 q4

q0 q1 q2 q3 q4

q0 q1 q2 q3 q4

Experiments
  • Hosting virtual hosts with a single Apache server
  • Four web sites
apache module for tagging
Apache Module for Tagging
  • Apache code not modified: module added
  • Apache config defines which reservation to use based on “a rule”, e.g.,
    • directory-based
    • port-based
  • Module uses fcntl() and chcpures() for explicit tagging
talk outline60
Talk Outline
  • Introduction
  • Reservation File System (reservfs)
  • Tagging
  • Schedulers
  • Apache Web Server Experiments
  • Access Control and Profiles
  • Eclipse/BSD Status
  • Related Work
  • Future Work
access control
Access Control
  • Permissions of a process to use or modify the objects belonging to the reservfs
  • Currently, a process can use/modify reservations “below” its default list
  • Soon, Eclipse/BSD will have more sophisticated access control
    • process can have different permissions on a reservation (e.g., permission for tagging but not for modifying)
    • process can have permission on arbitrary set of reservations
multiple default lists profiles
Multiple Default Lists: Profiles
  • Multiple default lists (profiles) simplifies explicit tagging
  • Server applications typically serve different entities (depending on client, content, etc.) with different QoS assignments
  • Global list of system-wide profiles
  • Profiles provide an easy way to manage and share “default” reservations of different entities
talk outline63
Talk Outline
  • Introduction
  • Reservation File System (reservfs)
  • Tagging
  • Schedulers
  • Apache Web Server Experiments
  • Access Control and Profiles
  • Eclipse/BSD Status
  • Related Work
  • Future Work
eclipse bsd status
Eclipse/BSD Status
  • Derived from FreeBSD
    • 3.2
    • 2.2.8
  • FreeBSD compatible
  • Eclipse/BSD code is available at http://www.bell-labs.com/project/eclipseincluding:
    • reservfs
    • hierarchical network scheduling
    • hierarchical disk scheduling
    • hierarchical cpu scheduling
    • input scheduling
    • also, Apache module for tagging and other applications
related work
Related Work
  • ALTQ
    • good for routers
    • not sufficient for QoS support in a general-purpose OS
  • Resource Containers
    • different from Reservation Domains
    • limited (similar to our Profiles)
    • not flexible enough to specify a number of useful provisioning needs
future work
Future work
  • QoS on cluster of servers
  • Support for fine-grained automatic tagging
  • More server applications
  • Supporting other QoS parameters
  • Other schedulers
ad