QoS Support in Operating Systems - PowerPoint PPT Presentation

Slide1 l.jpg
Download
1 / 67

QoS Support in Operating Systems. Banu Özden Bell Laboratories ozden@research.bell-labs.com. Vision. Service providers will offer storage and computing services through their distributed data centers connected with high bandwidth networks to globally distributed clients.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

QoS Support in Operating Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

QoS Support in Operating Systems

Banu Özden

Bell Laboratories

ozden@research.bell-labs.com


Vision l.jpg

Vision

  • Service providers will offer storage and computing services

    • through their distributed data centers

    • connected with high bandwidth networks

    • to globally distributed clients.

  • Clients will access these services via diverse devices and networks, e.g.:

    • mobile devices and wireless networks,

    • high-end computer systems and high bandwidth networks.

  • These services will become utilities (e.g., storage utility, computing utility).

  • Eventually resources will be exchanged and traded between geographically dispersed data centers to address fluctuating demand.


Eclipse bsd an operating system with quality of service support l.jpg

Eclipse/BSD:an Operating System with Quality of Service Support

Banu Özden

ozden@research.bell-labs.com


Motivation l.jpg

Motivation

  • QoS support for (server) applications:

    • web servers

    • video servers

  • Isolation and differentiation of different

    • entities serviced on the same platform

    • applications running on the same platform

  • QoS requirements:

    • client-based

    • service-based

    • content-based


Design goals l.jpg

Design Goals

  • QoS support in a general purpose operating system

  • Remain compatible with the underlying operating system

  • QoS parameters:

    • Isolation

    • Differentiation

    • Fairness

    • (Cumulative) throughput

  • Flexible resource management

    • capable of implementing a large set of provisioning needs

    • supports a large set of server applications without imposing significant changes to their design


  • Talk outline l.jpg

    Talk Outline

    • Schedulers

    • Reservation File System (reservfs)

    • Tagging

    • Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Proportional sharing l.jpg

    Proportional sharing

    • Generalized processor sharing (GPS)

      weight of flow i

      service received by flow i in

      set of flows

      • For any flow i continuously backlogged in

      • Thus, rate of flow i in is:


    Qos guarantees l.jpg

    QoS Guarantees

    • Fairness

    • Throughput

    • Packet delay


    Schedulers in eclipse l.jpg

    Schedulers in Eclipse

    • Resource characteristics differ

    • Different hierarchical proportional-share schedulers for resources

      • Link scheduler: WF2Q

      • Disk scheduler: YFQ

      • CPU scheduler: MTR-LS

      • Network input: SRP


    Hierarchical gps example l.jpg

    server

    server

    0.8

    0.2

    0.4

    0.2

    0.4

    company A

    company B

    company A

    page 1

    company A

    page 2

    company B

    0.5

    0.5

    page 1

    page 2

    Hierarchical GPS Example

    hierarchical

    proportional sharing

    proportional sharing


    Schedulers l.jpg

    Schedulers

    • Hierarchical proportional-sharing (GPS)

      descendant queue nodes of node n

      serviced received by scheduler node n

      in

      set of immediate descendant nodes of the parent of node n

    • For any node n continuously backlogged in


    Link aggregation l.jpg

    link

    scheduler

    link

    scheduler

    Link Aggregation

    • Need to incrementally scale bandwidth

    • Resource aggregation is emerging as a solution:

      • Grouping multiple resources into a single logical unit

    • QoS over such aggregated links?


    Multi server model l.jpg

    GPS

    MSFQ

    Nr

    Nr

    r

    r

    r

    Multi-Server Model

    • Multi Server Fair Queuing (MSFQ)

      • A packetized algorithm for a system with N links, each with a bandwidth of r, that approximates a GPS system with a single link with Nr bandwidth

    Reference

    model

    Packetized

    scheduler


    Multi server model contd l.jpg

    Multi-Server Model (Contd.)

    • Goals:

      • Guarantee bandwidth and packet delay bounds that are independent of the number of flows

      • Allow flows arrive and depart dynamically

      • Be work-conserving

    • Algorithm:

      • When a server is idle, schedule the packet that would complete transmission earliest under a single server GPS system with a bandwidth of Nr

    Sigcomm 2001


    Msfq preliminary properties l.jpg

    a1

    a2

    a1

    a2

    GPS

    GPS

    1

    2

    1

    2

    MSFQ

    serv1

    WFQ

    1

    serv

    1

    2

    serv2

    2

    time =

    0

    1

    2

    3

    4

    time =

    0

    1

    2

    3

    4

    a1

    a2

    a3

    a4

    a5

    a6

    a7

    GPS

    1

    2

    3

    4

    5

    6

    7

    serv1

    6

    1

    4

    7

    2

    5

    serv2

    MSFQ

    3

    serv3

    time =

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    MSFQ Preliminary Properties

    Multi-Server specific properties

    • Ordering: a pair of packets scheduled in the order of their GPS finishing times may complete in reverse order

    • GPS busy MSFQ busy, but converse is not true

    • Non-coinciding busy periods

    • Work backlog?


    Msfq properties l.jpg

    GPS

    service

    MSFQ

    Packet delay

    time

    GPSi

    service

    MSFQi

    Service discrepancy

    time

    MSFQ Properties

    • Maximum service discrepancy (buffer requirement)

    • Maximum packet delay

    • Maximum per-flow service discrepancy


    Schedulers contd l.jpg

    Schedulers (contd.)

    • Disk scheduling with QoS

      • tradeoffs between QoS and total disk performance

        • driver queue management

        • queue depth

        • queue ordering

        • fragmentation

      • Hierarchical YFQ

    • CPU scheduling with QoS

      • length of cpu phases are not known a priori

      • cumulative throughput

      • Hierarchical MTR-LS


    Eclipse s key elements l.jpg

    Eclipse’s Key Elements

    • Hierarchical, proportional share resource schedulers

    • Reservation, reservation file system (reservfs)

    • Tagging mechanism

    • Access and admission control, reservation domain


    Reservations and schedulers l.jpg

    Reservations and Schedulers

    • (Resource)reservations

      • unit for QoS assignment

      • similar to the concept of a flow in packet scheduling

    • Hierarchical schedulers

      • a tree with two kinds of nodes:

        • scheduler nodes

        • queue nodes

        • each node corresponds to a reservation

    • Schedulers are dynamically reconfigurable


    Web server example l.jpg

    disk bandwidth

    cpu cycles

    0.8

    0.8

    0.8

    0.2

    0.2

    0.2

    company A

    company B

    company A

    company B

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    page 1

    page 1

    page 1

    page 1

    page 2

    page 2

    page 2

    page 2

    Web Server Example

    • Hosting two companies’ web sites, each with two web pages

    network bandwidth

    company A

    company B


    Reservfs l.jpg

    Web Server

    Video Server

    Application

    Interface

    Reservation file system

    Scheduler

    Interface

    CPU scheduler

    Link scheduler

    Disk scheduler 1

    Disk scheduler 2

    Net 1

    Net 2

    CPU 1

    CPU 1

    Disk1

    Disk2

    Disk3

    Reservfs

    • We built the reservation file system

      • to create and manipulate reservations

      • to access and configure resource schedulers


    Reservfs22 l.jpg

    /reserv

    cpu

    fxp0

    fxp1

    da0

    Reservfs

    • Hierarchical

    • Each reservation directory corresponds to a node at a scheduler

    • Each resource is represented by a reservation directory under /reserv


    Reservfs23 l.jpg

    Reservfs

    • Two types of reservation directories:

      • scheduler directories

      • queue directories

    • Scheduler directories are hierarchically expandable

    • Queue directories are not expandable


    Reservfs24 l.jpg

    /reserv

    cpu

    fxp0

    fxp1

    ca0

    q0

    q0

    r1

    q0

    q0

    q1

    q0

    share

    newqueue

    newreserv

    share

    backlog

    Reservfs

    • Scheduler directory:

      • share

      • newqueue

      • newreserv

      • special queue: q0

    • Queue directory:

      • share

      • backlog


    Reservfs25 l.jpg

    CPU scheduler

    Link scheduler

    Disk scheduler

    Net 1

    Net 2

    CPU 1

    Disk1

    Disk2

    Reservfs

    Web Server

    Video Server

    Application Interface:

    Reservation file system

    Scheduler Interface:


    Reservfs api l.jpg

    Reservfs API

    • Creation of a new queue/scheduler reservation

      • fd=open(newqueue/newreserve,O_CREAT)

      • fd of newly created share file


    Creating queue reservation l.jpg

    da0

    q1

    q0

    q1

    share

    newqueue

    newreserv

    share

    backlog

    Creating Queue Reservation

    /reserv

    cpu

    fxp0

    fxp1

    da0

    q0

    q0

    r1

    q0

    q0

    q0

    q1

    fd=

    open(“newqueue”,O_CREAT)


    Creating scheduler reservation l.jpg

    da0

    da0

    q0

    q1

    r0

    r0

    q0

    q1

    share

    newqueue

    newreserv

    q0

    share

    newreserv

    newqueue

    fd=

    open(“newreserv”,O_CREAT)

    Creating Scheduler Reservation

    /reserv

    cpu

    fxp0

    fxp1

    q0

    q0

    r1

    q0

    q0

    q1


    Reservfs api29 l.jpg

    Reservfs API

    • Changing QoS parameters

      • writing a weight and min value to the share file

    • Getting QoS parameters

      • reading the share file

    • Getting/setting queue parameters

      • reading/writing the backlog file


    Reservfs api30 l.jpg

    Reservfs API

    Command line output:

    killerbee$ cd /reserv

    killerbee$ ls -al

    total 5

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 .

    drwxr-xr-x 20 root wheel 512 Sep 12 21:54 ..

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 cpu

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 fxp0

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 fxp1

    killerbee$ cd fxp0

    killerbee$ ls -alR

    total 6

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 newqueue

    -rw------- 1 root wheel 1 Sep 15 11:39 newreserv

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

    -r-------- 1 root wheel 1 Sep 15 11:39 share

    ./q0:

    total 4

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 backlog

    -rw------- 1 root wheel 1 Sep 15 11:39 share


    Reservfs api31 l.jpg

    Reservfs API

    killerbee$ cd r0

    killerbee$ ls -al

    total 6

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 newqueue

    -rw------- 1 root wheel 1 Sep 15 11:39 newreserv

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

    -r-------- 1 root wheel 1 Sep 15 11:39 share

    killerbee$ echo “50 1000000” > newqueue

    killerbee$ ls -al

    total 6

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 newqueue

    -rw------- 1 root wheel 1 Sep 15 11:39 newreserv

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q1

    -r-------- 1 root wheel 1 Sep 15 11:39 share

    killerbee$ cd q1

    killerbee$ ls -al

    total 4

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 share

    -rw------- 1 root wheel 1 Sep 15 11:39 backlog

    killerbee$ cat share

    50 1000000

    killerbee$


    Reservfs32 l.jpg

    CPU scheduler

    Link scheduler

    Disk scheduler

    Net 1

    Net 2

    CPU 1

    Disk1

    Disk2

    Reservfs

    Web Server

    Video Server

    Application Interface:

    Reservation file system

    Scheduler Interface:


    Reservfs scheduler interface l.jpg

    Reservfs Scheduler Interface

    • Schedulers registers by providing

      the following interface routines via

      reservfs_register():

      • init(priv)

      • create(priv, parent, type)

      • start(priv, parent, type)

      • delete(priv, node)

      • get/set(priv, node, values, type)


    Reservfs implementation l.jpg

    Reservfs Implementation

    • Built via vnode/vfs interface

    • A reserv{} structure represents each reservfs file

    • reserv{} representing a directory contains a pointer to the corresponding node at scheduler

    • Scheduler independent

    • Implements garbage collection mechanism


    Talk outline35 l.jpg

    Talk Outline

    • Introduction

    • Schedulers

    • Reservation File System (reservfs)

    • Tagging

    • Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Tagging l.jpg

    Tagging

    • A request arriving at a scheduler must be associated with the appropriate reservation

    • Each request is tagged with a pointer to a queue node

      • mbuf{}, buf{} and proc{} are augmented

    • How is a request tagged?


    Tagging contd l.jpg

    Tagging (contd.)

    • For a file, its file descriptor is tagged with a disk reservation

    • For a connected socket, its file descriptor is tagged with a network reservation

    • For unconnected sockets, we provide a late tagging mechanism

    • Each process is tagged with a cpu reservation

    • We associate reservations with references to objects


    Default list of a process l.jpg

    Default List of a Process

    • Default reservations of a process, one for each resource

    • A list of tags (pointers to queue directories)

    • Used when a tag is otherwise not specified

    • Two new files are added for each process pid in /proc/pid

      • /proc/pid/default to represent the default list

      • /proc/pid/cdefault to represent the child default list


    Default list of a process contd l.jpg

    Default List of a Process (contd.)

    • Reading these file returns the name of default queue directories, e.g.,

      /reserv/cpu/q1

      /reserv/fxp0/r2/q1

      /reserv/da0/r1/q3

    • A process, with the appropriate access rights, can change the entries of default files


    Implicit tagging l.jpg

    Implicit Tagging

    • The file descriptor returned by open(), accept() or connect() is automatically tagged with default

    • The tag of the file descriptor of an unconnected socket is set to default at sendto() and sendmesg()

    • When a process forks, the child process is tagged with the default cpu reservation


    Explicit tagging l.jpg

    Explicit Tagging

    • The tag of a file descriptor can be set/read with new commands to fcntl():

      • F_SET_RES

      • F_GET_RES

    • A new system call chcpures() to change the cpu reservation of a process


    Reservation domains l.jpg

    Reservation Domains

    • Permissions of a process to use, create and manipulate reservations

    • The reservation domain of a process is independent of its protection domain


    Reservations and reservation domains l.jpg

    disk bandwidth

    network bandwidth

    cpu cycles

    0.8

    0.8

    0.8

    0.2

    0.2

    0.2

    reserv A

    reserv B

    reserv A

    reserv B

    reserv A

    reserv B

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    reserv 1 reserv 2

    reserv 1 reserv 2 reserv 1 reserv2

    reserv 1

    reserv2

    Reservations and Reservation Domains

    Reservationdomain

    1

    Reservation domain 2


    Reservfs garbage collection l.jpg

    Reservfs Garbage Collection

    • Based on reference counts

      • every application that is using a specific node adds a reference on it (to the vnode)

    • Triggered by the vnode layer

      • when the last application finishes using the node this is garbage collected

    • fcntl() available to maintain the node even if no references to it exist


    Srp input processing l.jpg

    SRP Input Processing

    • Demultiples incoming packets

      • before network and higher-level protocol processing

    • Unprocesed input queue per socket

    • Processes input protocols in context of receiving process

    • Drops packets when per-socket queue is full

    • Avoids receive livelock


    Talk outline46 l.jpg

    Talk Outline

    • Introduction

    • Schedulers

    • Reservation File System (reservfs)

    • Tagging

    • Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Qos support for web server l.jpg

    QoS Support for Web Server

    • Virtual hosting with Apache server:

      • separate Apache server for each virtual host

      • single Apache server for all virtual hosts

    • Eclipse/BSD isolates and differentiates performance of virtual hosts

      • multiple Apache servers----implicit tagging

      • single Apache server----explicit tagging

        • We implemented an Apache module for explicit tagging


    Experimental setup l.jpg

    Experimental Setup

    • Apache Web Server:

      • A multi-process server

      • (Pre)spawns helper processes

      • A process handles one request at a time

      • Each process calls accept() to service the next connection request

    • HTTP clients run on five different machines

    • Servers are running FreeBSD 2.2.8 or Eclipse/BSD 2.2.8 on a PC (266 MHz Pentium Pro, 64 MB RAM, 9 GB Seagate ST39173W fast wide SCSI disk)

    • Machines are connected with a 10/100 Mbps Ethernet switch


    Experiments l.jpg

    /reserv

    cpu

    fxp0

    da0

    q0

    q0

    q0

    q1

    q1

    q1

    q2

    q2

    q2

    Experiments

    • Hosting two sites with two servers

    Reservation domain of server 1

    Reservation domain of server 2


    Cpu intensive workload l.jpg

    CPU Intensive Workload


    Cpu intensive workload51 l.jpg

    CPU Intensive Workload


    Network intensive workload l.jpg

    Network Intensive Workload


    Disk intensive workload l.jpg

    Disk Intensive Workload


    Input intensive workload l.jpg

    Input Intensive Workload


    Input intensive workload55 l.jpg

    Input Intensive Workload


    Experiments56 l.jpg

    /reserv

    cpu

    fxp0

    da0

    q0 q1 q2 q3 q4

    q0 q1 q2 q3 q4

    q0 q1 q2 q3 q4

    Experiments

    • Hosting virtual hosts with a single Apache server

    • Four web sites


    Apache module for tagging l.jpg

    Apache Module for Tagging

    • Apache code not modified: module added

    • Apache config defines which reservation to use based on “a rule”, e.g.,

      • directory-based

      • port-based

    • Module uses fcntl() and chcpures() for explicit tagging


    Isolating web sites l.jpg

    Isolating Web Sites

    Eclipse/BSD


    Isolating web sites59 l.jpg

    Isolating Web Sites

    FreeBSD


    Talk outline60 l.jpg

    Talk Outline

    • Introduction

    • Reservation File System (reservfs)

    • Tagging

    • Schedulers

    • Apache Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Access control l.jpg

    Access Control

    • Permissions of a process to use or modify the objects belonging to the reservfs

    • Currently, a process can use/modify reservations “below” its default list

    • Soon, Eclipse/BSD will have more sophisticated access control

      • process can have different permissions on a reservation (e.g., permission for tagging but not for modifying)

      • process can have permission on arbitrary set of reservations


    Multiple default lists profiles l.jpg

    Multiple Default Lists: Profiles

    • Multiple default lists (profiles) simplifies explicit tagging

    • Server applications typically serve different entities (depending on client, content, etc.) with different QoS assignments

    • Global list of system-wide profiles

    • Profiles provide an easy way to manage and share “default” reservations of different entities


    Talk outline63 l.jpg

    Talk Outline

    • Introduction

    • Reservation File System (reservfs)

    • Tagging

    • Schedulers

    • Apache Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Eclipse bsd status l.jpg

    Eclipse/BSD Status

    • Derived from FreeBSD

      • 3.2

      • 2.2.8

    • FreeBSD compatible

    • Eclipse/BSD code is available at http://www.bell-labs.com/project/eclipseincluding:

      • reservfs

      • hierarchical network scheduling

      • hierarchical disk scheduling

      • hierarchical cpu scheduling

      • input scheduling

      • also, Apache module for tagging and other applications


    Related work l.jpg

    Related Work

    • ALTQ

      • good for routers

      • not sufficient for QoS support in a general-purpose OS

    • Resource Containers

      • different from Reservation Domains

      • limited (similar to our Profiles)

      • not flexible enough to specify a number of useful provisioning needs


    Future work l.jpg

    Future work

    • QoS on cluster of servers

    • Support for fine-grained automatic tagging

    • More server applications

    • Supporting other QoS parameters

    • Other schedulers


    Eclipse bsd an operating system with quality of service support67 l.jpg

    Eclipse/BSD:an Operating System with Quality of Service Support

    Banu Özden

    ozden@research.bell-labs.com


  • Login