QoS Support in Operating Systems
Download
1 / 67

QoS Support in Operating Systems - PowerPoint PPT Presentation


  • 309 Views
  • Updated On :

QoS Support in Operating Systems. Banu Özden Bell Laboratories [email protected] Vision. Service providers will offer storage and computing services through their distributed data centers connected with high bandwidth networks to globally distributed clients.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' QoS Support in Operating Systems' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Vision l.jpg
Vision

  • Service providers will offer storage and computing services

    • through their distributed data centers

    • connected with high bandwidth networks

    • to globally distributed clients.

  • Clients will access these services via diverse devices and networks, e.g.:

    • mobile devices and wireless networks,

    • high-end computer systems and high bandwidth networks.

  • These services will become utilities (e.g., storage utility, computing utility).

  • Eventually resources will be exchanged and traded between geographically dispersed data centers to address fluctuating demand.


Eclipse bsd an operating system with quality of service support l.jpg

Eclipse/BSD:an Operating System with Quality of Service Support

Banu Özden

[email protected]


Motivation l.jpg
Motivation

  • QoS support for (server) applications:

    • web servers

    • video servers

  • Isolation and differentiation of different

    • entities serviced on the same platform

    • applications running on the same platform

  • QoS requirements:

    • client-based

    • service-based

    • content-based


Design goals l.jpg
Design Goals

  • QoS support in a general purpose operating system

  • Remain compatible with the underlying operating system

  • QoS parameters:

    • Isolation

    • Differentiation

    • Fairness

    • (Cumulative) throughput

  • Flexible resource management

    • capable of implementing a large set of provisioning needs

    • supports a large set of server applications without imposing significant changes to their design


  • Talk outline l.jpg
    Talk Outline

    • Schedulers

    • Reservation File System (reservfs)

    • Tagging

    • Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Proportional sharing l.jpg
    Proportional sharing

    • Generalized processor sharing (GPS)

      weight of flow i

      service received by flow i in

      set of flows

      • For any flow i continuously backlogged in

      • Thus, rate of flow i in is:


    Qos guarantees l.jpg
    QoS Guarantees

    • Fairness

    • Throughput

    • Packet delay


    Schedulers in eclipse l.jpg
    Schedulers in Eclipse

    • Resource characteristics differ

    • Different hierarchical proportional-share schedulers for resources

      • Link scheduler: WF2Q

      • Disk scheduler: YFQ

      • CPU scheduler: MTR-LS

      • Network input: SRP


    Hierarchical gps example l.jpg

    server

    server

    0.8

    0.2

    0.4

    0.2

    0.4

    company A

    company B

    company A

    page 1

    company A

    page 2

    company B

    0.5

    0.5

    page 1

    page 2

    Hierarchical GPS Example

    hierarchical

    proportional sharing

    proportional sharing


    Schedulers l.jpg
    Schedulers

    • Hierarchical proportional-sharing (GPS)

      descendant queue nodes of node n

      serviced received by scheduler node n

      in

      set of immediate descendant nodes of the parent of node n

    • For any node n continuously backlogged in


    Link aggregation l.jpg

    link

    scheduler

    link

    scheduler

    Link Aggregation

    • Need to incrementally scale bandwidth

    • Resource aggregation is emerging as a solution:

      • Grouping multiple resources into a single logical unit

    • QoS over such aggregated links?


    Multi server model l.jpg

    GPS

    MSFQ

    Nr

    Nr

    r

    r

    r

    Multi-Server Model

    • Multi Server Fair Queuing (MSFQ)

      • A packetized algorithm for a system with N links, each with a bandwidth of r, that approximates a GPS system with a single link with Nr bandwidth

    Reference

    model

    Packetized

    scheduler


    Multi server model contd l.jpg
    Multi-Server Model (Contd.)

    • Goals:

      • Guarantee bandwidth and packet delay bounds that are independent of the number of flows

      • Allow flows arrive and depart dynamically

      • Be work-conserving

    • Algorithm:

      • When a server is idle, schedule the packet that would complete transmission earliest under a single server GPS system with a bandwidth of Nr

    Sigcomm 2001


    Msfq preliminary properties l.jpg

    a1

    a2

    a1

    a2

    GPS

    GPS

    1

    2

    1

    2

    MSFQ

    serv1

    WFQ

    1

    serv

    1

    2

    serv2

    2

    time =

    0

    1

    2

    3

    4

    time =

    0

    1

    2

    3

    4

    a1

    a2

    a3

    a4

    a5

    a6

    a7

    GPS

    1

    2

    3

    4

    5

    6

    7

    serv1

    6

    1

    4

    7

    2

    5

    serv2

    MSFQ

    3

    serv3

    time =

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    MSFQ Preliminary Properties

    Multi-Server specific properties

    • Ordering: a pair of packets scheduled in the order of their GPS finishing times may complete in reverse order

    • GPS busy MSFQ busy, but converse is not true

    • Non-coinciding busy periods

    • Work backlog?


    Msfq properties l.jpg

    GPS

    service

    MSFQ

    Packet delay

    time

    GPSi

    service

    MSFQi

    Service discrepancy

    time

    MSFQ Properties

    • Maximum service discrepancy (buffer requirement)

    • Maximum packet delay

    • Maximum per-flow service discrepancy


    Schedulers contd l.jpg
    Schedulers (contd.)

    • Disk scheduling with QoS

      • tradeoffs between QoS and total disk performance

        • driver queue management

        • queue depth

        • queue ordering

        • fragmentation

      • Hierarchical YFQ

    • CPU scheduling with QoS

      • length of cpu phases are not known a priori

      • cumulative throughput

      • Hierarchical MTR-LS


    Eclipse s key elements l.jpg
    Eclipse’s Key Elements

    • Hierarchical, proportional share resource schedulers

    • Reservation, reservation file system (reservfs)

    • Tagging mechanism

    • Access and admission control, reservation domain


    Reservations and schedulers l.jpg
    Reservations and Schedulers

    • (Resource)reservations

      • unit for QoS assignment

      • similar to the concept of a flow in packet scheduling

    • Hierarchical schedulers

      • a tree with two kinds of nodes:

        • scheduler nodes

        • queue nodes

        • each node corresponds to a reservation

    • Schedulers are dynamically reconfigurable


    Web server example l.jpg

    disk bandwidth

    cpu cycles

    0.8

    0.8

    0.8

    0.2

    0.2

    0.2

    company A

    company B

    company A

    company B

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    page 1

    page 1

    page 1

    page 1

    page 2

    page 2

    page 2

    page 2

    Web Server Example

    • Hosting two companies’ web sites, each with two web pages

    network bandwidth

    company A

    company B


    Reservfs l.jpg

    Web Server

    Video Server

    Application

    Interface

    Reservation file system

    Scheduler

    Interface

    CPU scheduler

    Link scheduler

    Disk scheduler 1

    Disk scheduler 2

    Net 1

    Net 2

    CPU 1

    CPU 1

    Disk1

    Disk2

    Disk3

    Reservfs

    • We built the reservation file system

      • to create and manipulate reservations

      • to access and configure resource schedulers


    Reservfs22 l.jpg

    /reserv

    cpu

    fxp0

    fxp1

    da0

    Reservfs

    • Hierarchical

    • Each reservation directory corresponds to a node at a scheduler

    • Each resource is represented by a reservation directory under /reserv


    Reservfs23 l.jpg
    Reservfs

    • Two types of reservation directories:

      • scheduler directories

      • queue directories

    • Scheduler directories are hierarchically expandable

    • Queue directories are not expandable


    Reservfs24 l.jpg

    /reserv

    cpu

    fxp0

    fxp1

    ca0

    q0

    q0

    r1

    q0

    q0

    q1

    q0

    share

    newqueue

    newreserv

    share

    backlog

    Reservfs

    • Scheduler directory:

      • share

      • newqueue

      • newreserv

      • special queue: q0

    • Queue directory:

      • share

      • backlog


    Reservfs25 l.jpg

    CPU scheduler

    Link scheduler

    Disk scheduler

    Net 1

    Net 2

    CPU 1

    Disk1

    Disk2

    Reservfs

    Web Server

    Video Server

    Application Interface:

    Reservation file system

    Scheduler Interface:


    Reservfs api l.jpg
    Reservfs API

    • Creation of a new queue/scheduler reservation

      • fd=open(newqueue/newreserve,O_CREAT)

      • fd of newly created share file


    Creating queue reservation l.jpg

    da0

    q1

    q0

    q1

    share

    newqueue

    newreserv

    share

    backlog

    Creating Queue Reservation

    /reserv

    cpu

    fxp0

    fxp1

    da0

    q0

    q0

    r1

    q0

    q0

    q0

    q1

    fd=

    open(“newqueue”,O_CREAT)


    Creating scheduler reservation l.jpg

    da0

    da0

    q0

    q1

    r0

    r0

    q0

    q1

    share

    newqueue

    newreserv

    q0

    share

    newreserv

    newqueue

    fd=

    open(“newreserv”,O_CREAT)

    Creating Scheduler Reservation

    /reserv

    cpu

    fxp0

    fxp1

    q0

    q0

    r1

    q0

    q0

    q1


    Reservfs api29 l.jpg
    Reservfs API

    • Changing QoS parameters

      • writing a weight and min value to the share file

    • Getting QoS parameters

      • reading the share file

    • Getting/setting queue parameters

      • reading/writing the backlog file


    Reservfs api30 l.jpg
    Reservfs API

    Command line output:

    killerbee$ cd /reserv

    killerbee$ ls -al

    total 5

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 .

    drwxr-xr-x 20 root wheel 512 Sep 12 21:54 ..

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 cpu

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 fxp0

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:37 fxp1

    killerbee$ cd fxp0

    killerbee$ ls -alR

    total 6

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 newqueue

    -rw------- 1 root wheel 1 Sep 15 11:39 newreserv

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

    -r-------- 1 root wheel 1 Sep 15 11:39 share

    ./q0:

    total 4

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 backlog

    -rw------- 1 root wheel 1 Sep 15 11:39 share


    Reservfs api31 l.jpg
    Reservfs API

    killerbee$ cd r0

    killerbee$ ls -al

    total 6

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 newqueue

    -rw------- 1 root wheel 1 Sep 15 11:39 newreserv

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

    -r-------- 1 root wheel 1 Sep 15 11:39 share

    killerbee$ echo “50 1000000” > newqueue

    killerbee$ ls -al

    total 6

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 newqueue

    -rw------- 1 root wheel 1 Sep 15 11:39 newreserv

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q0

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 q1

    -r-------- 1 root wheel 1 Sep 15 11:39 share

    killerbee$ cd q1

    killerbee$ ls -al

    total 4

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 .

    dr-xr-xr-x 0 root wheel 512 Sep 15 11:39 ..

    -rw------- 1 root wheel 1 Sep 15 11:39 share

    -rw------- 1 root wheel 1 Sep 15 11:39 backlog

    killerbee$ cat share

    50 1000000

    killerbee$


    Reservfs32 l.jpg

    CPU scheduler

    Link scheduler

    Disk scheduler

    Net 1

    Net 2

    CPU 1

    Disk1

    Disk2

    Reservfs

    Web Server

    Video Server

    Application Interface:

    Reservation file system

    Scheduler Interface:


    Reservfs scheduler interface l.jpg
    Reservfs Scheduler Interface

    • Schedulers registers by providing

      the following interface routines via

      reservfs_register():

      • init(priv)

      • create(priv, parent, type)

      • start(priv, parent, type)

      • delete(priv, node)

      • get/set(priv, node, values, type)


    Reservfs implementation l.jpg
    Reservfs Implementation

    • Built via vnode/vfs interface

    • A reserv{} structure represents each reservfs file

    • reserv{} representing a directory contains a pointer to the corresponding node at scheduler

    • Scheduler independent

    • Implements garbage collection mechanism


    Talk outline35 l.jpg
    Talk Outline

    • Introduction

    • Schedulers

    • Reservation File System (reservfs)

    • Tagging

    • Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Tagging l.jpg
    Tagging

    • A request arriving at a scheduler must be associated with the appropriate reservation

    • Each request is tagged with a pointer to a queue node

      • mbuf{}, buf{} and proc{} are augmented

    • How is a request tagged?


    Tagging contd l.jpg
    Tagging (contd.)

    • For a file, its file descriptor is tagged with a disk reservation

    • For a connected socket, its file descriptor is tagged with a network reservation

    • For unconnected sockets, we provide a late tagging mechanism

    • Each process is tagged with a cpu reservation

    • We associate reservations with references to objects


    Default list of a process l.jpg
    Default List of a Process

    • Default reservations of a process, one for each resource

    • A list of tags (pointers to queue directories)

    • Used when a tag is otherwise not specified

    • Two new files are added for each process pid in /proc/pid

      • /proc/pid/default to represent the default list

      • /proc/pid/cdefault to represent the child default list


    Default list of a process contd l.jpg
    Default List of a Process (contd.)

    • Reading these file returns the name of default queue directories, e.g.,

      /reserv/cpu/q1

      /reserv/fxp0/r2/q1

      /reserv/da0/r1/q3

    • A process, with the appropriate access rights, can change the entries of default files


    Implicit tagging l.jpg
    Implicit Tagging

    • The file descriptor returned by open(), accept() or connect() is automatically tagged with default

    • The tag of the file descriptor of an unconnected socket is set to default at sendto() and sendmesg()

    • When a process forks, the child process is tagged with the default cpu reservation


    Explicit tagging l.jpg
    Explicit Tagging

    • The tag of a file descriptor can be set/read with new commands to fcntl():

      • F_SET_RES

      • F_GET_RES

    • A new system call chcpures() to change the cpu reservation of a process


    Reservation domains l.jpg
    Reservation Domains

    • Permissions of a process to use, create and manipulate reservations

    • The reservation domain of a process is independent of its protection domain


    Reservations and reservation domains l.jpg

    disk bandwidth

    network bandwidth

    cpu cycles

    0.8

    0.8

    0.8

    0.2

    0.2

    0.2

    reserv A

    reserv B

    reserv A

    reserv B

    reserv A

    reserv B

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    0.5

    reserv 1 reserv 2

    reserv 1 reserv 2 reserv 1 reserv2

    reserv 1

    reserv2

    Reservations and Reservation Domains

    Reservationdomain

    1

    Reservation domain 2


    Reservfs garbage collection l.jpg
    Reservfs Garbage Collection

    • Based on reference counts

      • every application that is using a specific node adds a reference on it (to the vnode)

    • Triggered by the vnode layer

      • when the last application finishes using the node this is garbage collected

    • fcntl() available to maintain the node even if no references to it exist


    Srp input processing l.jpg
    SRP Input Processing

    • Demultiples incoming packets

      • before network and higher-level protocol processing

    • Unprocesed input queue per socket

    • Processes input protocols in context of receiving process

    • Drops packets when per-socket queue is full

    • Avoids receive livelock


    Talk outline46 l.jpg
    Talk Outline

    • Introduction

    • Schedulers

    • Reservation File System (reservfs)

    • Tagging

    • Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Qos support for web server l.jpg
    QoS Support for Web Server

    • Virtual hosting with Apache server:

      • separate Apache server for each virtual host

      • single Apache server for all virtual hosts

    • Eclipse/BSD isolates and differentiates performance of virtual hosts

      • multiple Apache servers----implicit tagging

      • single Apache server----explicit tagging

        • We implemented an Apache module for explicit tagging


    Experimental setup l.jpg
    Experimental Setup

    • Apache Web Server:

      • A multi-process server

      • (Pre)spawns helper processes

      • A process handles one request at a time

      • Each process calls accept() to service the next connection request

    • HTTP clients run on five different machines

    • Servers are running FreeBSD 2.2.8 or Eclipse/BSD 2.2.8 on a PC (266 MHz Pentium Pro, 64 MB RAM, 9 GB Seagate ST39173W fast wide SCSI disk)

    • Machines are connected with a 10/100 Mbps Ethernet switch


    Experiments l.jpg

    /reserv

    cpu

    fxp0

    da0

    q0

    q0

    q0

    q1

    q1

    q1

    q2

    q2

    q2

    Experiments

    • Hosting two sites with two servers

    Reservation domain of server 1

    Reservation domain of server 2








    Experiments56 l.jpg

    /reserv

    cpu

    fxp0

    da0

    q0 q1 q2 q3 q4

    q0 q1 q2 q3 q4

    q0 q1 q2 q3 q4

    Experiments

    • Hosting virtual hosts with a single Apache server

    • Four web sites


    Apache module for tagging l.jpg
    Apache Module for Tagging

    • Apache code not modified: module added

    • Apache config defines which reservation to use based on “a rule”, e.g.,

      • directory-based

      • port-based

    • Module uses fcntl() and chcpures() for explicit tagging


    Isolating web sites l.jpg
    Isolating Web Sites

    Eclipse/BSD



    Talk outline60 l.jpg
    Talk Outline

    • Introduction

    • Reservation File System (reservfs)

    • Tagging

    • Schedulers

    • Apache Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Access control l.jpg
    Access Control

    • Permissions of a process to use or modify the objects belonging to the reservfs

    • Currently, a process can use/modify reservations “below” its default list

    • Soon, Eclipse/BSD will have more sophisticated access control

      • process can have different permissions on a reservation (e.g., permission for tagging but not for modifying)

      • process can have permission on arbitrary set of reservations


    Multiple default lists profiles l.jpg
    Multiple Default Lists: Profiles

    • Multiple default lists (profiles) simplifies explicit tagging

    • Server applications typically serve different entities (depending on client, content, etc.) with different QoS assignments

    • Global list of system-wide profiles

    • Profiles provide an easy way to manage and share “default” reservations of different entities


    Talk outline63 l.jpg
    Talk Outline

    • Introduction

    • Reservation File System (reservfs)

    • Tagging

    • Schedulers

    • Apache Web Server Experiments

    • Access Control and Profiles

    • Eclipse/BSD Status

    • Related Work

    • Future Work


    Eclipse bsd status l.jpg
    Eclipse/BSD Status

    • Derived from FreeBSD

      • 3.2

      • 2.2.8

    • FreeBSD compatible

    • Eclipse/BSD code is available at http://www.bell-labs.com/project/eclipseincluding:

      • reservfs

      • hierarchical network scheduling

      • hierarchical disk scheduling

      • hierarchical cpu scheduling

      • input scheduling

      • also, Apache module for tagging and other applications


    Related work l.jpg
    Related Work

    • ALTQ

      • good for routers

      • not sufficient for QoS support in a general-purpose OS

    • Resource Containers

      • different from Reservation Domains

      • limited (similar to our Profiles)

      • not flexible enough to specify a number of useful provisioning needs


    Future work l.jpg
    Future work

    • QoS on cluster of servers

    • Support for fine-grained automatic tagging

    • More server applications

    • Supporting other QoS parameters

    • Other schedulers


    Eclipse bsd an operating system with quality of service support67 l.jpg

    Eclipse/BSD:an Operating System with Quality of Service Support

    Banu Özden

    [email protected]


    ad