Maximizing goodput via co scheduling of cpu and network capacity
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Maximizing Goodput via Co-scheduling Of CPU and Network Capacity PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on
  • Presentation posted in: General

Maximizing Goodput via Co-scheduling Of CPU and Network Capacity. Miron Livny Computer Sciences Department University of Wisconsin-Madison [email protected] (joint work with Jim Basney). Allocated CPU hours per user (6/21/98 - 9/3/98). 400,000 CPU hours in 73 days on

Download Presentation

Maximizing Goodput via Co-scheduling Of CPU and Network Capacity

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Maximizing goodput via co scheduling of cpu and network capacity

Maximizing Goodput viaCo-scheduling Of CPU and Network Capacity

Miron Livny

Computer Sciences Department

University of Wisconsin-Madison

[email protected]

(joint work with Jim Basney)


Allocated cpu hours per user 6 21 98 9 3 98

Allocated CPU hours per user(6/21/98 - 9/3/98)

400,000 CPU hours in 73 days on

320 Desk-top machines of the UW-CS Condor pool

(~17 hours per day per machine)

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Remote execution challenge

Memory

CPU

File System

Remote Execution Challenge

Remote Resource

Customer File System*

Executable

Checkpoint

Network

Input Files

Output Files

*May be distributed.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Maximizing goodput via co scheduling of cpu and network capacity

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


How useful is the allocated time

How useful is the allocated Time?

Allocate

Preempt

X

Placement

Periodic

Ckpt

Periodic

Ckpt

Preempt

Ckpt

Remote

I/O

Wait and See

Goodput = Allocation - Overhead

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Maximizing goodput via co scheduling of cpu and network capacity

Goodput is the allocation time where the application makes forward progress

overhead = Placement + Migration Periodic Checkpoints + Remote I/O +Wait and See

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Placement

Placement

  • What: Transfer executable and checkpoint data

  • How much - Known in advance.

    • Executable: usually small

    • Checkpoint: application memory image

      • Can be large! (100MB+)

      • May include cached input data and intermediate file data

  • When: Triggered by Resource Manager when CPU is allocated

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Migration

Migration

  • What: Transfer Checkpoint Data to file system or a hot standby.

  • How much: Known in advance

    • Workstation owner may limit time to migrate

    • Failure results in lost work

  • When: Initiated by workstation owner or triggered by Resource Manager to enforce priority order

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Remote i o

Remote I/O

  • What: Application Input/Output data

    • Read input files.

    • Write intermediate results.

    • Read intermediate results.

    • Write final results.

  • How much: Application may know/tell.

  • When: Initiated by application read and write system calls during run.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Periodic checkpoint

Periodic Checkpoint

  • What: Transfer Checkpoint Data to file system.

  • How much: Known in advance.

  • When: Scheduled in advance by shadow.

    • reduce risk in case of a failed migration.

    • No deadline.

    • All remote resources are available.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Wait and see

Wait and See

  • What: Suspend application when resource is revoked

    • Wait and See if resource will become available shortly.

    • Shortens migration time limit.

    • Consumes local resources.

  • When: Initiated by owner activity

  • How long: Upper bound set by resource owner.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


High throughput computing layers

Application

Application Agent

Customer Agent

Environment Agent

Owner Agent

Local Resource Management

Resource

High Throughput Computing Layers

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Who does what in the condor environment

Who Does What in the Condor Environment?

  • Matchmaker

    • Initiates allocations

    • Preempts (re-matches) to transfer allocation to higher priority customer.

  • Checkpoint Server(s)

    • Store checkpoints (may include data files).

  • File system (Unix, NFS, AFS)

    • Stores files.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Who does what

Who does what?

  • Shadow: Application Resource Manager

    • Application-level scheduling

    • Acts a proxy for the application in the submit environment.

  • Owner Agent: Controls opportunistic resource

    • Owner may preempt application at any time.

    • Owner controls preemption policy.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Approachs for maximizing goodput

Approachs for Maximizing Goodput

  • Co-matching (scheduling of network, server and CPU resources. (matchmaker)

  • Support high priority data transfers to/from checkpoint servers. (checkpoint server)

  • Localized checkpointing (shadow).

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Approach

… approach

  • Plan in advance for pre-scheduled events.(external scheduler)

  • Reduce size of data to be transferred (checkpoint server and remote resource).

  • Monitor system goodput (all).

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Challenges

Challenges

  • Develop an effective model of the network and I/O capabilities of a Condor pool.

  • Obtain the information needed to build such a model.

  • Add co-matching of ClassAds to the matchmaking framework.

  • Develop a multi-resource consumption based priority scheme.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Matchmaker co matching

Matchmaker Co-matching

  • Problem: Bursty matchmaking causes network or server saturation

    • increases placement and checkpoint costs

    • slow placement results in underutilized CPUs

    • results in failed migrations

  • Approach: Don’t allow new matches to exceed predefined usage thresholds

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Matchmaker co matching1

…. Matchmaker Co-matching

  • Application requests an allocation which provides the best possible goodput

    • large data and checkpoint files require high bandwidth to checkpoint server.

    • balance cost of application placement and checkpoint overheads with (estimated) allocation time.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Matchmaker co matching2

… Matchmaker Co-matching

  • Best Fit vs. First Fit

    • Match lower priority requests with smaller network requirements first toincrease cluster CPU utilization

    • Preempt one of these requests when you match a high priority request with a large network requirement.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Checkpoint server support

Checkpoint Server support

  • Prioritize data streams

    • high priority: migration streams

    • low priority: checkpoint read and periodic checkpoint write streams

  • Schedule periodic checkpoints in advance to avoid bursts of network traffic.

  • Schedule graceful shutdowns in advance to avoid vacate failures.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Shadow support

Shadow support

  • Choose most efficient data access method per file

    • Locate checkpoint and file servers

  • Schedule periodic checkpoints in advance.

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Minimize data size

Minimize Data Size

  • compress checkpoints.

  • only checkpoint changes (diffs).

  • data staging.

  • checkpoint staging.

    • write checkpoint to local file system and schedule transfer when resources are available

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Goodput measurements

Goodput Measurements

  • Goodput/Allocation ratio measures health of the system

    • detect problem resources

    • detect overloaded subnets

    • measure QOS per application

  • Checkpoint transfer statistics measure network usage

    • success rate

    • throughput

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


Maximizing goodput via co scheduling of cpu and network capacity

Very

Large

Objects

on the Network

Maximizing Goodput via Co-scheduling of CPU and Network Capacity


  • Login