Planning on the grid
This presentation is the property of its rightful owner.
Sponsored Links
1 / 53

Planning on the Grid PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on
  • Presentation posted in: General

Planning on the Grid. With slides contributed by Ewa Deelman and Yolanda Gil. Thinking about applications of planning. You’ve seen Planning as X, X  { SAT, CSP, ILP, …} Now: Y as Planning Y  { Grid/Web services composition, …}.

Download Presentation

Planning on the Grid

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Planning on the grid

Planning on the Grid

With slides contributed by

Ewa Deelman and Yolanda Gil


Thinking about applications of planning

Thinking about applications of planning

You’ve seen Planning as X,

X  {SAT, CSP, ILP, …}

Now: Y as Planning

Y  {Grid/Web services composition, …}


Problem solving on grids

Problem-solving on Grids

  • Users pool access to distributed resources (computers, instruments, data, ..)

  • Applications are often composed of separate components run at several locations

  • Grid middleware tools allow for scheduling jobs, resource discovery. e.g. Globus toolkit


The computational grid

The Computational Grid

  • Emerging computational and networking infrastructure

    • bring together compute resources, data storage system, instruments, human resources

  • Enable entirely new approaches to applications and problem solving

    • remote resources the rule, not the exception

    • can solve ever bigger problems

  • Wide-area distributed computing

    • national and international

  • Facilitate collaborative environments

    • Sharing of data which can be expensive to produce (experimentation/simulation)


Example ligo experiment laser interferometer gravitational wave observatory

Example: LIGO Experiment(Laser Interferometer Gravitational-Wave Observatory)

  • Aims to detect gravitational waves predicted

    by theory of relativity.

  • Can be used to detect

    • binary pulsars

    • mergers of black holes

    • “starquakes” in neutron stars

  • Two installations: in Louisiana (Livingston) and Washington State

    • Other projects: Virgo (Italy), GEO (Germany), Tama (Japan)

  • Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum.

  • Data collected during experiments is a collection of time series (multi-channel)

  • Analysis is performed in time and Fourier domains


Ligo s pulsar search laser interferometer gravitational wave observatory

archive

Interferometer

Hz

Time

raw channels

LIGO’s Pulsar Search(Laser Interferometer Gravitational-wave Observatory)

Extract

channel

Short

Fourier

Transform

transpose

Long time frames

30 minutes

Short time frames

Single Frame

Time-frequency Image

Extract frequency range

event DB

Construct image

Find Candidate

Store


Motivation using today s grid

Motivation: Using Today’s Grid

  • Users have high level requirements naturally stated in terms of the application domain

    • Ex: Obtain frequency spectrum for signal S in instrument I and timeframe T

  • Users have to turn these requirements into executable job workflows in detailed scripts

    • Users must figure out which code generates desired products, which files contain it, physical location of the files, hosts that support execution given code requirements, availability of hosts, access policies, etc.

    • Users must query Grid middleware: metadata catalog, replica locator, resource descriptor and monitoring, etc.

  • Users must oversee execution


Problems with today s grid

Problems with today’s Grid

  • Usability: users must be proficient in grid computing

  • Complexity: many interrelated choices and dead ends

  • Solution cost: any-cost solutions are already hard

  • Global cost: optimization necessary when contention

  • Reliability of execution: job resubmission upon failure


Planning for workflow generation and maintenance

Planning for workflow generation and maintenance

Outline:

  • Formalization as a planning problem

  • Integration with the grid middleware

  • Case study: planning for workflows in LIGO

  • The grid as a test bed for planning and scheduling research


Planning on the grid

Abstract Workflow Generation

Concrete

Workflow Generation


Desiderata for workflow generator

Desiderata for workflow generator

  • Allow users to refer to data requirements by descriptions, not file names

    • Intuitive, requires far less input

  • Seek high quality workflows according to variable metric

  • Model variety of constraints declaratively

    • Data dependencies, resource constraints, user access rights, ….


Planning for workflow generation and maintenance1

Planning for workflow generation and maintenance

Outline:

  • Formalization as a planning problem

  • Integration with the grid middleware

  • Case study: planning for workflows in LIGO

  • The grid as a test bed for planning and scheduling research


Planning for workflow generation

Planning for workflow generation

  • Application components as operators

  • Desired data as goals

  • World state includes available hosts, existing data products, network bandwidths, …


Existing tools for building workflows abstract workflow generation

Existing tools for building workflows:abstract workflow generation

  • Chimera

    • Input-ouput transforms for files, in ‘Virtual Data Language’:

DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"},

b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd"},

t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q",

fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234",

fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");


Planning operator

(operator pulsar-search

(preconds

(

(<start-time> 7143800)

(<channel> LSC-AS-Q)

(<fcenter> 0.5)

(<right-ascension> 50)

(<sample-rate> 20)

…)

(and

(created “H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd”))

Planning operator

(effects

()

( (add

(created “H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd”))

)

))


Operator with metadata parameters

(operator pulsar-search

(preconds

(

(<start-time> Number)

(<channel> Channel)

(<fcenter> Number)

(<right-ascension> Number)

(<sample-rate> Number)

(<file> File-Handle)

;; These two are parameters for the frequency-extract.

(<f0> (and Number (get-low-freq-from-center-and-band

<fcenter> <fband>)))

(<fN> (and Number (get-high-freq-from-center-and-band

<fcenter> <fband>)))

…)

(and

(forall ((<sub-sft-file-group>

(and File-Group-Handle

(gen-sub-sft-range-for-pulsar-search

<f0> <fN> <start-time> <end-time>

<sub-sft-file-group>))))

(and (sub-sft-group <start-time> <end-time>

<channel> <instrument> <format>

<f0> <fN> <sample-rate> <sub-sft-file-group>)

(at <sub-sft-file-group> <host>)))))

Operator with metadata parameters

(effects

()

(

(add (created <file>))

(add (pulsar <start-time> <end-time> <channel>

<instrument> <format>

<fcenter> <fband>

<fderv1> <fderv2> <fderv3> <fderv4> <fderv5>

<right-ascension> <declination> <sample-rate>

<file>))

)

))


Operator with host identified

(operator pulsar-search

(preconds

((<host> (or Condor-pool Mpi))

(<start-time> Number)

(<channel> Channel)

(<fcenter> Number)

(<right-ascension> Number)

(<sample-rate> Number)

(<file> File-Handle)

;; These two are parameters for the frequency-extract.

(<f0> (and Number (get-low-freq-from-center-and-band

<fcenter> <fband>)))

(<fN> (and Number (get-high-freq-from-center-and-band

<fcenter> <fband>)))

(<run-time> (and Number

(estimate-pulsar-search-run-time

<start-time> <end-time> <sample-rate>

<f0> <fN> <host> <run-time>)))

…)

(and (available pulsar-search <host>)

(forall ((<sub-sft-file-group>

(and File-Group-Handle

(gen-sub-sft-range-for-pulsar-search

<f0> <fN> <start-time> <end-time>

<sub-sft-file-group>))))

(and (sub-sft-group <start-time> <end-time>

<channel> <instrument> <format>

<f0> <fN> <sample-rate> <sub-sft-file-group>)

(at <sub-sft-file-group> <host>)))))

Operator with host identified

(effects

()

(

(add (created <file>))

(add (at <file> <host>))

(add (pulsar <start-time> <end-time> <channel>

<instrument> <format>

<fcenter> <fband>

<fderv1> <fderv2> <fderv3> <fderv4> <fderv5>

<right-ascension> <declination> <sample-rate>

<file>))

)

))


Planning for workflow generation1

Planning for workflow generation

  • Application components as operators

    • Parameters include host: plan is a concrete workflow

  • Desired data (in descriptive form) as goals

  • World state includes available hosts, existing data products, network bandwidths, …


Operator descriptions

Operator descriptions

  • Represent applying a given component at a particular location with fixed parameters, inputs and outputs.

  • Preconditions combine

    • data dependencies – derive input requirements from outputs

    • Task constraints – e.g. component must be run on an MPI machine


Plan quality

Objective function may include

Performance – expected runtime, variance

Reliability – probability of failure, expected number of retries

Computational cost – use of ‘expensive’ resources, conformance to policies

Plan quality


Using local heuristics and global metrics

Using local heuristics and global metrics

  • Need local heuristics since search space is intractable

    • e.g. prefer host for program with high-bandwidth connection to where the output is required

  • Need to test a global metric (e.g. overall runtime) since local heuristics can lead to globally poor solution

    • Create as many plans as possible, return best

    • Search control to eliminate redundant solutions


Example search heuristics

Example search heuristics

(control-rule only-transfer-from-loc-with-greatest-bandwidth

(if (and (current-ops (transfer-file))

(current-goal (at <file> <dest>))

(true-in-state (at <file> <loc1>))

(true-in-state (at <file> <loc2>))

(higher-bandwidth <loc1> <loc2> <dest>)))

(thenreject bindings ((<from-loc> . <loc2>))))

(control-rule prefer-mpi-to-condor-for-pulsar-search

(if (and (current-ops (pulsar-search))

(type-of <mpi> Mpi)

(type-of <condor> Condor-pool)))

(thenprefer bindings ((<host> . <mpi>)) ((<host> . <condor>))))


Planning for workflow generation and maintenance2

Planning for workflow generation and maintenance

Outline:

  • Formalization as a planning problem

  • Integration with the grid middleware

  • The grid as a test bed for planning and scheduling research


Generating the planning problem

Generating the planning problem

  • Currently, static file representation for available hosts, bandwidths

  • Query grid services prior to planning to find which relevant files exist

    • Future versions will make dynamic queries

  • Goal is translated from user request, plan is translated into DAG format suitable for grid scheduler.


Ligo s pulsar search at sc 02

Used LIGO’s data collected during the first scientific run of the instrument

Targeted a set of 1000 locations: known pulsar or random locations

Results of the analysis published to the LIGO Scientific Collaboration

Performed using LDAS and compute and storage resources at Caltech, University of Southern California, University of Wisconsin Milwaukee.

LIGO’s Pulsar Search at SC’02


Summary benefits of planning

Summary: benefits of planning

  • Automating workflow composition

    • Just being addressed in Grid middleware

  • Reasoning with explicit descriptions of data

    • More intuitive for users

    • Far fewer inputs required than at file level

  • Better workflows by searching many plans


Planning for workflow generation and maintenance3

Planning for workflow generation and maintenance

Outline:

  • Existing Grid tools for workflow generation

  • Formalization as a planning problem

  • Integration with the grid middleware

  • The grid as a test bed for planning and scheduling research


Many areas of planning research relevant for grid

Many areas of planning research relevant for grid

  • Planning for a dynamic environment: plan monitoring and repair, planning under uncertainty

  • Scheduling:resource reasoning, temporal reasoning

  • Plan quality:learning, acquiring preferences, local search planning

  • Planning for information gathering:integrating access to grid services with workflow creation

  • Domain modeling:handling multiple ontologies, acquiring metadata descriptions, acquiring operators


Fault tolerant planning for a dynamic environment

Fault-tolerant planning for a dynamic environment

  • Grid resources become unavailable, queue length & network bandwidth change

  • Exploring plan repair strategies, balance of work done off-line and on-line

  • Modeling failures, keeping statistics for creating plans more likely to succeed, conditional plans, ..


Fault tolerant straw men

Fault-tolerant straw men

  • Current version: build fully detailed plan offline, resource allocation is fixed

    • Ignores world dynamics

  • Build abstract plan (without specifying hosts) offline, use a matchmaker online

    • Matchmaker makes local decisions only


Global reasoning is needed for resource allocation

Global reasoning is needed for resource allocation


Approaches for fault tolerant planning in dynamic domains

Approaches for fault-tolerant planning in dynamic domains

  • RAX (Jonsson et al.) general framework. As implemented:

    offline: builds complete plan

    online: adjusts temporal intervals

  • Combining planning and scheduling

    offline: build several abstract plans

    online: reason about critical path to instantiate each plan

  • MDP/POMDP approaches

  • Open area..


Challenge understanding when different approaches are more important

Challenge: understanding when different approaches are more important

  • Hypotheses:

    • Uneven task distribution, in terms of computational and data expense and resource constraints will indicate global planning

    • Time-dependency, e.g. need to re-plan during execution, will indicate local planning

  • Interesting project: use experiments in synthetic and real domains to test hypotheses and uncover new insights


Empirical tests with synthetic ligo problems

Empirical tests with synthetic LIGO problems

  • Example: Problem requires 100 files on one machine. Vary the number that exist.


Domain modeling

resource

policies

Domain modeling

Current system:

Knowledge from several sources must be used

Info from Grid services

(RLS, MCS etc)

task

requirements

existing data

in files

State info

(files, resources)

Comp.

selector

User

policies

Monolithic planner

available

resources

KBs combined

in one location

Resource

selector

Resource

queues

Concrete tasks

Exec.

monitor

Network

bandwidth

Grid task schedulers


Where does knowledge used by our planners come from

Where does knowledge used by our planners come from?

(Operator …

(preconditions

..

))

(effects

..

))

task

resource

requirements

user policies

& preferences

resource

policies

data

dependencies

(VDL*)

Each knowledge component is used for other purposes beyond planning


Automatically generated operators for several application domains

Automatically generated operators for several application domains

(Operator …

(preconditions

..

))

(effects

..

))

task

resource

requirements

{

Digital sky survey

LIGO

GEO

Galaxy morphology

Tomography

policies

data

dependencies

(VDL*)

Investigating patterns of data descriptions for more efficient planning


Planning on the grid

  • Question: if operators are gathered from distributed services, can we still guarantee soundness and completeness?

  • Under what kinds of conditions?


Representing appropriate information units with metadata

Representing appropriate information units with metadata

  • E.g. Have 60,000 files, want to allocate 60 tasks each dealing with 1,000 files.

  • Previously, application components specified in terms of specific files:

    DV run59000->extractSFTData( input=[@{input:“nSFT.59000"},…,@{input:”nSFT.59999”}],

    output=[@{output:” eSFT.59000”},…,@{output:”eSFT.59999”}],

    t1="714384000", t2="714384063", freq=“1008”,band=“4”,instrument="H2");

    … 59 similar clauses…

    DV final->computeFStatistic( input=[@{input:”eSFT.00000”},…,@{input:”eSFT.59999”}],…);

1000 files

60000 files


Metadata representation

Metadata representation

  • Replace with two clauses, two input predicates

    • A predicate now represents a range of files

    • Simpler to model, greater generality, more efficient for reasoner

      (operator run-extractSFTData-range

      (preconds

      ((<begin-file> Number)

      (<number-of-files> (and Number (> <number-of-files> 0)))

      (<local-begin-file> (and Number

      (gen-smaller-number <number-of-files> 1000 <begin-file>))))

      (and (range "eSFT" <begin-file> 2 1 <local-begin-file>)

      (range "nSFT" <local-begin-file> 2 1 999)))

      (effects ()

      ((add (range "eSFT" <begin-file> 2 <number-of-files>)))))


Requires library operators for ranges

Requires library operators for ranges

  • E.g. if a range of files exists, then so does any subrange

  • Questions: what are the required operators? Similar to spatial calculus RCC-8?

    (operator subranges-exist

    (preconds

    ((<begin-file> Number)

    (<type> Object)

    (<number-of-files> (and Number (> <number-of-files> 0)))

    (<enclosing-begin> (and Number (gen-known-enclosing-begins <type> <begin-file>

    2 1 <number-of-files>)))

    (<enclosing-number-of-files>

    (and Number (gen-known-enclosing-number-of-files <type> <enclosing-begin>

    2 1 <number-of-files>

    <begin-file>))))

    (created-range <type> <enclosing-begin> 2 1 <enclosing-number-of-files>))

    (effects ()

    ((add (created-range <type> <begin-file> 2 1 <number-of-files>)))))


Conclusions

Conclusions

  • Implemented system takes data description requests from LIGO users, composes workflow and executes on the Grid

  • Planning and scheduling technologies can make a large contribution to Grid infrastructure

  • Many interesting challenges for planning and scheduling research from Grid applications

    http://www.isi.edu/ikcap/cognitive-grids

    http://www.isi.edu/~deelman/pegasus.htm


Koehler and srivastava

Koehler and Srivastava

  • Different approaches to specifying workflows by hand


Wsdl service specification no workflow specified

WSDL service specification(no workflow specified)

<definitions targetNamespace="http://..."

xmlns="http://schemas.xmlsoap.org/wsdl/">

<message name = "OrderEvent"></message>

<message name = "TripRquest"></message>

<message name = "FlightRequest"></message>

<message name = "HotelRequest"></message>

<message name = "BookingFailure"></message>

<portType name ="pt1">

<operation name ="CToCI">

<input message ="TripRequest"/>

</operation>

</portType>

<portType name ="pt2">

<operation name ="CIToHS">

<output message ="HotelRequest"/>

</operation>

</portType>

<portType name ="pt3">

<operation name ="CIToFS">

<output message ="FlightRequest"/>

</operation>

</portType>

...

<portType name ="pt9">

<operation name ="RIToFS">

<output message ="BookingFailure/>

</operation>

</portType>

</definitions>


Bpel4ws

BPEL4WS

<sequence>

<receive partner="Customer"

portType ="pt1"

operation ="CToCI"

container ="OrderEvent">

</receive>

<flow>

<invoke partner ="HotelService"

portType ="pt2"

operation ="CIToHS"

inputContainer ="HotelRequest">

</invoke>

<invoke partner ="FlightService"

portType ="pt3"

operation ="CIToFS"

inputContainer ="FlightRequest">

</invoke>

</flow>


Golog

Golog


Back up slides

Back-up slides


What is needed

What is Needed

  • We need alternative foundations that offer

    • expressive representations

    • flexible reasoners

  • Many Artificial Intelligence (AI) techniques are relevant:

    • Planning to achieve given requirements

    • Searching through problem spaces of related choices

    • Using and combining heuristics

    • Expressive knowledge representation languages

    • Reasoners that can incorporate rules, definitions, axioms, etc.

    • Schedulers and resource allocation techniques


Existing tools for building workflows abstract workflow generation1

Existing tools for building workflows:abstract workflow generation

  • Chimera

    • Input-ouput transforms at level of actual files, in ‘Virtual Data Language’:

DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"},

t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q",

instrument="H2");

DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"},

t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q",

instrument="H2");

DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"},

b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"},

t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q",

fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234",

fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");


Existing tools for building workflows abstract workflow generation2

Existing tools for building workflows:abstract workflow generation

  • Chimera

    • Input-ouput transforms for files, in ‘Virtual Data Language’:

DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"},

t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q",

instrument="H2");

DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"},

t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q",

instrument="H2");

DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"},

b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"},

t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q",

fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234",

fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");


Existing tools 2 concrete planner

Existing tools 2: concrete planner

  • Assigns specific hosts and data locations for tasks

  • Makes random selection of resources and data

  • Provided a feasible solution

  • Reused existing data products

INPUT:

OUTPUT:


Sample pulsar search results to date

SC 2002 run:

Over 58 pulsar searches

Total of

330 tasks

469 data transfers

330 output files produced.

The total runtime was 11:24:35.

To date:

185 pulsar searches

Total of

975 tasks

1365 data transfers

975 output files

Total runtime

96:49:47

Sample Pulsar Search Results to Date


  • Login