A grid approach to geographically distributed data analysis for virgo
Download
1 / 37

A Grid Approach to Geographically Distributed Data Analysis for Virgo - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

A Grid Approach to Geographically Distributed Data Analysis for Virgo. F. Barone, M. de Rosa, R. De Rosa, R. Esposito, P. Mastroserio, L. Milano, F. Taurino, G.Tortone INFN Napoli Università di Napoli “Federico II” Università di Salerno L. Brocco, S. Frasca, C. Palomba , F. Ricci

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Grid Approach to Geographically Distributed Data Analysis for Virgo' - kishi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A grid approach to geographically distributed data analysis for virgo

A Grid Approach to Geographically Distributed Data Analysis for Virgo

F. Barone, M. de Rosa, R. De Rosa, R. Esposito, P. Mastroserio, L. Milano, F. Taurino, G.Tortone

INFN NapoliUniversità di Napoli “Federico II”Università di Salerno

L. Brocco, S. Frasca, C. Palomba, F. Ricci

INFN Roma1Università di Roma “La Sapienza”

GWADW 2002 – Isola d’Elba (Italy) – May 19-26 2002


Outline
Outline for Virgo

  • scientific goals and requirements

  • basic concepts of GRID

  • what the Grid offers

  • layout of VIRGO Virtual Organisation

  • application to gravitational waves data analysis

  • conclusions


Scientific goals and requirements
Scientific goals and requirements for Virgo

  • the coalescing binaries and periodic sources analysis needs large computing power

    • ~ 300 Gflops for coalescing binaries search

    • ~ 1000 Gflops for periodic sources search

      computational grids allows to use computing resources

      available in different laboratories/institutions


Grid a definition
GRID: a definition for Virgo

GRID:

an infrastructure to allow the sharing and coordinated use of resources within large, dynamic and multi-institutionals communities;


Basic resources of datagrid middleware
Basic resources of for VirgoDataGrid Middleware

  • DataGrid is an European Community project (3 years) to develop Grid Middleware and testbed infrastructure on European scale;

  • need to execute a program

    • Computing Element (CE)

  • need to access data

    • Storage Element (SE)

  • need to move data

    • network


Computing element ce
Computing Element (CE) for Virgo

GRID resource that provides CPU cycles

Examples:

  • clusters of PCs

  • supercomputers

  • ...


Storage element se
Storage Element (SE) for Virgo

GRID resource that provides disk space to store files

Examples:

  • simple disks pool

  • big Mass Storage System

  • ...

    Data is accessible to all processes running on CEs via multiple protocols


Grid resource
Grid resource for Virgo

  • A Grid resource provides a standard interface (protocol and API) that is common to that type of resource:

    • all CEs talk the same protocol (CE protocol) independently of the underlying batch system;

    • all SEs talk the same protocol (SE protocol) independently of the underlying Mass Storage System


What the grid offers
What the Grid offers for Virgo

  • independence from execution location

    • the user doesn’t want to know where a job will run (what CE)

  • independence from data location

    • the user doesn’t want to know where is data (what SE);

  • security

    • authentication, authorization;


Independence from execution location

Independence from for Virgoexecution location


Workload management system
Workload Management System for Virgo

Resource Broker (RB)a Resource Broker tries to find a good match between the job requirements and preferences and the available resources, in particular CEs

Job Submission Service (JSS)the Job Submission Service thenguarantees a reliable job submission andmonitoring


  • Scheduling criteria for Virgo

  • authorization information

  • data availability

  • job requirements

  • job preferences

  • accounting


Monitoring information system
Monitoring/Information System for Virgo

  • The Resource Broker needs some information:

    • what are available resources ?

    • what is their status ?

  • The Resource Broker query the Monitoring Information System to locate producers (CE, SE,...) and then obtain data directly from producers;


status update “pushed” on MIS for Virgo

data obtained from CE


Logging and bookkeeping
Logging and bookkeeping for Virgo

  • The LB service is a database of events concerning jobs and the other service of Workload Management System (RB and JSS)

    • provides status info for jobs;

    • designed to be highly reliable and available;


Independence from data location

Independence from for Virgodata location


Replica catalogue rc
Replica Catalogue (RC) for Virgo

  • With Replica Catalogue the same file (master) can exists in multiple copies (replicas)

    • LFN – Logical File Name: name for a set of replicasexample: lfn://virgo.org/virgofile-1.dat

    • PFN – Physical File Name: location of a replicaexample: pfn://virgo-se.na.infn.it/virgo/virgofile-1.dat

      it’s up to RB to translate LFN in PFN

      to locate the SE “closed” to a CE


Gridftp
GridFtp for Virgo

  • GridFtp is an efficient data transfer protocol

  • Features:

    • GSI security;

    • multiple data channels for parallel transfers;

    • partial file transfers;

    • third-party (direct server-to-server) transfers;

    • interrupted transfer recovery;


“standard FTP” average bandwith for Virgo

saturation of lowest bandwith

INFN Napoli – 34 Mbit/s

GridFTP tests period

CNAF Bologna – 98 Mbit/s



Layout of virgo virtual organisation

INFN Roma1 for Virgo

Computing Element

Worker Node 1

Worker Node 2

User Interface

INFN Napoli

Computing Element

E0 run

Worker Node 1

Worker Node 2

User Interface

Layout of VIRGO Virtual Organisation

CNAF-Bologna

Computing Element

Worker Node 1

Worker Node 2

Storage Element

Worker Node 3

Storage Element

GARR

Resource Broker

Storage Element

Information Index

Replica Catalogue


Job submission mechanism for Virgo

ResourceBroker

I I

User Interface

Computing Element

IS

Worker Node 1

PBS

OS

Worker Node 2

IS

Worker Node 3

OS

Storage Element

Computing Element

Computing Element

Worker Node 1

Worker Node 1

Worker Node 1

Worker Node 1


Job submission mechanism
Job submission mechanism for Virgo

  • The general scheme for distributed computation is the following:

    • multiple jobs submission from the Rome UI;

    • the Resource Broker interrogates the Information Index and submit each job to an available WN; the Input Data file is staged from the SE on the WN;

    • the output is sent back to the UI or published on SE;

  • the Resource Broker automatically distributes the jobs among the nodes (according to specifications in the JDL file) unless we decide to tie a given job to a particular node;

  • job scheduling at the node level is done via PBS.


Grid tests for coalescing binaries search 1 2
Grid tests for coalescing binaries search for Virgo1/2

  • Algorithm: standard matched filters

  • Templates generated at PN order 2 with Taylor approximants

  • Data

    • VIRGO E0 run

    • start GPS time: 685112730

    • data length: 600 s

  •  Conditions

    • raw data resampled at 2 kHz

    • lower frequency: 60 Hz

    • upper frequency: 1 kHz

    • search space: 2 – 10 solar masses

    • minimal match: 0.97

  • number of templates: ~ 40000


Grid tests for coalescing binaries search 2 2
Grid tests for coalescing binaries search for Virgo2/2

  • Step 1

    The data were extracted from CNAF-Bologna Mass Storage System. The extraction process reads the VIRGO standard frame format, performs a simple resampling and publishes the selected data file on the Storage Element;

  • Step 2

    The search was performed dividing the template space in 200 subspace and submitting from Napoli User Interface a job for each template subspace.Each job reads the selected data file from the Storage Element (located at CNAF-Bologna) and runs on the Worker Nodes selected by Resource Broker in the VIRGO VO.Finally, the output data of each job were retrieved from Napoli User Interface.


Grid tests for periodic sources search
Grid tests for periodic sources search for Virgo

The analysis for periodic sources search is based on a hierarchical approach in which coherent steps, based on FFTs and incoherent ones, based on the Hough Transform, alternates. At each iteration a more refined analysis is done on the selected candidates.

This procedure fits very well in a geographically distributed computational scheme.

The whole problem can be divided in a number of independent smaller tasks, each performed by a given computational node. E.g. each node can analyze a frequency band and/or a portion of the sky.

We have performed some preliminary test to evaluate the DataGrid software with respect to our analysis problem.

For the GRID tests we have used the code for the Hough Transform. The source spin-down is not taken into account. The input of the code is given by a “peak map” in the time-frequency plane.


Grid tests for periodic sources search 1 2
Grid tests for periodic sources search for Virgo1/2

The tests consists of two phases:

  • Production of input data on the SE;

  • Distributed computation.

  • We start from raw data of engineering run E1 (~ 5 hours) and the steps are the following:

    • channel extraction;

    • decimation at 1 kHz;

    • generation of periodograms by computing interlaced and windowed FFT (T_FFT=4194.304 s);

    • peaks selection (above two times the average noise);

      The produced time-frequency peaks map covers 20 Hz in frequency (from 480 to 500 Hz).


Grid tests for periodic sources search 2 2
Grid tests for periodic sources search for Virgo2/2

  • Each computing node processes a subset of the whole frequency band. Each job runs according to this scheme:

    • reads its initial reference frequency and the velocity vector direction;

    • migrates on a worker node;

    • takes from the SE the input data corresponding to the frequency band associated to that job;

    • calculates the current frequency band of interest, i.e the Doppler band;

    • calculates the Hough Transform;

    • iterates on the reference frequency until the full band has been processed.

  • The output of each job would be a set of candidates which will be followed in the next coherent phase.


Conclusions
Conclusions for Virgo

  • we have successfully verified that multiple jobs can be submitted and the output retrieved with small overhead time;

  • computational grids seems very suitable to perform data analysis for coalescing binaries and periodic sources searches;

  • Future plans

    • testing MPI-job submission for coalescing binaries search (feature provided in next DataGrid release);

    • testing the whole data analysis chain for periodic sources search;

    • first tests for network analysis among interferometers;


ad