Potential data access architectures using xrootd
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Potential Data Access Architectures using xrootd PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

Potential Data Access Architectures using xrootd. OSG All Hands Meeting Harvard University March 7-11, 2011 Andrew Hanushevsky, SLAC http://xrootd.org. Goals. Describe xrootd What it is and what it is not The architecture The clustering model Data access modes

Download Presentation

Potential Data Access Architectures using xrootd

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Potential data access architectures using xrootd

Potential Data AccessArchitectures using xrootd

OSG All Hands Meeting

Harvard University

March 7-11, 2011

Andrew Hanushevsky, SLAC

http://xrootd.org


Goals

Goals

  • Describe xrootd

    • What it is and what it is not

  • The architecture

    • The clustering model

  • Data access modes

    • How they relate to the xrootdarchitecture

  • Conclusion


What is xrootd

What Is xrootd?

  • A file access and data transfer protocol

    • Defines POSIX-style byte-level random access for

      • Arbitrarydata organized as files of anytype

      • Identified by a hierarchical directory-like name

  • A reference softwareimplementation

    • Embodied as the xrootdand cmsddaemons

      • xrootddaemon provides access to data

      • cmsddaemon clusters xrootddaemons together

    • Attempts to brand software as Scallahave failed


What isn t xrootd

What Isn’t xrootd?

  • It is not a POSIX file system

    • There is a FUSEimplementation called xrootdFS

      • An xrootdclientsimulating a mountable file system

        • It does not provide full POSIX file system semantics

  • It is not an Storage Resource Manager (SRM)

    • Provides SRM functionality via BeStMan

  • It is not aware of any file internals (e.g., root files)

    • But is distributed with root and proof frameworks

      • As it provides unique & efficient file access primitives


Primary xrootd access modes

Primary xrootdAccess Modes

  • The root framework

    • Used by most HEP and many Astro experiments (MacOS, Unix and Windows)

  • POSIX preload library

    • Any POSIX compliant application (Unix only, no recompilation needed)

  • File system in User SpacE

    • A mounted xrootddata access system via FUSE(Linux and MacOS only)

  • SRM, globus-url-copy, gridFTP, etc

    • General grid access (Unix only)

  • xrdcp

    • The parallel stream, multi-source copy command (MacOS, Unix and Windows)

  • xrd

    • The command line interface for meta-data operations (MacOS, Unix and Windows)


What makes xrootd unusual

What Makes xrootdUnusual?

  • A comprehensive plug-in architecture

    • Security, storage back-ends (e.g., tape), proxies, etc

  • Clusters widely disparate file systems

    • Practically any existing file system

      • Distributed (shared-everything) to JBODS (shared-nothing)

    • Unified view at local, regional, and global levels

  • Very low support requirements

    • Hardware and human administration


The plug in architecture

Authentication

(gsi, krb5, etc)

lfn2pfn

prefix encoding

Authorization

(dbms, voms, etc)

Protocol (1 of n)

(xroot, proof, etc)

Logical File System

(ofs, sfs, alice, etc)

Physical Storage System

(ufs, hdfs, hpss, etc)

Clustering

(cmsd)

The Plug-In Architecture

Protocol Driver

(XRD)

Let’s take a closer look

atxrootd-style clustering

Replaceable plug-ins to

accommodate any environment


Clustering

Clustering

  • xrootdservers can be clustered

    • Increase access points and reliability

  • Uses highly effective clustering algorithms

    • Cluster overhead (human & non-human) scales linearly

    • Cluster size is not limited

    • I/O performance is not affected

  • Always pairs xrootd& cmsdservers

    • Symmetric cookie-cutter arrangement

      • Allows for a single configuration file

xrootd

cmsd


A simple xrootd cluster

A Simple xrootd Cluster

Manager

(a.k.a. Redirector)

1: open(“/my/file”)

Client

4: Try open() at A

xrootd

xrootd

xrootd

xrootd

cmsd

cmsd

cmsd

cmsd

5: open(“/my/file”)

3: I DO!

3: I DO!

2: Who has “/my/file”?

Data Servers

A

/my/file

B

C

/my/file


Recapping the fundamentals

Recapping The Fundamentals

  • An xrootd-cmsd pair is the building block

    • xrootd provides the client interface

      • Handles data and redirections

    • cmsd manages xrootd’s (i.e. forms clusters)

      • Monitors activity and handles file discovery

  • Building blocks are stackable & replicable

    • Can create a wide variety of configurations

      • Much like you would do with LEGOÒ blocks

    • Extensive plug-ins provide adaptability


Exploiting stackability

Exploiting Stackability

Client

Meta-Manager

(a.k.a. Global Redirector)

S

e

r

v

e

r

s

S

e

r

v

e

r

s

S

e

r

v

e

r

s

Data is uniformly available

By federating three distinct sites

1: open(“/my/file”)

2: Who has “/my/file”?

5: Try open() at ANL

7: Try open() at A

6: open(“/my/file”)

4: I DO!

4: I DO!

Manager

(a.k.a. Local Redirector)

Manager

(a.k.a. Local Redirector)

Manager

(a.k.a. Local Redirector)

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

8: open(“/my/file”)

B

A

C

/my/file

ANL

SLAC

UTA

C

C

C

/my/file

/my/file

/my/file

An exponentially parallel search!

(i.e. O(2n))

A

A

A

/my/file

/my/file

B

B

B

Federated Distributed Clusters

Distributed Clusters

3: Who has “/my/file”?

3: Who has “/my/file”?

3: Who has “/my/file”?


Federated distributed clusters

Federated Distributed Clusters

  • Unites multiple site-specific data repositories

    • Each site enforces its own access rules

      • Usable even in the presence of firewalls

  • Scalability increases as more sites join

    • Essentially a real-time bit-torrent social model

      • Federations are fluid and changeable in real time

        • Provide multiple data sources to achieve high transfer rates

  • Increased opportunities for data analysis

    • Based on what is actually available


What federated clusters foster

What Federated Clusters Foster

  • Resilient analysis

    • Fetch the “last” missing file at run-time

      • Copy only when necessary

  • Adaptable analysis

    • Cache files where they are needed

      • Copy whatever analysis demands

  • Storage-starved analysis

    • Real-time access to data across multiple sites

      • Deliver towherever the compute cycles are

Copy Data Access Architecture

Cached Data Access Architecture

Direct Data Access Architecture


Copy data access architecture

Copy Data Access Architecture

  • The built-in File Residency Manager drives

    • Copy On Fault

      • Demand driven (fetch to restore missing file)

    • Copy On Request

      • Pre-driven (fetch files to be used for analysis)

S

e

r

v

e

r

s

S

e

r

v

e

r

s

S

e

r

v

e

r

s

open(“/my/file”)

xrdcp –x xroot://mm.org//my/file /my

Meta-Manager

(a.k.a. Global Redirector)

Client

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

Manager

(a.k.a. Local Redirector)

Manager

(a.k.a. Local Redirector)

Manager

(a.k.a. Local Redirector)

C

C

C

/my/file

/my/file

/my/file

A

A

A

/my/file

/my/file

B

B

B

/my/file

/my/file

ANL

SLAC

UTA

xrdcp copies data using two sources


Direct data access architecture

DirectData Access Architecture

  • Use servers as if all of them were local

    • Normal and easiest way of doing this

    • Latency may be an issue (depends on algorithms & CPU-I/O ratio)

      • Requires Cost-Benefit analysis to see if acceptable

S

e

r

v

e

r

s

S

e

r

v

e

r

s

S

e

r

v

e

r

s

open(“/my/file”)

Meta-Manager

(a.k.a. Global Redirector)

Client

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

Manager

(a.k.a. Local Redirector)

Manager

(a.k.a. Local Redirector)

Manager

(a.k.a. Local Redirector)

C

C

C

/my/file

/my/file

/my/file

A

A

A

/my/file

/my/file

B

B

B

/my/file

ANL

SLAC

UTA


Cached data access architecture

Cached Data Access Architecture

  • Front servers with a caching proxy server

    • Client access proxy server for all data

      • Server can be central or local to client (i.e. laptop)

    • Data comes from proxy’s cache or other servers

S

e

r

v

e

r

s

S

e

r

v

e

r

s

S

e

r

v

e

r

s

open(“/my/file”)

Meta-Manager

(a.k.a. Global Redirector)

Client

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

xrootd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

cmsd

Manager

(a.k.a. Local Redirector)

Manager

(a.k.a. Local Redirector)

Manager

(a.k.a. Local Redirector)

C

C

C

/my/file

/my/file

/my/file

A

A

A

B

B

B

/my/file

ANL

SLAC

UTA


Conclusion

Conclusion

  • The xrootdarchitecture promotes efficiency

    • Can federated almost any file system

    • Gives a uniform view of massive amounts of data

      • Assuming per-experiment common logical namespace

    • Secure and firewall friendly

    • Ideal platform for adaptive caching systems

    • Completely open source under a BSD license

  • See more at http://xrootd.org/


Acknowledgements

Acknowledgements

  • Current Software Contributors

    • ATLAS: Doug Benjamin

    • CERN: FabrizioFurano, Lukasz Janyst, Andreas Peters,David Smith

    • Fermi/GLAST: Tony Johnson

    • FZK: ArtemTrunov

    • LBNL: Alex Sim, JunminGu, VijayaNatarajan(BeStMan team)

    • Root: Gerri Ganis, BeterandBellenet, FonsRademakers

    • OSG: Tim Cartwright, Tanya Levshina

    • SLAC: Andrew Hanushevsky,WilkoKroeger, Daniel Wang, Wei Yang

    • UNL: Brian Bockelman

    • UoC: Charles Waldman

  • Operational Collaborators

    • ANL, BNL, CERN, FZK, IN2P3, SLAC, UTA, UoC, UNL, UVIC, UWisc

  • US Department of Energy

    • Contract DE-AC02-76SF00515with Stanford University


  • Login