overview of xtreemos n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Overview of XtreemOS PowerPoint Presentation
Download Presentation
Overview of XtreemOS

Loading in 2 Seconds...

play fullscreen
1 / 45

Overview of XtreemOS - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

Overview of XtreemOS. Christine Morin XtreemOS scientific coordinator xtreemos-projectleader@irisa.fr Phenix Workshop, Rennes December 07, 2006. VO1. WAN. VO2. Grid Environment & VO. Multiple users from different institutions Multiple geographically

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Overview of XtreemOS' - cheryl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview of xtreemos

Overview of XtreemOS

Christine Morin

XtreemOS scientific coordinator

xtreemos-projectleader@irisa.fr

Phenix Workshop, Rennes

December 07, 2006

XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576

grid environment vo

VO1

WAN

VO2

Grid Environment & VO
  • Multiple users from different institutions
  • Multiple geographically
  • distributed resources in different administrative domains
  • Large scale
    • Uncountable number of resources
  • Dynamicity
    • VO, users, resources

Overview of XtreemOS - Phenix Workshop, December 7, 2006

state of the art
State of the Art
  • Current OS are not Grid-aware & not VO-aware
  • A variety of Grid middleware & Toolkits for Grid Computing
      • Resource management
      • Changing interfaces
      • Security pitfalls
      • Complexity for users, programmers & administrators

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos objectives
XtreemOS Objectives
  • Design & implement a reference open source Grid operating system based on Linux
    • Native support for virtual organizations
  • Validate the XtreemOS Grid OS with a set of real use cases on a large Grid testbed
  • Promote XtreemOS software in the Linux community and create communities of users and developers

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos research challenges
XtreemOS Research Challenges
  • Identify fundamental functionalities to be embedded in Linux for secure application execution in Grids
  • Build a set of scalableself-healing OS services for secure resource management in very large dynamic grids
  • Provide a simple Grid API compliant with Posix while adding new functionality and supporting Grid-aware applications
  • Aggregate cluster resources into powerful grid nodes by integrating single system image mechanisms in Linux
  • Build an XtreemOS flavour for mobile devices enabling ubiquitous access to grid resources

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos flavours
PC

Federation of PCs

Cluster

Mobile device

PDA

Mobile phone

Appli

Appli

Appli

Application

Middleware

XtreemOS

Linux

Linux

Linux

Linux

Computer

Computer

Computer

Computer

XtreemOS Flavours

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos architecture

Scientific Applications

Business Applications

XtreemOS API

Application

Management

Data Management

VO & Security

Infrastructure for Highly Available and Scalable Services

Linux-XOS: Grid-enabled Linux Operating System

Linux-XOS for Mobile Devices

Linux-XOS for Cluster

Linux-XOS for PC

XtreemOS Architecture

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos use cases
XtreemOS UseCases
  • 14 applications
    • Simulation applications (aerospace, energy)
    • Business applications
    • Bioinformatics application
    • Virtual reality application
    • Finance application
    • Telecom application

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos linux
XtreemOS & Linux
  • Acceptance in the Linux community is key for the success of the XtreemOS project
    • Packaging for multiple Linux distributions
        • Mandriva Linux
        • Red Flag Linux
        • Debian
    • Integration in OSCAR
    • Get XtreemOS patches accepted in Linux OS

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos project phases
XtreemOS Project Phases
  • Phase 1 (M1-M6)
    • Specification of XtreemOS
  • Phase 2 (M7-M18)
    • Design and implementation of XtreemOS basic version
    • Preliminary experiments with LinuxSSI
  • Phase 3 (M19-M24)
    • Integration of all XtreemOS components
    • Delivery of first XtreemOS prototype
  • Phase 4 (M25-M48)
    • Evaluation with real use cases
    • Design and implementation of advanced features of XtreemOS
    • Public releases

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos sub projects
SP1 - Project Management

SP2 - Linux for Virtual Organizations

SP3 - Grid Support for Linux

SP4 - Software integration, packaging, experimentation & validation

SP5 - Communication, dissemination, exploitation & training

SP2

SP3

XtreemOS

SP4

XtreemOS Sub-projects

Overview of XtreemOS - Phenix Workshop, December 7, 2006

vo and security management

Scientific Applications

Business Applications

XtreemOS API

Application

Management

Data Management

VO & Security

Infrastructure for Highly Available and Scalable Services

Linux-XOS: Grid-enabled Linux Operating System

Linux-XOS for Mobile Devices

Linux-XOS for Cluster

Linux-XOS for PC

VO and Security Management

Overview of XtreemOS - Phenix Workshop, December 7, 2006

vo security management
VO & Security Management

A VO can be seen as a temporary or permanent coalition of geographically dispersed entities (individuals, groups, organizational units or entire organizations) that pool resources, capabilities and information to achieve common objectives.

  • Legal or contractual arrangements between entities
  • Resources can be physical equipment or other capabilities such as knowledge, information or data

Overview of XtreemOS - Phenix Workshop, December 7, 2006

some lessons from the state of the art
Some Lessons from the State of the Art
  • Open issues
    • Scalability of in-the-large VO management
      • Short-lived VOs
    • Ease of management of VO and VO identities
    • Security and VO policy enforcement at the node and site level

Overview of XtreemOS - Phenix Workshop, December 7, 2006

vo security management1
VO & Security Management
  • Key components of VO
    • Owner/administrator of the VO
    • A set of participating users in different participating domains
    • A set of participating resources in different participating domains
    • A set of roles which users/resources can play in the VO
    • A set of rules/policies on resource availability and access control
    • An (renewable) expiry time of the VO

Overview of XtreemOS - Phenix Workshop, December 7, 2006

vo lifecycle
VO Lifecycle
  • VO identification
    • Identify and name VO candidates
  • VO formation
    • Creation and configuration of the VO according to the anticipated roles of members
  • VO operation
    • Members should be identified for effectively logging and auditing
    • The VO should be able to classify the resources to different access control level for effective management
  • VO evolution
    • Managing change in participating entities or in their condition of use
    • Members can be added and linked into a VO by authorization
    • Users can be classified at different levels with associated operation rights
  • VO dissolution
    • Non persistent information should be deleted, credentials reclaimed and user and resource providers notified
    • Should take place after all activities finished

Overview of XtreemOS - Phenix Workshop, December 7, 2006

vo management
VO Management
  • Two levels
    • VO level (administration)
      • Performed by XtreemOS-G services
        • Distributed information management for membership tracking and accounting of users and resources
    • Node level
      • Performed by XtreemOS-F
      • Add mechanisms to Linux OS for recognizing, controlling, and enforcing usage of global Grid entities
        • Grid identity management
        • Resource access granting and accounting
        • VO policy checking, auditing and enforcing

Overview of XtreemOS - Phenix Workshop, December 7, 2006

node level vo management
Node Level VO Management
  • Minimal with respect to changes to the kernel code to reduce pressure to get VO related changes accepted in Linux community
    • Keep changes localized in dynamically loadable kernel modules
  • Features
    • PAM-plug-in based authentication
    • Static and dynamic identity mapping to local user/group ids
    • Kernel level key retention mechanisms
    • ACL mechanisms
      • Flexible, secure, efficient and easily sustainable from the software engineering point of view VO model
  • Investigation of synergies with existing security enhancement for Linux
    • Linux Security Module (LSM)
      • Refinement of access control and enforcement mechanisms

Overview of XtreemOS - Phenix Workshop, December 7, 2006

infrastructure for highly available and scalable services

Scientific Applications

Business Applications

XtreemOS API

Application

Management

Data Management

VO & Security

Infrastructure for Highly Available and Scalable Services

Linux-XOS: Grid-enabled Linux Operating System

Linux-XOS for Mobile Devices

Linux-XOS for Cluster

Linux-XOS for PC

Infrastructure for Highly Available and Scalable Services

Overview of XtreemOS - Phenix Workshop, December 7, 2006

infrastructure for highly available and scalable grid services
Infrastructure for Highly Available and Scalable Grid Services
  • Grid
    • Very large number of nodes that are distributed world-wide
    • Dynamicity: nodes join, leave, fail
  • Applications
    • Standalone (interact only with the user that launched them)
    • Services (present an interface to the outside world and can be invoked)
      • System level functionalities
      • Application-level functionalities
  • Targets of the infrastructure
    • XtreemOS-G services
    • Application-level services

Overview of XtreemOS - Phenix Workshop, December 7, 2006

infrastructure for highly available and scalable grid services1
Infrastructure for Highly Available and Scalable Grid Services
  • Management of collections of nodes

Overview of XtreemOS - Phenix Workshop, December 7, 2006

infrastructure for highly available and scalable grid services2
Infrastructure for Highly Available and Scalable Grid Services
  • Toolbox
    • Facilities to construct structured collections
      • Application initialization
      • DHT, N-dimensional matrix, ranked nodes
    • Distributed servers
      • Present a single stable address to the external world hiding the internal organization of the service
    • Virtual nodes
      • Fault tolerant groups of nodes capable of taking over each other’s tasks
    • Publish/Subscribe
      • Useful for applications and also to build structured collections
      • Fully decentralized implementation
    • Directory service
      • Node monitoring and failure detection
      • Adapt to the dynamicity of the monitored attributes

Overview of XtreemOS - Phenix Workshop, December 7, 2006

application management

Scientific Applications

Business Applications

XtreemOS API

Application

Management

Data Management

VO & Security

Infrastructure for Highly Available and Scalable Services

Linux-XOS: Grid-enabled Linux Operating System

Linux-XOS for Mobile Devices

Linux-XOS for Cluster

Linux-XOS for PC

Application Management

Overview of XtreemOS - Phenix Workshop, December 7, 2006

application management1
Application Management
  • Entities taking part in job execution
    • Job
      • One or more processes that collaborate to achieve a common goal
      • Resource allocation unit
    • Resources
      • Physical or virtual component of limited availability within a computer system
        • Have static and dynamic characteristics
  • Application execution management
    • Job submission and scheduling
    • Job and resource control
    • Job and resource monitoring

Overview of XtreemOS - Phenix Workshop, December 7, 2006

application life cycle
Application Life Cycle

Overview of XtreemOS - Phenix Workshop, December 7, 2006

application execution management
Application Execution Management
  • AEM is generic and flexible as much as possible
    • Does not target specific users or types of jobs
  • AEM allows users to exploit advantages of executing a job in a Grid
  • AEM provides an easy to use job submission, control and monitoring interface
    • Unix-like submission (with default description of requirements)
    • Batch-like submission
      • Requirements
      • Hints (additional information optionally provided by users)
    • Adaptive and accurate monitoring
  • AEM deals with Grid dynamicity
    • Job migration and checkpointing
    • Hide failures and changes as much as possible to users

Overview of XtreemOS - Phenix Workshop, December 7, 2006

application execution management1
Application Execution Management
  • AEM has to guarantee access to authorized resources and their limited utilization
    • Jobs executed in the context of a grid user and a VO
    • Rely on VO and security management services (WP2.1, WP3.5)
  • Scalability and fault tolerance taken into account in the design of AEM
    • Most of AEM services are in the scope of a job which is suitable for scalability
      • JExecMng and jMonitor could potentially have to manage hundreds of nodes
    • JobDirectory and jController need to be fault tolerant
    • WP3.2 services will be used as appropriate
      • Resource discovery
      • Distributed servers
  • Tight integration with the Linux OS
    • Enforcement in the usage of agreed resources (quota, access control)
      • Job-id to be known by XtreemOS-F
    • Users will have more information and control on how their jobs are running
      • Performance metrics, occurred errors, exit status, …
  • AEM provides a basic set of system-level functionalities
    • Users may rely on user-level services (eg. workflow manager, SAGA runtime)

Overview of XtreemOS - Phenix Workshop, December 7, 2006

data management

Scientific Applications

Business Applications

XtreemOS API

Application

Management

Data Management

VO & Security

Infrastructure for Highly Available and Scalable Services

Linux-XOS: Grid-enabled Linux Operating System

Linux-XOS for Mobile Devices

Linux-XOS for Cluster

Linux-XOS for PC

Data management

Overview of XtreemOS - Phenix Workshop, December 7, 2006

data management1
Data Management
  • XtreemFS
    • Federated object-based file system for Grid environments
      • Centralised metadata servers replaced by a federation of metadata servers
        • Independence of participating organizations while maintaining a global view of the system
      • Designed with wide-area networks in mind
        • File replication
        • Location and access management based on an intelligent monitoring service
          • Access pattern-aware replication
      • Semantic naming and advanced query functions to allow users to find data in huge archives
    • Object Sharing Service (OSS)
      • Inter-process communication via volatile memory, mapped files, dynamically allocated objects and grid pipes

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemfs components
XtreemFS Components
  • Object Storage Device (OSD)
    • Data access in the file system
      • Read/write access, concurrency control
    • Object-based storage interface to hide complexity of underlying block-based storage mechanisms
  • Metadata and Replica Catalogue (MRC)
    • Maintenance of all file system metadata
      • Posix metadata
      • Extended (user defined) metadata
      • Information on replica locations
  • Replica Management Service (RMS)
    • Decides when replicas have to be replicated and with what distribution among OSD
    • Replica removal
  • Client
    • Hosts running the access layer (file system adapter or XtreemFS library)
      • Linux traditional file system interface for transparent access to MRC, OSD, RMS
      • Native XtreemFS interface

Overview of XtreemOS - Phenix Workshop, December 7, 2006

object storage device osd
Object Storage Device (OSD)
  • Container of objects
    • Reliably store and retrieve data from physical media
    • Security enforcement for access to stored objects
      • Capabilities built by MRC and received with each request
    • Multi-object files
      • Striping and/or replication
      • Each file replica has its own striping policy
    • Transactional files
      • Changes performed on a local copy (and not forwarded to other OSD) and committed or rolled back at some time

Overview of XtreemOS - Phenix Workshop, December 7, 2006

replica management service rms
Replica Management Service (RMS)
  • Take care of autonomous creation and deletion of replicas
  • Replication policies
    • Must satisfy security needs and comply with local regulations
      • Countries, real organization, VO, racks in a data centre
  • Replica creation
    • Gathering information from other services to decide when and where to create a replica
      • Each time a file is open
        • RMS is contacted to see if a better replica should be created
          • Decision depends on the file size, OSD availability
          • A client may start accessing a “bad replica” during the creation of a new one
      • MRC may keep track of opens to predict future access from the previous ones
      • AEM can inform RMS that a job is about to start its execution
        • RMS can anticipate the creation of a new replica before the job execution
  • Removing “obsolete” replicas
    • Lack of free space, file or replica very seldom used, close replicas not anymore useful, …
    • A replica can be removed at any time even while being used

Overview of XtreemOS - Phenix Workshop, December 7, 2006

metadata and replica catalogue mrc
MetaData and Replica Catalogue (MRC)
  • MRC
    • Acts logically as one service but will be composed of replicated service instances to improve availability and performance
    • Access control management
      • Support of a variety of policies
      • Volume ACL
  • Data model
    • Hierarchical directory structure and/or extended metadata
    • Core abstraction for controlling access to file metadata and file data is the volume
    • Files can be copied between volumes and links to files in other volumes can be created
  • Internal architecture
    • Exactly one meta object per physical object on a storage device
  • To what extend it is possible to decouple system components while preserving a global view to the system

Overview of XtreemOS - Phenix Workshop, December 7, 2006

object sharing service oss
Object Sharing Service (OSS)
  • Inter-process communication via volatile memory, mapped files, dynamically allocated objects and grid pipes
    • All components designed to be scalable and fault tolerant to deal with the dynamic behaviour of the Grid
  • Features
    • Management of shared objects containing references
    • Object access detection
      • Page based
    • Object access monitoring to control false sharing and object replicas
    • Object consistency management
      • Strict, weak and transactional memory consistency models

Overview of XtreemOS - Phenix Workshop, December 7, 2006

linuxssi linux xos for clusters

Scientific Applications

Business Applications

XtreemOS API

Application

Management

Data Management

VO & Security

Infrastructure for Highly Available and Scalable Services

Linux-XOS: Grid-enabled Linux Operating System

Linux-XOS for Mobile Devices

Linux-XOS for Cluster

Linux-XOS for PC

LinuxSSI: Linux-XOS for Clusters

Overview of XtreemOS - Phenix Workshop, December 7, 2006

linuxssi xtreemos f cluster flavour
LinuxSSI: XtreemOS-F Cluster Flavour
  • LinuxSSI will leverage Kerrighed SSI OS for clusters
  • Four work directions for LinuxSSI
    • Scalability to hundreds of processors
    • LinuxSSI file system
    • Automatic reconfiguration of LinuxSSI
    • Checkpoint/restart mechanisms for parallel applications
    • Customizable scheduler

Overview of XtreemOS - Phenix Workshop, December 7, 2006

scalability reconfiguration management
Scalability & Reconfiguration Management
  • Scalability to hundreds of processors
    • Removing hard limits on the amount of nodes
    • Evaluating the scalability of Kerrighed internal algorithms
  • Automatic reconfiguration of LinuxSSI
    • Node addition, eviction or failure management
    • Leverage the existing mechanisms provided by Kerrighed in the HotPlug module

Overview of XtreemOS - Phenix Workshop, December 7, 2006

linuxssi file system
LinuxSSI File System
  • LinuxSSI file system
    • Exploitation of the disks attached to cluster nodes
      • Single name space (root file system)
      • Policies for placing/replicating data on disk
      • Efficient parallel accesses to large data volumes
    • Performance as a primary target in LinuxSSI basic version
    • LinuxSSI file system should not fail in the event of failures
      • Better support to failures in the advanced version of LinuxSSI

Overview of XtreemOS - Phenix Workshop, December 7, 2006

checkpoint restart in linuxssi
Checkpoint/Restart in LinuxSSI
  • Checkpoint and restart of parallel application units in a cluster
    • Shared memory and message-passing programming models will be supported
    • Checkpointer multi-level architecture
      • Kernel checkpointer
        • Process/thread checkpointing
        • Based on Kerrighed mechanisms
        • Transparent or application-aware checkpointing
      • System checkpointer
        • Application unit checkpointing (inside a cluster)
        • Coordination of thread/process checkpoints for parallel applications
        • Configurable service
      • Grid checkpointer
        • Application checkpointing (an application may span multiple Grid nodes)
        • Coordination of application unit checkpoints for an application comprising of multiple units

Overview of XtreemOS - Phenix Workshop, December 7, 2006

customizable scheduler
Customizable Scheduler
  • Customizable scheduler
    • Long-term scheduler
      • Application admission in the cluster (job queuing system)
    • Load balancing scheduler
      • Balance the current workload between cluster nodes
  • Long-term scheduler
    • DRMAA standard interface
    • Adapted to take advantage of the SSI “virtual multiprocessor”
    • Resource sharing (a CPU may not be dedicated to a single application)
    • Advanced monitoring capabilities
  • Load balancing scheduler
    • Policy customization
      • Multilevel architecture (probes, analyzers, decision-making)
    • Self adaptation of policy based on the current state of the cluster
    • Advanced policies
      • Shared memory, IPC
  • Interaction with the Grid level services when needed

Overview of XtreemOS - Phenix Workshop, December 7, 2006

from linuxssi to linuxssi xos
From LinuxSSI to LinuxSSI-XOS
  • Virtual organization support
    • Support of the kernel key retention system
      • Impact on the Ghost module
    • XtreemOS-G services will run as a single instance on a LinuxSSI cluster
      • Example: daemons in charge of mapping global user, VO and group identities onto the Linux UID/GID

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos consortium
XtreemOS Consortium
  • 19 partners
    • 1 public financial institution as coordinator
    • 9 research centers & universities
    • 9 industrial partners
      • 4 SME
  • 8 countries
    • Europe
      • France, Germany, Italy, Slovenia, Spain, The Netherlands, UK
    • China

Overview of XtreemOS - Phenix Workshop, December 7, 2006

xtreemos partners
XtreemOS Partners

Overview of XtreemOS - Phenix Workshop, December 7, 2006

fact sheet
Start date

June 1st, 2006

Duration

4 years

Budget

Approx. 30 Meuros

EC funding 14.2 Meuros

Website

http://www.xtreemos.eu

Administrative and financial coordinator

CDC, Jean-Noël Forget

Scientific and technical staff

More than 100 persons

Fact Sheet

Overview of XtreemOS - Phenix Workshop, December 7, 2006