Rule based data management systems
Download
1 / 23

Rule-Based Data Management Systems - PowerPoint PPT Presentation


  • 255 Views
  • Updated On :

Rule-Based Data Management Systems. Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, [email protected] http://www.sdsc.edu/srb http://irods.sdsc.edu/. Topics. Managing distributed shared collections Data grids Control of name spaces - SRB Production system

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Rule-Based Data Management Systems' - MartaAdara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Rule based data management systems l.jpg

Rule-Based Data Management Systems

Reagan W. Moore

Wayne Schroeder

Mike Wan

Arcot Rajasekar

{moore, schroede, mwan, [email protected]

http://www.sdsc.edu/srb

http://irods.sdsc.edu/


Topics l.jpg
Topics

  • Managing distributed shared collections

    • Data grids

  • Control of name spaces - SRB

    • Production system

    • Data and trust virtualization

    • Infrastructure independence

  • Control of management policies - iRODS

    • Next generation technology

    • Management virtualization

    • Rules controlling remote operations

    • Constraints on the rules and remote operations


Data management applications l.jpg
Data Management Applications

  • Data grids

    • Share data

  • Digital libraries

    • Publish data

  • Persistent archives

    • Preserve data

  • Real-time sensor streams

    • Data federation

  • Data analysis

    • Automate access to distributed data


Concepts l.jpg
Concepts

  • Distributed Data Management Concepts

    • Data virtualization

      • Manage the properties of a shared collection independently of the storage systems

    • Trust virtualization

      • Administrative domain independence

    • Federation

      • Managing interactions between data grids

  • Rule-based Data Management

    • Policy virtualization

      • Automating execution of management policies

      • Applying management policies to remote operations


Using a data grid in abstract l.jpg

Data delivered

Ask for data

  • The data is found and returned

    • Where & how details are hidden

Using a Data Grid – in Abstract

Data Grid

  • User asks for data from the data grid


Using a data grid details l.jpg

DB

Storage Resource Broker Server

Metadata Catalog

Storage Resource Broker Server

Using a Data Grid - Details

  • User asks for data

  • Data request goes to SRB Server

  • Server looks up information in catalog

  • Catalog tells which SRB server has data

  • 1st server asks 2nd for data

  • The data is found and returned


Data virtualization l.jpg
Data Virtualization

  • Manage properties of each digital entity independently of the remote storage systems

    • Infrastructure independence

  • Properties of the shared collection

    • Name spaces

    • Persistent state information (location, size,…)

  • Manage standard operations

    • Map from client requests to standard operations

    • Map from standard operations to remote storage system protocol


Data virtualization8 l.jpg
Data Virtualization

Data Access Methods (C library, Unix, Web Browser)

Data Collection

  • Storage Repository

  • Storage location

  • User name

  • File name

  • File context (creation date,…)

  • Access controls

  • Data Grid

  • Logical resource name space

  • Logical user name space

  • Logical file name space

  • Logical context (metadata)

  • Access constraints

Data is organized as a shared collection


Data virtualization9 l.jpg
Data Virtualization

Access Interface

Map from the

actions requested by

the access method

to a standard set of

micro-services used

to interact with the

storage system

Standard Access Actions

Data Grid

Standard Micro-services

Storage Protocol

Storage System


Standard operations l.jpg
Standard Operations

  • File manipulation

    • Posix I/O calls - open, close, read, write, seek, …

    • Register, replicate, checksum, synchronize

  • Bulk operations

    • Bulk data transport, metadata load

    • Parallel I/O streams

  • Remote procedures

    • Data filtering, subsetting, metadata extraction

    • Remote library execution (HDFv5, DataCutter)


Babar high energy physics l.jpg
BaBar High-Energy Physics

  • Stanford Linear Accelerator

  • IN2P3

  • Lyon, France

  • Rome, Italy

  • San Diego

  • RAL, UK

  • A functioning international Data Grid for high-energy physics

Manchester-SDSC mirror

Moved over 300 TBs of data

Increasing to 5 TBs per day


Next generation technology l.jpg
Next Generation Technology

  • Every fault that occurs in the distributed environment is the responsibility of the data grid

    • Network outage / system crash / operator error

    • Minimize risk through checksums, replicas, synchronization, federation

  • Management of large collections is labor intensive

    • Initiation of recovery operations after remote system failure

  • Need to automate execution of management policies


Controlling remote operations l.jpg
Controlling Remote Operations

iRODS - integrated Rule-Oriented Data System

Support unique organizational / social

management policies for each collection


Rule based data management l.jpg
Rule-based Data Management

  • Express assessment criteria through sets of required persistent state information

  • Express management policies as sets of rules controlling the execution of micro-services

  • Express capabilities as sets of micro-services

    • Manage persistent state information resulting from the application of rules controlling execution of remote micro-services


Management virtualization l.jpg
Management Virtualization

  • Examples of management policies

    • Integrity

      • Validation of checksums

      • Synchronization of replicas

      • Data distribution

      • Data retention

      • Access controls

    • Authenticity

      • Chain of custody - audit trails

      • Track required preservation metadata - templates

      • Generation of Archival Information Packages


Rule based data management16 l.jpg
Rule-based Data Management

  • Rules required for standard operations

    • Posix I/O control

    • Standard SRB operations

  • Administrator controlled rules to implement management policies

    • Administrative - adding / deleting users, resources

    • Data ingestion - pre-processing, post-processing

    • Data transport / deletion - parallel I/O streams, disposition

  • User-defined rules, create your own server-side workflow

    • Rule set for a particular collection, particular user group, particular storage system, particular micro-service


Irods rule l.jpg
iRODS Rule

  • Each rule defines

    • Event

    • Condition

    • Action sets (micro-services and rules)

    • Recovery sets

  • Rule types

    • Atomic, applied immediately

    • Deferred, support deferred consistent constraints

    • Periodic, typically used to validate assertions


Rule based access l.jpg
Rule-based Access

  • Associate security policies with each digital entity

    • Redaction, access controls on structures within a file

    • Time-dependent access controls (how long to hold data proprietary)

  • Associate access controls with each rule

    • Restrict ability to modify, apply rules

  • Associate access controls with each micro-service

    • Explicit control of operation execution within a given collection

    • Much finer control than provided by Unix r:w:e


Federation between data grids l.jpg
Federation Between Data Grids

Data Access Methods (Web Browser, DSpace, OAI-PMH)

Data Collection A

Data Collection B

  • Data Grid

  • Logical resource name space

  • Logical user name space

  • Logical file name space

  • Logical rule name space

  • Logical micro-service name

  • Logical persistent state

  • Data Grid

  • Logical resource name space

  • Logical user name space

  • Logical file name space

  • Logical rule name space

  • Logical micro-service name

  • Logical persistent state


Rule based federation l.jpg
Rule-based Federation

  • When registering a digital entity into another data grid, register required management rules along with the digital entity

    • Move management policies with data

  • Expectation that each operation on each digital entity can be controlled across federated data grids

    • Example is end-to-end encryption


Evolution of rule based systems l.jpg
Evolution of Rule-based Systems

  • Logical name spaces enable dynamic addition of new rules, micro-services, and state information

    • Apply new rules on one collection while applying old rule sets on a legacy collection

    • Can run old and new rule sets in parallel

  • Can build a system that manages its evolution

    • Can create rules that track the evolution of the rule-based system

    • Can create rules that govern migration to new rule sets


Assessment rules l.jpg
Assessment Rules

  • Can build a system that monitors its own state information

    • Parse audit trails to verify accesses by authorized persons

    • Parse persistent state information for compliance with management rules

    • Test micro-services for compliance with rules

    • Audit all accesses to a collection

    • Compare system properties to desired outcomes


For more information l.jpg
For More Information

Reagan W. Moore

San Diego Supercomputer Center

[email protected]

SRB: http://www.sdsc.edu/srb/

iRODS: http://irods.sdsc.edu/


ad