rule based data management systems
Download
Skip this Video
Download Presentation
Rule-Based Data Management Systems

Loading in 2 Seconds...

play fullscreen
1 / 23

Rule-Based Data Management Systems - PowerPoint PPT Presentation


  • 256 Views
  • Uploaded on

Rule-Based Data Management Systems. Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, sekar}@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/. Topics. Managing distributed shared collections Data grids Control of name spaces - SRB Production system

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Rule-Based Data Management Systems' - MartaAdara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
rule based data management systems

Rule-Based Data Management Systems

Reagan W. Moore

Wayne Schroeder

Mike Wan

Arcot Rajasekar

{moore, schroede, mwan, sekar}@sdsc.edu

http://www.sdsc.edu/srb

http://irods.sdsc.edu/

topics
Topics
  • Managing distributed shared collections
    • Data grids
  • Control of name spaces - SRB
    • Production system
    • Data and trust virtualization
    • Infrastructure independence
  • Control of management policies - iRODS
    • Next generation technology
    • Management virtualization
    • Rules controlling remote operations
    • Constraints on the rules and remote operations
data management applications
Data Management Applications
  • Data grids
    • Share data
  • Digital libraries
    • Publish data
  • Persistent archives
    • Preserve data
  • Real-time sensor streams
    • Data federation
  • Data analysis
    • Automate access to distributed data
concepts
Concepts
  • Distributed Data Management Concepts
    • Data virtualization
      • Manage the properties of a shared collection independently of the storage systems
    • Trust virtualization
      • Administrative domain independence
    • Federation
      • Managing interactions between data grids
  • Rule-based Data Management
    • Policy virtualization
      • Automating execution of management policies
      • Applying management policies to remote operations
using a data grid in abstract
Data delivered

Ask for data

  • The data is found and returned
    • Where & how details are hidden
Using a Data Grid – in Abstract

Data Grid

  • User asks for data from the data grid
using a data grid details
DB

Storage Resource Broker Server

Metadata Catalog

Storage Resource Broker Server

Using a Data Grid - Details
  • User asks for data
  • Data request goes to SRB Server
  • Server looks up information in catalog
  • Catalog tells which SRB server has data
  • 1st server asks 2nd for data
  • The data is found and returned
data virtualization
Data Virtualization
  • Manage properties of each digital entity independently of the remote storage systems
    • Infrastructure independence
  • Properties of the shared collection
    • Name spaces
    • Persistent state information (location, size,…)
  • Manage standard operations
    • Map from client requests to standard operations
    • Map from standard operations to remote storage system protocol
data virtualization8
Data Virtualization

Data Access Methods (C library, Unix, Web Browser)

Data Collection

  • Storage Repository
  • Storage location
  • User name
  • File name
  • File context (creation date,…)
  • Access controls
  • Data Grid
  • Logical resource name space
  • Logical user name space
  • Logical file name space
  • Logical context (metadata)
  • Access constraints

Data is organized as a shared collection

data virtualization9
Data Virtualization

Access Interface

Map from the

actions requested by

the access method

to a standard set of

micro-services used

to interact with the

storage system

Standard Access Actions

Data Grid

Standard Micro-services

Storage Protocol

Storage System

standard operations
Standard Operations
  • File manipulation
    • Posix I/O calls - open, close, read, write, seek, …
    • Register, replicate, checksum, synchronize
  • Bulk operations
    • Bulk data transport, metadata load
    • Parallel I/O streams
  • Remote procedures
    • Data filtering, subsetting, metadata extraction
    • Remote library execution (HDFv5, DataCutter)
babar high energy physics
BaBar High-Energy Physics
  • Stanford Linear Accelerator
  • IN2P3
  • Lyon, France
  • Rome, Italy
  • San Diego
  • RAL, UK
  • A functioning international Data Grid for high-energy physics

Manchester-SDSC mirror

Moved over 300 TBs of data

Increasing to 5 TBs per day

next generation technology
Next Generation Technology
  • Every fault that occurs in the distributed environment is the responsibility of the data grid
    • Network outage / system crash / operator error
    • Minimize risk through checksums, replicas, synchronization, federation
  • Management of large collections is labor intensive
    • Initiation of recovery operations after remote system failure
  • Need to automate execution of management policies
controlling remote operations
Controlling Remote Operations

iRODS - integrated Rule-Oriented Data System

Support unique organizational / social

management policies for each collection

rule based data management
Rule-based Data Management
  • Express assessment criteria through sets of required persistent state information
  • Express management policies as sets of rules controlling the execution of micro-services
  • Express capabilities as sets of micro-services
    • Manage persistent state information resulting from the application of rules controlling execution of remote micro-services
management virtualization
Management Virtualization
  • Examples of management policies
    • Integrity
      • Validation of checksums
      • Synchronization of replicas
      • Data distribution
      • Data retention
      • Access controls
    • Authenticity
      • Chain of custody - audit trails
      • Track required preservation metadata - templates
      • Generation of Archival Information Packages
rule based data management16
Rule-based Data Management
  • Rules required for standard operations
    • Posix I/O control
    • Standard SRB operations
  • Administrator controlled rules to implement management policies
    • Administrative - adding / deleting users, resources
    • Data ingestion - pre-processing, post-processing
    • Data transport / deletion - parallel I/O streams, disposition
  • User-defined rules, create your own server-side workflow
    • Rule set for a particular collection, particular user group, particular storage system, particular micro-service
irods rule
iRODS Rule
  • Each rule defines
    • Event
    • Condition
    • Action sets (micro-services and rules)
    • Recovery sets
  • Rule types
    • Atomic, applied immediately
    • Deferred, support deferred consistent constraints
    • Periodic, typically used to validate assertions
rule based access
Rule-based Access
  • Associate security policies with each digital entity
    • Redaction, access controls on structures within a file
    • Time-dependent access controls (how long to hold data proprietary)
  • Associate access controls with each rule
    • Restrict ability to modify, apply rules
  • Associate access controls with each micro-service
    • Explicit control of operation execution within a given collection
    • Much finer control than provided by Unix r:w:e
federation between data grids
Federation Between Data Grids

Data Access Methods (Web Browser, DSpace, OAI-PMH)

Data Collection A

Data Collection B

  • Data Grid
  • Logical resource name space
  • Logical user name space
  • Logical file name space
  • Logical rule name space
  • Logical micro-service name
  • Logical persistent state
  • Data Grid
  • Logical resource name space
  • Logical user name space
  • Logical file name space
  • Logical rule name space
  • Logical micro-service name
  • Logical persistent state
rule based federation
Rule-based Federation
  • When registering a digital entity into another data grid, register required management rules along with the digital entity
    • Move management policies with data
  • Expectation that each operation on each digital entity can be controlled across federated data grids
    • Example is end-to-end encryption
evolution of rule based systems
Evolution of Rule-based Systems
  • Logical name spaces enable dynamic addition of new rules, micro-services, and state information
    • Apply new rules on one collection while applying old rule sets on a legacy collection
    • Can run old and new rule sets in parallel
  • Can build a system that manages its evolution
    • Can create rules that track the evolution of the rule-based system
    • Can create rules that govern migration to new rule sets
assessment rules
Assessment Rules
  • Can build a system that monitors its own state information
    • Parse audit trails to verify accesses by authorized persons
    • Parse persistent state information for compliance with management rules
    • Test micro-services for compliance with rules
    • Audit all accesses to a collection
    • Compare system properties to desired outcomes
for more information
For More Information

Reagan W. Moore

San Diego Supercomputer Center

[email protected]

SRB: http://www.sdsc.edu/srb/

iRODS: http://irods.sdsc.edu/

ad