Future of database systems 2 xml databases and grid based digital libraries
Download
1 / 62

Future of Database Systems 2: XML Databases and Grid-based Digital Libraries - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Future of Database Systems 2: XML Databases and Grid-based Digital Libraries. University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management. Lecture Outline. Review Future of Database Systems XML and DBMS Grid-Based Digital Libraries

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Future of Database Systems 2: XML Databases and Grid-based Digital Libraries' - moral


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Future of database systems 2 xml databases and grid based digital libraries

Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

University of California, Berkeley

School of Information Management and Systems

SIMS 257: Database Management


Lecture outline
Lecture Outline Digital Libraries

  • Review

    • Future of Database Systems

  • XML and DBMS

  • Grid-Based Digital Libraries

    • Data Grids

    • Grid-based IR

  • DBMS and usability


Lecture outline1
Lecture Outline Digital Libraries

  • Review

    • Future of Database Systems

  • XML and DBMS

  • Grid-Based Digital Libraries

    • Data Grids

    • Grid-based IR

  • DBMS and usability







Accomplishments of dbms research
Accomplishments of DBMS Research and in 1996 catastrophically collapse.

  • DBMS are now used in almost every computing environment to create, organize and maintain large collections of information, and this is largely due to the results of the DBMS research community’s efforts, in particular:

    • Relational DBMS

    • Transaction management

    • Distributed DBMS


Next generation database systems
Next Generation Database Systems and in 1996 catastrophically collapse.

  • Where are we going from here?

    • Hardware is getting faster and cheaper

    • DBMS technology continues to improve and change

      • OODBMS

      • ORDBMS

    • Bigger challenges for DBMS technology

      • Medicine, design, manufacturing, digital libraries, sciences, environment, planning, etc...


Examples
Examples and in 1996 catastrophically collapse.

  • NASA EOSDIS

    • Estimated 1016 Bytes (Exabyte)

  • Computer-Aided design

  • The Human Genome

  • Department Store tracking

    • Mining non-transactional data (e.g. Scientific data, text data?)

  • Insurance Company

    • Multimedia DBMS support


New features
New Features and in 1996 catastrophically collapse.

  • New Data types

  • Rule Processing

  • New concepts and data models

  • Problems of Scale

  • Parallelism/Grid-based DB

  • Tertiary Storage vs Very Large-Scale Disk Storage

  • Heterogeneous Databases

  • Memory Only DBMS


Coming to a database near you
Coming to a Database Near You… and in 1996 catastrophically collapse.

  • Browsibility

  • User-defined access methods

  • Security

  • Steering Long processes

  • Federated Databases

  • IR capabilities

  • XML

  • The Semantic Web(?)


Some things to consider
Some things to consider and in 1996 catastrophically collapse.

  • Bandwidth will keep increasing and getting cheaper (and go wireless)

  • Processing power will keep increasing

    • Moore’s law: Number of circuits on the most advanced semiconductors doubling every 18 months

  • Memory and Storage will keep getting cheaper (and probably smaller)

    • “Storage law”: Worldwide digital data storage capacity has doubled every 9 months for the past decade

  • Put it all together and what do you have?

    • “The ideal database machine would have a single infinitely fast processor with infinite memory with infinite bandwidth – and it would be infinitely cheap (free)” : David DeWitt and Jim Gray, 1992


Lecture outline2
Lecture Outline and in 1996 catastrophically collapse.

  • Review

    • Future of Database Systems

  • XML and DBMS

  • Grid-Based Digital Libraries

    • Data Grids

    • Grid-based IR

  • DBMS and usability


Standards xml sql
Standards: XML/SQL and in 1996 catastrophically collapse.

  • As part of SQL3 an extension providing a mapping from XML to DBMS is being created called XML/SQL

  • The (draft) standard is very complex, but the ideas are actually pretty simple

  • Suppose we have a table called EMPLOYEE that has columns EMPNO, FIRSTNAME, LASTNAME, BIRTHDATE, SALARY


Standards xml sql1
Standards: XML/SQL and in 1996 catastrophically collapse.

  • That table can be mapped to: <EMPLOYEE> <row><EMPNO>000020</EMPNO> <FIRSTNAME>John</FIRSTNAME> <LASTNAME>Smith</LASTNAME> <BIRTHDATE>1955-08-21</BIRTHDATE> <SALARY>52300.00</SALARY> </row>

    <row> … etc. …


Standards xml sql2
Standards: XML/SQL and in 1996 catastrophically collapse.

  • In addition the standard says that XMLSchemas must be generated for each table, and also allows relations to be managed by nesting records from tables in the XML.

  • Don’t know whether this has actually been implemented by anyone

    • There is actually something very similar in the Cheshire II interface to RDBMS


Lecture outline3
Lecture Outline and in 1996 catastrophically collapse.

  • Review

    • Future of Database Systems

  • XML and DBMS

  • Grid-Based Digital Libraries

    • Data Grids

    • Grid-based IR

  • DBMS and usability


Grid based digital libraries
Grid-based Digital Libraries and in 1996 catastrophically collapse.

  • So what’s this Grid thing anyhow?

  • Data Grids and Distributed Storage

  • Grid-Based IR

  • Grid-Based Digital Libraries

    This lecture borrows heavily from presentations by Ian Foster (Argonne National Laboratory & University of Chicago), Reagan Moore and others from San Diego Supercomputer Center


The grid on demand access to electricity
The Grid: On-Demand Access to Electricity and in 1996 catastrophically collapse.

Quality, economies of scale

Time

Source: Ian Foster


By analogy a computing grid
By Analogy, A Computing Grid and in 1996 catastrophically collapse.

  • Decouples production and consumption

    • Enable on-demand access

    • Achieve economies of scale

    • Enhance consumer flexibility

    • Enable new devices

  • On a variety of scales

    • Department

    • Campus

    • Enterprise

    • Internet

Source: Ian Foster


Not exactly a new idea
Not Exactly a New Idea … and in 1996 catastrophically collapse.

  • “The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.”

    • Fernando Corbato and Robert Fano , 1966

  • “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967

Source: Ian Foster


But things are different now
But, Things are Different Now and in 1996 catastrophically collapse.

  • Networks are far faster (and cheaper)

    • Faster than computer backplanes

  • “Computing” is very different than pre-Net

    • Our “computers” have already disintegrated

    • E-commerce increases size of demand peaks

    • Entirely new applications & social structures

  • We’ve learned a few things about software

Source: Ian Foster


Computing isn t really like electricity
Computing isn’t Really Like Electricity and in 1996 catastrophically collapse.

  • I import electricity but must export data

  • “Computing” is not interchangeable but highly heterogeneous: data, sensors, services, …

  • This complicates things; but also means that the sum can be greater than the parts

    • Real opportunity: Construct new capabilities dynamically from distributed services

  • Raises three fundamental questions

    • Can I really achieve economies of scale?

    • Can I achieve QoS across distributed services?

    • Can I identify apps that exploit synergies?

Source: Ian Foster


Why the grid 1 revolution in science
Why the Grid? and in 1996 catastrophically collapse.(1) Revolution in Science

  • Pre-Internet

    • Theorize &/or experiment, aloneor in small teams; publish paper

  • Post-Internet

    • Construct and mine large databases of observational or simulation data

    • Develop simulations & analyses

    • Access specialized devices remotely

    • Exchange information within distributed multidisciplinary teams

Source: Ian Foster


Why the grid 2 revolution in business
Why the Grid? and in 1996 catastrophically collapse.(2) Revolution in Business

  • Pre-Internet

    • Central data processing facility

  • Post-Internet

    • Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B)

    • Business processes increasingly computing- & data-rich

    • Outsourcing becomes feasible => service providers of various sorts

Source: Ian Foster


New opportunities demand new technology

and in 1996 catastrophically collapse.Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”

New OpportunitiesDemand New Technology

Source: Ian Foster


Building an open grid
Building an Open Grid and in 1996 catastrophically collapse.


Building an open grid1
Building an Open Grid and in 1996 catastrophically collapse.

Open

Standards


Building an open grid2
Building an Open Grid and in 1996 catastrophically collapse.

Open

Standards

Open

Source


Building an open grid3
Building an Open Grid and in 1996 catastrophically collapse.

Open

Standards

Open

Source

Open

Infrastructure


Building an open grid4
Building an Open Grid and in 1996 catastrophically collapse.

Open

Standards

Open

Grid

Open

Source

Open

Infrastructure


Building an open grid5
Building an Open Grid and in 1996 catastrophically collapse.

Open

Standards

Open

Grid

Open

Source

Open

Infrastructure


Grids and open standards
Grids and Open Standards and in 1996 catastrophically collapse.

Open Grid

Services Arch

Web services

GGF: OGSI, …

(+ OASIS, W3C)

Multiple implementations,

including Globus Toolkit

X.509,

LDAP,

FTP, …

Globus Toolkit

Defacto standards

GGF: GridFTP, GSI

App-specific

Services

Increased functionality,

standardization

Custom

solutions

Time


Open grid services architecture
Open Grid Services and in 1996 catastrophically collapse.Architecture

  • Service-oriented architecture

    • Key to virtualization, discovery, composition, local-remote transparency

  • Leverage industry standards

    • Internet, Web services

  • Distributed service management

    • A “component model for Web services”

  • A framework for the definition of composable, interoperable services

“The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002


Realizing a service oriented architecture how do i
Realizing a Service-Oriented Architecture: How Do I and in 1996 catastrophically collapse.

  • Create, name, manage, discover services?

  • Render resources, data, sensors as services?

  • Negotiate service level agreements?

  • Express & negotiate policy?

  • Organize & manage service collections?

  • Establish identity, negotiate authentication?

  • Manage VO membership & communication?

  • Compose services efficiently?

  • Achieve interoperability?


Web services
Web Services and in 1996 catastrophically collapse.

  • XML-based distributed computing technology

  • Web service = a server process that exposes typed ports to the network

  • Described by the Web Services Definition Language, an XML document that contains

    • Type of message(s) the service understands & types of responses & exceptions it returns

    • “Methods” bound together as “port types”

    • Port types bound to protocols as “ports”

  • A WSDL document completely defines a service and how to access it


Open grid services infrastructure
Open Grid Services Infrastructure and in 1996 catastrophically collapse.

Client

  • Introspection:

  • What port types?

  • What policy?

  • What state?

GridService

(required)

Other standard interfaces:

factory,

notification,

collections

Grid Service

Handle

Service

data

element

Service

data

element

Service

data

element

handle

resolution

Grid Service

Reference

  • Lifetime management

  • Explicit destruction

  • Soft-state lifetime

Data

access

Implementation

Hosting environment/runtime

(“C”, J2EE, .NET, …)


The grid as enabler of 21st century science
The Grid and in 1996 catastrophically collapse.as Enabler of 21st Century Science

  • Entirely new approaches to enquiry based on

    • Deep analysis of huge quantities of data

    • Interdisciplinary collaboration

    • Large-scale simulation

    • Smart instrumentation

  • Enabled by an infrastructure that enables access to, and integration of, resources & services without regard for location


Grid infrastructure
Grid Infrastructure and in 1996 catastrophically collapse.

  • Broadly deployed services in support of fundamental collaborative activities

    • Formation & operation of virtual organizations

    • Authentication, authorization, discovery, …

  • Services, software, and policies enabling on-demand access to critical resources

    • Computers, databases, networks, storage, software services,…

  • Operational support for 24x7 availability

  • Integration with campus and commercial infrastructures


The foundations are being laid
The Foundations are Being Laid and in 1996 catastrophically collapse.

Edinburgh

Glasgow

DL

Newcastle

Belfast

Manchester

Cambridge

Oxford

Hinxton

RAL

Cardiff

London

Soton

Tier0/1 facility

Tier2 facility

Tier3 facility

10 Gbps link

2.5 Gbps link

622 Mbps link

Other link


Data grid problem
Data Grid Problem and in 1996 catastrophically collapse.

  • “Enable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of data”

  • Note that this problem:

    • Is common to many areas of science

    • Overlaps strongly with other Grid problems


Data grids for high energy physics
Data Grids for and in 1996 catastrophically collapse.High Energy Physics

~PBytes/sec

~100 MBytes/sec

Offline Processor Farm

~20 TIPS

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

~100 MBytes/sec

Online System

Tier 0

CERN Computer Centre

~622 Mbits/sec or Air Freight (deprecated)

Tier 1

FermiLab ~4 TIPS

France Regional Centre

Germany Regional Centre

Italy Regional Centre

~622 Mbits/sec

Tier 2

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

HPSS

HPSS

HPSS

HPSS

HPSS

~622 Mbits/sec

Institute ~0.25TIPS

Institute

Institute

Institute

Physics data cache

~1 MBytes/sec

1 TIPS is approximately 25,000

SpecInt95 equivalents

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Pentium II 300 MHz

Pentium II 300 MHz

Pentium II 300 MHz

Pentium II 300 MHz

Tier 4

Physicist workstations

Image courtesy Harvey Newman, Caltech


Data intensive issues include
Data Intensive Issues Include … and in 1996 catastrophically collapse.

  • Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains

  • Respect local and global policies governing what can be used for what

  • Schedule resources efficiently, again subject to local and global constraints

  • Achieve high performance, with respect to both speed and reliability

  • Catalog software and virtual data


Data intensive computing and grids
Data Intensive Computing and Grids and in 1996 catastrophically collapse.

  • The term “Data Grid” is often used

    • Implies a distinct infrastructure, which it isn’t; but easy to say

  • Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, …

    • Security, resource mgt, info services, etc.

  • Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained

  • Fortunately this seems easy to do!


Examples of desired data grid functionality
Examples of and in 1996 catastrophically collapse.Desired Data Grid Functionality

  • High-speed, reliable access to remote data

  • Automated discovery of “best” copy of data

  • Manage replication to improve performance

  • Co-schedule compute, storage, network

  • “Transparency” wrt delivered performance

  • Enforce access control on data

  • Allow representation of “global” resource allocation policies


A model architecture for data grids
A Model Architecture for Data Grids and in 1996 catastrophically collapse.

Attribute Specification

Replica Catalog

Metadata Catalog

Application

Multiple Locations

Logical Collection and Logical File Name

MDS

Selected

Replica

Replica

Selection

Performance

Information &

Predictions

NWS

GridFTP Control Channel

Disk Cache

GridFTPDataChannel

TapeLibrary

Disk Array

Disk Cache

Replica Location 1

Replica Location 2

Replica Location 3

Source: Arcot Rajasekar (SDSC)


Data grid requirements
Data Grid Requirements and in 1996 catastrophically collapse.

  • Seamless access to data and information stored at local and remote sites

  • Virtualization of data, collection and meta information

  • Handle Dataset Scaling – size & number

  • Integrate Data Collections & Associated Metadata

  • Handle Multiplicity of Platforms, Resource & Data Types

  • Handle Seamless Authentication

  • Handle Access Control

  • Provide Auditing Facilities

  • Handle Legacy Data & Methods

Source: Arcot Rajasekar (SDSC)


Srb as a solution
SRB as a Solution and in 1996 catastrophically collapse.

Distributed Storage Resources

(database systems, archival storage systems, file systems, ftp, http, …)

  • The Storage Resource Broker is a middleware

  • It virtualizes resource access

  • It mediates access to distributed heterogeneous resources

  • It uses a MetaCATalog to facilitate the brokering

  • It integrates data and metadata

MCAT

Application

SRB Server

HRM DB2, Oracle, Illustra, ObjectStore

HPSS, ADSM, UniTree

UNIX, NTFS, HTTP, FTP

Source: Arcot Rajasekar (SDSC)


Sdsc storage resource broker meta data catalog
SDSC Storage Resource Broker and in 1996 catastrophically collapse.& Meta-data Catalog

Application

Resource,

User

Java, NT

Browsers

Prolog

Python

C, C++,

Linux I/O

Unix

Shell

Third-party

copy

Web

User

Defined

SRB

Remote

Proxies

MCAT

Databases

DB2, Oracle,

Sybase

Archives

HPSS, ADSM,

UniTree, DMF

File Systems

Unix, NT,

Mac OSX

HRM

Dublin

Core

DataCutter

Application

Meta-data

Source: Arcot Rajasekar (SDSC)


Srb single signon
SRB Single SignOn and in 1996 catastrophically collapse.

Authentication

Secure Password, GSI or SEA

Application

Session Established

1

(Host,port)

Identification & Initialization

SRB

Master

(port)

2

4

Server spawned

3

MCAT

SRB agents

3

CA

Source: Arcot Rajasekar (SDSC)


Federated srb operation
Federated SRB Operation and in 1996 catastrophically collapse.

Peer-to-peer Brokering

Read Application

Parallel Data Access

Logical Name

Or

Attribute Condition

1

6

5/6

SRB

server

SRB

server

3

4

5

SRB agent

SRB agent

2

Server(s) Spawning

R1

MCAT

1.Logical-to-Physical mapping

2. Identification of Replicas

3.Access & Audit Control

R2

Data Access

Source: Arcot Rajasekar (SDSC)


Srb concepts
SRB Concepts and in 1996 catastrophically collapse.

  • Abstraction of User Space

    • Single sign-on

    • Multiple authentication schemes

      • certificates, (secure) passwords, tickets, group permissions, roles

  • Virtualization of Resources

    • Resource Location, Type & Access transparency

    • Logical Resource Definitions - bundling

  • Abstraction of Data and Collections

    • Virtual Collections: Persistent Identifier and Global Name Space

    • Replication & Segmentation

  • Data Discovery – system & application metadata

    • User-defined Metadata – Structural & Descriptive

    • Attribute-based Access (path names become irrelevant)

  • Uniform Access Methods

    • APIs, Command Line, GUI Browsers, Web-Access (Portal,WSDL, CGI)

    • Parallel Access with both Client and Server-driven strategies

Source: Arcot Rajasekar (SDSC)


Oceanstore everyone s data one big utility
OceanStore: and in 1996 catastrophically collapse.Everyone’s data, One big Utility

OStore

“The data is just out there”

  • Separate information from location

    • Locality is an only an optimization (an important one!)

    • Wide-scale coding and replication for durability

  • All information is globally identified

    • Unique identifiers are hashes over names & keys

    • Single uniform lookup interface replaces: DNS, server location, data location

    • No centralized namespace required (such as SDSI)

Source: John Kubiatowicz (UCB)


Basic structure irregular mesh of pools
Basic Structure: and in 1996 catastrophically collapse.Irregular Mesh of “Pools”

OStore

Source: John Kubiatowicz (UCB)


Amusing back of the envelope calculation
Amusing back of the envelope calculation and in 1996 catastrophically collapse.

OStore

  • How many files in the OceanStore?

    • Assume 1010 people in world

    • Say 10,000 files/person (very conservative?)

    • So 1014 files in OceanStore!

    • If 1 gig files (not likely), get 1 mole of files!

  • Truly impressive number of elements…

  • … but small relative to physical constants

    • (courtesy Bill Bolotsky, Microsoft)

Source: John Kubiatowicz (UCB)


Utility based infrastructure
Utility-based Infrastructure and in 1996 catastrophically collapse.

OStore

Canadian

OceanStore

  • Service provided by confederation of companies

    • Monthly fee paid to one service provider

    • Companies buy and sell capacity from each other

Sprint

AT&T

IBM

Pac

Bell

IBM

Source: John Kubiatowicz (UCB)


Lecture outline4
Lecture Outline and in 1996 catastrophically collapse.

  • Review

    • Future of Database Systems

  • Grid-Based Digital Libraries

    • Data Grids

    • Grid-based IR

  • DBMS and usability


Dbms and usability
DBMS and Usability and in 1996 catastrophically collapse.

  • What features would you like to see in DBMS?


Dbms and usability1
DBMS and Usability and in 1996 catastrophically collapse.

  • What do you hate about Database Management Systems?

    • From your experiences

    • In general

  • What do you like about Database Management Systems?

    • From your experience

    • In general


Next week
Next Week and in 1996 catastrophically collapse.

  • Workshops to help you develop the final reports and presentations.


ad