1 / 23

New Generation Database Systems: XML Databases and Grid-based Digital Libraries

New Generation Database Systems: XML Databases and Grid-based Digital Libraries. University of California, Berkeley School of Information IS 257: Database Management. Lecture Outline. XML and DBMS The Grid and DBMS The Grid Data Grids Grid-based DBMS. Lecture Outline. XML and DBMS

genero
Download Presentation

New Generation Database Systems: XML Databases and Grid-based Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Generation Database Systems: XML Databases and Grid-based Digital Libraries University of California, Berkeley School of Information IS 257: Database Management

  2. Lecture Outline • XML and DBMS • The Grid and DBMS • The Grid • Data Grids • Grid-based DBMS

  3. Lecture Outline • XML and DBMS • The Grid and DBMS • The Grid • Data Grids • Grid-based DBMS

  4. Standards: XML/SQL • As part of SQL3 an extension providing a mapping from XML to DBMS is being created called XML/SQL • The (draft) standard is very complex, but the ideas are actually pretty simple • Suppose we have a table called EMPLOYEE that has columns EMPNO, FIRSTNAME, LASTNAME, BIRTHDATE, SALARY

  5. Standards: XML/SQL • That table can be mapped to: <EMPLOYEE> <row><EMPNO>000020</EMPNO> <FIRSTNAME>John</FIRSTNAME> <LASTNAME>Smith</LASTNAME> <BIRTHDATE>1955-08-21</BIRTHDATE> <SALARY>52300.00</SALARY> </row> <row> … etc. …

  6. Standards: XML/SQL • In addition the standard says that XMLSchemas must be generated for each table, and also allows relations to be managed by nesting records from tables in the XML. • Variants of this are incorporated into the latest versions of ORACLE • (Slides from Oracle Web Site on ORACLE XML)

  7. Lecture Outline • XML and DBMS • The Grid and DBMS • The Grid • Data Grids • Grid-based DBMS

  8. Grid-based Digital Libraries • So what’s this Grid thing anyhow? • Data Grids and Distributed Storage • Grid-Based IR • Grid-Based Digital Libraries This lecture borrows heavily from presentations by Ian Foster (Argonne National Laboratory & University of Chicago), Reagan Moore and others from San Diego Supercomputer Center

  9. The Grid: On-Demand Access to Electricity Quality, economies of scale Time Source: Ian Foster

  10. By Analogy, A Computing Grid • Decouples production and consumption • Enable on-demand access • Achieve economies of scale • Enhance consumer flexibility • Enable new devices • On a variety of scales • Department • Campus • Enterprise • Internet Source: Ian Foster

  11. What is the Grid? “The short answer is that, whereas the Web is a service for sharing information over the Internet, the Grid is a service for sharing computer power and data storage capacity over the Internet. The Grid goes well beyond simple communication between computers, and aims ultimately to turn the global network of computers into one vast computational resource.” Source: The Global Grid Forum

  12. Not Exactly a New Idea … • “The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.” • Fernando Corbato and Robert Fano , 1966 • “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967 Source: Ian Foster

  13. But, Things are Different Now • Networks are far faster (and cheaper) • Faster than computer backplanes • “Computing” is very different than pre-Net • Our “computers” have already disintegrated • E-commerce increases size of demand peaks • Entirely new applications & social structures • We’ve learned a few things about software Source: Ian Foster

  14. Computing isn’t Really Like Electricity • I import electricity but must export data • “Computing” is not interchangeable but highly heterogeneous: data, sensors, services, … • This complicates things; but also means that the sum can be greater than the parts • Real opportunity: Construct new capabilities dynamically from distributed services • Raises three fundamental questions • Can I really achieve economies of scale? • Can I achieve QoS across distributed services? • Can I identify apps that exploit synergies? Source: Ian Foster

  15. Why the Grid?(1) Revolution in Science • Pre-Internet • Theorize &/or experiment, aloneor in small teams; publish paper • Post-Internet • Construct and mine large databases of observational or simulation data • Develop simulations & analyses • Access specialized devices remotely • Exchange information within distributed multidisciplinary teams Source: Ian Foster

  16. Why the Grid?(2) Revolution in Business • Pre-Internet • Central data processing facility • Post-Internet • Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B) • Business processes increasingly computing- & data-rich • Outsourcing becomes feasible => service providers of various sorts Source: Ian Foster

  17. The Information Grid Imagine a web of data • Machine Readable • Search, Aggregate, Transform, Report On, Mine Data – using more computers, and less humans • Scalable • Machines are cheap – can buy 50 machines with 100Gb or memory and 100 TB disk for under $100K, and dropping • Network is now faster than disk • Flexible • Move data around without breaking the apps Source: S. Banerjee, O. Alonso, M. Drake - ORACLE

  18. The Foundations are Being Laid Edinburgh Glasgow DL Newcastle Belfast Manchester Cambridge Oxford Hinxton RAL Cardiff London Soton Tier0/1 facility Tier2 facility Tier3 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link

  19. Data Grid Problem • “Enable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of data” • Note that this problem: • Is common to many areas of science • Overlaps strongly with other Grid problems

  20. Data Grids forHigh Energy Physics ~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Image courtesy Harvey Newman, Caltech

  21. Grids and Open Standards Open Grid Services Arch Web services GGF: OGSI, … (+ OASIS, W3C) Multiple implementations, including Globus Toolkit X.509, LDAP, FTP, … Globus Toolkit Defacto standards GGF: GridFTP, GSI App-specific Services Increased functionality, standardization Custom solutions Time

  22. The Gridas Enabler of 21st Century Science • Entirely new approaches to enquiry based on • Deep analysis of huge quantities of data • Interdisciplinary collaboration • Large-scale simulation • Smart instrumentation • Enabled by an infrastructure that enables access to, and integration of, resources & services without regard for location

  23. Not only Science… • The Database world is moving to the Grid for large-scale applications • Oracle 10g is specifically designed to exploit clustered/grid computing using RACs (Real Application Clusters) • An example from the Information/Publishing world… • Presentation from Oracle about Thomson Legal’s use of Oracle 10g and RACs

More Related