1 / 14

Oracle Solutions for Data Archiving

Oracle Solutions for Data Archiving. Luca Canali, CERN/IT-DM Distributed Database Operations Workshop November 11 th , 2008. Data life cycle management - WHAT. Use cases ultimately come from the experiments Typically require removal of large chunks of data on a time-based selection

mmandi
Download Presentation

Oracle Solutions for Data Archiving

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oracle Solutions for Data Archiving Luca Canali, CERN/IT-DM Distributed Database Operations Workshop November 11th, 2008

  2. Data life cycle management - WHAT • Use cases ultimately come from the experiments • Typically require removal of large chunks of data on a time-based selection • Related use case: put read only part of data

  3. Data life cycle management - WHY • Manageability • A database should not become a ‘data dump’ • Very large DBs are hard to manage • Performance • Some application-critical SQL statements may not scale well (full scans, hash joins, etc) • More and more attention to SQL tuning is needed with very large tables • Cost • When the DB grows extra HW resources are needed to maintain the service levels • ex: extra spindles to provide IOPS to read stale data

  4. Why it’s hard • There is no default Oracle mechanism to archive parts of applications • It must be custom implemented • Rules must be set in place by the ‘data owners’ on how to move and restore • What can be of help • Oracle features for bulk data movement • Typically require DBA privileges • Additional HW to hold ‘offline data’

  5. Oracle and large data movements • Normal DML is too slow • delete operations in particular • Alternatives • DDL such as partition exchange/movement • Transportable tablespace • CTAS (create table a select) • Datapump

  6. Oracle and large data sets • Data (tables) need to have the possibility to allow extracting ‘chunks’ in a time-based manner • Oracle partitioning by time is a possibility • Beware of many limitations of partitioning (it’s not a magic wand) • Do-it-yourself partitioning by generating multiple tables/schema every year (for example)

  7. Additional challenge • Lookup tables/metadata is needed too • Besides the main table(s) to archive • Can be a large number of tables • Not easy to identify a ‘consistent set’ to archive

  8. Technology • Main requirements • Must be able to restore the data if and when need (over a time scale of 10 years, i.e. 3-4 Oracle versions) • Must keep track of what is where • Should use one of the few Oracle technologies for moving large chunks of data • Possible solutions • Move data to a specialized Oracle database to collect ‘application chunks’ • Export to files (flat files or datapump) is not considered reliable given current experience

  9. Possible Scenarios • Scenario 1 • Application owners identify consistent sets of data to be moved • Identified data sets are moved by the DBAs • Data is then dropped from production. • requires considerable effort from the developers • gives the cleaner result

  10. Possible Scenarios • Scenario 2 • A clone (complete copy) of the selected applications is taken (for example at the end of each year) and archived according to the chosen archive technology • Application owners then take care of deleting from production trusting that data is archived and kept. • easy to implement • more data than strictly needed is copied

  11. Example of data movements

  12. Example of data purging after archive

  13. Conclusions • LHC databases are growing fast • Offline data for significant performance and cost gains • Mechanisms for offlining are application dependent • DB services and ideas to provide tools to move and restore data • A large Oracle ‘Archive DB’ could be part of the solution

  14. Q&A Thank you

More Related