140 likes | 146 Views
Oracle Solutions for Data Archiving. Luca Canali, CERN/IT-DM Distributed Database Operations Workshop November 11 th , 2008. Data life cycle management - WHAT. Use cases ultimately come from the experiments Typically require removal of large chunks of data on a time-based selection
E N D
Oracle Solutions for Data Archiving Luca Canali, CERN/IT-DM Distributed Database Operations Workshop November 11th, 2008
Data life cycle management - WHAT • Use cases ultimately come from the experiments • Typically require removal of large chunks of data on a time-based selection • Related use case: put read only part of data
Data life cycle management - WHY • Manageability • A database should not become a ‘data dump’ • Very large DBs are hard to manage • Performance • Some application-critical SQL statements may not scale well (full scans, hash joins, etc) • More and more attention to SQL tuning is needed with very large tables • Cost • When the DB grows extra HW resources are needed to maintain the service levels • ex: extra spindles to provide IOPS to read stale data
Why it’s hard • There is no default Oracle mechanism to archive parts of applications • It must be custom implemented • Rules must be set in place by the ‘data owners’ on how to move and restore • What can be of help • Oracle features for bulk data movement • Typically require DBA privileges • Additional HW to hold ‘offline data’
Oracle and large data movements • Normal DML is too slow • delete operations in particular • Alternatives • DDL such as partition exchange/movement • Transportable tablespace • CTAS (create table a select) • Datapump
Oracle and large data sets • Data (tables) need to have the possibility to allow extracting ‘chunks’ in a time-based manner • Oracle partitioning by time is a possibility • Beware of many limitations of partitioning (it’s not a magic wand) • Do-it-yourself partitioning by generating multiple tables/schema every year (for example)
Additional challenge • Lookup tables/metadata is needed too • Besides the main table(s) to archive • Can be a large number of tables • Not easy to identify a ‘consistent set’ to archive
Technology • Main requirements • Must be able to restore the data if and when need (over a time scale of 10 years, i.e. 3-4 Oracle versions) • Must keep track of what is where • Should use one of the few Oracle technologies for moving large chunks of data • Possible solutions • Move data to a specialized Oracle database to collect ‘application chunks’ • Export to files (flat files or datapump) is not considered reliable given current experience
Possible Scenarios • Scenario 1 • Application owners identify consistent sets of data to be moved • Identified data sets are moved by the DBAs • Data is then dropped from production. • requires considerable effort from the developers • gives the cleaner result
Possible Scenarios • Scenario 2 • A clone (complete copy) of the selected applications is taken (for example at the end of each year) and archived according to the chosen archive technology • Application owners then take care of deleting from production trusting that data is archived and kept. • easy to implement • more data than strictly needed is copied
Conclusions • LHC databases are growing fast • Offline data for significant performance and cost gains • Mechanisms for offlining are application dependent • DB services and ideas to provide tools to move and restore data • A large Oracle ‘Archive DB’ could be part of the solution
Q&A Thank you