40 likes | 67 Views
Explore the complexities of managing petabyte tables with billions of rows, complex data structures, and diverse query workloads. Learn about smart partitioning, query execution challenges, federated database solutions, and backup strategies to optimize VLDB performance and reliability.
E N D
V LDB 2 Boris Gelman Vice President Architecture Information Services VISA bgelman@visa.com
2 V LDB: The Concept 2 • V LDB = Very Very Large Database: • New concept or change to VLDB concept ? • Data Structure: • Petabyte tables with 100s billions of rows • Complex table structures • Non-uniform physical data representation of petabyte tables • Query: • Well-defined subsets (index and/or partition) on tables: small (~10,000) -> medium (~300,000) -> large (~1,000,000) • Undefined subsets: very large (~1,000,000,000) -> very very large (~100,000,000,000) • Complex joins • Complex group by’s and sorts • Workload: • Multiple categories of queries running concurrently (transaction research, analytics, data mining) • Inserts and selects concurrently against the same tables • 24 * 7 operation with very limited maintenance windows • SLAs are very strict
2 V LDB: Problems • Data Partitioning: • Smart partitioning: hash, expression, … -> hybrid multi-level partitioning • Smart partition manipulation: detach / attach partition online • Query Execution: • Hash join on petabyte tables ? • Performance Tuning does not work: • Adaptive and buffer-pool aware query optimization ? • System-category aware query optimization ? • Optimizer efficiency ? • Backup/Restore does not work: • Data replication is not a substitute for backup: data corruption, application errors, human errors • Smart backup/restore related to smart data partitioning !
2 V LDB: Problems • Database Federation: • Single database system cannot hold a combination of ODS (> 1 PB) and cross-functional multi-subject DW (> 200 TB) - it is impractical • Data Abstraction Layer: federated tables partitioned across multiple database systems! • Federated Database is easier to maintain and backup, and availability is higher! • Federated Database Performance = Single Database System Performance !!!