1 / 6

Managing Petabyte Database Workloads: Netezza NPS System Challenges

Analyzing and optimizing large data volumes, query performance, system software, hardware, and data management challenges in petabyte-scale databases. Opportunities for university researchers to contribute innovative solutions and explore new applications.

timothyi
Download Presentation

Managing Petabyte Database Workloads: Netezza NPS System Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IISWC 2007 PanelAnalyzing Petabytes Suchi Raman Netezza Corp. http://www.netezza.com/

  2. Petabyte Database Workloads • Macro-analytic queries • Identify trends and patterns • Very large data volumes • Query times dominated by disk scan times • Micro-analytic queries • Short running queries • Query run once and stored • Pre-computed summaries • Data management • ETL load/unload • Backup/restore

  3. Netezza NPS System

  4. Software challenges • Effective disk bandwidth • Optimal data layouts • Data compression • Increased effective disk bandwidth (and reliability!) • Upgrades and evolution of on-disk formats • Minimize disk reads (indexes, caches) • Query processing algorithms • Skew avoidance algorithms • Scheduling among queries, especially with mixed workloads combining large and small queries • System Monitoring/profiling • System monitoring during busy periods • Accurate profiling techniques • Data management challenges • High speed data path in/out of NPS system • Efficient/flexible data formats for load/unload • Infrastructure challenge – fast external devices for sourcing/sinking data • Custom functions (UDFs/UDAs) implemented within the system

  5. Hardware challenges • Hardware challenges • Increased effective disk bandwidth (and reliability!) • Multi-core technology • Balancing CPU-to-disk ratio • Specialized engines (e.g., FPGA-based filtering) • Faster internal and external connectivity

  6. How can University Researchers contribute? • Explore new applications and data types • E.g., network traffic analysis • Geospatial data • Biological data types • Skew avoidance/scheduling algorithms • Applications built on UDFs/UDAs • Verification methods for optimizer algorithms • Platform improvements • Disk performance and reliability • FPGA filtering algorithms • Faster interconnect networks • Power and cooling improvements

More Related