best practices in fast track parallel data warehouse and traditional data warehouse design l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design PowerPoint Presentation
Download Presentation
Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design

Loading in 2 Seconds...

play fullscreen
1 / 42

Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design - PowerPoint PPT Presentation


  • 183 Views
  • Uploaded on

SESSION CODE: BIE306. Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design. Jeff Spiller ESS Performance and Solutions Engineering COE (Center of Excellence) BI ATC (Business Intelligence - Advanced Technology Center).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design' - vail


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
best practices in fast track parallel data warehouse and traditional data warehouse design

SESSION CODE: BIE306

Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design

Jeff Spiller

ESS Performance and Solutions Engineering COE (Center of Excellence)

BI ATC (Business Intelligence - Advanced Technology Center)

Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design

BI Best Practices and Tuning for Scaling SQL Server 2008

objectives for this session
Objectives for this Session
  • Understand why Fast Track and the SQL Server 2008 Parallel Data Warehouse (PDW) providefaster I/O bandwidth over traditional data warehouse designs
  • Understand differences between scale-up and scale-out data warehouse designs
  • Review reference configurations for traditional database design, Fast Track and PDW appliances
  • Discuss best practices which will allow customers to more effectively exploit hardware to support end user BI workloads
the continuum to sql server 2008 r2 scale up4
The Continuum to SQL Server 2008 R2 (scale-up)
  • Fast Track = scale up

OLTP systems

SSIS

Fast Track

the continuum to sql server 2008 r2 scale up parallel data warehouse pdw

Mid-

Tier

FastTrack

RDBMS

The Continuum to SQL Server 2008 R2 (scale-up) Parallel Data Warehouse (PDW)
  • Fast Track = scale up
  • PDW = scale out

OLTP systems

SSIS

PDW

the continuum to sql server 2008 r2 scale up parallel data warehouse pdw6

Mid-

Tier

FastTrack

RDBMS

SSAS

The Continuum to SQL Server 2008 R2 (scale-up) Parallel Data Warehouse (PDW)
  • Fast Track = scale up
  • PDW = scale out
  • Hub-and-spoke architecture to include support for SMP spokes

OLTP systems

SSIS

PDW

Hub

Traditional MD design

PDW

Mid-

Tier

RDBMS

RDBMS

SSAS

hp has two types of rolap reference configurations
HP Has Two Types of ROLAPReference Configurations

Microsoft/HP

Fast Track reference configurations

OR

SQL Server Parallel Data Warehouse (PDW)

SQL Server/HP

Traditional DW design reference configurations

Mmm, what will my logical & physical DB design look like ?

Lower hardware costs

Different logical and physical DB design philosophies

traditional rolap data warehouse data mart physical design
Traditional ROLAP (data warehouse/data mart) Physical Design
  • Create file group
    • 1 LUN may be physically stripped
  • Create table:
    • on 1 LUN
    • partition table across multiple LUNs for additional parallelism
    • balance workload across filegroups, LUNS - which, in turn, gets stripped across many physical disks
  • PRO: Data is distributed and I/Os are parallelized across multiple physical disk drives
  • CON: Little attention is traditionally paid to how or where data is physically loaded or indexes built

RAID 5

  • It is not uncommon to have hundreds of disk drives to support the I/O throughput requirements in a traditional DW environment
slide10
How does Fast Track and PDW get it’s speed ?

X-Ray view at the physical disk level

First let’s look at a traditional DW…..

How To Gain Speed While Reducing CostsLet’s lake a closer look under the hood to understand BI performance
traditional rolap physical database design loading data into the data warehouse mart
Traditional ROLAP Physical Database Design Loading data into the data warehouse/mart
  • ETL batch jobs are executed in parallel to complete the batch load window as fast as possible
    • Data from multiple fact tables data may be loaded simultaneously from to speed up ETL batch load window
    • Some data warehouses may use a trickle feed to load data, close to real time
    • Dimension table insert/updates may occur in parallel
    • Indexes may be updated in real time as rows are inserted or updated - or -
    • Indexesmay be rebuilt after data is loaded
  • Traditional ROLAP design tends to encourage physical disk fragmentation

Sequential data

Fact table

3rd day load

Fact table

6th day load

Fact table

5th day load

Fact table

2nd day load

Data is stored wherever it happens to land

Fact table

Initial load

traditional rolap physical database design trade offs using indexes in data warehouses data marts
Traditional ROLAP physical database designTrade-offs using indexes in data warehouses/data marts
  • ETL batch windows may take a long time because building indexes can be resource intensive
  • Indexes (duplicate data) consume a large amount of disk space (4x – 10x less space)
  • DBAs spend a lot of time managing and tuning indexes
  • Indexes may reduce the number of disk I/Os to service a query. But, at the cost of slower disk service times due to extra disk head movement “directly access data that index points to”
    • Proof is in monitoring ad-hoc query response times using PDW or Fast Track “index-lite” designs vs. traditional “index-heavy” database designs
  • Customized indexes and summary tables may speed up specific queries. But, ad-hoc queries may get inconsistent response times because of slower scan rates & long disk service times)

Duplicate data

Column

Index /

Column

Pre-Calculated

data

summary why fast track pdw provide high levels of i o throughput
SummaryWhy Fast Track & PDW provide high levels of I/O throughput…

Fast Track & PDW

Eliminating indexes and storing data sequentially will provide the fastest disk throughput rates

Summary table

Fast Track & PDW

Index-lite

Fastest sequential scan rates

Index

Index-lite is faster because there is less disk head movement

Traditional DW design with indexes & summary tables

Disk throughput is slower with indexes, aggregates and summary tables

how does fast track and pdw reduce data warehouse or data mart storage costs
How Does Fast Track and PDW Reduce Data Warehouse or Data Mart Storage Costs ?
  • Efficient data loads
  • Reduce or eliminate indexes

Example: Average disk Seek time is typically about 4ms; Full stroke is about 7.5ms.

At 15K RPM = 250 revolutions/sec. = 4ms for a full revolution = Average latency is about 2ms.

Fast Track & PDW are designed to stream large blocks of data sequentially which is even faster than “average latency” because disk heads are directly over the streaming data.

why does pdw and fast track want data to b e stored sequentially

Why does PDW and Fast Track want data to be stored sequentially ?

Why Does PDW and Fast Track Want Data to be Stored Sequentially ?

Track of data

Latency

Virtually

No seek time

Disk drive manufacturers sometimes quote average seek time and max/full seek time numbers

Seek time is typically 2 - 4x longer than average latency. By eliminating seek time you can have approximately 2 – 4x fewer disk drives in order to maintain a given throughput level.

Fast Track & PDW are designed to stream large blocks of data sequentially!

what is t he most efficient way for fast track and pdw to store data
What is the Most Efficient Way for Fast Track and PDW to Store Data ?
  • Outer tracks
    • Hold more data (Good for fact tables)
    • Outside tracks can pass more data under the heads in a single rotation (good for fact tables)
    • Inner tracks better for smaller dimension tables (and they should cached anyway &/or replicated in PDW)
  • Organizing data on the physical disk drives to physically eliminate seek time can be the difference of requiring 64 disks vs. 200 disk drives to provide the similar levels of I/O throughput
bottom line hp sql server and fast track concepts provide you with speed while reducing costs
BOTTOM LINE: HP, SQL Server and Fast Track ConceptsProvide You with Speed while Reducing Costs

In addition, HP and SQL Server PDW uses Massively Parallel Processing (MPP) to expand Fast Track concepts in a BI “appliance”

Fast scan rates

Fast Track and PDW get it’s speed from FAST scan rates !

hp has two types of rolap reference configurations fast track or pdw db design
HP Has Two Types of ROLAP – Reference ConfigurationsFast Track or PDW DB Design
  • Traditional DB design reference configurations
  • HP Business Intelligence Sizer for Microsoft SQL Server 2005/2008 http://h71019.www7.hp.com/ActiveAnswers/us/en/sizers/microsoft-sql-bi.html
  • Fast Track

http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA3-0347ENW.pdf

http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA3-0314ENW.pdf

  • SQL Server 2008 Parallel Data Warehouse (PDW) – “Appliance”

Fast Track or PDW

Traditional DB design

hp fast track data warehousing continuum
HP Fast Track Data Warehousing Continuum
  • Scales from SMB to Enterprise
    • Prescriptive guidance and optimized methodology for deploying a data warehouse
      • Targeted at query workloads patterned for large sequential data reads
    • Balanced hardware approach
    • Supports up to 48TB Data Warehouse at leading price/performance metrics
    • HP provides configurations,tested performance guidanceand best practices for deploying/operating/managing

Basic6 – 12TBDL38x w/MSA2000

Mainstream12 – 24TBDL585 G6 w/MSA2000

Mainstream16 – 32 TB DL580 G5 w/MSA2000 G2

Premium24 – 48 TBDL785 G6 w/MSA2000 G2

free your it pressures get more value
Free Your IT Pressures . . . Get More Value

Without HP Factory Express

With HP Factory Express

Faster time to solution

Free up valuable IT resources

Maximize your IT investment

chip manufacturer features breakdown fast track uses high end x86 servers
Chip Manufacturer Features BreakdownFast Track uses high end x86 servers
  • Intel Features (DL580)
    • Scalable to 24 cores (4 processor)
    • Clock Speeds (up to 2.67 GHz) – on 6-core model
    • Memory expandable to 256 GB
    • I/O expansion slots (11)
    • Up to 16 internal disks
    • x64 Architecture
    • 4U
  • AMD Features (DL785)
    • Scalable to 48 cores (8 processor)
    • Clock Speeds (up to 2.8 GHz) – on 6 core model
    • Memory expandable to 512 GB
    • I/O expansion slots (11)
    • Up to 8 internal disks
    • x64 Architecture
    • 7U
sql server 2008 techniques to improve performance
SQL Server 2008 Techniques to Improve Performance
  • Database Services
    • Separate the DB/NON DB workloads
      • SSIS/AS on a separate machine / SQL Server only machine
    • Compressing Fact Tables and Indexes
      • PDW automatically uses compression
      • Fast Track recommends compression
        • Compression Reduces I/O workload but Increases CPU utilization
        • Increases Query performance due to more pages being read via each I/O
        • Compression can be enabled selectively
    • Compressing Backup
sql server 2008 techniques to improve performance26
SQL Server 2008 Techniques to Improve Performance
  • Database Services
    • Table Partitioning
      • Align table storage to a business requirement driven boundary
        • Primary key considerations
        • PDW uses hash key to balance rows across the servers and storage subsystem
      • Indexes
        • Traditional – create indexes based upon query workload
        • Fast Track & PDW use no indexes or Index-lite
sql server 2008 techniques to improve performance27
SQL Server 2008Techniques to Improve Performance
  • Database Services
    • Star Schema Optimization
      • Queries that are directed towards star schema are optimized using bitmap filters. This greatly improves query performance by ignoring non-qualifying rows while the fact table is being read
sql server 2008 techniques to improve performance28
SQL Server 2008Techniques to Improve Performance
  • Database Services
    • Bit Map Filter Must Haves
    • In Your Database Design you MUST:
      • Fact tables are expected to have at least 100 pages. The optimizer considers smaller tables to be dimension tables
      • Only inner joins between a fact table and a dimension table are considered
      • The join predicate between the fact table and dimension table must be a single column join, but does not need to be a primary-key-to-foreign-key relationship. An integer-based column is preferred
      • Joins with dimensions are only considered when the dimension input cardinalities are smaller than the input cardinality from the fact table
        • Make sure table “stats” are updated
sql server 2008 techniques to improve performance29
SQL Server 2008 Techniques to Improve Performance
  • Database Services
    • Resource Governor
        • Only within a SQL Server instance
        • Controls CPU & memory resources used by SQL Server
          • Classify incoming connections and route their workloads to a specific group classification
          • User pool/group
            • Normal, Advanced and power users
            • Set workload group priority
        • Allows for Internal/Default and User created Resource Pools
    • PDW uses the Resource Governor
sql server analysis services 2008 techniques to improve performance
SQL Server Analysis Services 2008 Techniques to Improve Performance
  • SSAS
    • SSAS has to major components
      • Formula Engine (does most of the analysis work and tries to keep cells in memory) – Fast clock speeds are best
      • Storage Engine(if cells are not in memory, the Storage Engine gets the data from disk) – Goal is to minimize Storage Engine use and keep data in memory for the Formula Engine to use
        • Faster Storage (SSD) OR more disk drives for quicker responses to Storage Engine
    • Manage your partitions in your AS Database by query performance required
      • Because Large Cubes > 100 GB may not fit in memory. So we design the partitions to get into memory as quickly as possible.
    • Best Practice – less than 4 million cells per partition
sql server analysis services 2008 techniques to improve performance32
SQL Server Analysis Services 2008Techniques to Improve Performance
  • SSAS – Settings to Remember
    • Memory\TotalMemoryLimit is 80% of physical memory on the server by default
    • Manually set memory for SSAS cubes
      • This is especially important if you run SSAS cubes on the same server as a ROLAP data warehouse/mart
    • If set too high, SSAS will page out of Memory
      • SSAS page faulting is worse than simply allocating less memory to the cubes and forcing the Storage Engine to retrieve cells fro disk
    • Things to Monitor with System Monitor
      • SSAS:Memory\Memory Usage KB
      • SSAS:Proc Aggregations\Temp
      • Context Switches/sec

Tune memory

sql server analysis services 2008 techniques to improve performance33
SQL Server Analysis Services 2008Techniques to Improve Performance
  • SSAS – Process Measurement
    • WSRM
      • Windows System Resource Manager (WSRM) is a feature of Windows Server 2003, Enterprise and Datacenter editions
      • Using WSRM, administrators can control how CPU resources are allocated to applications, services, and processes
      • WSRM is useful in a consolidated environment (SQL Server, SSAS, SSIS, etc.)
      • Manage CPU and memory utilization
        • WSRM:Process\Actual Managed CPU% counters to monitor CPU utilization
      • Do not use WSRM to set SQL Server instance limits. SQL Server has it’s own settings you can tune
sql server analysis services 2008 techniques to improve performance34
SQL Server Analysis Services 2008Techniques to Improve Performance
  • SSAS – Optimization Strategy
    • Use FAST I/O via the DataDir property
      • Move active partitions to fast storage or SSD implementation
    • Use a Dedicated Temp Storage Directory via Tempdir
      • Keep drive contention minimized.
      • SSD is excellent for this purpose
    • Keep partitions sized reasonably < 4 million cells
      • Based upon the number of facts within your AS Database
    • Configure memory based upon your usage using the “Aggregation Design Wizard”
sql server integration services 2008 techniques to improve performance
SQL Server Integration Services 2008Techniques to Improve Performance
  • SSIS – Optimization Strategy
    • Parallelism
      • Multiple copies against the same source
        • Provides more pipeline to move data
        • Separate process space for each package
        • Must have a control process to resync the package from async mode
        • Provides more execution trees for more buffers to be allocated
sql server integration services 2008 techniques to improve performance36
SQL Server Integration Services 2008Techniques to Improve Performance
  • SSIS – Optimization Strategy
    • Blocking “Row-at-a-time inserts” may be better than bulk load
    • Parallelism and More Pipeline

Buffers are allocated via Execution Trees

Each of these Numbered Steps represents a new Execution Tree

Spawning multiple copies of the package with a horizontal partition of data will create more process space and execution trees

resources

Required Slide

Resources

Learning

  • Sessions On-Demand & Community
  • Microsoft Certification & Training Resources

www.microsoft.com/teched

www.microsoft.com/learning

  • Resources for IT Professionals
  • Resources for Developers

http://microsoft.com/technet

http://microsoft.com/msdn

slide39

Required Slide

Complete an evaluation on CommNet and enter to win!

slide40

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st

http://northamerica.msteched.com/registration

You can also register at the North America 2011 kiosk located at registrationJoin us in Atlanta next year

slide41

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.