1 / 20

Designing VLDBs

Designing VLDBs. Praveen Srivatsa Director | AstrhaSoft Consulting blogs.asthrasoft.com/praveens | praveens@asthra.net. Table and Index Partitioning. Designed for … VLDB w/very large tables (100’s GB) Large machines with 8, 16, 32, and more ‘real’ CPUs

barb
Download Presentation

Designing VLDBs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing VLDBs Praveen Srivatsa Director|AstrhaSoft Consulting blogs.asthrasoft.com/praveens|praveens@asthra.net

  2. Table and Index Partitioning • Designed for … • VLDB w/very large tables (100’s GB) • Large machines with 8, 16, 32, and more ‘real’ CPUs • Replacing partitioned views where the partitions are in a single database • Improving large data set manageability • Loading • Efficient dropping of whole partitions

  3. How do you do Partitioning? • Before a table is partitioned two things have to be created, a partition Function and Schema • Partition Function • Each row of an index/table is assigned to a partition (numbered 1, 2, 3, ...) by using a “Partition Function” • SQL Server 2005 supports Range partitioning onlyon a single column • User defines the key column, the number of partitions, and the partition boundary points • Partition Scheme • Each partition for a partition function is mapped to a physical storage location (Filegroup) through a “Partition Scheme”

  4. Rules For Partitions • Restrictions are derived from the principles of no data movement and no scans • All target indexes must exist in the source • All indexes must be aligned (use the same partition function as was used on the table) • Corresponding target index is in the same filegroup as the source • There must be a constraint on the source to ensure all its data “fits” into the target partition

  5. Example Code • CREATE PARTITION FUNCTIONtesttablepart_fn (int) • AS RANGE LEFT FOR VALUES (-1,10, 20, 30) • CREATE PARTITION SCHEMEtesttablepart_ps • AS PARTITION testtablepart_fn TO • ([Filegroup4], [Filegroup1], [Filegroup1], [Filegroup3], [Filegroup2]) • The order of filegroups being displayed is significant • CREATE TABLE Employees (EmpId int, EmpName varchar(50)) ON testtablepart_ps(EmpId)

  6. Partitioning Key restrictions • Must be based on columns in the table • Restrictions similar to index key limitations • Only 1 column is permitted • No Text/NText/Image in partition key • No varchar(max) • No timestamp • Only “native types” – no user-defined types • Column values must be deterministic • Computed columns must be persisted (a new feature in 2005) • Maximum 1000 partitions per table in SQL Server 2005 • (practical limitation, not tested beyond 1000) • All partitions of a single table or index must be in a single database • Partitioned views can be used in conjunction with table partitioning to span databases and servers

  7. Partitioning Benefits • Allow easy management of very large tables and indexes (data scale-up) • For example Fast Insert or Delete of large quantities of data (per-partition) • Index defragmentation or rebuild on one partition using ALTER INDEX … PARTITION (<num>) • In some cases may experience some performance improvements for some operators, e.g. hash join would be more efficient • take advantage of collocation • joins of large tables

  8. Partitioning Benefits • Without interfering with access to the rest of a table: • Add 100's of millions of rows to a table that already has billions of rows • Delete 100's of millions of rows from that table • Solution … use partitions with ranges • Very fast (1 second), just metadata changes • Switch data in and out • Table and partition of another table • Two partitions of different tables • Bulk data loading (into separate table - then switch into main table) • Index maintenance per partition • Two key best practices to improve manageability • Index Alignment (partitioned like the table) • Storage Alignment (table partitions in the same filegroup\file)

  9. Partitioning Benefits • When you have to BCP 100 million rows into a table that already has 600 million row, need to reindex • Table has indexes => load is slow; about 10+ times slower than loading into a heap and creating indexes afterwards • Reindexing made table unavailable • Using a partitioned table, • can load into the table, and index it • while everyone still using the rest of the partitions • When done loading and indexing, just switch it in • Delete is slow – deleting rows is orders of magnitude slower than truncating a multi-GB table • But you don’t want to truncate the table, just delete 20 or 30% of it • With partitioning, just switch out the unneeded partition • Cost of running a utility usually grows linearly with the table size • DBCC CHECKTABLE, use it on a partition at a time

  10. Table Partitioning and Switching • Bring a ‘partition’ of new data in or take a ‘partition’ of old data out of a partitioned table very fast • A table has 12 billion rows divided into 12 months • You want to archive off the 13th month and then bring in the new month • You have partitioned by month • Data is ‘moved’ (not really) between tables by SWITCHing the pointers to physical data locations • DATA IS NOT MOVED or even scanned, so size doesn’t matter • It’s a metadata operation • Takes a second

  11. Example Switch Code • ALTER TABLE table_name1 SWITCH [PARTITION <partition_number1>] TO table_name2 [PARTITION <partition_number2>] • table_name1 is called “source” and table_name2 “target” of the SWITCH • Target table (or partition_number2) is empty

  12. SWITCH Partition • Performance – allows building new and removing old partition fast • Availability – allows adding new and removing old partition with minimal downtime Note: Schema modification lock is acquired for the duration of the ALTER TABLE … SWITCH • Per-partition manageability – enables taking single partition out and run utility on it • CREATE INDEX, ALTER INDEX … REBUILD • And other operations that are not intended to work on individual partition numbers, DBCC CHECKTABLE • Efficiently supports Sliding Window scenario

  13. Indexes can also be Partitioned • Syntax is the same as for tables • “ON PartitionScheme(col)” • They may be partitioned differently from the base table • However, index alignment is the best practice • If an index uses a similar partition function and same partitioning key as the base table, then the index is “aligned” • Similar means the same number of partitions and same boundary points • One-to-one correspondence between data in table and index partition • All index entries in one partition map to data in a single partition of the base table

  14. Can I Change a Partition? • Yes, but do you want to? Not if there are already 100’s of millions of rows • Plan partitioning very carefully and avoid a huge I/O problem later • But if you have to add or remove a partition: • SPLIT – adding a partition • Taking 1 partition and splitting it into 2 partitions • ALTER PARTITION FUNCTION pfname() SPLIT RANGE (new_boundary_value) • MERGE – removing a partition • Joining 2 partitions into 1 • ALTER PARTITION FUNCTION pfname() MERGE RANGE (old_boundary_value) • All affected tables are locked during the operation by EXCLUSIVE lock • Logging is in effect similarly as in INSERT INTO … SELECT FROM • No data moves in or out of the table; only rows of tables and indexes are moved from one partition to another • This is the I/O killer, so plan ahead of time to avoid doing this

  15. Partitioning Best Practices • Plan Plan Plan ahead and save yourself from pain • Ensure minimal data movement for • SPLIT • MERGE • Have many partition functions so any alteration of one function will have limited impact • Use storage and index alignment • Empty partition should be the one that is “SPLIT” • Watch how filegroups are assigned to partitions • Spread the filegroups over many drives – let SQL Server handle the I/O distribution (e.g. each partition on a separate disk may not be the best choice)

  16. Feedback / QnA • Your Feedback is Important! Please take a few moments to fill out our online feedback form For detailed feedback, use the form at http://www.connectwithlife.co.in/vtd/helpdesk.aspx Or email us at vtd@microsoft.com • Use the Question Manager on LiveMeeting to ask your questions now!

More Related