Mastering Indexes for Efficient Database Performance

Performing Indexing andFull-Text Searching Lesson 21

Skills Matrix

Indexes • Indexes can potentially speed resultset creation during a query, but only when a WHERE clause specifies an effective search argument. Indexes always slow data entry. • You must balance these competing goals to keep your database performing optimally.

Heaps • Tables with no clustered index in place are called heaps. • SQL Server stores tables on disk by allocating one extent (eight contiguous 8 KB pages) at a time in the database file. • When one extent fills with data, another is allotted. • These extents, however, aren’t physically next to each other in the database file; they’re scattered about. • That is part of what makes data access on a heap so slow: SQL Server needs to access an index allocation map (IAM) to find various extents for the table it’s searching.

Sysindexes Table • This system table is used to store index information: every table in your database has an entry in the sysindexes table whether or not the particular table has an index in place. • The First IAM column tells SQL Server exactly where to find the first index allocation map (IAM) page in the database. • This process of scanning the IAM page, and then scanning each extent of the table for the record needed, is called a table scan.

Clustered Indexes • Clustered indexes physically rearrange the data that users insert in your tables. • The arrangement of a clustered index on disk can be compared to that in a dictionary, because they both use the same storage paradigm. • It might help to visualize an index in SQL Server as an upside-down tree. • In fact, the index structure is called a B-tree (balanced-tree) structure.

Clustered Index • The beginning point of an index traversal is the root page of the clustered index. • Each intermediate level of the clustered index grows initially from the root, to two pages, to four pages, to eight pages, and so on. • Because an index is organized as a balanced inverted tree, each move from one index page eliminates half of the data to be scanned.

Clustered Index • When you perform a query on a column that is part of a clustered index (by using a SELECT statement), SQL Server must refer to the sysindexes table, where every table has a record. • Tables with a clustered index have a value of 1 in the indid column (unlike heaps, which have a value of 0). • Once the record has been located, SQL Server looks at the root column, which contains the location of the root page of the clustered index. • When SQL Server locates the root page of the index, it begins to search for your data.

Clustered Index • Each page in the clustered index has pointers, or links, to the index page just before it and the index page just after it. • Having these links built into the index pages eliminates the need for the IAM pages that heaps require. • SQL Server then looks through each intermediate-level page, where it may be redirected to another intermediate-level page or, finally, to the leaf level. • The leaf level in a clustered index is the end destination—the data you requested in your SELECT query.

Selectivity • Selectivity is the number of duplicate values in a column; low selectivity means a column has many duplicate values.

Modifying Data with a Clustered Index • Modifying data with a clustered index is the same—you use standard INSERT, UPDATE , and DELETE statements. • What makes this process intriguing is the way SQL Server has to store your data: it must be physically rearranged to conform to the clustered index parameters.

Modifying Data with a Clustered Index • On a heap, the data are inserted at the end of the table, which is the bottom of the last data page. • If there is no room on any of the data pages, SQL Server allocates a new extent and starts filling it with data.

Modifying Data with a Clustered Index • Because you’ve told SQL Server to physically rearrange your data by creating a clustered index, SQL Server no longer has the freedom to stuff data wherever room exists. • The data must be placed in order physically. • To help SQL Server accomplish this task efficiently, you need to leave a little room at the end of each data page on a clustered index. This blank space is referred to as the fill factor.

Nonclustered Index • Like its clustered cousin, the nonclustered index is a B-tree structure having a root page, intermediate levels, and a leaf level. • However, two major differences separate the index types. • The first is that the leaf level of the nonclustered index doesn’t contain the actual data; it contains pointers to the data that is stored in data pages. • The second big difference is that the nonclustered index doesn’t physically rearrange the data.

Nonclustered Index • When you search for data on a table with a nonclustered index, SQL Server first queries the sysindexes table, looking for a record that contains your table name and a value in the indid column from 2 to 251 (0 denotes a heap, and 1 is for a clustered index). • Once SQL Server finds this record, it looks at the root column to find the root page of the index (just like it did with a clustered index). • Once SQL Server has the location of the root page, it can begin searching for your data.

Clustered and Nonclustered Indexes

Covered Index and Included Columns • Covered index: • The index that contains all output fields required by the operation performed on that index. • The base table need not be accessed. • Included columns • You can extend the functionality of nonclustered indexes by adding nonkey columns to the leaf level of the nonclustered index. • By including nonkey columns, you can create nonclustered indexes that cover more queries.

Nonkey Columns • When using nonkey columns, you need to keep a few guidelines in mind: • Nonkey columns can be included only in nonclustered indexes. • Columns can’t be defined in both the key column and the INCLUDE list. • Column names can’t be repeated in the INCLUDE list. • Nonkey columns can’t be dropped from a table unless the index is dropped first.

Nonkey Columns • You must have at least one key column defined with a maximum of 16 key columns. • You can have only up to a maximum of 1,023 included columns. • The only changes allowed to nonkey columns are to nullability (from NULL to NOT NULL, and vice versa), and increasing the length of varbinary, varchar, or nvarchar columns.

Setting Relational Options • PAD_INDEX • When the PAD_INDEX option is set to ON, the percentage of free space that is specified by the fill factor is applied to the intermediate-level pages of the index. • When this is OFF, the intermediate-level pages are filled to the point at which there is room for one new record.

Setting Relational Options • FILLFACTOR • Specifies how full the database engine should make each page during index creation or rebuild. • Valid values are from 0 to 100. • Values of 0 and 100 are the same in that they both tell the database engine to fill the page to capacity, leaving room for only one new record. • Any other value specifies the amount of space to use for data; for instance, a fill factor of 70 tells the database engine to fill the page to 70 percent full with 30 percent free space.

Collations • Collations are used within databases to display and store an international character set, based on business requirements. • When returning data, you have the ability to retrieve the data in a collation type different from how it was stored. • When working with these multiple collations, you can invoke the COLLATE keyword and then specify the collation type you prefer to use.

Collations • You can use the COLLATE keyword in various ways and at several levels: • COLLATE on database creation • COLLATE on table creation • COLLATE by casting or expression • Collations supported by SQL Server

Filtered Index • SQL Server 2008 includes a new method of indexing known as Filtered Indexes. • These indexes work by providing a WHERE clause like other SQL commands to only select some rows of data. • The query optimizer will try to use a filtered index whenever the WHERE clause of a SELECT statement matches the WHERE clause of the index.

Filtered Statistics • A related new feature in SQL Server 2008 is Filtered Statistics. • Query execution plans can now use statistics filtered by a WHERE clause. • Filtered statistics are created along with filtered indexes although you can manually create filtered statistics using the CREATE STATISTICS command.

Partitioned Index • It’s usually best to partition a table and then create an index on the table so that SQL Server can partition the index for you based on the partition function and schema of the table. • However, you can partition indexes separately. • This is useful in the following cases: • The base table isn’t partitioned. • Your index key is unique but doesn’t contain the partition column of the table. • You want the base table to participate in collocated JOINs with other tables using different JOIN columns.

Partitioned Index • If you decide you need to partition your index separately, then you need to keep the following in mind: • The arguments of the partition function for the table and index must have the same data type. • Your table and index must define the same number of partitions. • The table and index must have the same partition boundaries.

Spatial Data • SQL Server 2008 and later versions support spatial data. This includes support for a planar spatial data type, geometry, which supports geometric data—points, lines, and polygons—within a Euclidean coordinate system. • The geography data type represents shapes on the Earth's surface, such as an area of land. • A spatial index on a geography column maps the geographic data to a two-dimensional, non-Euclidean space. • A spatial index is defined on a table column that contains spatial data (a spatial column). • Each spatial index refers to a finite space.

Primary Index • A primary key ensures that each of the records in your table is unique in some way. • It does this by creating a special type of index called a unique index. • An index is ordinarily used to speed up access to data by reading all of the values in a column and keeping an organized list of where the record that contains that value is located in the table.

Primary Index • A unique index not only generates that list, but it also prohibits duplicate values from being stored in the index. • If a user tries to enter a duplicate value in the indexed field, the unique index will return an error, and the data modification will fail. • When a column can be used as a unique identifier for a row (such as an identity column), it is referred to as a surrogate or candidate key.

Full-text Indexes • You perform full-text searching through a completely separate program that runs as a service, called the SQL Server FullText Search service, or msftesq, and that can be used to index all sorts of information from most of the BackOffice (or even non-Microsoft) products. • Full-text indexes are created using SQL Server tools, such as Management Studio, but they are maintained by the FullText Search service and stored on disk as files separate from the database.

Noise Words • You need to consider reducing meaningless successful hits by itemizing those common words or phrases in a file for use by Full-Text Search. • In SQL Server 2005 this is a text file located, by default, in the ftdata folder. Use Notepad or any similar text editor to add or delete noise words.

Summary • In this lesson, you first learned how SQL Server accesses and stores data when no index is in place. • Without a clustered index, the table is called a heap, and the data are stored on a first-come, first-served basis. • When accessing this data, SQL Server must perform a table scan, which means SQL Server must read every record in the table to find the data you’re seeking. • This can make data access slow on larger tables; but on smaller tables that are about one extent in size, table scans can be faster than indexing.

Summary • Next you learned how to accelerate data access by using indexes. • The first index you looked at was the clustered index. • This type of index physically rearranges the data in the database file. • This property makes the clustered index ideal for columns that are constantly being searched for ranges of data and that have low selectivity, meaning several duplicate values.

Summary • Next came nonclustered indexes. • These indexes don’t physically rearrange the data in the database but rather create pointers to the actual data. • This type of index is best suited to high-selectivity tables (few duplicate values) where single records are desired rather than ranges. • Then you learned how to create indexes using SQL Server Management Studio.

Summary • Finally, you found that full-text searching could greatly enhance SELECT queries by allowing you to find words or phrases within your text fields. • With this newfound knowledge about indexing, you’ll be able to speed up data access for your users.

Summary for Certification Examination • Know the difference between clustered and nonclustered indexes. • Clustered indexes physically rearrange the data in a table to match the definition of the index. • Nonclustered indexes are separate objects in the database that refer to the original table, without rearranging the table in any way.

Summary for Certification Examination • Understand full-text indexing. Understand what full-text indexing is for and how to manage it. • Full-text indexing runs as a separate service and is used to search columns of text for phrases, instead of just single words. • You have to repopulate the index occasionally to keep it up to date with the underlying table. • SQL Server can do this for you automatically if you want.

Summary for Certification Examination • Know the relational options for creating indexes. • In the “Setting Relational Options” section, you learned about the different relational options and what they do. • Familiarize yourself with these options for the exam.

Mastering Indexes for Efficient Database Performance

Mastering Indexes for Efficient Database Performance

Presentation Transcript

Full-Text Indexing

Indexing and Searching

CINAHL with Full Text Basic Searching

Indexing and Searching

SEARCHING THROUGH EBSCO MEDLINE AND CINAHL WITH FULL TEXT

Lecture 1 : Full-text Indexing

Tools for Text Indexing and SearchING

Searching and Accessing Full Text Journal Articles

Text Indexing

Semi-Automatic Indexing of Full Text Biomedical Articles

Indexing and Searching (File Structures)

Indexing and Searching

Introduction on biological sequence indexing, searching and text mining

Basic Text Processing and Indexing

Chapter 8 Indexing and Searching

Text Searching

Multimedia and Text Indexing

Full-Text Indexing

Searching and Indexing

Introduction on biological sequence indexing, searching and text mining

Indexing and Searching

Multimedia and Text Indexing