This SQL Server Index Maintenance Overview contains the following 3 topics:
- Determining Index Fragmentation with DBCC SHOWCONTIG
- Fixing Fragmented Indexes
- Distribution Statistics
Determining Index Fragmentation with DBCC SHOWCONTIG
SQL Server maintains the values within the index. However, as records are added, modified and removed from the table indexes become fragmented. When an index is fragmented it takes SQL Server more time to scan through the index. Therefore the effectiveness of the index decreases as the fragmentation level increases.
The index fragmentation level can be determined by executing DBCC SHOWCONTIG.
This DBCC statement accepts a table name and index name as parameters. If you only specify the table name, the statement is executed against the base index of this table. Alternatively you can omit both table and index name; if so, DBCC SHOWCONTIG will return fragmentation information for all indexes in the current database. Depending on the size of the table (or database, if table name isn't specified) running DBCC SHOWCONTIG can take a long time.
The following is a sample output from DBCC SHOWCONTIG:
DBCC SHOWCONTIG scanning 'sysobjects' table... Table: 'sysobjects' (1); index ID: 1, database ID: 8 TABLE level scan performed. - Pages Scanned................................: 25 - Extents Scanned..............................: 7 - Extent Switches..............................: 21 - Avg. Pages per Extent........................: 3.6 - Scan Density [Best Count:Actual Count].......: 18.18% [4:22] - Logical Scan Fragmentation ..................: 44.00% - Extent Scan Fragmentation ...................: 85.71% - Avg. Bytes Free per Page.....................: 1576.8 - Avg. Page Density (full).....................: 80.52%
Results may also be returned in a table format, in which case you get a few extra columns in the output.
This output tells us that the sysobjects table has 44% logical fragmentation. The scan density column shows that the best count is 4; this means that if there was no fragmentation in this index, then scanning through an index should have taken 4 extents. However, the actual count is 22. When an index is created, all extents are contiguously linked and index has no fragmentation. A level of fragmentation up to 15% is typically considered acceptable. Anything beyond that level can cause significant performance problems.
Fixing Fragmented Indexes
High levels of index fragmentation should be avoided. This can be accomplished by executing DBCC INDEXDEFRAG, DBCC DBREINDEX or CREATE INDEX WITH DROP EXISTING statements.
DBCC INDEXDEFRAG is an online operation (this means the index can be used even while fragmentation is being removed), and takes significantly less time than rebuilding an index. Unfortunately DBCC INDEXDEFRAG doesn't always do as good of a job as dropping and recreating an index. This statement takes database name, table name and index name as parameters. For example:
DBCC INDEXDEFRAG('pubs', 'authors', 'aunmind')
DBCC DBREINDEX rebuilds an index without dropping primary or unique constraints. This statement takes longer than DBCC INDEXDEFRAG, but it is guaranteed to remove fragmentation since it re-creates the index. Note that non-clustered indexes are NOT rebuilt when DBCC DBREINDEX is executed against the clustered index. DBCC DBREINDEX cannot change the index keys.
DBCC DBREINDEX doesn't require specified index names; it can rebuild all indexes within a table or all indexes within the database if you omit the table name. We could rebuild all indexes on the authors table as follows:
CREATE INDEX WITH DROP_EXISTING
The third alternative is using the WITH DROP_EXISTING clause of the CREATE INDEX statement. This statement performs better than the DROP INDEX statement followed by CREATE INDEX; however you must provide the name of the existing index. If the index keys haven't changed CREATE INDEX WITH DROP_EXISTING doesn't have to sort data again (as would be the case if you were to drop and re-create the index). Notice also that if you rebuild a clustered index then all non-clustered indexes are rebuilt automatically to reflect new clustering keys. We could rebuild the 'aunmind' index on the authors table as follows:
CREATE INDEX aunmind ON authors(au_lname, au_fname) WITH DROP_EXISTING
SQL Server 2005 introduces ALTER INDEX statement which allows you to remove some fragmentation and thereby improve the effectiveness of the index. The main options available with ALTER INDEX are REBUILD, REORGANIZE or DISABLE. The REBUILD option is equivalent to DBCC DBREINDEX with a slight difference: rebuilding the clustered index does NOT cause rebuilding non-clustered indexes unless you also supply ALL keyword. Furthermore, with Enterprise Edition of SQL Server you can specify WITH ONLINE option to rebuild the index while it remains online. In the background SQL Server builds a new index and substitutes the old one with the new after the new index is built. The DISABLE option advises the query engine to ignore the index when creating execution plans but the index definition remains intact. It can be re-enabled later by specifying REBUILD option with ALTER INDEX. If you disable the clustered index, no one will be able to query the data in the table because the clustered index contains the data pages at the leaf level. The REORGANIZE option is equivalent to DBCC INDEXDEFRAG. It removes index fragmentation without taking the index offline. For example, the following statement rebuilds ix_employee_name index on dim_employee table; the index remains online so that it can still be used by query optimizer:
ALTER INDEX ix_employee_name ON dim_employee REBUILD WITH (ONLINE = ON)
The next statement disables ix_employee_name index. Since this happens to be a non-clustered index table, data will remain accessible but the index will not be considered by query optimizer:
ALTER INDEX ix_employee_name ON dim_employee DISABLE
The following statement re-enables the index:
ALTER INDEX ix_employee_name ON dim_employee REBUILD
The statement below uses REORGANIZE keyword to remove index fragmentation. It is functionally equivalent to DBCC INDEXDEFRAG:
ALTER INDEX ix_employee_name ON dim_employee REORGANIZE
Another factor related to the effectiveness of an index is distribution statistics. Distribution statistics are generated automatically when you create an index and store information about index selectivity and distribution of key values. SQL Server query optimizer examines these statistics to determine the effectiveness of the index for each query. If distribution statistics are out of date, then the query optimizer might not evaluate index effectiveness correctly. This could lead query optimizer to use an index when it would be more efficient to scan a table or use a different index. To avoid such pitfalls you must ensure that the statistics stay current.
By default SQL Server computes distribution statistics automatically. However, you can disable automatic update of statistics for the entire database or for specific indexes. Updating statistics can be time consuming, especially on large tables that grow by millions of rows each day. An alternative is to update statistics manually, perhaps once a day during off-peak hours. To turn off automatic updating of statistics you can use the sp_autostats system stored procedure or use the STATISTICS_NORECOMPUTE clause of the CREATE INDEX statement.
To update statistics manually on a specific index execute UPDATE STATISTICS statement, as in:
UPDATE STATISTICS dbo.authors
Another option is to utilize the sp_updatestats system procedure to update statistics on all user tables within the current database.
The frequency of maintaining indexes can depend largely on the application type – whether you're trying to optimize read-only queries, reads as well as writes or bulk-loads.
Read-only databases are typically used for reporting applications that get refreshed with new data occasionally. In such environment indexes are not likely to get fragmented. However, if you're simply backing up a transactional database and restoring it for reporting, you might want to re-examine your indexes and perhaps even have a separate indexing scheme for a reporting application. In such environments you can examine the index fragmentation level and tune them after each data load, or even less frequently. You should not have to maintain indexes between data loads.
In transactional systems, where data is constantly added, read, modified and deleted indexes are likely to get fragmented often. For this type of environment, you might want to examine your indexes during each non-peak timeframe. Keep in mind that running index rebuilds can be quite time-consuming, so rebuilding all indexes each night might not be affordable. It is helpful to have a separate server where you can test rebuilding indexes and get a good estimate of how much time it will take for index maintenance on the production server.
In data warehousing systems it is not uncommon to drop indexes prior to each bulk-load operation, load the data and then rebuild indexes. The overhead of maintaining key values in the index during a bulk-load might be significant. In addition, adding thousands or millions of rows to the fact table might fragment the clustered index so badly that the index will be virtually useless. Therefore, we recommend at least running the index maintenance tasks after each bulk-load into a fact table, if not dropping and re-creating all indexes. Dimension tables, on the other hand, tend to be significantly smaller. Unless a dimension contains a million or more rows, updating such tables is usually digested with little harm to the indexes.
Updating statistics on every table in a data warehouse after each load can make significant difference in time it takes to process OLAP cubes.