Thursday, 14 June 2012

What is covering index ?


The term covering index does not mean a separate kind of index having a different internal structure. Rather, this term is used to describe a certain technique that is used to improve performance. It is discussed here because, as will be shown later, for most practical purposes, a clustering index can be thought of as a covering index.
Let's consider a simple but typical example. Suppose that we need to join two tables—Master, defined as (MasterId int primary key, Data int), and Detail, defined as (DetailId int primary key, Data int, MasterId int)—and select columns (MasterID, Detail.Data) where Master.Data satisfies a certain condition. The condition can be anything that is built directly on the value of Master.Data—for example, Data = @data, Data between @start and @end, or Data in (1, 2, 3). Let's also assume that, besides primary key indexes, the Master table has an index idxMaster built on (Data, MasterID), and the Detail table has an indexidxDetail built on (MasterID). The query plan will look as shown in Figure





What does that mean, exactly? Let's analyze it step by step:
  1. The query engine searches idxMaster, to find the entries that satisfy our condition for Master.Data, and it builds a list of corresponding Master.MasterIDvalues. This is fast, because:
    • Index pages are sorted by Master.Data, so the engine can jump directly to leaf index pages that satisfy our condition.
    • Those leaf index pages already contain values of Master.MasterId, so there is no need to jump somewhere else to read them.
  2. The engine selects entries from idxDetail having values of Detail.MasterID that match the list of Master.MasterID built during Step 1. This can be done by simply looping through the list of Master.MasterID and looking up each corresponding entry in idxDetail, or by using more complicated hash or mergealgorithms. The query engine determines which algorithm to use in each particular case, based on the relative cost of each algorithm. This, in turn, depends on the following criteria:
    • Selectivity of the index
    • Estimated number of lookups
    • Estimated number of rows to find.
    For the low relative values of the second and third criteria, the loop algorithm is preferred, whereas for higher values, the hash/merge algorithms become more efficient. The output of this step is a list of (MasterID, Detail:FID:PN:RN) values (see Figure 1).
  3. A list of heap pointers (FID:PN:RN), also known as bookmarks, is used to read randomly located heap data pages of the Detail table, and to retrieve values of the Data field in order to compile the requested result set (MasterID, Detail.Data). This step is called bookmark lookup, and, in our case, it is the most expensive one. The cost of bookmark lookup can be anywhere from 90 to 99 percent of the total query cost. This means that eliminating this step can improve the performance ten-fold or even one-hundred-fold. The cost of bookmark lookup depends on the fragmentation of data pages, the number of data pages to read, that availability of read-ahead optimization, the IO-to-CPU cost ratio, and other factors.
Please note that, in this example, bookmark lookup has only been performed on the Detail table, not on the Master table. This is because the required output fieldMasterID was a part of the index. This leads us to the definition of covering index: A covering index is the index that contains all output fields required by the operation performed on that index. Please note that if the Master table contained more fields, and at least one of these fields was present in the SELECT clause, the query engine would have to perform bookmark lookup on the Master table, and therefore idxMaster could not serve as a covering index anymore. That is why it is not possible to tell whether the index is covering without taking into consideration the query it is used in, and even the particular execution plan.
It is very easy now to devise a way to improve the performance of our query. Adding Detail.Data as a second field to idxDetail eliminates the need for bookmark lookup and considerably improves the performance.
Good candidates for covering indexes are junction tables (tables that exist to resolve many-to-many relationships). In the simplest case of a many-to-many relationship between T1 and T2, it is enough to have two indexes, (T1_Id, T2_Id) and (T2_Id, T1_Id), in order to save the query engine from ever doing bookmark lookups on this table.

No comments:

Post a Comment