ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Clustered and Non clustered index
    DB/RDB 2020. 2. 23. 23:00

    1. Overview

    With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.

    With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indices, although each new index will increase the time it takes to write new records.

    It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.

    Writing to a table with a clustered index can be slower if there is a need to rearrange the data.

    2. Clustered Index

    2.1 What is a clustered index

    A clustered index is a special index that physically orders the data according to the indexed columns. The leaf nodes of the index store the data for the rest of the columns in the table so when a lookup is performed on this type of index there are no other structures that need to be referenced.

    2.2 Why create a clustered index

    • When querying the table, it requires fewer IO operations since an extra lookup is not needed to get any/all of the non-key column data.  This data is stored in the leaf node of the index.
    • It gives us a way to reorganize the table data.  If the table has no clustered index it is then stored in a heap structure. 
      • When our data becomes fragmented over time due to DML operations the only way to fix this fragmentation is to reload all the data into a new table.  With a clustered index in place, we can run and index reorganizes or rebuilds to address the fragmentation which in some cases can be done online while the table is still accessible to other processes.

    2.3 When to use

    As a general rule of thumb is it's best to create a clustered index on a table when the same columns are heavily used in the WHERE clause portion of a query.

    Queries that perform a lot of range scans with the indexed columns in the WHERE clause can also benefit greatly from having a clustered index on these columns

    2.4 How to change clustered indexes

    A better solution would be to follow these steps:

    1. Drop all non-clustered indexes
    2. Drop clustered index
    3. Create new clustered index
    4. Re-create all non-clustered indexes (including dropped clustered index as non-clustered)

    2.5 Should the primary key be the clustered index

    As was noted earlier the main reason for creating a clustered index is to reduce the disk IO for queries where the same columns are heavily used in the WHERE clause as we will have to perform fewer lookups to get the actual column data.   These columns that are in the WHERE clause most often may or may not be the primary key column(s).  A simple example of this would be the case where the primary key column of a table is an identity column but most queries that hit the table using some other column(s) in the WHERE clause.  If we look at the HumanResources.Employee table we can see that there are many other alternate key columns for this table and it could be the case that one of these is used more often in the WHERE clause.  If that was the case then it would make sense to change one of these indexes to be clustered index and make the primary key non-clustered. 

    3. Non-Clustered Index

    3.1 What is a non clustered index

    A non-clustered index (or regular b-tree index) is an index where the order of the rows does not match the physical order of the actual data.  It is instead ordered by the columns that make up the index.  In a non-clustered index, the leaf pages of the index do not contain any actual data, but instead, contain pointers to the actual data.  These pointers would point to the clustered index data page where the actual data exists (or the heap page if no clustered index exists on the table).

    3.2 Why create non clustered index

    The main benefit to having a non-clustered index on a table is it provides fast access to data.  The index allows the database engine to locate data quickly without having to scan through the entire table.  As a table gets larger it is very important that the correct indexes are added to the table, as without any indexes query performance will drop off dramatically.

    3.3 When to use

    • when there is more than one set of columns that are used in the WHERE clause of queries that access the table. 
      • A second index (assuming there is already a clustered index on the primary key column) will speed up execution times and reduce IO for the other queries.
    • If your queries frequently require data to be returned in a certain order, having an index on these columns can reduce the amount of CPU and memory required as additional sorting will not need to be done since the data in the index is already ordered. 

    3.4 Non clustered indexes related to a clustered index

    The clustered index stores the actual data of the non-key columns in the leaf nodes of the index.  The leaf nodes of each non-clustered index do not contain any data and instead have pointers to the actual data page (or leaf node) of the clustered index.  The diagram below illustrates this point.

    4. Difference between Clustered Index and Non-clustered Index

    Clustered Index Non-clustered Index
    Clustered index will be created by default when you create a primary key on a column. So we can create one clustered index per table Non-clustered index will be created automatically when you create a unique key on a column. A table can have a number of unique keys, so we can create a number of non-clustered indexes per table
    The leaf level of a clustered index is the actual data page The leaf level of a non-clustered index does not consist of the actual data page
    Only one clustered index can be created Many non-clustered indexes can be created
    Physical ordering of the data of the column Logical ordering of the data of the column. It is more like a pointer to data
    Clustered index can improve the performance of data retrieval It should be created on columns that it used in joins, where, and order by clause.

    5. Reference

    https://www.youtube.com/watch?v=ITcOiLSfVJQ

    https://stackoverflow.com/questions/1251636/what-do-clustered-and-non-clustered-index-actually-mean

    https://www.mssqltips.com/sqlservertutorial/9132/sql-server-clustered-indexes/

    https://www.mssqltips.com/sqlservertutorial/9133/sql-server-nonclustered-indexes/

    https://www.wikitechy.com/interview-questions/sql/difference-between-clustered-and-non-clustered-index

    'DB > RDB' 카테고리의 다른 글

    S3, Block storage, File storage, and Object storage  (0) 2020.02.24
    DDL, DML, DCL, and TCL  (0) 2020.02.23
    MySQL Scaling  (0) 2020.02.23
    Dirty Read  (0) 2019.09.30
    Phantom Read  (0) 2019.09.30

    댓글

Designed by Tistory.