|
| 1 | +--- |
| 2 | +id: dbms-indexing |
| 3 | +title: DBMS - Indexing |
| 4 | +sidebar_label: Indexing |
| 5 | +sidebar_position: 1 |
| 6 | +description: Learn about different types of indexing in database management systems, their structures, and operations. |
| 7 | +--- |
| 8 | + |
| 9 | +DBMS - Indexing |
| 10 | +=== |
| 11 | + |
| 12 | +We know that data is stored in the form of records. Every record has a key field, which helps it to be recognized uniquely. |
| 13 | + |
| 14 | +Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. Indexing in database systems is similar to what we see in books. |
| 15 | + |
| 16 | +Indexing Types |
| 17 | +--- |
| 18 | + |
| 19 | +Indexing is defined based on its indexing attributes. Indexing can be of the following types: |
| 20 | + |
| 21 | +### Primary Index |
| 22 | +- **Description:** Defined on an ordered data file. The data file is ordered on a key field, generally the primary key of the relation. |
| 23 | + |
| 24 | +### Secondary Index |
| 25 | +- **Description:** May be generated from a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values. |
| 26 | + |
| 27 | +### Clustering Index |
| 28 | +- **Description:** Defined on an ordered data file. The data file is ordered on a non-key field. |
| 29 | + |
| 30 | +Ordered Indexing Types |
| 31 | +--- |
| 32 | + |
| 33 | +Ordered Indexing can be of two types: |
| 34 | + |
| 35 | +### Dense Index |
| 36 | +- **Description:** There is an index record for every search key value in the database. |
| 37 | +- **Characteristics:** Faster searching but requires more space to store index records. |
| 38 | +- **Structure:** |
| 39 | + - Index records contain search key value and a pointer to the actual record on the disk. |
| 40 | + |
| 41 | +```mermaid |
| 42 | +graph TD; |
| 43 | + A[Dense Index] --> B[Search Key 1] |
| 44 | + A --> C[Search Key 2] |
| 45 | + A --> D[Search Key 3] |
| 46 | + B --> E[Record Pointer 1] |
| 47 | + C --> F[Record Pointer 2] |
| 48 | + D --> G[Record Pointer 3] |
| 49 | +``` |
| 50 | + |
| 51 | +### Sparse Index |
| 52 | +- **Description:** Index records are not created for every search key. |
| 53 | +- **Characteristics:** Contains a search key and an actual pointer to the data on the disk. |
| 54 | +- **Structure:** |
| 55 | + - To search a record, proceed by index record and reach the actual location of the data. If not found, start sequential search until the desired data is found. |
| 56 | + |
| 57 | +```mermaid |
| 58 | +graph TD; |
| 59 | + A[Sparse Index] --> B[Search Key 1] |
| 60 | + A --> C[Search Key 2] |
| 61 | + A --> D[Search Key 3] |
| 62 | + B --> E[Record Pointer 1] |
| 63 | + C --> F[Record Pointer 2] |
| 64 | + D --> G[Record Pointer 3] |
| 65 | +``` |
| 66 | + |
| 67 | +### Multilevel Index |
| 68 | +- **Description:** Index records comprise search-key values and data pointers. Stored on disk along with the actual database files. |
| 69 | +- **Characteristics:** As the database size grows, so does the size of the indices. |
| 70 | +- **Structure:** |
| 71 | + - Break down the index into several smaller indices to make the outermost level so small that it can be saved in a single disk block. |
| 72 | + |
| 73 | +```mermaid |
| 74 | +graph TD; |
| 75 | + A[Multilevel Index] --> B[Level 1 Index] |
| 76 | + B --> C[Level 2 Index 1] |
| 77 | + B --> D[Level 2 Index 2] |
| 78 | + C --> E[Data Pointer 1] |
| 79 | + C --> F[Data Pointer 2] |
| 80 | + D --> G[Data Pointer 3] |
| 81 | + D --> H[Data Pointer 4] |
| 82 | +``` |
| 83 | + |
| 84 | +### B+ Tree |
| 85 | +- **Description:** A balanced binary search tree that follows a multi-level index format. Leaf nodes denote actual data pointers. |
| 86 | +- **Characteristics:** Ensures all leaf nodes remain at the same height, thus balanced. Supports random access and sequential access. |
| 87 | + |
| 88 | +```mermaid |
| 89 | +graph TD; |
| 90 | + A[B+ Tree] --> B[Internal Node] |
| 91 | + B --> C[Leaf Node 1] |
| 92 | + B --> D[Leaf Node 2] |
| 93 | + C --> E[Data Pointer 1] |
| 94 | + C --> F[Data Pointer 2] |
| 95 | + D --> G[Data Pointer 3] |
| 96 | + D --> H[Data Pointer 4] |
| 97 | + H --> I[Next Leaf Node] |
| 98 | +``` |
| 99 | + |
| 100 | +#### Structure of B+ Tree |
| 101 | +- **Internal Nodes:** |
| 102 | + - Contain at least ⌈n/2⌉ pointers, except the root node. |
| 103 | + - At most, an internal node can contain n pointers. |
| 104 | + |
| 105 | +- **Leaf Nodes:** |
| 106 | + - Contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values. |
| 107 | + - At most, a leaf node can contain n record pointers and n key values. |
| 108 | + - Every leaf node contains one block pointer P to point to the next leaf node, forming a linked list. |
| 109 | + |
| 110 | +#### B+ Tree Insertion |
| 111 | +1. **Insertion at Leaf Node:** |
| 112 | + - If a leaf node overflows, split node into two parts. |
| 113 | + - Partition at i = ⌊(m+1)/2⌋. |
| 114 | + - First i entries are stored in one node. |
| 115 | + - Rest of the entries (i+1 onwards) are moved to a new node. |
| 116 | + - ith key is duplicated at the parent of the leaf. |
| 117 | + |
| 118 | +2. **Insertion at Non-leaf Node:** |
| 119 | + - Split node into two parts. |
| 120 | + - Partition the node at i = ⌈(m+1)/2⌉. |
| 121 | + - Entries up to i are kept in one node. |
| 122 | + - Rest of the entries are moved to a new node. |
| 123 | + |
| 124 | +#### B+ Tree Deletion |
| 125 | +1. **Deletion at Leaf Node:** |
| 126 | + - The target entry is searched and deleted. |
| 127 | + - If it is an internal node, delete and replace it with the entry from the left position. |
| 128 | + - After deletion, check for underflow. |
| 129 | + |
| 130 | +2. **Handling Underflow:** |
| 131 | + - If underflow occurs, distribute the entries from the nodes left to it. |
| 132 | + - If distribution is not possible from the left, distribute from the nodes right to it. |
| 133 | + - If distribution is not possible from left or right, merge the node with left and right nodes. |
| 134 | + |
| 135 | +In summary, indexing in DBMS is a crucial technique to enhance the speed and efficiency of data retrieval. Different indexing methods and structures are suited to various data and query types, ensuring optimized performance for diverse database operations. |
0 commit comments