Skip to content

Commit 09915d8

Browse files
committed
dbms-docs-sec-5-6-7-added
1 parent aa3054f commit 09915d8

File tree

9 files changed

+644
-0
lines changed

9 files changed

+644
-0
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"label": "Indexing and Hashing",
3+
"position": 5,
4+
"link": {
5+
"type": "generated-index",
6+
"description": "Explore various indexing techniques and hashing methods in DBMS."
7+
}
8+
}
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
id: dbms-hashing
3+
title: DBMS - Hashing
4+
sidebar_label: Hashing
5+
sidebar_position: 2
6+
description: Learn about different types of hashing in database management systems, their structures, and operations.
7+
---
8+
9+
DBMS - Hashing
10+
===
11+
12+
For a huge database structure, it can be almost next to impossible to search all the index values through all its levels and then reach the destination data block to retrieve the desired data. Hashing is an effective technique to calculate the direct location of a data record on the disk without using an index structure.
13+
14+
Hashing uses hash functions with search keys as parameters to generate the address of a data record.
15+
16+
Hash Organization
17+
---
18+
19+
- **Bucket:** A hash file stores data in bucket format. A bucket is considered a unit of storage and typically stores one complete disk block, which in turn can store one or more records.
20+
- **Hash Function:** A hash function, h, is a mapping function that maps all the set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses.
21+
22+
### Static Hashing
23+
24+
In static hashing, when a search-key value is provided, the hash function always computes the same address. For example, if a mod-4 hash function is used, then it shall generate only 5 values. The output address shall always be the same for that function. The number of buckets provided remains unchanged at all times.
25+
26+
#### Operation
27+
- **Insertion:** When a record is required to be entered using static hash, the hash function h computes the bucket address for search key K, where the record will be stored.
28+
- Bucket address = h(K)
29+
- **Search:** When a record needs to be retrieved, the same hash function can be used to retrieve the address of the bucket where the data is stored.
30+
- **Delete:** This is simply a search followed by a deletion operation.
31+
32+
```mermaid
33+
graph TD;
34+
A[Static Hashing] --> B[Insertion]
35+
A --> C[Search]
36+
A --> D[Deletion]
37+
B --> E[Compute Bucket Address]
38+
C --> F[Retrieve Bucket Address]
39+
D --> G[Locate and Delete Record]
40+
```
41+
42+
#### Bucket Overflow
43+
The condition of bucket overflow is known as a collision. This is a critical state for any static hash function. In this case, overflow chaining can be used.
44+
45+
- **Overflow Chaining:** When buckets are full, a new bucket is allocated for the same hash result and is linked after the previous one. This mechanism is called Closed Hashing.
46+
- **Linear Probing:** When a hash function generates an address at which data is already stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
47+
48+
```mermaid
49+
graph TD;
50+
A[Bucket Overflow] --> B[Overflow Chaining]
51+
A --> C[Linear Probing]
52+
B --> D[New Bucket Allocation]
53+
C --> E[Next Free Bucket]
54+
```
55+
56+
### Dynamic Hashing
57+
58+
The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand. Dynamic hashing is also known as extended hashing.
59+
60+
#### Organization
61+
The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is used for computing bucket addresses. Every hash index has a depth value to signify how many bits are used for computing a hash function. These bits can address \(2^n\) buckets. When all these bits are consumed, that is, when all the buckets are full, then the depth value is increased linearly and twice the buckets are allocated.
62+
63+
```mermaid
64+
graph TD;
65+
A[Dynamic Hashing] --> B[Hash Index]
66+
B --> C[Depth Value]
67+
C --> D[Compute Bucket Addresses]
68+
D --> E[Increase Depth and Allocate More Buckets]
69+
```
70+
71+
#### Operation
72+
- **Querying:** Look at the depth value of the hash index and use those bits to compute the bucket address.
73+
- **Update:** Perform a query as above and update the data.
74+
- **Deletion:** Perform a query to locate the desired data and delete the same.
75+
- **Insertion:** Compute the address of the bucket.
76+
- If the bucket is already full:
77+
- Add more buckets.
78+
- Add additional bits to the hash value.
79+
- Re-compute the hash function.
80+
- Else:
81+
- Add data to the bucket.
82+
- If all the buckets are full, perform the remedies of static hashing.
83+
84+
```mermaid
85+
graph TD;
86+
A[Dynamic Hashing Operation] --> B[Querying]
87+
A --> C[Update]
88+
A --> D[Deletion]
89+
A --> E[Insertion]
90+
E --> F[Compute Bucket Address]
91+
F --> G{Bucket Full?}
92+
G --> H[Add More Buckets]
93+
G --> I[Add Data to Bucket]
94+
H --> J[Add Bits to Hash Value]
95+
H --> K[Re-compute Hash Function]
96+
```
97+
98+
### Comparison Table
99+
100+
| Feature | Static Hashing | Dynamic Hashing |
101+
|--------------------|---------------------------|----------------------------|
102+
| Bucket Expansion | Fixed number of buckets | Buckets expand/shrink dynamically |
103+
| Collision Handling | Overflow chaining, linear probing | Overflow chaining, linear probing |
104+
| Performance | Good for small databases | Better for large, dynamic databases |
105+
| Flexibility | Less flexible | Highly flexible |
106+
107+
Hashing is not favorable when the data is organized in some ordering and the queries require a range of data. When data is discrete and random, hashing performs the best. Hashing algorithms have higher complexity than indexing. All hash operations are done in constant time.
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
id: dbms-indexing
3+
title: DBMS - Indexing
4+
sidebar_label: Indexing
5+
sidebar_position: 1
6+
description: Learn about different types of indexing in database management systems, their structures, and operations.
7+
---
8+
9+
DBMS - Indexing
10+
===
11+
12+
We know that data is stored in the form of records. Every record has a key field, which helps it to be recognized uniquely.
13+
14+
Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. Indexing in database systems is similar to what we see in books.
15+
16+
Indexing Types
17+
---
18+
19+
Indexing is defined based on its indexing attributes. Indexing can be of the following types:
20+
21+
### Primary Index
22+
- **Description:** Defined on an ordered data file. The data file is ordered on a key field, generally the primary key of the relation.
23+
24+
### Secondary Index
25+
- **Description:** May be generated from a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values.
26+
27+
### Clustering Index
28+
- **Description:** Defined on an ordered data file. The data file is ordered on a non-key field.
29+
30+
Ordered Indexing Types
31+
---
32+
33+
Ordered Indexing can be of two types:
34+
35+
### Dense Index
36+
- **Description:** There is an index record for every search key value in the database.
37+
- **Characteristics:** Faster searching but requires more space to store index records.
38+
- **Structure:**
39+
- Index records contain search key value and a pointer to the actual record on the disk.
40+
41+
```mermaid
42+
graph TD;
43+
A[Dense Index] --> B[Search Key 1]
44+
A --> C[Search Key 2]
45+
A --> D[Search Key 3]
46+
B --> E[Record Pointer 1]
47+
C --> F[Record Pointer 2]
48+
D --> G[Record Pointer 3]
49+
```
50+
51+
### Sparse Index
52+
- **Description:** Index records are not created for every search key.
53+
- **Characteristics:** Contains a search key and an actual pointer to the data on the disk.
54+
- **Structure:**
55+
- To search a record, proceed by index record and reach the actual location of the data. If not found, start sequential search until the desired data is found.
56+
57+
```mermaid
58+
graph TD;
59+
A[Sparse Index] --> B[Search Key 1]
60+
A --> C[Search Key 2]
61+
A --> D[Search Key 3]
62+
B --> E[Record Pointer 1]
63+
C --> F[Record Pointer 2]
64+
D --> G[Record Pointer 3]
65+
```
66+
67+
### Multilevel Index
68+
- **Description:** Index records comprise search-key values and data pointers. Stored on disk along with the actual database files.
69+
- **Characteristics:** As the database size grows, so does the size of the indices.
70+
- **Structure:**
71+
- Break down the index into several smaller indices to make the outermost level so small that it can be saved in a single disk block.
72+
73+
```mermaid
74+
graph TD;
75+
A[Multilevel Index] --> B[Level 1 Index]
76+
B --> C[Level 2 Index 1]
77+
B --> D[Level 2 Index 2]
78+
C --> E[Data Pointer 1]
79+
C --> F[Data Pointer 2]
80+
D --> G[Data Pointer 3]
81+
D --> H[Data Pointer 4]
82+
```
83+
84+
### B+ Tree
85+
- **Description:** A balanced binary search tree that follows a multi-level index format. Leaf nodes denote actual data pointers.
86+
- **Characteristics:** Ensures all leaf nodes remain at the same height, thus balanced. Supports random access and sequential access.
87+
88+
```mermaid
89+
graph TD;
90+
A[B+ Tree] --> B[Internal Node]
91+
B --> C[Leaf Node 1]
92+
B --> D[Leaf Node 2]
93+
C --> E[Data Pointer 1]
94+
C --> F[Data Pointer 2]
95+
D --> G[Data Pointer 3]
96+
D --> H[Data Pointer 4]
97+
H --> I[Next Leaf Node]
98+
```
99+
100+
#### Structure of B+ Tree
101+
- **Internal Nodes:**
102+
- Contain at least ⌈n/2⌉ pointers, except the root node.
103+
- At most, an internal node can contain n pointers.
104+
105+
- **Leaf Nodes:**
106+
- Contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
107+
- At most, a leaf node can contain n record pointers and n key values.
108+
- Every leaf node contains one block pointer P to point to the next leaf node, forming a linked list.
109+
110+
#### B+ Tree Insertion
111+
1. **Insertion at Leaf Node:**
112+
- If a leaf node overflows, split node into two parts.
113+
- Partition at i = ⌊(m+1)/2⌋.
114+
- First i entries are stored in one node.
115+
- Rest of the entries (i+1 onwards) are moved to a new node.
116+
- ith key is duplicated at the parent of the leaf.
117+
118+
2. **Insertion at Non-leaf Node:**
119+
- Split node into two parts.
120+
- Partition the node at i = ⌈(m+1)/2⌉.
121+
- Entries up to i are kept in one node.
122+
- Rest of the entries are moved to a new node.
123+
124+
#### B+ Tree Deletion
125+
1. **Deletion at Leaf Node:**
126+
- The target entry is searched and deleted.
127+
- If it is an internal node, delete and replace it with the entry from the left position.
128+
- After deletion, check for underflow.
129+
130+
2. **Handling Underflow:**
131+
- If underflow occurs, distribute the entries from the nodes left to it.
132+
- If distribution is not possible from the left, distribute from the nodes right to it.
133+
- If distribution is not possible from left or right, merge the node with left and right nodes.
134+
135+
In summary, indexing in DBMS is a crucial technique to enhance the speed and efficiency of data retrieval. Different indexing methods and structures are suited to various data and query types, ensuring optimized performance for diverse database operations.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"label": "Storage and Joins",
3+
"position": 4,
4+
"link": {
5+
"type": "generated-index",
6+
"description": "Explore various storage techniques and join operations in DBMS."
7+
}
8+
}
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
id: dbms-file-structure
3+
title: DBMS - File Structure
4+
sidebar_label: File Structure
5+
sidebar_position: 2
6+
description: Explore the different types of file structures in database management, including file organization methods and file operations.
7+
---
8+
9+
DBMS - File Structure
10+
---
11+
12+
Relative data and information are stored collectively in file formats. A file is a sequence of records stored in binary format. A disk drive is formatted into several blocks that can store records. File records are mapped onto those disk blocks.
13+
14+
File Organization
15+
---
16+
17+
File Organization defines how file records are mapped onto disk blocks. We have four types of File Organization to organize file records:
18+
19+
### Heap File Organization
20+
- **Description:** When a file is created using Heap File Organization, the Operating System allocates memory area to that file without any further accounting details. File records can be placed anywhere in that memory area.
21+
- **Characteristics:** No ordering, sequencing, or indexing.
22+
- **Responsibility:** Software manages the records.
23+
24+
### Sequential File Organization
25+
- **Description:** Records are placed in the file in some sequential order based on a unique key field or search key.
26+
- **Characteristics:** Practically, not all records can be stored sequentially in physical form.
27+
- **Example:** Library cataloging system where books are stored based on a unique identifier.
28+
29+
### Hash File Organization
30+
- **Description:** Uses Hash function computation on some fields of the records. The output of the hash function determines the location of the disk block where the records are to be placed.
31+
- **Characteristics:** Efficient for retrieval when the search is based on the hashed attribute.
32+
- **Example:** Student records where student ID is used to determine storage location.
33+
34+
### Clustered File Organization
35+
- **Description:** Related records from one or more relations are kept in the same disk block.
36+
- **Characteristics:** Not based on primary key or search key.
37+
- **Use Case:** Used when accessing related data together.
38+
39+
```mermaid
40+
graph TD;
41+
A[File Organization] --> B[Heap File Organization]
42+
A --> C[Sequential File Organization]
43+
A --> D[Hash File Organization]
44+
A --> E[Clustered File Organization]
45+
```
46+
47+
File Operations
48+
---
49+
50+
Operations on database files can be broadly classified into two categories:
51+
52+
1. **Update Operations**
53+
- **Description:** Change data values by insertion, deletion, or update.
54+
55+
2. **Retrieval Operations**
56+
- **Description:** Retrieve data without altering it, potentially with optional conditional filtering.
57+
58+
### Common File Operations
59+
- **Open:**
60+
- **Modes:** Read mode (data is read-only) and Write mode (data modification allowed).
61+
- **Characteristics:** Files in read mode can be shared; files in write mode cannot be shared.
62+
63+
- **Locate:**
64+
- **Description:** File pointer tells the current position where data is to be read or written.
65+
- **Function:** Can be moved forward or backward using find (seek) operation.
66+
67+
- **Read:**
68+
- **Description:** By default, the file pointer points to the beginning of the file when opened in read mode.
69+
- **Characteristics:** User can specify where to locate the file pointer.
70+
71+
- **Write:**
72+
- **Description:** Enables editing file contents, including deletion, insertion, or modification.
73+
- **Characteristics:** File pointer can be dynamically changed if allowed by the operating system.
74+
75+
- **Close:**
76+
- **Description:** Crucial for the operating system.
77+
- **Function:**
78+
1. Removes all locks if in shared mode.
79+
2. Saves data to secondary storage if altered.
80+
3. Releases all buffers and file handlers associated with the file.
81+
82+
```mermaid
83+
graph TD;
84+
A[File Operations] --> B[Update Operations]
85+
A --> C[Retrieval Operations]
86+
B --> D[Insert]
87+
B --> E[Delete]
88+
B --> F[Update]
89+
C --> G[Select]
90+
C --> H[Filter]
91+
```
92+
93+
The organization of data inside a file plays a major role in how efficiently these operations can be performed. The method used to locate the file pointer to a desired record inside a file varies based on whether the records are arranged sequentially or clustered.
94+
95+
In summary, understanding the various file structures and their operations is crucial for efficient database management, ensuring optimal performance and reliability.
96+
```

0 commit comments

Comments
 (0)