In databases, two common methods of accessing data are index scans and sequential scans. To optimize the performance of your database, it’s crucial to comprehend the distinctions between these methods.
Index scan
An index scan involves using an index structure to locate specific rows in a table. The database engine looks up the value being searched for in the index, and then uses a pointer to retrieve the corresponding row from the table. This method is particularly useful for queries that only need to access a small subset of rows in a table.
Imagine we have the following table with 10_000 rows with the index based on the column age
:
id | name | age |
---|---|---|
1 | John | 23 |
2 | Jane | 18 |
3 | Tom | 25 |
10000 | Mia | 19 |
When you run the query:
SELECT id, name FROM users WHERE age < 20 LIMIT 20;
Index scan performs selection data right from the index. We get 20 pointers and after that, with several jumps to the
disc database reads rows and provides the id
and name
to the end user.
Sequential scan
On the other hand, a sequential scan involves scanning every row in a table, whether or not it matches the query criteria. You might think that this method is slow, yes and no. The method is slow, but in some cases, it is better to use just sequential scan and ignore index scan.
Sequential scan jumps to that table, selects the correct rows, and returns them.
We have the same table, but this time we want to execute another query.
|
|
Databases are smart enough to optimize the performance of this query. A sequential scan jumps to the disk, reads data, and returns back. The key feature here is understanding that eventually, we need to read a lot of data from the table.
Using an index scan in this case will slow down our query. Here’s why:
- An index scan jumps to the index and fetches pointers (think about row IDs) according to the condition.
- Using pointers, it fetches data from the disk.
When we need to read, for example, 95% of the data from the disk, it might be much faster to just read it and ignore 5% of rows instead of reading data from the index and from the table itself.
Pros and cons
Index scan
Pros:
- Faster access to specific rows, especially in large tables
- Requires less disk I/O compared to sequential scans
- Can take advantage of sorted data in the index
Cons:
- Takes up space and needs maintenance to keep running
- Only useful for indexed columns
- Might not be the best option for small tables
Sequential scan
Pros:
- Can be faster for small tables or when examining all rows in a table
- No additional space or maintenance overhead is required
Cons:
- Slower for finding specific rows in large tables
- Requires more disk I/O compared to index scans
- Not optimized for pre-sorted data
Summary
In summary, index scans and sequential scans are both useful methods for accessing data in databases. The choice of method depends on the size of the table, the nature of the query, and the availability of appropriate indexes. Understanding the advantages and disadvantages of each method is important for optimizing database performance.