Database: Index scan vs Sequential scan

In databases, two common methods of accessing data are index scans and sequential scans. To optimize the performance of your database, it’s crucial to comprehend the distinctions between these methods.

Sequential scan reading data from disk

Index scan

An index scan involves using an index structure to locate specific rows in a table. The database engine looks up the value being searched for in the index, and then uses a pointer to retrieve the corresponding row from the table. This method is particularly useful for queries that only need to access a small subset of rows in a table.

Imagine we have the following table with 10_000 rows with the index based on the column age:

id	name	age
1	John	23
2	Jane	18
3	Tom	25
10000	Mia	19

When you run the query:

SELECT id, name FROM users WHERE age < 20 LIMIT 20;

Index scan performs selection data right from the index. We get 20 pointers and after that, with several jumps to the disc database reads rows and provides the id and name to the end user.

Sequential scan

On the other hand, a sequential scan involves scanning every row in a table, whether or not it matches the query criteria. You might think that this method is slow, yes and no. The method is slow, but in some cases, it is better to use just sequential scan and ignore index scan.

Sequential scan jumps to that table, selects the correct rows, and returns them.

We have the same table, but this time we want to execute another query.

1
SELECT id, name FROM users WHERE id > 10;

Databases are smart enough to optimize the performance of this query. A sequential scan jumps to the disk, reads data, and returns back. The key feature here is understanding that eventually, we need to read a lot of data from the table.

Using an index scan in this case will slow down our query. Here’s why:

An index scan jumps to the index and fetches pointers (think about row IDs) according to the condition.
Using pointers, it fetches data from the disk.

When we need to read, for example, 95% of the data from the disk, it might be much faster to just read it and ignore 5% of rows instead of reading data from the index and from the table itself.

Pros and cons

Index scan

Pros:

Faster access to specific rows, especially in large tables
Requires less disk I/O compared to sequential scans
Can take advantage of sorted data in the index

Cons:

Takes up space and needs maintenance to keep running
Only useful for indexed columns
Might not be the best option for small tables

Sequential scan

Pros:

Can be faster for small tables or when examining all rows in a table
No additional space or maintenance overhead is required

Cons:

Slower for finding specific rows in large tables
Requires more disk I/O compared to index scans
Not optimized for pre-sorted data

Summary

In summary, index scans and sequential scans are both useful methods for accessing data in databases. The choice of method depends on the size of the table, the nature of the query, and the availability of appropriate indexes. Understanding the advantages and disadvantages of each method is important for optimizing database performance.