Ovidiu Podariu (Tech)'s Key Ideas from SQL Performance Explained
by Markus Winand

Name: SQL Performance Explained
Author: Ovidiu Podariu (Tech)

Ideas, facts & insights covering these topics:

Computer Science

Software Engineering

11 ideas

1.6K reads

Explore the World's Best Ideas

Join today and uncover 100+ curated journeys from 50+ topics. Unlock access to our mobile app with extensive features.

Index for equality first, then for ranges

Given a table of employees and this query:

SELECT first_name, last_name, date_of_birth

FROM employees

WHERE date_of_birth >= ? AND date_of_birth

AND subsidiary_id = ?

Index(subsidiary_id, date_of_birth) is better than index(date_of_birth, subsidiary_id) because it visits fewer index leaf nodes. The difference is negligible if the date filter is very selective, but, otherwise, the larger the date range, the worse the performance.

377 reads

Use longer prefixes in LIKE filters

When using a LIKE filter on an indexed column, only the prefix before the wildcard (%) is used as an index condition; the remaining characters don't narrow the scanned index range. The longer the prefix, the more efficient the lookup.

Consequently, a LIKE filter with a leading wildcard (e.g. "%ant") can't use an index and has to do a sequential scan over the whole table.

229 reads

How to debug a slow query that uses an index

In general, queries are slow because they get too much data from the heap. To find the root cause, get the query's execution plan and look for:

Heap-table accesses → which means it's reading too much data from the heap
Wide index range scans → which means the index scan is not very selective, resulting in many potential matches, which need to be fetched from heap to determine if they're good matches

176 reads

How long it takes to read a table block

The smallest unit of data that a database can read is a table block or a table page, which is usually 8KB. Depending on where that block is located, reading it takes different amounts of time:

From memory: 2μs
From SSD: 100μs
From HDD: 10ms

There's an order of magnitude between them, making it important to have the indexes loaded into memory.

152 reads

Only two tables are joined at a time

If you need to join 5 tables, 2 are joined, then their result is joined with another table, and so on. It doesn't matter in which order you write the joins because the database evaluates all orders and chooses the most efficient one.

Furthermore, depending on table statistics, available indexes, and system resources, the actual joining is done using one of these methods: nested loops join, hash join, or merge join.

As a general guideline, to improve join performance, consider creating composite indexes that cover the columns in the WHERE clause, followed by the columns in the JOIN condition.

142 reads

Nested loops join

How: It iterates over all rows from one table and for each of them it searches the matching rows in the other table.

When: It's used for small datasets when the overhead of a more advanced join method is not justified, or when one of the tables has an index on the JOIN condition.

Tip: Ensure you have an index on the JOIN condition.

Complexity:

without any indexes: O(M*N) time. O(1) space
with indexes: O(M*logN) time. O(1) space

115 reads

Hash join

How: First, it builds a hash table from the smaller table, where the key is the JOIN key and the value is the necessary tuple data. Then, it scans the other table, hashing the JOIN key of each row and emitting the rows that are also present in the hash table.

When: It's used for joining large datasets with equality conditions, that can fit into the working memory (work_mem).

Tip: Ensure you have enough working memory to avoid hash table spilling to disk.

Complexity:

O(M+N) time. O(M) space
O(M) to build the hash table + O(N) to scan the other table

93 reads

Merge join

How: Sorts both tables by the join condition and merges the results like a zipper. For very large datasets, it splits the dataset into chunks, sorts each chunk in memory, writes each sorted chunk into a temporary file on disk, and finally does a multi-way merge of the sorted files on disk.

When: It's used when at least one input dataset is pre-sorted or for datasets larger than the working memory.

Tip: Ensure datasets are pre-sorted or can be sorted efficiently.

Complexity:

for pre-sorted inputs: O(M+N) time. O(1) space
otherwise: O(MlogM + NlogN) time. O(M+N) space

77 reads

Use indexes to pipeline ORDER BY clauses

The sorting operation of an ORDER BY is skipped if there's an index with the same ordering. Moreover, that index allows the ORDER BY to be executed in a pipelined manner i.e. if you request the Top-N results, the query won't need to read and sort the entire result set first.

In contrast, non-pipelined ORDER BYs need to sort all rows, materialize the results, and then return the first N results. They take longer to execute as the table grows in size, whereas the pipelined ORDER BY is unaffected (it's affected by the depth of the index B-tree, but that grows much slower than the table size).

84 reads

How a GROUP BY is executed

There are two ways: using a hash table or sorting and aggregating.

The database builds a hash table with the key being the grouped columns, and the value being the aggregated value. This hash table is built in a temporary table and then returned to the client as the result.
The database sorts the rows by the grouped columns, stores them in a temporary table and then aggregates the results.

1 uses less memory because 2 sorts the entire input set, but 2 can use an index to avoid the sorting step and as such it can be pipelined i.e. it can return results before reading the entire input set.

76 reads

Pagination methods

There are two ways to paginate results: offset and seek. The offset method uses the OFFSET keyword, whereas the seek method relies on a column filter to exclude the rows from the previous pages (in practice, this is usually the unique identifier of the last row from the previous page).

In terms of performance, the seek method is better because it doesn't have to process all previously paginated rows (as offset has to do) and is good for infinite scrolling use cases. However, it doesn't allow for page skipping, which can only be achieved with the offset method.

81 reads

IDEAS CURATED BY

Ovidiu Podariu (Tech)

@ocpodariu

Alt account of @ocp. I use it to stash ideas about software engineering

CURATOR'S NOTE

Tips to improve the performance of your SQL queries

“

Discover Key Ideas from Books on Similar Topics

5 ideas

History of Fashion Week

fashionweekonline.com

4 ideas

Believe in Yourself (And Why Nothing Will Work If You Don't...)

jamesclear.com

2 ideas

3 Myths About Emotions At Work You Need To Stop Believing

www-forbes-com.cdn.ampproject.org

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

—

100+ Learning Journeys

—

Access to 200,000+ ideas

—

Access to the mobile app

—

Unlimited idea saving

—

Unlimited history

—

Unlimited listening to ideas

—

Downloading & offline access

—

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

I agree to receive email updates

deepstash

Content

Ideas

Collections

Stories

Explore

Product

Pricing

Businesses

Resources

Terms

Privacy

Press Kit

Sitemap

Company

About

Contact

Ovidiu Podariu (Tech)'s Key Ideas from SQL Performance Explainedby Markus Winand

Index for equality first, then for ranges

Use longer prefixes in LIKE filters

How to debug a slow query that uses an index

How long it takes to read a table block

Only two tables are joined at a time

Nested loops join

Hash join

Merge join

Use indexes to pipeline ORDER BY clauses

How a GROUP BY is executed

Pagination methods

Discover Key Ideas from Books on Similar Topics

Ovidiu Podariu (Tech)'s Key Ideas from SQL Performance Explained
by Markus Winand