Ideas, facts & insights covering these topics:
11 ideas
·921 reads
7
Explore the World's Best Ideas
Join today and uncover 100+ curated journeys from 50+ topics. Unlock access to our mobile app with extensive features.
Given a table of employees and this query:
SELECT first_name, last_name, date_of_birth
FROM employees
WHERE date_of_birth >= ? AND date_of_birth
AND subsidiary_id = ?
Index(subsidiary_id, date_of_birth) is better than index(date_of_birth, subsidiary_id) because it visits fewer index leaf nodes. The difference is negligible if the date filter is very selective, but, otherwise, the larger the date range, the worse the performance.
34
208 reads
When using a LIKE filter on an indexed column, only the prefix before the wildcard (%) is used as an index condition; the remaining characters don't narrow the scanned index range. The longer the prefix, the more efficient the lookup.
Consequently, a LIKE filter with a leading wildcard (e.g. "%ant") can't use an index and has to do a sequential scan over the whole table.
29
133 reads
In general, queries are slow because they get too much data from the heap. To find the root cause, get the query's execution plan and look for:
31
104 reads
The smallest unit of data that a database can read is a table block or a table page, which is usually 8KB. Depending on where that block is located, reading it takes different amounts of time:
There's an order of magnitude between them, making it important to have the indexes loaded into memory.
30
88 reads
If you need to join 5 tables, 2 are joined, then their result is joined with another table, and so on. It doesn't matter in which order you write the joins because the database evaluates all orders and chooses the most efficient one.
Furthermore, depending on table statistics, available indexes, and system resources, the actual joining is done using one of these methods: nested loops join, hash join, or merge join.
As a general guideline, to improve join performance, consider creating composite indexes that cover the columns in the WHERE clause, followed by the columns in the JOIN condition.
32
80 reads
How: It iterates over all rows from one table and for each of them it searches the matching rows in the other table.
When: It's used for small datasets when the overhead of a more advanced join method is not justified, or when one of the tables has an index on the JOIN condition.
Tip: Ensure you have an index on the JOIN condition.
Complexity:
30
64 reads
How: First, it builds a hash table from the smaller table, where the key is the JOIN key and the value is the necessary tuple data. Then, it scans the other table, hashing the JOIN key of each row and emitting the rows that are also present in the hash table.
When: It's used for joining large datasets with equality conditions, that can fit into the working memory (work_mem).
Tip: Ensure you have enough working memory to avoid hash table spilling to disk.
Complexity:
30
56 reads
How: Sorts both tables by the join condition and merges the results like a zipper. For very large datasets, it splits the dataset into chunks, sorts each chunk in memory, writes each sorted chunk into a temporary file on disk, and finally does a multi-way merge of the sorted files on disk.
When: It's used when at least one input dataset is pre-sorted or for datasets larger than the working memory.
Tip: Ensure datasets are pre-sorted or can be sorted efficiently.
Complexity:
31
48 reads
The sorting operation of an ORDER BY is skipped if there's an index with the same ordering. Moreover, that index allows the ORDER BY to be executed in a pipelined manner i.e. if you request the Top-N results, the query won't need to read and sort the entire result set first.
In contrast, non-pipelined ORDER BYs need to sort all rows, materialize the results, and then return the first N results. They take longer to execute as the table grows in size, whereas the pipelined ORDER BY is unaffected (it's affected by the depth of the index B-tree, but that grows much slower than the table size).
31
46 reads
There are two ways: using a hash table or sorting and aggregating.
1 uses less memory because 2 sorts the entire input set, but 2 can use an index to avoid the sorting step and as such it can be pipelined i.e. it can return results before reading the entire input set.
30
45 reads
There are two ways to paginate results: offset and seek. The offset method uses the OFFSET keyword, whereas the seek method relies on a column filter to exclude the rows from the previous pages (in practice, this is usually the unique identifier of the last row from the previous page).
In terms of performance, the seek method is better because it doesn't have to process all previously paginated rows (as offset has to do) and is good for infinite scrolling use cases. However, it doesn't allow for page skipping, which can only be achieved with the offset method.
31
49 reads
IDEAS CURATED BY
Alt account of @ocp. I use it to stash ideas about software engineering
CURATOR'S NOTE
Tips to improve the performance of your SQL queries
“
Discover Key Ideas from Books on Similar Topics
5 ideas
History of Fashion Week
fashionweekonline.com
4 ideas
2 ideas
3 Myths About Emotions At Work You Need To Stop Believing
www-forbes-com.cdn.ampproject.org
Read & Learn
20x Faster
without
deepstash
with
deepstash
with
deepstash
Personalized microlearning
—
100+ Learning Journeys
—
Access to 200,000+ ideas
—
Access to the mobile app
—
Unlimited idea saving
—
—
Unlimited history
—
—
Unlimited listening to ideas
—
—
Downloading & offline access
—
—
Supercharge your mind with one idea per day
Enter your email and spend 1 minute every day to learn something new.
I agree to receive email updates