How Modern Indexing works in PostgreSQL

 

Indexing is one of the most important parts of Database Management Systems

 

Indexing makes data retrieval fast; the DBMS does not need to scan every row of the table
Because of indexing, DBMs do not need to load the entire file in memory (without indexing, DBMSs have to load page by page of a table against which a query is made), which improves memory optimization

 

What is Indexing and how does it works internally?

 

Note: This article is written taking note of PostgreSQL

 

Before moving into the internals of indexing, let’s see the types of indexing and how they are created
Types of indexing
  1. Clustered Indexing :
  2. Non-Clustered Indexing :
MySQL stores only non clustered index as a separate file on disk, but PostgreSQL treats every type of indexing as non-clustered and saves it as a separate file on disk

 

How indices are created by query


The following query creates an index on the Email column on the users table

 

CREATE INDEX idx_users_email
ON users(email);

 

Assuming the table is already containing thousands of records

 

idx_users_email will be the name of the index; it will be stored as metadata of that index
As you run the above query, Postgres starts scanning the table in our example. It starts scanning the Users table

 

Before we go further lets understand how a table is stored on the disk

 

DBMs stores the table as a file. Suppose the table contains 10k records, then that file is divided into pages, and each page is of 8 kb
Now back to the indexing process, which is in progress, while scanning the table dbms or in our case Postgre it scans according to key which is ‘Email’ column, it sorts alphabetically and starts storing in the indexing file (Note that Postgres by default sorts records according to key which is used to create an index), but it only stores certain info like primary key and most importantly the physical (not actual but logical address which may be used to calculate actual physical record address on the Disk) address of that record, so that Postgres can directly locate that record while fetching
Suppose for 10 k records, the indexing file stores 100 pages

 

Organization of the Index File

 

Each Index has root pages called page 0 
Each page has a max limit, which shows Postgre that what the upper limit so that it can determine the required value is on the same page or the next page
Example: Charlie is pointing to page 1
Michal is pointing to page 2 as M comes after C
Zoey is on the last page
So if the query is finding value Daniel, then Postgres will start on page 0. If Daniel is smaller than Michel, it will be on page 2
In this way, the indexing file is organized

 

Note: when you create an index on an existing table, it will take some time for sorting data and creating a new index file on disk

 

Now we will see what will actually happen, like how the index file is loaded in memory and loaded in the B+ Tree, and how it finds the physical address and retrieves the record
For example, Postre receives a query to retrieve the record of a user whose email is ‘daniel@gmail.com’, as we have indexing for the email column. Postgres loads the first page of the index file, as each page is 8 KB, it is easy to calculate the first page and load in memory

 

Now Postgres loads page 0 in the memory buffer and starts the process as given below
It looks for the page header, in that header, it finds the pd_special pointer, which shows PostgreSQL that index metadata is stored at the end
For extra verification, it checks flags BTP-ROOT AND BTP-META

 

Next important thing this page will contain is line pointers, these are the physical addresses of each record against the key, they become useful when PostgreSQL finds right page for a given key

 

Index page contains ranges against uppar limits, means it notes down ranges according to available data, for example table contains records and in our example our key is Email field, now suppose there are 3k records whose emails starts with letter ‘a’ then pages of these records will be more, if there are only 1k records whose email is starting from letter ‘z’ then pages for these records will be less, so in this way page range is maintained.

 

All ranges will be listed down, another example being if the table has  some records whose emails starts with ‘a’ but subsequent letters are missing, it means there are no records whose emails start with letter ‘b’ or ‘c’ instead there are records whose emails directly starting from letter ‘i’  then page 2 will contain records whose emails starts with letter ‘i’, 
All these range calculations are done while creating the index 
So now we have ranges, and it is easy to find the appropriate page

 

After finding the right page, that page is loaded into the B tree (Balanced Tree Structure), and that page is called the leaf page.
Leaf page and every other page will have a Tuple ID called as TID (Because any page can be a leaf page, it depends on which column you are using and which records are on which pages). By using this TID, Postgres will calculate the actual disk address of that record

 

How does Postgres find right TID? In our case Tuple is it Daniel?

 

Postgres does binary search on all TIDs that are loaded in the leaf node, because the leaf node may contain thousands of tuples, and scanning top to bottom is time-consuming
Once the physical address is found, postgress locate that record directly. Because PostgreSQL is written in C they may be using the function fseek()/iseek() or pread(), which can read a record based on physical address

 

OS System Calls level optimization in Modern Indexing

 

Page 0 is always needed, so Postgres keeps page 0 in RAM always, because reading everytime from the disk is time is time-consuming
Modern indexing uses a system call in Linux called io_uring which makes it possible to make synchronous call to disk, most of the system calls to disk were async this tradition was broken by advent of io_uring, developed by a Linux Kernel Engineer from Meta named Jens Asboe
We will cover the exciting story of io_uring and its entire functionality in some other article

 

We cover deep topics like Garbage Collector Internals, Linux Kernel Internals, Hypervisor Internals, Distributed Computing, Open Source System Code Snippets explained, Custom Memory Management, Also we cover exciting Tech Stories, Underdog Companies stories, Extraordinary Developers stories.

Our Contents are useful for CS Students to CTO of any company. Consider subscribing to our newsletter and it’s free.

We publish 3 deeply researched articles 3 times a week

 

Subscribe to our free Newsletter

Subscription Form

 

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top