How databases store data on the disk

Easy explanation of how HDDs and SSDs store data from database

Feb 19, 2026

Appreciate you reading this newsletter, I am writing as I learn and explore these topics in depth. I hope the content and resources are helping you level up as an engineer. If you want me to write about something specific, just message me or comment on the post.
Note: This article on was originally published on my personal website - https://pradyumnachippigiri.dev/blogs/database-storage-hdd-ssd..
Also, checkout all my Linkedin tech posts here → https://pradyumnachippigiri.dev/blogs/linkedin-posts

Here’s what I learned this week

Let Start,,

Understanding how databases store table data such as rows and columns on disk forms the foundational groundwork for understanding how databases work internally. It helps us understand how indexes operate, how multi-level indexes are structured, and how B-trees and B+ trees efficiently store and access indexed data. But before that lets understand this from ground up

Let’s see how it works below for HDDs, and then the similar concept applies to the SSDs.

Disk and Disk Structure

A disk is a non-volatile storage device used to store data persistently. Because it is non-volatile, the data on disk remains intact even when the system loses power (unlike RAM). This is why disks are essential for storing permanent files, databases, and operating system data.

Would highliy recommend you watch this 3-minute youtube video, to get an idea of how an actual real HDD looks, so that you get a sense of what we are going to discuss in this article today :

The disk is composed of the following components :

Platters: Magnetic disks where data is recorded on both surfaces; and each surface is nothing but a set of concentric circles. (logic)
Spindle (spindle motor): Rotates the spindle, which spins all the platters at a fixed RPM. (rotations per minute)
Tracks: Individual concentric circular rings on the platter.
Sector: This is a geometric sector, meaning just a portion of a disk. (shaped like a slice of pie.)
Block (OS/filesystem block): The intersection of a track and a sector is called a block. Hence, any location on the disk, i.e., block address, can be identified by the track number and sector number. The block size is usually decided by the manufacturer. (earlier one block was around 512 bytes, now it is around 4KB+)
Arm: Moves the read/write heads inward or outward to position them over the required track and sector.
top view of disk

How data is accessed from the disk ?

Question: So how do we access the disk to read and write?
Answer : Well, we use disk APIs, which are OS-level system calls like read() and write()to read data from disk and write data back to disk.

Question : But how much data is accessed at a time?
Answer: At the OS/filesystem level, it doesnt like to deal with sectors etc, so data is transferred in blocks (usually 4 KB). Even if we ask for a few bytes, the OS typically returns the bytes thats contained in those 4KB blocks. Meaning it is the minimum amount of data which can be read/write by an I/O operation.

At the database level, the DB does not like to deal with these blocks, it thinks its too small a unit, so it works with pages (e.g., 8 KB / 16 KB). A page is the DB’s logical unit, and it usually spans multiple blocks. So whenever a we request data (using sql query for rows), the OS ends up fetching bytes from one block or a set of blocks (depending on how much we asked for). Meaning if the disk returns bytes from 2 blocks say 8KB, then the db considers it as 1 page of 8KB. (Let’s solve an example together to get a clear understanding of this..)

As discussed, a data page is a fixed-size chunk (e.g., 8 KB) that the database reads from disk into memory. If the underlying disk block size is 4 KB, then one 8 KB page is stored across two disk blocks, so the OS reads two blocks to bring one page into RAM.

The page starts with a header that stores metadata like page id and free space pointers.
Actual rows are stored sequentially in the data area as bytes.
At the bottom of the page is a slot directory, which is an array of offsets. Each slot entry stores the byte position where a particular row begins inside the page.

Question : But after accessing the data where are we transferring it and storing it?
Answer : The data that’s read (the block/blocks) is first brought into our working memory (RAM). From there, the CPU can process it efficiently. And you might know that inside our working memory, we organize and manipulate that data using data structures. (arrays, hash maps, trees, etc.).

Lets try and put everything together and understand the high level flow of how data flows in and out..

Read Path

DB finds the page where the rows live and identifies offset on disk for those rows. (If an index exists, the database uses the index to directly identify the page containing the row. If no index exists, the database scans pages sequentially until it finds the row.)
DB asks the OS to read the bytes from that page file from certain offset till a certain offset.
OS checks filesystem cache; if missing, OS reads from disk and returns to the db.
DB places the page into the buffer pool (main memory) and reads the rows from that page..

Write Path

DB finds the page where the rows live and pulls it into the buffer pool..
DB updates the row in memory.
DB writes a journal/WAL entry and persists it to disk
The page stays in memory and may receive more writes before being flushed back to disk later (reducing I/Os)
Inserts/deletes follow a similar pattern (details vary)

How data is organized on a disk with an example

Lets consider our block size of the disk designed by our manufacturer to be 4KB. And the data page size of our database to be 8 KB. And we have a table of employee records containing 1000 records like this :

Employee id → 16 bytes 
Employee Name → 16 bytes 
Employee Address → 32 bytes 
Total bytes of one row = 64 bytes

Now, Say if we want to query and return one row (employee_id = 999) :

SELECT * FROM employee_table WHERE employee_id = 999;

We considered our page size to be 8KB right, with 8060 bytes being the empty record area.

one 8KB page = two 4KB blocks.

But since we are considering only 8060 bytes (as some bytes are taken away by header, and offset.. ), and 1 row is 64 bytes so :

Number of rows in 1 page  = 8060 / 64
                             = 125.93
                             = 125 rows

So, one page will actually have 125 rows, and not 128 rows.

To cover all the rows, the total number of pages needed would be :

The total number of pages required = 1000 / 125 
                                   = 8 pages

Rows are packed into pages; pages are fetched from disk using blocks. In order to fetch 8 pages in terms of OS level blocks :

The number of blocks to fetch 1 page  = 8192 / 4096 
                                      = 2 blocks

Each 8KB page is stored on disk across 2 contiguous 4KB blocks (meaning these 125 rows are spread across 2 sequential blocks). The OS reads those 2 blocks to get back the data to the page in memory

So searching for employee_id = 999 may require scanning all 8 pages (16 blocks), even though we need only one row.

Indexing

Now imagine if we had a table of size 100000, and if had to find the last record then we had to read all the → 800 pages, that’s 1600 OS blocks worth of data to scan. As table size grows, a full scan requires reading more and more pages, which increases I/O cost.

The number of blocks we need to access is directly proportional to the read access time. So we need to reduce the number of block access.

Thats where indexing comes in..

Let;s index the table on employee id

The next question comes as to where would we store the index table. ?

The index table also would be stored on the disk.. , so how many blocks would it take.?

Employee id → 16 bytes 
Record pointer → assuming 6 bytes
Total bytes of one row = 22 bytes

The number of rows in one page = 8060 / 22
                               = 366.3 ~ 367

Number of pages required to cover all rows of the index = 1000 / 367
                                                        = 2.72 ~ 3 pages

So the index table would take at most 6 OS blocks.

Now the best part is: to access the index, at maximum we may read 6 blocks (3 pages). Once we find the key, we get its pointer and in a real database that pointer usually does not point directly to the “row object.” Instead, it stores something like:

(Page ID, Slot Offset)

Meaning:

Jump to the correct data page that contains the row
Use the page’s slot directory to locate the exact row inside that page

So after the index lookup, we only need to read 1 data page (≈ 2 OS blocks) to fetch the row.

So the total blocks needed becomes:

6 blocks (index) + 2 blocks (data page) = 8 blocks

So the drastic change we saw is: without index we may read 16 blocks (8 pages), and with an index it may drop to 8 blocks (~4 DB pages).

Multi-Level Indexing

So far, our index only had 1,000 entries.

It fit into just 3 pages (6 blocks).

But what if the table had 1 million records?

Now the index would also grow proportionally.

If 1,000 rows → 3 index pages
Then 1,000,000 rows → 3,000 index pages.
That’s: 3,000 pages × 2 blocks = 6,000 OS blocks

Now imagine searching for an employee ID inside such a large index.

If we had to scan the entire index page by page, we would again be reading thousands of blocks.

So the index itself becomes expensive to search.

So to solve this, we don’t scan the entire index.

Instead, we build an index on top of the index.

Think of it like this:

We group the 3,000 index pages into ranges.

Then we create a small “guide index” that stores:

First employee_id in each index page
Pointer to that index page
multi-level indexing

Now instead of scanning 3,000 index pages:

We first read the small guide index.
It tells us which index page to go to.
Then we read that specific index page.
Then we read the actual data page.

So instead of scanning thousands of pages, we now read only:

1 guide index page
1 lower-level index page
1 data page

That’s just 3 pages total.

If we keep adding more records, we can keep adding more “guide levels”.

This naturally forms a tree-like structure.

Instead of manually managing multiple index layers, databases use a self-balancing tree structure that:

Automatically grows when data grows
Automatically reorganizes itself
Keeps search cost logarithmic

This structure is called a B-tree (or more commonly in databases, a B+ tree).

We will talk about B- trees and B+ trees in the next articles..

How the data storage in SSD works ?

In SSDs, there are no spinning platters. Instead, data is stored across multiple NAND flash chips (often exposed as separate targets inside the SSD) and managed by an NVMe controller (as shown above). The important thing is: even on SSDs, the OS still talks to storage in terms of fixed-size blocks (e.g., 4KB) using logical block addresses.

So when the database requests an 8KB page, the OS issues reads for the corresponding 2 blocks. The NVMe controller then routes those block reads to the correct target / flash chip, fetches the data from the underlying flash pages, and returns the bytes back into RAM. From the database’s point of view, the page-based model stays the same only the physical storage mechanics under the hood change.

If you liked this article, please do like, subscribe, and drop a comment to spark a discussion, so we can all learn from each other. And if you think it’ll help someone, share it with your friends or on social media.

If you’d like 1:1 help (system design, backend, DSA, resume review etc.), you can book a session with me on Topmate

If you found this useful and want to support my work, you can also buy me a coffee ☕
buymeacoffee.com/cpradyumnao..

I also write short form content on a daily basis on X and Linkedin. Please do follow there too..

References

B-trees vs B+ trees

Evolving Engineer

Discussion about this post

Ready for more?