Brute force finding something inside a database would be a terrible idea and lead to terrible performance. Thus, indexing is useful. The main idea is to keep some additional metadata on the side that act as signposts to locate the date.

Index will slows down write but speed up read.

Hash Indexes

Starts with example of key-value stores.

How to avoid running out of disk space? To break the log into segments of a certain size, and making subsequent writes to a new segment file.

Compaction can also be made: throw away duplicate keys and keeping only the most recent update. After compaction, several logs can be merged together to a new file. (those need to be done in a background thread)

Several things to keep in mind:

SSTables and LSM-Trees

if the sequence of key-value pairs is sorted by key, then this format is Sorted String Table or SSTable.

Each key only appears once within each merged segment file.