The Delta Lake Format

When a file is saved using the Delta Lake format, it’s just a standard Parquet file with additional metadata.

Parquet Files

The Apache Parquet file format is column-oriented, meaning that the values of each table are stored next to each other, rather than in each record. In addition to the actual data, it contains metadata, including the schema and structure of the file.

When spark attempts to write to parquet format, multiple files would be created and written since each partition would write a portion of the file. Having this ability to split large datasets that can be processed in parallel can dramatically increase performance.

Writing a Delta File

When writing Delta, the parquet files would still be written. But, additional _delta_log file would be created that contains the transaction log with every single operation performed in the data.

The Delta Lake Transaction Log

Also known as the DeltaLog, is a sequential record of every transaction performed on a Delta Lake file since its creation. It is at the core of many of important features, including ACID transactions, scalable metadata handling and time travel.

The main goal of the transaction log is to enable multiple readers and writers to operate on a given version of a dataset simultaneously, and provide additional information to the execution engine for more performant executions. It always shows the user a consistent view of the data and serves as a single source of truth. It is the central repository that tracks all changes.

Reading a Delta File

When system reads a Delta Lake table, it will iterate through the transaction log to “compile” the current state of the table. The sequence of events is to read the transaction log files first then the part files based on log files. If a delta log is missing, then even if there’s a parquet file, it won’t be read.

note: most of the details are ignored, since this note only aim to provide a high level understanding

Scaling Massive Metadata

Checkpoint Files

A checkpoint file in Parquet format in the _delta_log folder would be saved. The Delta Lake writer will keep generate new checkpoint every 10 commits.

A checkpoint file saves the entire state of the table at a given point in time which contain the add file, remove file, update metadata, commit info etc. actions, with all the context info. It will save this list in native Parquet format. This will allow Spark to read the checkpoint quickly.