Skip to main content

Command Palette

Search for a command to run...

Lakehouse Architectural Pattern with Delta Lake

Updated
1 min read
Lakehouse Architectural Pattern with Delta Lake

Delta Lake Technologies

A Delta Table is a collection of components kept together using Delta Lake technologies:

  • Delta files (Parquet files)
  • Delta transaction logs (stored in object storage)
  • The metastore (optional)

Delta Transaction Logs

The logs are under the special folder named: _delta_log

Why metastore?

In order to make a Parquet file on the data lake part of the Delta Lake (and take advantage of the many features of a Delta Lake, one being ACID transactions), the Delta Table is registered in the metastore.

When using Spark SQL and Delta Lake, the queries go against tables only if they are registered in the metastore.

Delta Lake as the important component of a modern data warehouse

Delta Lake provides a critical layer of a modern data warehouse (i.e. Lakehouse), by bringing structure and reliability to the data lakes in support of any downstream data use case.

Features

Delta Lake brings data warehousing capabilities to data lakes without the limitation of structured data

  1. An open source ACID table storage layer over the cloud object stores
  2. Adds quality and performance to the data lakes
  3. Based on the Parquet open format

Delta Lake was initially developed at Databricks and open-sourced in 2019.

More from this blog

The house on the Data Lake

6 posts