Data Engineering Podcast

Version Your Data Lakehouse Like Your Software With Nessie

March 10th, 2024  •  40 mins 55 secs  •  Download (26.4 MB)  •  Link with Timestamp

RSS Feed

Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility. In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git.