Open Lakehouse Engineering/Apache Iceberg Lakehouse Engineering
A Directory of Resources
The concept of the Open Lakehouse has emerged as a beacon of flexibility and innovation. An Open Lakehouse represents a specialized form of data lakehouse (bringing data warehouse-like functionality and performance to data on a data lake), uniquely characterized by its commitment to open standards and technologies. At the core of this paradigm are tools like Apache Iceberg, Nessie, and Apache Arrow, which collectively empower organizations to build highly efficient, scalable, and interoperable data ecosystems.
Unlike conventional data lakehouses, which may have high levels of coupling between the storage formats, governance, optimization, and more of their data with one vendor with few alternatives, an Open Lakehouse prioritizes the avoidance of vendor lock-in, ensuring that organizations maintain full control over their data infrastructure. This approach not only fosters a more adaptable and resilient data environment but also encourages a collaborative, community-driven development ethos that is instrumental in driving the field forward.
A key platform enabling open lakehouses is Dremio, a cutting-edge lakehouse platform that epitomizes the Open Lakehouse philosophy. Dremio seamlessly integrates various data sources, leveraging the power of open-source technologies to unify data management and analytics. This integration allows for an unprecedented level of flexibility and efficiency, making Dremio an indispensable tool for organizations looking to harness the full potential of their data. Dremio enables the maximization of decentralization in data by harnessing the right features for data virtualization (decentralized data), data lakehouse (decentralized access to a single copy of a dataset), and data mesh (decentralized data curation).
This directory is a comprehensive resource for anyone looking into Open Lakehouse Engineering. Whether you're a seasoned data professional or just starting, the following resources will guide you through building and managing an Open Lakehouse, ensuring you're well-equipped to leverage these exciting technologies to their fullest extent. Feel free to modify or expand upon this introduction to better fit the tone and scope of
If you are new to the data space, I recommend starting with this playlist which will cover lakehouse engineering, modeling, big data concepts, and more.
Getting Started with Open Lakehouses
No Code Setup of a Data Lakehouse on your Laptop with Dremio & Minio using Docker Desktop
Blog: Creating an Iceberg Lakehouse on your Laptop with Dremio/Minio/Nessie
Blog: BI Dashboard Acceleration: Cubes, Extracts, and Dremio’s Reflections
Hands-on Articles
Blog: Creating an Iceberg Lakehouse with Spark, Minio, Dremio, Nessie
Blog: Connecting to Dremio Using Apache Arrow Flight in Python
Blog: Exploring the Architecture of Apache Iceberg, Delta Lake, and Apache Hudi
Blog: How to Create a Lakehouse with Airbyte, S3, Apache Iceberg, and Dremio
Blog: 3 Ways to Convert a Delta Lake Table Into an Apache Iceberg Table
Blog: Getting Started with Project Nessie, Apache Iceberg, and Apache Spark Using Docker
Video: Apache Superset & Dremio: How to Run Superset from Docker and Connect to Dremio Cloud


