Apache Data Lakehouse Weekly: December 30, 2025 – January 5, 2026
Get Data Lakehouse Books:
Lakehouse Community:
The opening days of 2026 bring a mix of holiday-quiet development activity and significant momentum building for the year ahead across Apache Iceberg, Polaris, Arrow, and Parquet. The community enters the new year with major releases approved, summit preparations accelerating, and foundational design work advancing across all four projects.
Apache Iceberg
The Iceberg community kicked off 2026 with several key initiatives that signal the project’s continued maturation and ecosystem expansion.
Iceberg Summit 2026 CFP Open: The call for papers for the inaugural Apache Iceberg Summit is now live, with submissions closing January 18, 2026. Planned for April 8-9 in San Francisco, the summit represents a milestone for the community—offering a dedicated forum for sharing real-world use cases, integrations, and best practices. The selection committee discussions wrapped up in late 2025, and the community is now rallying speakers from across vendors, users, and contributors.
Atlanta Meetup Scheduled: The Apache Iceberg Meetup ATL announced its first event of 2026 on January 21, continuing the grassroots community engagement that has grown throughout 2025. Organizers are working on a structured CFP process to encourage diverse presenters and topics throughout the year.
OAuth2 Manager v2 Proposal: Contributors shared a comprehensive design document for overhauling Iceberg’s OAuth2 authentication manager. The proposal acknowledges that while seamless evolution would be ideal, practical reality requires careful migration planning across multiple minor versions. The document includes detailed deprecation and transition roadmaps, with discussion planned for the January 2026 catalog meeting.
Apache Polaris
Polaris enters 2026 with continued momentum toward graduation, building on the strong foundation established in 2025.
Generic Table Capability Stabilizing: Following multiple iterations throughout 2025, the “Generic Table” feature is expected to graduate from beta status in the upcoming 1.3.0-incubating release. This capability allows Polaris to catalog external table formats like Apache Hudi and Delta Lake alongside Iceberg tables, positioning Polaris as a truly multi-format catalog solution.
Community Growth: Regular community syncs continued through year-end 2025, with development sprints scheduled to focus on documentation, onboarding, and open issues. The project’s expanding PPMC (Podling Project Management Committee) reflects healthy governance maturation as Polaris moves toward full Apache graduation.
Integration Testing Enhancements: With AWS credits now available to the project, contributors discussed expanding integration testing against real cloud infrastructure, particularly for IAM AssumeRole flows and credential vending that are difficult to simulate locally.
Apache Arrow
The Arrow project closed out 2025 and entered 2026 with its characteristic multi-language consistency and mature release cadence.
Leadership Continuity: Antoine Pitrou, Arrow’s co-creator, was formally appointed PMC Chair, reinforcing governance stability and providing continued technical vision from the project’s founding leadership.
Go 18.5.0 Released: Arrow Go shipped version 18.5.0 comprising 38 commits from 17 contributors. These regular updates ensure the Go implementation maintains feature parity with Arrow’s C++, Rust, and Java implementations—critical for lightweight analytics and ETL pipelines that rely on Go.
TimestampWithOffset Format Addition: The community successfully voted to add a new canonical type for timestamps that encode UTC offset directly with each value. This enhancement eliminates ambiguity when interpreting timestamps across systems with different time zones or daylight saving settings, addressing a long-standing pain point in cross-system data exchange.
Apache Parquet
Parquet begins 2026 with a major release vote and early discussions about the format’s long-term evolution.
1.17.0 Release Vote Passes: The vote for Apache Parquet 1.17.0 RC0 passed on January 2, 2026, with binding approvals and successful integration testing against Iceberg, Trino, and Spark. This release drops Java 8 support, setting Java 11 as the new minimum runtime—a significant modernization that aligns Parquet with contemporary Java library standards and enables the use of modern language features.
FSST String Encoding Work Continues: Contributors advanced design discussions around FSST (Finite State Symbol Table) compression for string and byte array encoding. Sharing a compressed dictionary across multiple column pages can significantly reduce file size and improve scan performance for string-heavy datasets.
V3 Format Groundwork: While no concrete proposal has emerged, informal discussions suggest the community is beginning to scope what a Parquet V3 format might include. Topics under consideration include improved metadata layouts, enhanced bloom filter indexing, and cloud-native encoding optimizations—all aimed at reducing scan times without breaking backward compatibility.
Cross-Project Themes
Several patterns emerge across the four projects as 2026 begins:
Java Modernization: Both Iceberg and Parquet are raising their minimum Java requirements (Parquet to Java 11, with Iceberg discussions around Java 17), enabling modern language features and cleaner dependency management while gradually phasing out legacy runtime support.
Community Events and Engagement: From Iceberg’s summit planning to local meetups and regular community syncs, all four projects demonstrate strong investment in face-to-face collaboration and knowledge sharing. These forums help translate mailing list technical discussions into practical implementation guidance.
Format Evolution Discussions: While Iceberg explores V4 features and Parquet considers V3 possibilities, both projects balance innovation with stability. The focus remains on completing current format generation features before introducing breaking changes, ensuring production users have stable, fully-featured platforms.
Looking Ahead
The first weeks of 2026 set an ambitious tone for the lakehouse ecosystem. Key developments to watch include the Iceberg Summit in April, Polaris’s graduation timeline, continued Arrow format enhancements, and the completion of Parquet 1.17.0 rollout. As these projects mature, expect tighter integration, shared learnings, and continued focus on production-grade reliability.


Appreciate the roundup. The Parquet 1.17.0 Java 11 bump is actually a pretty significant signal, shows the ecosystem is finally willing to drop legacy runtime support in favor of modern tooling. The FSST string compression work is interesting too, especially for workloads with high-cardinality string columns where dictionary encoding alone doesnt cut it. Curious how the V3 format discussion will handle backward compat, since thats historically been Parquets biggest strength against newer formats.