<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data, Lakehouse and AI with Alex Merced]]></title><description><![CDATA[Data, Lakehouse and AI with Alex Merced is a deep dive into the architecture shaping modern analytics. Each edition explores data lakehouse design, open table formats like Apache Iceberg, catalog strategy, semantic layers, query acceleration, and the rise]]></description><link>https://amdatalakehouse.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!h4k8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13abec9b-070a-4bbd-82f6-b3e9ddf01c5a_1024x1024.png</url><title>Data, Lakehouse and AI with Alex Merced</title><link>https://amdatalakehouse.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sun, 17 May 2026 20:35:37 GMT</lastBuildDate><atom:link href="https://amdatalakehouse.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Alex Merced]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[amdatalakehouse@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[amdatalakehouse@substack.com]]></itunes:email><itunes:name><![CDATA[Alex Merced]]></itunes:name></itunes:owner><itunes:author><![CDATA[Alex Merced]]></itunes:author><googleplay:owner><![CDATA[amdatalakehouse@substack.com]]></googleplay:owner><googleplay:email><![CDATA[amdatalakehouse@substack.com]]></googleplay:email><googleplay:author><![CDATA[Alex Merced]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[AI Weekly: Voice Models, Custom Silicon, MCP Goes Enterprise (May 7–13, 2026)]]></title><description><![CDATA[This week, OpenAI shipped three voice models in the API and a security variant of GPT-5.5.]]></description><link>https://amdatalakehouse.substack.com/p/ai-weekly-voice-models-custom-silicon</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/ai-weekly-voice-models-custom-silicon</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Fri, 15 May 2026 13:03:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TyFV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TyFV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TyFV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!TyFV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!TyFV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!TyFV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TyFV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2255611,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/197620308?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TyFV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!TyFV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!TyFV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!TyFV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd99370-2842-4787-8e5c-2c61305021ce_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This week, OpenAI shipped three voice models in the API and a security variant of GPT-5.5. Anthropic doubled Claude Code rate limits using SpaceX Colossus compute. Thinking Machines released its first model, a 276B-parameter system built for 200-millisecond real-time interaction. Google launched Googlebooks and Gemini Intelligence at its Android Show. Anthropic released Claude for Legal with 20+ MCP connectors. Cursor landed in Microsoft Teams. The connective tissue across all of these stories: AI coding tools, processing hardware, and standards work are all maturing in parallel, and each maturation is reshaping the others.</p><h2><strong>AI Coding Tools: Cursor in Teams, Copilot CLI Iterates Fast</strong></h2><p>Cursor expanded its surface area this week. The Cursor team <a href="https://www.cryptointegrat.com/p/ai-news-may-12-2026">announced on May 11</a> that Cursor is now available in Microsoft Teams. Users can mention <a href="https://dev.to/cursor">@cursor</a> in any Teams channel to delegate tasks to an agent or pull information from Cursor directly into Teams. The integration matters because it moves Cursor out of the editor and into the place where engineering teams actually coordinate work. Async code review, ticket triage, and PR follow-ups now happen in the same surface as the conversation about them.</p><p>GitHub Copilot CLI shipped five releases between May 5 and May 11, 2026. Version 1.0.41 (May 5) reduced startup time by rendering the UI immediately while authentication ran in the background. Version 1.0.42 (May 6) improved MCP server error messages. Version 1.0.43 (May 6) added server-side model routing for Auto mode. Version 1.0.44 (May 8) fixed path completion flickering and enabled mid-input slash commands. Version 1.0.45 (May 11) added an /autopilot slash command to toggle between interactive and autopilot modes. Five releases in seven days reflects the cadence the Copilot CLI team has settled into since the <a href="https://www.havoptic.com/tools/github-copilot">tool went GA earlier this year</a>.</p><p>Anthropic doubled Claude Code rate limits overnight on May 8, 2026. The increase came from a SpaceX partnership that added 300 megawatts of new compute, the equivalent of more than 220,000 Nvidia GPUs, in under a month. The Colossus One facility was originally built for xAI&#8217;s Grok training workloads. Anthropic now uses it for burst compute on developer-facing products. <a href="https://aitoolsrecap.com/Blog/ai-news-may-8-2026">Claude Code users had reported hitting output limits during peak hours</a>, and the doubled limits resolve that pressure point without raising prices.</p><p>The competition between Claude Code, Cursor, and GitHub Copilot continues to harden. Claude Code runs on Opus 4.7 with <a href="https://www.nipralo.com/blogs/best-ai-coding-tools-2026">SWE-bench Verified at 87.6%, SWE-bench Pro at 64.3%, and CursorBench at 70%</a>, according to Anthropic&#8217;s April 16 release. Cursor pushed Composer 2 and parallel agents in April. GitHub paused new sign-ups for Copilot Pro and Pro+ ahead of the <a href="https://scrimba.com/articles/best-ai-coding-assistants-2026/">June 1 transition to usage-based billing</a>. Each tool now has a distinct positioning. Claude Code is the surface-agnostic agent for senior developers. Cursor is the daily-driver IDE. Copilot is the GitHub-integrated extension for organizations with existing GitHub investment.</p><p>Cursor 3.3 shipped in May 2026 with <a href="https://dev.to/rachef_khoulod_a166c693fa/cursor-3-in-2026-the-ai-code-editor-that-changed-how-i-ship-software-e4d">/multitask for spawning parallel subagents</a> instead of running them in sequence. Vulnerability Scanner runs scheduled scans for known CVEs and outdated dependencies. Context usage breakdown shows engineers exactly what their agent is consuming. MCP connection stability got patched in the same release. Cursor 3, which shipped in April 2026, already changed the architecture by putting all local and cloud agents in a single sidebar. Agents kicked off from mobile, Slack, GitHub, and Linear all appear in one workspace view.</p><p>Cursor also launched its Security Review beta in May for Teams and Enterprise plans. Two always-on agents anchor the offering. <a href="https://releasebot.io/updates/cursor">Security Reviewer checks every PR</a> for vulnerabilities, auth regressions, privacy risks, agent tool auto-approvals, and prompt injection attacks. It leaves inline comments at the exact diff location with severity and remediation guidance. Vulnerability Scanner runs scheduled scans for known vulnerabilities, outdated dependencies, and configuration issues with optional Slack notifications. Cursor introduced canvases in the Agents Window during the same window. Canvases let agents build interactive visual interfaces for PR reviews, eval analysis, and data dashboards rather than walls of text. They use React-based components and live alongside the terminal, browser, and source control as durable artifacts.</p><p>Cursor&#8217;s Bugbot is moving from $40 per seat per month to usage-based billing on June 8, 2026. Teams will bill from on-demand spend. Individuals will bill from included usage. The average Bugbot run costs $1.00 to $1.50 depending on PR size and complexity. The pricing change reflects the wider industry shift toward per-action billing for AI coding tools, which is the same direction GitHub Copilot is heading with its June 1 transition.</p><p>GitHub&#8217;s billing change is the bigger story for many teams. The Opus premium-request multiplier on Copilot jumped from 15x to 27x in the same transition window. Teams running Opus-heavy workloads on Copilot pay considerably more than Claude Max plan subscribers using the same model through Anthropic&#8217;s direct channel. The pricing gap is the reason a growing number of engineering teams are evaluating whether to use Claude Code as a primary surface and keep Copilot for GitHub-native workflows.</p><p>Claude Code itself stopped being a CLI in 2026. It runs in the shell, as VS Code and JetBrains extensions, as a GitHub Action that opens PRs, and inside claude.ai on web and mobile. Subagents, skills, hooks, and plan mode turn it into a per-repo configuration rather than a per-session tool. The agent runs anywhere the engineer works. That surface-agnostic posture is part of why the SpaceX compute deal mattered. Capacity constraints on Claude Code show up in five surfaces at once, and the rate limit increase helped all five.</p><p>A security note worth flagging. Cursor patched a vulnerability in version 2.5 that let a malicious Git repository trigger arbitrary code execution through the agent. The patch is in place. No public reports of in-the-wild abuse have surfaced. Teams running older Cursor versions need to update. The broader question the bug surfaced is how engineering teams handle repo trust when AI agents have shell access. The combination of agent capability and unverified inputs is a category of risk that did not exist a year ago, and the security tooling industry is starting to respond.</p><h2><strong>AI Processing: Tesla AI5 Tape-Out, Google TPU Customer Roster Expands</strong></h2><p>Tesla taped out its AI5 chip on April 15, 2026, with details continuing to land this week. The chip is dual-sourced from TSMC Arizona and Samsung Texas. According to <a href="https://aitoolsrecap.com/Blog/tesla-ai5-chip-tape-out-optimus-robots-supercomputers">Tesla&#8217;s stated specs</a>, AI5 delivers roughly 8x the compute and 5x the bandwidth of the current AI4 hardware. A single AI5 chip approximates an NVIDIA H100 for Tesla&#8217;s specific inference workloads. A dual AI5 setup approximates an NVIDIA Blackwell at a fraction of the cost and power. Tesla claims AI5 uses roughly one-third the power of Blackwell and runs at under 10% of the cost.</p><p>Tesla&#8217;s strategic positioning is the more significant part of the announcement. Musk confirmed that AI4 is sufficient for Full Self-Driving safety levels, so Tesla owners do not need to be retrofitted. AI5 is built for Optimus humanoid robots and supercomputer clusters. Configurations of 5 to 12 AI5 chips per board will form the backbone of Tesla&#8217;s training infrastructure for FSD v15 and future Optimus models. Engineering samples are expected in late 2026, with volume production targeted for 2027. AI6 is already in development, with tape-out targeted for December 2026.</p><p>Google&#8217;s TPU customer roster reshaped the AI chip market the week before. At Google Cloud Next 2026 on April 22, <a href="https://techcrunch.com/2026/04/22/google-cloud-next-new-tpu-ai-chips-compete-with-nvidia/">Google unveiled the TPU 8t and TPU 8i</a>, claiming 2.8x better price-performance than the prior Ironwood generation. Anthropic expanded to multiple gigawatts of next-generation TPU capacity for Claude training and serving. Meta signed a multibillion-dollar multiyear deal in February 2026. OpenAI now takes TPU capacity, which is the most significant signal because OpenAI trains on Microsoft-procured Nvidia clusters. A confirmed OpenAI booking on Google silicon is the first visible crack in the assumption that Nvidia GPUs are the only serious substrate for frontier AI.</p><p>The TPU 8i has 384MB of SRAM, triple the amount in the prior Ironwood generation. It pairs with Google&#8217;s custom Arm-based Axiom CPU. The 8i is built for inference and AI agents. The 8t targets training workloads, with what Google calls a development-cycle compression &#8220;from months to weeks.&#8221; Citadel Securities built quantitative research software on TPUs. All 17 U.S. Energy Department national laboratories use AI co-scientist software built on the chips. Broadcom co-designed TPU 8t under the codename &#8220;Sunfish.&#8221; MediaTek handles TPU 8i under the codename &#8220;Zebrafish.&#8221; The dual-vendor co-design pattern matters because it gives Google supply chain options. It also gives both Broadcom and MediaTek concrete reference designs they can extend to other hyperscaler customers.</p><p>Anthropic also announced a SpaceX compute deal on May 8 alongside the Claude Code rate limit increase. The deal covers 220,000+ GPUs at the Colossus One data center. <a href="https://aitoolsrecap.com/Blog/ai-news-may-8-2026">The deal does not reduce Anthropic&#8217;s TPU commitment</a>, since both arrangements run in parallel. Compute supply is the bottleneck for every frontier lab in 2026, and Anthropic, OpenAI, and Meta are all running multi-vendor strategies across Nvidia, Google TPUs, AMD Instinct, and now SpaceX-operated capacity.</p><p>Anthropic&#8217;s revenue context puts the SpaceX deal in perspective. Dario Amodei <a href="https://www.roborhythms.com/anthropic-spacex-deal-doubled-claude-limits/">confirmed in early May</a> that Anthropic grew 80x in Q1 2026, blowing past the internal plan that called for 10x growth. The annualized revenue run rate crossed $30 billion, up from $9 billion at the end of 2025. The number of customers spending $1 million per year doubled from 500 to over 1,000 in two months. That kind of revenue growth would push any AI lab into emergency compute procurement mode. The SpaceX deal, the multi-gigawatt Amazon and Google partnerships, the $30 billion Azure capacity arrangement through Microsoft and Nvidia, and the <a href="https://www.anthropic.com/news/higher-limits-spacex">$50 billion Fluidstack US AI infrastructure investment</a> are the practical response.</p><p>The orbital compute angle in the SpaceX announcement deserves a footnote. Anthropic and SpaceX expressed shared interest in developing multi-gigawatt orbital data center capacity over the coming years. The engineering timeline for that is measured in years, not quarters. The signal is what matters. Frontier AI labs and the compute providers that serve them are now planning for a future where terrestrial power, land, and cooling cannot keep pace with model training demands. Whether orbital capacity becomes real or stays speculative, the fact that two serious companies put it in a press release tells you where the industry thinks the compute ceiling is heading.</p><p>The environmental angle around Colossus 1 also bears mention. The Memphis facility runs on gas turbines that were initially installed without Clean Air Act permits or pollution control devices. <a href="https://simonwillison.net/2026/May/7/xai-anthropic/">Memphis residents have raised concerns</a> about air quality and documented increases in hospital admissions tied to poor air quality near the site. Protests against the data center&#8217;s environmental footprint have continued through 2026. Anthropic committed to cover any consumer electricity price increases caused by its US data centers and is considering local investment in communities that host its facilities. Whether those commitments are sufficient remediation for the Memphis-specific concerns is an open question that the broader AI infrastructure conversation will continue to surface.</p><p>The Anthropic compute strategy is now multi-vendor, multi-region, and multi-substrate. Claude runs on AWS Trainium, Google TPUs, and Nvidia GPUs. The recent collaboration with Amazon includes inference capacity in Asia and Europe to satisfy data residency requirements for regulated industries. Location decisions focus on democratic countries with stable legal frameworks and secure supply chains. The single-vendor compute strategy that dominated 2023 and 2024 is gone. Portfolio diversification across the entire AI infrastructure stack has replaced it.</p><p>The wider trend is clear. Custom silicon is no longer a hyperscaler-only story. Tesla has joined the small group of companies that design AI chips from the ground up and manufacture at scale. Apple, Google, Amazon, and Meta are all building their own. Anthropic has multi-gigawatt commitments on both Nvidia and TPU substrates. The &#8220;single-vendor AI substrate&#8221; narrative that justified Nvidia&#8217;s valuation premium has its first real counter-example.</p><h2><strong>AI Standards &amp; Protocols: Anthropic Claude for Legal Ships 20+ MCP Connectors</strong></h2><p>Anthropic released Claude for Legal on May 12, 2026. The launch includes 12 practice-area plugins and more than 20 MCP connectors. The 12 plugins cover Commercial Legal, Corporate Legal (including M&amp;A diligence), Employment Legal, Privacy Legal, Product Legal, Regulatory Legal, AI Governance Legal, IP Legal, and Litigation Legal. Each plugin starts with a setup interview that learns the team&#8217;s playbooks, escalation chains, risk calibration, and house style.</p><p>The MCP connector list reads like the operational stack of a modern law firm. Anthropic connected Claude to DocuSign, Box, Thomson Reuters (CoCounsel Legal), Harvey, Relativity, Everlaw, and Microsoft 365. The Thomson Reuters integration is bidirectional. CoCounsel Legal is rebuilt on Anthropic&#8217;s technology, and Claude can now call CoCounsel as a tool. The foundation model is both the underlying layer and a caller of the application built on top of it. Anthropic also confirmed that <a href="https://aitoolsrecap.com/Blog/ai-news-may-12-2026">legal became the number one power-user job function in Claude Cowork</a>, with over 3x the usage of any other function.</p><p>The launch is a stress test for MCP as a production protocol. MCP shipped in November 2024 as Anthropic&#8217;s open standard for connecting AI models to tools and data. By February 2026, <a href="https://dev.to/pockit_tools/mcp-vs-a2a-the-complete-guide-to-ai-agent-protocols-in-2026-30li">MCP crossed 97 million monthly SDK downloads</a> across Python and TypeScript. The protocol is now under the Linux Foundation&#8217;s Agentic AI Foundation (AAIF), with Anthropic, OpenAI, and Block as co-founders. Microsoft embedded MCP into Windows 11 and Copilot. Google DeepMind confirmed support in Gemini. AWS, Cloudflare, and Bloomberg all sit on the AAIF.</p><p>The MCP 2026 roadmap published in March 2026 has four priority areas. First, transport evolution to make Streamable HTTP work statelessly at scale. Second, agent communication primitives, closing lifecycle gaps in the Tasks primitive. Third, governance maturation with a formal contributor ladder. Fourth, enterprise readiness with audit trails, SSO-integrated auth, and gateway patterns. <a href="https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/">The Tasks primitive shipped as experimental</a> and now needs production iteration. Retry semantics on transient failures and expiry policies for completed tasks are the two concrete gaps to close.</p><p>Google&#8217;s Agent-to-Agent (A2A) protocol sits alongside MCP, not in competition. MCP connects agents to tools and data. A2A connects agents to other agents. Microsoft, AWS, and Google all support both. The two-layer stack uses MCP for tool access and A2A for agent coordination. It has become the architectural default for enterprise multi-agent deployments. <a href="https://www.getmaxim.ai/articles/top-5-enterprise-mcp-gateway-solutions-in-2026/">Industry data shows enterprise MCP adoption crossed 78% in production AI teams</a>, and the public registry surpassed 9,400 servers. Enterprise MCP gateways from Kong, Docker, and others now centralize authentication, audit trails, and tool-level access control.</p><p>The Anthropic Claude for Legal launch is significant precisely because it deploys MCP at production scale for a regulated industry. Legal aid organizations and public defenders also get access through Claude for Nonprofits at discounted pricing. The launch rattled legal tech stocks. RELX, Thomson Reuters, and Wolters Kluwer shares fell on the February plugin announcement, and the May 12 release is considerably larger in scope.</p><p>Anthropic followed Claude for Legal with <a href="https://www.anthropic.com/news/claude-for-small-business">Claude for Small Business on May 13, 2026</a>. The bundle covers connectors for QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, and Microsoft 365. It includes 15 ready-to-run skills covering finance, operations, sales, marketing, HR, and customer service. Workflows handle payroll planning, month-end close, business performance monitoring, campaign management, invoice chasing, and cash-flow forecasting. The package runs on top of Claude Cowork, the multi-step automation platform Anthropic launched in January 2026, and ships as a toggle install in the desktop app.</p><p>The Small Business release is the fifth market-specific Claude product Anthropic has shipped since the start of 2025. The other four target life science researchers, schools, attorneys, and financial professionals. Each bundle uses the same MCP-based connector architecture and the same Claude Cowork automation layer. The pattern is clear. Anthropic uses MCP and Claude Cowork as a platform substrate, with vertical packaging on top. The vertical packaging is what closes deals. The substrate is what makes the verticals shippable in quick succession.</p><p>Claude for Small Business comes with a 10-city tour starting May 14 in Chicago. Tulsa, Dallas, Hamilton Township, Baton Rouge, Birmingham, Salt Lake City, Baltimore, San Jose, and Indianapolis follow. Each stop is a free half-day training workshop for 100 local small business leaders. Attendees get a one-month Claude Max subscription. Anthropic also partnered with Workday and the Local Initiatives Support Corporation on a Solopreneurship Accelerator Program funding 15 aspiring entrepreneurs in 2026 with seed capital, Claude credits, and an AI-first curriculum. A separate partnership with three Community Development Financial Institutions targets small business access to capital.</p><p>The competitive pressure on traditional SaaS vendors keeps building. Salesforce, ServiceNow, Intuit, DocuSign, and Box have all seen their stocks decline year to date and over the last 12 months as Anthropic&#8217;s offerings expand into territory those companies have historically owned. Dario Amodei warned at the Briefing: Financial Services event in early May that some SaaS vendors will go bankrupt if they cannot keep pace with the AI shift. That framing is harder to dismiss when Anthropic&#8217;s own annualized revenue grew from $9 billion to $30 billion in roughly a year. The market is repricing on the assumption that AI-native delivery will collapse a meaningful share of seat-based SaaS revenue.</p><p>MCP makes that competitive shift feasible at the protocol level. The same connector that lets Claude pull data from QuickBooks for a small business owner lets Claude pull data from Thomson Reuters for a corporate legal team. Anthropic ships one protocol stack and one automation layer, then packages it for a dozen verticals. The traditional SaaS business cannot easily respond because the value proposition for those tools was the workflow integration with the user. When a model can build the integration through MCP, the workflow lock-in weakens fast.</p><h2><strong>Also Worth Noting</strong></h2><p>Thinking Machines released TML-Interaction-Small on May 11, 2026, the first model from Mira Murati&#8217;s lab. The 276 billion-parameter mixture-of-experts system uses 12 billion active parameters. The architecture processes audio, video, and text in 200-millisecond micro-turns rather than waiting for users to finish speaking. <a href="https://techcrunch.com/2026/05/11/thinking-machines-wants-to-build-an-ai-that-actually-listens-while-it-talks/">The model achieves 0.40-second turn-taking latency</a>, roughly the speed of natural human conversation. Soumith Chintala, PyTorch co-creator, became CTO after co-founders Barret Zoph and Luke Metz left for OpenAI in January. The research preview is available to a limited group of researchers, with broader access planned for later in 2026. The model&#8217;s interaction-first design reflects a hypothesis the team has been making publicly for months. Real-time multimodal interaction is a different category of capability than the request-response pattern that dominates most current AI products.</p><p>Google streamed its Android Show on May 12, 2026, one week before Google I/O 2026. Two announcements anchored the event. Googlebooks are premium Gemini-first laptops from Acer, Asus, Dell, HP, and Lenovo, shipping this fall. Every Googlebook features a signature &#8220;Glowbar&#8221; light bar on the keyboard. Gemini Intelligence is the new agentic AI layer running underneath Android. It takes data from one app and completes multistep tasks across other apps without the user switching between them. Gemini Intelligence rolls out to the latest Samsung Galaxy and Google Pixel phones starting this summer, then to Wear OS, Android Auto, Android XR, and Googlebooks. The Glowbar is a physical signal that an AI agent is acting on the user&#8217;s behalf. That kind of hardware-level affordance for AI activity is the same direction Apple is exploring with iOS 26 visual indicators for active Siri agents.</p><p>OpenAI shipped three voice models on May 8. GPT-Realtime-2 brings GPT-5-class reasoning to real-time voice. GPT-Realtime-Whisper handles transcription workloads. GPT-Realtime-Translate handles speech-to-speech translation. The same day, <a href="https://aitoolsrecap.com/Blog/ai-news-may-8-2026">OpenAI released GPT-5.5-Cyber to vetted security teams in limited preview</a>. The release is a direct response to Anthropic&#8217;s Claude Mythos Preview, which has been used under Project Glasswing to identify zero-day vulnerabilities. ElevenLabs reported crossing $500 million in annual recurring revenue after a Series D round on the same day, with cuts to voice and agentic API pricing. The voice infrastructure category is one of the most competitive surfaces in AI right now. OpenAI, ElevenLabs, Cartesia, Deepgram, and Thinking Machines are all pushing on the same set of latency and quality benchmarks from different starting points.</p><p>White House National Economic Council Director Kevin Hassett confirmed on May 7 that the White House is drafting an executive order requiring AI models to be vetted before public release. The Commerce Department has already expanded its voluntary pre-release testing program to include Google, Microsoft, xAI, OpenAI, and Anthropic. Pennsylvania filed suit against Character.AI on May 8 over AI personas misrepresenting themselves as qualified medical professionals. Connecticut&#8217;s comprehensive AI bill and Iowa&#8217;s chatbot safety law both advanced in the same week. The state-level regulatory activity is moving faster than the federal pace, and the patchwork compliance question is becoming a real operational concern for AI companies serving US users.</p><p>Unsloth joined the PyTorch Ecosystem on May 11, bringing its open-source AI training and inference acceleration tools into the official ecosystem. The move expands PyTorch&#8217;s official tooling footprint and gives Unsloth&#8217;s acceleration layer more institutional weight. The Fivetran 2026 Agentic AI Readiness Index, released May 8, found that only 15% of organizations have a data foundation capable of safely running AI agents at production scale. Nearly 60% have already invested millions in the technology. The gap between agent ambition and data foundation readiness is the practical reality every enterprise AI program is now confronting. The lakehouse stack work happening in parallel inside the Apache Iceberg, Polaris, Arrow, and Parquet communities is what closes that gap, and the timing of the Fivetran report against the AI infrastructure investment surge is not a coincidence.</p><h2><strong>What to Watch Next Week</strong></h2><p>Google I/O 2026 runs the week of May 19 and is the biggest single event on the calendar. Gemini 3 production rollout, Android XR partner announcements, and additional TPU 8 customer disclosures are all on the likely agenda. Anthropic&#8217;s 10-city Small Business Tour kicks off May 14 in Chicago, and the early-week press coverage will set expectations for SMB AI adoption. The state-level AI regulatory activity in Pennsylvania, Connecticut, and Iowa should continue moving, and the federal executive order drafting work bears close watching. On the coding tools side, GitHub Copilot&#8217;s June 1 usage-based billing transition is the next major pricing event, and Cursor&#8217;s continued shipping cadence on the 3.x release line should bring more agent surface changes. The MCP and A2A protocol communities are running ongoing working group sessions on the Tasks primitive iteration and the Streamable HTTP stateless transport, with concrete proposals expected through the spring.</p><h2><strong>Resources to Go Further</strong></h2><p>The AI landscape changes fast. Here are tools and resources to help you keep pace.</p><p><strong>Try Dremio Free.</strong> Experience agentic analytics and an Apache Iceberg-powered lakehouse. <a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=05-13-2026&amp;utm_content=alexmerced">Start your free trial</a></p><p><strong>Learn Agentic AI with Data.</strong> Dremio&#8217;s agentic analytics features let your AI agents query and act on live data. <a href="https://www.dremio.com/use-cases/agentic-ai/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=05-13-2026&amp;utm_content=alexmerced">Explore Dremio Agentic AI</a></p><p><strong>Join the Community.</strong> Connect with data engineers and AI practitioners building on open standards. <a href="https://developer.dremio.com/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=05-13-2026&amp;utm_content=alexmerced">Join the Dremio Developer Community</a></p><p><strong>Book: The 2026 Guide to AI-Assisted Development.</strong> Covers prompt engineering, agent workflows, MCP, evaluation, security, and career paths. <a href="https://www.amazon.com/2026-Guide-AI-Assisted-Development-Engineering-ebook/dp/B0GQW7CTML/">Get it on Amazon</a></p><p><strong>Book: Using AI Agents for Data Engineering and Data Analysis.</strong> A practical guide to Claude Code, Google Antigravity, OpenAI Codex, and more. <a href="https://www.amazon.com/Using-Agents-Data-Engineering-Analysis-ebook/dp/B0GR6PYJT9/">Get it on Amazon</a></p>]]></content:encoded></item><item><title><![CDATA[Apache Data Lakehouse Weekly: May 7–13, 2026]]></title><description><![CDATA[The post-summit translation work that has dominated 2026 turned into shipped artifacts this week.]]></description><link>https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-may</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-may</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Thu, 14 May 2026 13:02:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mQ2I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mQ2I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mQ2I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!mQ2I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!mQ2I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!mQ2I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mQ2I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2165092,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/197619513?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mQ2I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!mQ2I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!mQ2I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!mQ2I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c260db0-abb7-4320-882c-bff9373896e1_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The post-summit translation work that has dominated 2026 turned into shipped artifacts this week. Iceberg cut a 1.11.0 release candidate on the strength of weeks of design follow-ups. Polaris published a security-focused 1.4.1 patch release alongside four coordinated CVE disclosures and announced 1.5.0 planning for next week. Arrow&#8217;s Rust subproject opened three release votes in a single day. Parquet finally shipped Java 1.17.1 after a year between releases and turned its attention to the next wave of format-level proposals. The connective tissue across all four projects: production hardening at scale, AI-workload-driven format design, and the slow consolidation of governance frameworks around AI-assisted contribution.</p><h2><strong>Apache Iceberg</strong></h2><p>Iceberg&#8217;s biggest news this week is the 1.11.0 release candidate that Aihua Xu <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13483.html">opened for voting on May 9</a>. The RC1 vote drew rapid engagement from P&#233;ter V&#225;ry, Yuya Ebihara, Steven Wu, Kevin Liu, Steve Loughran, Russell Spitzer, Amogh Jahagirdar, Talat Uyarer, Manu Zhang, Ajay Yadav, and huaxin gao, with verification work splitting between binary checks, Snowflake build tests, and Trino downstream validation. The thread surfaced enough discussion that Aihua had to address questions about Spark integration coverage, version notes, and licensing audit follow-ups across multiple replies before the vote could close cleanly. This is the first 1.x release of 2026 carrying the full weight of V3 production maturity, and the depth of the verification work reflects how seriously contributors are treating it as a stability anchor while V4 design continues in parallel.</p><p>The release candidate landed against a backdrop of unblocking work that Ryan Blue <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13425.html">drove through the LICENSE updates thread</a>. The discussion threaded through commentary from Russell Spitzer, Steven Wu, Aihua Xu, Jean-Baptiste Onofr&#233;, Steve, roryqi, John Zhuge, P&#233;ter V&#225;ry, Fokko Driesprong, and Kevin Liu, with the conversation focused on ensuring that 1.11 would ship with a clean LICENSE/NOTICE chain that matched what contributors had actually merged. Apache release engineering depends on these audits being thorough, and the discussion shows the community treating LICENSE correctness as a release blocker rather than a checkbox.</p><p>Beyond the release work, the V4 design conversations continued advancing on multiple fronts. Ryan Blue opened a fresh <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13443.html">DISCUSS thread on partition tuples in V4</a> that quickly drew responses from Steven Wu, Anoop Johnson, Amogh Jahagirdar, Russell Spitzer, and Micah Kornfield. The thread is one of the more architecturally significant V4 conversations &#8212; partition tuples affect how metadata represents partition state when single-file commits replace manifest lists, and the decision shapes how column statistics, manifest delete vectors, and root manifests interact at scale. Amogh&#8217;s multiple replies on the thread reflect the same depth of analysis that has anchored his work on the broader one-file commits proposal with Russell.</p><p>Ryan also opened a <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13347.html">DISCUSS thread on a compact bitmap format</a> that drew engagement from Maximilian Michels, Andrei Tserakhau, Guy Khazma, Anoop Johnson, and Amogh Jahagirdar. The proposal targets one of the practical efficiency issues with the current Roaring bitmap encoding used for delete vectors &#8212; for sparse delete sets across very large data files, the metadata overhead matters at scale, and a more compact format that preserves the operational semantics could materially affect the storage footprint of MOR tables in production. Anoop Johnson separately opened a <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13376.html">V4 Aggregate Column Stats DISCUSS thread</a> that pushes on the same broader theme &#8212; making metadata cheaper to scan as table sizes grow.</p><p>The catalog-side design work also stayed active. EJ Wang&#8217;s <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13454.html">First-class Tag concept in Iceberg REST Catalog DISCUSS thread</a> drew responses from Yufei Gu and Andrei Tserakhau, building on the broader labels-and-metadata conversation that Andrei has been driving for months. Steven Wu opened a <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13520.html">VOTE on adding the CatalogObjectIdentifier schema</a> that drew binding +1 votes through the week from Yufei Gu, Russell Spitzer, huaxin gao, Christian Thiel, Alexandre Dutra, Jean-Baptiste Onofr&#233;, and Steve. Yuya Ebihara&#8217;s <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13464.html">DISCUSS thread on recursive namespace listing</a> drew responses from Ajantha Bhat and Yufei Gu &#8212; a quality-of-life REST API change that would matter most to catalogs federating across many tenants. Prashant Singh&#8217;s <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13452.html">DISCUSS thread on credential management for KMS/Vault and table-level encryption</a> pulled in feedback from Sreesh Maheshwar, Chris Lu, Gidon Gershinsky, and &#193;d&#225;m Szita. Alexandre Dutra opened a <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13373.html">DISCUSS thread on passing arbitrary information to request signers</a> &#8212; a thread that builds on the months of work he&#8217;s been leading on remote signing semantics.</p><p>Yuya Ebihara also opened a <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13385.html">DISCUSS thread on adding HashiCorp Vault KMS support</a> that drew engagement from Steve Loughran, Romain Manni-Bucau, and Jean-Baptiste Onofr&#233;. Vault is the de facto key management standard for self-hosted environments, and bringing first-class Vault support into the encryption layer closes one of the bigger gaps for teams running Iceberg outside the major cloud KMS providers.</p><p>The Rust subproject continued shipping. Shawn Chang opened the <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13360.html">Iceberg Rust 0.9.1 release candidate vote</a>, worked through two intermediate RCs before <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13404.html">RC3 drew binding +1s</a> from Renjie Liu, Kevin Liu, Matt Butrovich, Jean-Baptiste Onofr&#233;, Kurtis Wright, Maximilian Michels, Sung Yun, and Fokko Driesprong, and then <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13476.html">announced the 0.9.1 release</a>. This is the fifth Iceberg Rust release in seven months &#8212; a cadence the community would not have predicted at the start of 2025. The Rust implementation&#8217;s DataFusion integration makes it a serious alternative for teams that want Iceberg without a JVM dependency, and the cadence reflects how much of that downstream traffic is actually shipping. Kurtis Wright separately opened a <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13485.html">Curiosity thread on checksums in Iceberg libraries</a> that drew responses from Russell Spitzer, Steve Loughran, Daniel Weeks, and Andrei Tserakhau &#8212; the kind of cross-library integrity question that matters more as Iceberg deployments expand beyond Java.</p><p>The community also welcomed Andrei Tserakhau as a <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13390.html">new committer</a>, with congratulations rolling in from Matt Topol, Neelesh Salian, Eduard Tudenh&#246;fner, Amogh Jahagirdar, Micah Kornfield, Kevin Liu, huaxin gao, Steven Wu, Alex Stephen, Sung Yun, Fokko Driesprong, Renjie Liu, Maximilian Michels, Gang Wu, P&#233;ter V&#225;ry, Drew, Talat Uyarer, Russell Spitzer, Shawn Chang, and Kurtis Wright. Andrei has anchored the labels-in-LoadTableResponse work across the spring, and the committer recognition reflects how that proposal moved from idea to multi-implementation POC across Polaris, Unity Catalog, Lakekeeper, and PyIceberg.</p><p>Local meetups continue to anchor community activity. Endi Caushi confirmed the <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13384.html">Boston Iceberg meetup for May 6</a>, Lester Martin announced the <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13353.html">Atlanta meetup for May 13</a>, Viktor Kessler advertised the Iceberg Community Meetup Europe events in <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13343.html">Barcelona</a> and <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13336.html">Erlangen, Germany</a> for May plus the <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13407.html">June London</a> and <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13518.html">Amsterdam</a> meetups, and Danica Fine <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13477.html">announced the Seattle Iceberg Community Meetup for June 25</a>. Sung Yun shared the <a href="https://mail-archive.com/dev@iceberg.apache.org/msg13383.html">Iceberg Summit 2026 Selection Committee retrospective notes</a> &#8212; Kevin Liu&#8217;s reply suggests retrospectives like this are exactly the kind of community-building work that lets the summit grow without breaking the volunteer model that makes it possible.</p><h2><strong>Apache Polaris</strong></h2><p>The defining Polaris story of the week is the 1.4.1 patch release and the four coordinated CVE disclosures that accompanied it. Jean-Baptiste Onofr&#233; opened the <a href="https://mail-archive.com/dev@polaris.apache.org/msg04570.html">1.4.1 RC0 release vote on May 5</a>, drew rapid +1s from Robert Stupp, Dmitri Bourlatchkov, Russell Spitzer, and Sung Yun, and <a href="https://mail-archive.com/dev@polaris.apache.org/msg04577.html">closed the vote successfully</a> before <a href="https://mail-archive.com/dev@polaris.apache.org/msg04582.html">announcing the 1.4.1 release</a>. The patch addresses the KMS upgrade bug, Helm packaging fixes, and the security disclosures that landed alongside it.</p><p>The CVE disclosures themselves are the more significant artifact. Jean-Baptiste posted four coordinated advisories: <a href="https://mail-archive.com/dev@polaris.apache.org/msg04578.html">CVE-2026-42809 covering staged table creation credential abuse</a>, <a href="https://mail-archive.com/dev@polaris.apache.org/msg04579.html">CVE-2026-42810 covering literal wildcard handling in IAM resource patterns</a>, <a href="https://mail-archive.com/dev@polaris.apache.org/msg04580.html">CVE-2026-42811 covering scope leakage in GCS credentials</a>, and <a href="https://mail-archive.com/dev@polaris.apache.org/msg04581.html">CVE-2026-42812 covering write.metadata.path protection</a>. Four CVEs in one release window is not a small event for a project that graduated to top-level status three months ago, but the visible coordinated disclosure &#8212; patches first, public advisories second &#8212; is exactly the security posture enterprise deployments need to see. The bugs concentrate around credential vending and resource pattern handling, which is where most cross-tenant exposure surfaces in a catalog that issues subscoped credentials on behalf of clients.</p><p>With 1.4.1 out the door, attention turned immediately to the 1.5.0 cycle. Jean-Baptiste opened a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04592.html">DISCUSS thread asking whether Polaris should target 1.5.0 next week</a>. The thread drew responses from Dmitri Bourlatchkov, Ajantha Bhat, Yufei Gu, Alexandre Dutra, and Robert Stupp, with Jean-Baptiste replying multiple times to refine the scope. Monthly release cadence has been the explicit commitment since the project graduated, and the 1.5.0 conversation reflects the community&#8217;s discipline about staying on that schedule even after a security-focused patch release week.</p><p>Anand Kumar Sankaran&#8217;s <a href="https://mail-archive.com/dev@polaris.apache.org/msg04590.html">Uptaking 1.4.1 and turning on table metrics persistence thread</a> gives a useful window into what production adopters are actually doing with the new release. Dmitri Bourlatchkov&#8217;s reply walks through the configuration mechanics &#8212; a small example of the community&#8217;s responsiveness to deployment-side questions that wouldn&#8217;t have had this kind of visible support a year ago when much of that traffic still went through bilateral vendor channels.</p><p>Jean-Baptiste also drafted the <a href="https://mail-archive.com/dev@polaris.apache.org/msg04586.html">Polaris May 2026 board report</a>, which drew commentary from Francois Papon, Robert Stupp, Dmitri Bourlatchkov, Yufei Gu, and Adnan Hemani. The board report is the formal artifact the PMC submits to the ASF board, and the visible drafting on the dev list reinforces the project&#8217;s open-governance posture. The community also welcomed Sung Yun to the PMC, with Robert Stupp&#8217;s <a href="https://mail-archive.com/dev@polaris.apache.org/msg04560.html">announcement</a> drawing congratulations from Jean-Baptiste Onofr&#233;, Alexandre Dutra, Keith Chapman, James Rowland-Jones, Kevin Liu, Yufei Gu, Dmitri Bourlatchkov, and Michael Collado. Sung Yun has anchored the REST Catalog &#8220;Trusted Iceberg Client&#8221; terminology work on the Iceberg side and brings strong cross-project coordination credentials to the Polaris PMC.</p><p>The DISCUSS pipeline stayed dense. Bill Bejeck opened a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04630.html">Diagnostics shell prototype for Polaris thread</a> that drew engagement from Dmitri Bourlatchkov, Jean-Baptiste Onofr&#233;, and Yufei Gu. Jean-Baptiste opened a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04645.html">Polaris server custom assembly tool thread</a> that drew responses from Dmitri Bourlatchkov and Yufei Gu. Robert Stupp opened a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04652.html">DISCUSS thread on enabling advisory Copilot PR review for documentation and test omissions</a> that drew responses from Jean-Baptiste Onofr&#233;, Dmitri Bourlatchkov, Yong Zheng, and Yufei Gu &#8212; exactly the kind of AI-assistance governance conversation that mirrors the Iceberg and Parquet AI contribution policy work happening in parallel. Robert also opened a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04611.html">Guardrails for security-sensitive changes thread</a> that drew a response from Jean-Baptiste Onofr&#233; &#8212; a conversation that lands with extra weight in the same release window as the four CVE disclosures.</p><p>Tornike Gurgenidze opened a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04640.html">DISCUSS thread on storage credential-vending SPI changes</a>, pushing the SPI surface that lets vendors plug in alternative credential vending strategies. Srinivas Rishindra opened a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04647.html">DISCUSS thread on event persistence architecture and a global sanitization pipeline</a>. EJ Wang continued the AI-readability conversation with a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04622.html">DISCUSS thread on linking the AI-generated Code Wiki from project docs</a>. Dmitri Bourlatchkov opened a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04623.html">DISCUSS thread on adjusting the renameTable response code to 204</a> that drew responses from N&#225;ndor Koll&#225;r and Yufei Gu &#8212; a small REST API conformance question that matters for client interoperability. Anand Kumar Sankaran&#8217;s <a href="https://mail-archive.com/dev@polaris.apache.org/msg04617.html">feat: Configurable STS session names thread</a> drew a reply from Dmitri Bourlatchkov.</p><h2><strong>Apache Arrow</strong></h2><p>Arrow&#8217;s Rust subproject ran its tightest release week of the year. Andrew Lamb opened <a href="https://mail-archive.com/dev@arrow.apache.org/msg34667.html">Apache Arrow Rust 58.3.0 RC1</a>, <a href="https://mail-archive.com/dev@arrow.apache.org/msg34672.html">Apache Arrow Rust 57.3.1 RC1</a>, and <a href="https://mail-archive.com/dev@arrow.apache.org/msg34675.html">Apache Arrow Rust 56.2.1 RC1</a> within a single planned cluster of patch and minor releases. All three drew rapid verification from Ed Seidl, Bryce Mecum, Ra&#250;l Cumplido, and L. C. Hsieh, and Andrew posted <a href="https://mail-archive.com/dev@arrow.apache.org/msg34691.html">the three RESULT messages</a> confirming all three votes had passed. The arrow-rs project running three concurrent release votes in a single week is a real engineering benchmark &#8212; it reflects the maintenance load of supporting multiple active release lines (58.x as the current minor, 57.x and 56.x as supported back-versions) and the verification community has reached the scale where this kind of parallel release cadence is actually sustainable. Andrew&#8217;s earlier heads-up that <a href="https://mail-archive.com/dev@arrow.apache.org/msg34650.html">planned patch releases were coming this week</a> set the cadence expectations clearly.</p><p>Beyond Rust releases, the design surface stayed lively. Antoine Pitrou opened a <a href="https://mail-archive.com/dev@arrow.apache.org/msg34668.html">DISCUSS thread on field/schema/custom metadata restriction to UTF8</a> that drew engagement from Rusty Conover, Raphael Taylor-Davies, and Dewey Dunnington. The thread sits at the intersection of cross-language compatibility and forward extensibility &#8212; Arrow metadata that is strictly UTF8 is easier to interoperate across languages, but the constraint also limits what extension types can carry through. Richie Black opened a <a href="https://mail-archive.com/dev@arrow.apache.org/msg34687.html">DISCUSS thread on column default value metadata changes to FlightSql.proto</a> &#8212; a JDBC interoperability question that matters for cross-system data engineering as Flight SQL adoption grows.</p><p>The pyarrow-stubs donation vote that has been building since Rok Mihevc opened it in April <a href="https://mail-archive.com/dev@arrow.apache.org/msg34630.html">drew further engagement</a> as Rok closed out the vote and confirmed the donation could move toward formal acceptance. The donation effectively brings type stubs for pyarrow into the official Arrow project rather than relying on a community-maintained external repository &#8212; a small but meaningful signal that the Python community is investing in pyarrow&#8217;s type-checking surface as much as its runtime behavior.</p><p>The Erlang language binding moved forward. Benjamin Philip continued advancing the <a href="https://mail-archive.com/dev@arrow.apache.org/msg34628.html">Arrow Erlang grant documents thread</a> with Sutou Kouhei, working through the IP grant paperwork that the ASF requires for code donations. Erlang adoption is a niche audience for Arrow, but the grant work is the same procedural foundation that every language expansion follows, and the visibility of the process is what makes Arrow&#8217;s multi-language footprint sustainable. Antoine Pitrou&#8217;s <a href="https://mail-archive.com/dev@arrow.apache.org/msg34655.html">announcement of the Apache Arrow / Parquet meetup in Paris</a> drew enthusiasm from Sutou Kouhei and Marc Deveaux &#8212; the kind of cross-project meetup that reflects how interleaved the Arrow and Parquet communities have become.</p><p>Mandukhai Alimaa&#8217;s <a href="https://mail-archive.com/dev@arrow.apache.org/msg34604.html">DISCUSS thread on a canonical BigDecimal extension type</a> and Andrew Lamb&#8217;s <a href="https://mail-archive.com/dev@arrow.apache.org/msg34610.html">arrow-rs security policy discussion</a> continued threading toward production hardening. The security policy work in particular reflects a project that is being deployed in commercial-grade scenarios where formal vulnerability disclosure paths matter as much as the underlying code quality. The Nishant Avasthi-led <a href="https://mail-archive.com/dev@arrow.apache.org/msg34642.html">DISCUSS on adding Apache Arrow support for IBM Db2 via ADBC</a> drew a response from Ian Cook &#8212; another quietly significant adoption signal, since Db2 brings a category of enterprise mainframe and traditional database workloads into Arrow&#8217;s data interchange footprint.</p><h2><strong>Apache Parquet</strong></h2><p>Parquet finally shipped its first Java release of the year. Gang Wu opened the <a href="https://mail-archive.com/dev@parquet.apache.org/msg27296.html">parquet-java 1.17.1 RC0 vote</a>, drew +1s from Steve Loughran, Fokko Driesprong, Russell Spitzer, Daniel Weeks, and Xinli shang, and <a href="https://mail-archive.com/dev@parquet.apache.org/msg27314.html">announced the 1.17.1 release</a>. The release closes the gap that opened after parquet-java 1.17.0 shipped in January 2026, and resolves the long-running <a href="https://mail-archive.com/dev@parquet.apache.org/msg27212.html">DISCUSS thread on a new parquet-java release</a> that Manu Zhang opened in late March. The release cadence question has been one of the more honest community conversations of the year &#8212; parquet-java has historically shipped less frequently than its sister projects, and Manu&#8217;s thread surfaced the real costs (slower bug-fix delivery, encoding feature lag) that justified the patch release effort.</p><p>Russell Spitzer&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27277.html">DISCUSS thread on GH-3547 automated release for Parquet</a> drew engagement from Arnav Balyan, Gang Wu, and Fokko Driesprong on the broader question of how to bring parquet-java&#8217;s release infrastructure closer to the cadence of the format and Rust projects. Automated release tooling is what eventually makes monthly-class release cadences sustainable, and the Polaris release engineering work has been one of the visible reference implementations the Parquet community can point at when designing its own approach.</p><p>The community welcomed Ed Seidl as a <a href="https://mail-archive.com/dev@parquet.apache.org/msg27305.html">new committer</a>, with Micah Kornfield announcing the recognition and congratulations following from Andrew Lamb, Gang Wu, and Ra&#250;l Cumplido. Ed has driven multiple format-level threads through 2026, including the path_in_schema optionality proposal and engaged commentary on the FlatBuffer footer redesign work. The committer recognition reflects sustained spec-design work over the year.</p><p>The format-level conversation continued at high intensity. Will Edwards opened a <a href="https://mail-archive.com/dev@parquet.apache.org/msg27322.html">DISCUSS thread on how readers handle Parquet files with future extensions</a> &#8212; a forward-compatibility question that becomes more urgent as the format adds Variant, Geospatial, and now the proposed File logical type. Daniel Weeks opened a <a href="https://mail-archive.com/dev@parquet.apache.org/msg27258.html">DISCUSS thread on supporting non-contiguous pages</a> that drew engagement from Andrew Bell, Adrian Garcia Badaracco, Micah Kornfield, Will Edwards, and Andrew Lamb. Non-contiguous pages would let writers split a column&#8217;s pages across the file rather than requiring them adjacent &#8212; a non-trivial format change with implications for how readers do projection pushdown and how object storage range reads can be coalesced. Andrew Bell&#8217;s separate <a href="https://mail-archive.com/dev@parquet.apache.org/msg27279.html">Wide Schemas thread</a> drew responses from Andrew Lamb, Adrian Garcia Badaracco, and Steve Loughran &#8212; wide-table workloads are precisely the AI/ML use case that has been pulling format design forward all year.</p><p>Micah Kornfield&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27271.html">DISCUSS thread on remaining open spec-level questions for ALP</a> drew engagement from Andrew Lamb on closing out the ALP encoding work that passed an earlier vote. Arnav Balyan opened a <a href="https://mail-archive.com/dev@parquet.apache.org/msg27282.html">DISCUSS thread on adding AGENTS.md to parquet-java</a> that drew responses from Aaron Niskode-Dossett, Andrew Lamb, and Micah Kornfield, and a separate <a href="https://mail-archive.com/dev@parquet.apache.org/msg27299.html">DISCUSS thread on an AI tooling policy for Parquet</a> that drew a response from Fokko Driesprong. The AGENTS.md and AI tooling policy threads are the Parquet community&#8217;s version of the same conversation Iceberg and Polaris have been running in parallel &#8212; how AI-assisted contribution fits into Apache governance and what guardrails the project wants to set.</p><p>Martin Prammer&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27295.html">Datasets Project &#8212; Raincloud thread</a> drew responses from Arnav Balyan on the broader question of how the Parquet community thinks about reference datasets for testing and benchmarking. Dewey Dunnington&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27319.html">Geography test files with statistics thread</a> continued the geospatial spec stabilization work. Julien Le Dem ran the regular Parquet sync on May 6, with the <a href="https://mail-archive.com/dev@parquet.apache.org/msg27270.html">meeting notes thread</a> setting the agenda for the format-level work that played out across the rest of the week.</p><h2><strong>Cross-Project Themes</strong></h2><p>The clearest pattern this week is the maturation of release engineering across all four projects. Iceberg cut a major 1.11.0 RC. Polaris shipped a patch release coordinated with four CVE disclosures and immediately planned the next monthly minor. Arrow ran three Rust release votes in parallel. Parquet shipped its first patch release of the year and started seriously planning automated release tooling. Each project is operating at a release cadence that would have been hard to sustain a year ago, and the cumulative effect is a lakehouse stack where every component is shipping at predictable, professional intervals. That is the difference between a research stack and infrastructure, and the lakehouse stack has clearly crossed into the latter.</p><p>The second pattern is the consistent attention to AI-assisted contribution governance. Iceberg has the published AI contribution policy work that Holden Karau, Kevin Liu, Steve Loughran, and Sung Yun pushed through March. Polaris ran the AI-generated Code Wiki linking thread and the advisory Copilot PR review thread this week. Parquet ran the AGENTS.md and AI tooling policy threads in parallel. These conversations are not happening in isolation &#8212; they reflect a coordinated community position that AI tools are welcome in Apache contribution flows but require explicit governance, disclosure, and review patterns. The pattern is consistent enough across projects that it looks like an emerging Apache-wide norm rather than four parallel one-offs.</p><p>The third pattern is the continued translation of AI workload pressure into format-level proposals. Iceberg&#8217;s compact bitmap format, V4 aggregate column stats, and partition tuples work all push on metadata efficiency at scale. Parquet&#8217;s wide schemas thread, the File logical type proposal that&#8217;s still resolving, and the non-contiguous pages discussion all target the AI workloads where data shapes don&#8217;t match the assumptions the format was originally designed around. Arrow&#8217;s BigDecimal canonical extension type and the metadata UTF8 thread both push on cross-language interop for AI-pipeline data. These are not coincidences &#8212; they&#8217;re four projects responding to the same pressure from the same workloads, on the same timeline.</p><p>The fourth pattern is the visible security maturation across the stack. Polaris&#8217;s four CVE disclosures landed with formal coordinated advisories, separate patches, and clear remediation guidance. Iceberg&#8217;s KMS/Vault credential management work, Arrow&#8217;s security policy discussion, and Parquet&#8217;s careful handling of forward-compatibility questions all reflect a shared posture that the lakehouse stack is being deployed in environments where formal security disclosure paths are required, not optional. The CVEs being public is itself a healthy signal &#8212; projects with bad security posture don&#8217;t disclose, they hide.</p><h2><strong>Looking Ahead</strong></h2><p>Watch for the Iceberg 1.11.0 vote to close and the release to ship in the coming days. The Polaris 1.5.0 planning email is likely the next major release-side artifact, with the 1.4.1 security work as the floor and 1.5.0 feature scoping as the ceiling. The Iceberg V4 design work &#8212; partition tuples, compact bitmap format, aggregate column stats &#8212; is converging toward formal proposals that should land in the coming weeks alongside the long-anticipated formal single-file commits write-up. The Parquet release engineering automation thread should converge into a concrete proposal, and the AGENTS.md and AI tooling policy discussions should harden into something the community can adopt.</p><p>On the Arrow side, the field/schema metadata UTF8 thread and the FlightSql column default value proposal both look ready to mature into more formal proposals, and the pyarrow-stubs donation should land formally. The Iceberg Community Meetup Europe series across Barcelona, Erlangen, London, Amsterdam, and Basel &#8212; plus the Atlanta and Seattle North American meetups &#8212; will continue translating the dev-list conversations into in-person community building. Iceberg Summit 2026 session recordings will continue rolling out on YouTube, and the next round of Apache board reports across all four projects will set the formal narrative for what shipped in May.</p><div><hr></div><h2><strong>Resources &amp; Further Learning</strong></h2><p><strong>Get Started with Dremio</strong></p><ul><li><p><a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-05-13&amp;utm_content=alexmerced">Try Dremio Free</a> &#8212; Build your lakehouse on Iceberg with a free trial</p></li><li><p><a href="https://www.dremio.com/use-cases/lake-to-iceberg-lakehouse/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-05-13&amp;utm_content=alexmerced">Build a Lakehouse with Iceberg, Parquet, Polaris &amp; Arrow</a> &#8212; Learn how Dremio brings the open lakehouse stack together</p></li></ul><p><strong>Free Downloads</strong></p><ul><li><p><a href="https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html">Apache Iceberg: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li><li><p><a href="https://hello.dremio.com/wp-apache-polaris-guide-reg.html">Apache Polaris: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li></ul><p><strong>Books by Alex Merced</strong></p><ul><li><p><a href="https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/">Architecting an Apache Iceberg Lakehouse</a></p></li><li><p><a href="https://www.amazon.com/Enabling-Agentic-Analytics-Apache-Iceberg-ebook/dp/B0GQXT6W3N/">Enabling Agentic Analytics with Apache Iceberg and Dremio</a></p></li><li><p><a href="https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands/dp/B0GQNY21TD/">The 2026 Guide to Lakehouses, Apache Iceberg and Agentic AI</a></p></li><li><p><a href="https://www.amazon.com/Book-Using-Apache-Iceberg-Python/dp/B0GNZ454FF/">The Book on Using Apache Iceberg with Python</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[The Metadata Structure of Modern Table Formats]]></title><description><![CDATA[This is Part 2 of a 15-part Apache Iceberg Masterclass. Part 1 covered why table formats exist.]]></description><link>https://amdatalakehouse.substack.com/p/the-metadata-structure-of-modern</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/the-metadata-structure-of-modern</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Wed, 13 May 2026 13:02:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8aOU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8aOU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8aOU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!8aOU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!8aOU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!8aOU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8aOU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1619880,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196012762?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8aOU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!8aOU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!8aOU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!8aOU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf28830d-5b14-4d7a-bb61-101681dab5e9_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is Part 2 of a 15-part <a href="https://iceberglakehouse.com/posts/">Apache Iceberg Masterclass</a>. <a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-01/">Part 1</a> covered why table formats exist. This article breaks down exactly how each format organizes its metadata.</p><p>The metadata structure of a table format determines everything: how fast queries start planning, how efficiently concurrent writes are handled, how schema changes propagate, and how much overhead accumulates over time. Two formats can both claim &#8220;ACID support&#8221; and &#8220;time travel&#8221; while having fundamentally different mechanisms under the hood.</p><h2><strong>Table of Contents</strong></h2><ol><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-01/">What Are Table Formats and Why Were They Needed?</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-02/">The Metadata Structure of Current Table Formats</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-03/">Performance and Apache Iceberg&#8217;s Metadata</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-04/">Technical Deep Dive on Partition Evolution</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-05/">Technical Deep Dive on Hidden Partitioning</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-06/">Writing to an Apache Iceberg Table</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-07/">What Are Lakehouse Catalogs?</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-08/">Embedded Catalogs: S3 Tables and MinIO AI Stor</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-09/">How Iceberg Table Storage Degrades Over Time</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-10/">Maintaining Apache Iceberg Tables</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-11/">Apache Iceberg Metadata Tables</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-12/">Using Iceberg with Python and MPP Engines</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-13/">Streaming Data into Apache Iceberg Tables</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-14/">Hands-On with Iceberg Using Dremio Cloud</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/">Migrating to Apache Iceberg</a></p></li></ol><h2><strong>Apache Iceberg: The Metadata Tree</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!opVx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!opVx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!opVx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!opVx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!opVx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!opVx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Iceberg's three-layer metadata architecture from catalog to metadata.json to manifest list to manifest files to data files&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Iceberg's three-layer metadata architecture from catalog to metadata.json to manifest list to manifest files to data files" title="Iceberg's three-layer metadata architecture from catalog to metadata.json to manifest list to manifest files to data files" srcset="https://substackcdn.com/image/fetch/$s_!opVx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!opVx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!opVx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!opVx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37189408-fc81-4e01-a366-3ea425a8d832_800x800.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Iceberg organizes metadata into a tree with four levels. Each level adds specificity and enables pruning at query planning time.</p><p><strong>Level 1: Catalog pointer.</strong> The catalog (a REST catalog, <a href="https://www.dremio.com/platform/open-catalog/">Dremio Open Catalog</a>, AWS Glue, or Hive Metastore) stores a pointer to the current <code>metadata.json</code> file. This pointer is the single source of truth for the table&#8217;s current state.</p><p><strong>Level 2: Metadata file (</strong><code>metadata.json</code><strong>).</strong> A JSON file containing the table&#8217;s schema (with column IDs), partition spec, sort order, table properties, and a list of snapshots. Each snapshot represents a complete, immutable version of the table. When the table is updated, a new <code>metadata.json</code> is created with the new snapshot appended to the list.</p><p><strong>Level 3: Manifest list (Avro).</strong> Each snapshot points to exactly one manifest list. The manifest list is a table of contents: it lists all the manifest files that make up this snapshot and includes partition-level summary statistics for each manifest. These summaries let the query planner skip entire manifests that cannot contain data matching the query filter.</p><p><strong>Level 4: Manifest files (Avro).</strong> Each manifest file tracks a set of data files and delete files. For each file, the manifest stores the file path, file size, row count, partition values, and column-level statistics (min value, max value, null count, NaN count, distinct count). These per-file statistics enable file-level pruning during query planning.</p><p>The key insight is that each level progressively narrows the search space. A query engine using <a href="https://www.dremio.com/blog/apache-iceberg-metadata-for-performance/">Dremio</a> or Spark reads the catalog pointer (1 request), loads the metadata file (1 read), checks the manifest list to skip irrelevant manifests (1 read, many skips), then reads only the relevant manifests to find the actual data files to scan. For a petabyte table, this can reduce planning from minutes of directory listing to milliseconds of metadata traversal.</p><h2><strong>Delta Lake: The Sequential Transaction Log</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K4_0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K4_0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!K4_0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!K4_0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!K4_0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K4_0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Delta Lake's transaction log structure with JSON commits, Parquet checkpoints, and the reader process&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Delta Lake's transaction log structure with JSON commits, Parquet checkpoints, and the reader process" title="Delta Lake's transaction log structure with JSON commits, Parquet checkpoints, and the reader process" srcset="https://substackcdn.com/image/fetch/$s_!K4_0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!K4_0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!K4_0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!K4_0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bfeb96-6a17-4bf0-9001-98a63013b9c2_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Delta Lake uses a simpler, linear structure. All metadata lives in the <code>_delta_log/</code> directory alongside the data.</p><p><strong>JSON commit files</strong> (<code>000001.json</code>, <code>000002.json</code>, ...) record each transaction as a set of actions: <code>add</code> (a new data file), <code>remove</code> (a file marked for deletion), <code>metaData</code> (schema or property change), and <code>protocol</code> (version requirements). Each commit file is sequentially numbered.</p><p><strong>Parquet checkpoint files</strong> are created every 10 commits (by default). A checkpoint is a Parquet file that summarizes the cumulative state of the table at that version, essentially a snapshot of all currently-active <code>add</code> actions. This prevents readers from having to replay hundreds of small JSON files.</p><p><code>_last_checkpoint</code> is a small file pointing to the most recent checkpoint. The read process is: find the latest checkpoint, load it, then replay any JSON commits after it.</p><p>The tradeoff: Delta&#8217;s log is simple and easy to reason about, but it does not have the multi-level pruning that Iceberg&#8217;s manifest tree provides. File-level statistics exist in the add actions but are not organized hierarchically. For very large tables (millions of files), the planning phase can be slower because there is no intermediate pruning layer equivalent to Iceberg&#8217;s manifest list.</p><h2><strong>Apache Hudi: The Timeline</strong></h2><p>Hudi stores metadata in the <code>.hoodie/</code> directory as a sequence of &#8220;instants&#8221; on a timeline. Each instant represents an operation (commit, compaction, rollback, clean) and transitions through three states: <code>REQUESTED</code>, <code>INFLIGHT</code>, and <code>COMPLETED</code>.</p><p>The timeline is split into two parts:</p><p><strong>Active timeline</strong> contains recent instants that are needed for current read and write operations. The file naming pattern is <code>&lt;timestamp&gt;.&lt;action_type&gt;.&lt;state&gt;</code>. For example, <code>20250429010500.commit.completed</code> indicates a completed write operation.</p><p><strong>Archived timeline</strong> contains older instants that have been moved to <code>.hoodie/archived/</code> to keep the active timeline lean. Hudi 1.0 introduced an LSM-based timeline that compacts archived instants into Parquet files for efficient long-term storage.</p><p>Hudi&#8217;s timeline tracks more granular operation types than other formats: <code>commit</code> (COW write), <code>delta_commit</code> (MOR write), <code>compaction</code>, <code>clean</code> (garbage collection), <code>rollback</code>, <code>savepoint</code>, and <code>replace</code> (clustering). This granularity reflects Hudi&#8217;s focus on complex write patterns like CDC pipelines.</p><h2><strong>Apache Paimon: Snapshots and LSM Trees</strong></h2><p>Paimon&#8217;s metadata is organized around snapshots and buckets. Each partition is divided into a fixed number of buckets, and each bucket contains an independent LSM (Log-Structured Merge) tree.</p><p>The snapshot metadata tracks which data files and changelog files belong to each bucket at each point in time. Inside each bucket, the LSM tree structure contains multiple &#8220;sorted runs&#8221; (levels) of Parquet files. When data is written, it lands in level 0 as a small sorted file. Background compaction merges small files into larger ones at higher levels.</p><p>This is fundamentally different from the other formats because Paimon&#8217;s metadata structure is designed for continuous streaming writes rather than batch commits. The LSM tree handles high-frequency inserts and updates efficiently by buffering writes in memory and flushing them as sorted runs.</p><h2><strong>DuckLake: SQL Database as Metadata</strong></h2><p>DuckLake takes the most radical departure. Instead of storing metadata as files in object storage, all metadata lives in a traditional SQL database (PostgreSQL, MySQL, SQLite, or DuckDB itself).</p><p>The metadata database contains tables for: schemas, snapshots, data files, column statistics, and table properties. When a query engine needs to plan a query, it issues a single SQL query against the metadata database instead of reading multiple metadata files from object storage.</p><p>The tradeoff is a dependency on a running database process for metadata management. The benefit is dramatically simpler metadata access patterns and the ability to use SQL for metadata operations like listing snapshots, finding files, and checking statistics.</p><h2><strong>Side-by-Side Comparison</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OwKq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OwKq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!OwKq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!OwKq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!OwKq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OwKq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52c84316-c5c4-4350-95bd-170525710576_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Five approaches to table metadata from file-based to database-backed&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Five approaches to table metadata from file-based to database-backed" title="Five approaches to table metadata from file-based to database-backed" srcset="https://substackcdn.com/image/fetch/$s_!OwKq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!OwKq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!OwKq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!OwKq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c84316-c5c4-4350-95bd-170525710576_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0P7P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0P7P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png 424w, https://substackcdn.com/image/fetch/$s_!0P7P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png 848w, https://substackcdn.com/image/fetch/$s_!0P7P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png 1272w, https://substackcdn.com/image/fetch/$s_!0P7P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0P7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png" width="1456" height="1408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1408,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:407064,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196012762?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0P7P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png 424w, https://substackcdn.com/image/fetch/$s_!0P7P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png 848w, https://substackcdn.com/image/fetch/$s_!0P7P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png 1272w, https://substackcdn.com/image/fetch/$s_!0P7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c40e3d3-b896-486e-b1e0-a26683634758_1514x1464.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>For teams building on multiple engines, Iceberg&#8217;s metadata structure provides the best combination of planning efficiency and engine independence. <a href="https://www.dremio.com/blog/apache-iceberg-delta-lake-apache-hudi-a-comparison/">Dremio</a> uses Iceberg&#8217;s metadata tree to achieve fast query planning even on tables with millions of files, and its <a href="https://www.dremio.com/platform/reflections/">Columnar Cloud Cache</a> caches frequently-accessed metadata locally to further reduce planning latency.</p><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-03/">Part 3</a> covers how query engines use Iceberg&#8217;s metadata specifically for performance optimization.</p><h3><strong>Books to Go Deeper</strong></h3><ul><li><p><a href="https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/">Architecting the Apache Iceberg Lakehouse</a> by Alex Merced (Manning)</p></li><li><p><a href="https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands-ebook/dp/B0GQL4QNRT/">Lakehouses with Apache Iceberg: Agentic Hands-on</a> by Alex Merced</p></li><li><p><a href="https://www.amazon.com/Constructing-Context-Semantics-Agents-Embeddings/dp/B0GSHRZNZ5/">Constructing Context: Semantics, Agents, and Embeddings</a> by Alex Merced</p></li><li><p><a href="https://www.amazon.com/Apache-Iceberg-Agentic-Connecting-Structured/dp/B0GW2WF4PX/">Apache Iceberg &amp; Agentic AI: Connecting Structured Data</a> by Alex Merced</p></li><li><p><a href="https://www.amazon.com/Open-Source-Lakehouse-Architecting-Analytical/dp/B0GW595MVL/">Open Source Lakehouse: Architecting Analytical Systems</a> by Alex Merced</p></li></ul><h3><strong>Free Resources</strong></h3><ul><li><p><a href="https://drmevn.fyi/linkpageiceberg">FREE - Apache Iceberg: The Definitive Guide</a></p></li><li><p><a href="https://drmevn.fyi/linkpagepolaris">FREE - Apache Polaris: The Definitive Guide</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-ai-for-dummies-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Agentic AI for Dummies</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-analytics-guide-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Leverage Federation, The Semantic Layer and the Lakehouse for Agentic AI</a></p></li><li><p><a href="https://forms.gle/xdsun6JiRvFY9rB36">FREE with Survey - Understanding and Getting Hands-on with Apache Iceberg in 100 Pages</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[How to Build a Semantic Layer: A Step-by-Step Guide]]></title><description><![CDATA[Most teams start building a semantic layer the wrong way: they open their BI tool, create a few calculated fields, and call it done.]]></description><link>https://amdatalakehouse.substack.com/p/how-to-build-a-semantic-layer-a-step</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/how-to-build-a-semantic-layer-a-step</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Mon, 11 May 2026 13:01:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nbyj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nbyj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nbyj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!nbyj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!nbyj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!nbyj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nbyj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/792df065-3947-4267-b9e4-3c440ece8033_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1097319,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/189277245?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nbyj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!nbyj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!nbyj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!nbyj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F792df065-3947-4267-b9e4-3c440ece8033_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EvmH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EvmH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!EvmH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!EvmH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!EvmH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EvmH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Building a semantic layer &#8212; Bronze, Silver, and Gold tiers&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Building a semantic layer &#8212; Bronze, Silver, and Gold tiers" title="Building a semantic layer &#8212; Bronze, Silver, and Gold tiers" srcset="https://substackcdn.com/image/fetch/$s_!EvmH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!EvmH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!EvmH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!EvmH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59da2dcc-22e9-4393-83ac-9c202523a6a9_640x640.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most teams start building a semantic layer the wrong way: they open their BI tool, create a few calculated fields, and call it done. Six months later, three dashboards define &#8220;churn&#8221; differently, nobody trusts the numbers, and the data team is debugging metric discrepancies instead of building new features.</p><p>A well-built semantic layer prevents all of that. Here&#8217;s how to do it right.</p><h2><strong>Start With Metrics, Not Data Models</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!45n2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!45n2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!45n2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!45n2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!45n2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!45n2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Stakeholders aligning on unified metric definitions&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Stakeholders aligning on unified metric definitions" title="Stakeholders aligning on unified metric definitions" srcset="https://substackcdn.com/image/fetch/$s_!45n2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!45n2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!45n2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!45n2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d989f4e-8b99-48e3-84fa-48430b658141_640x640.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Before writing a single line of SQL, sit down with stakeholders from Sales, Finance, Marketing, and Product. Agree on the top 5-10 business metrics your organization uses to make decisions.</p><p>For each metric, document:</p><ul><li><p><strong>The calculation</strong>: Revenue = SUM(order_total) WHERE status = &#8216;completed&#8217; AND refunded = FALSE</p></li><li><p><strong>The owner</strong>: Who is accountable for this definition?</p></li><li><p><strong>The grain</strong>: Daily? Monthly? Per customer?</p></li><li><p><strong>The refresh cadence</strong>: Real-time? Daily batch? Weekly?</p></li></ul><p>This exercise is harder than it sounds. You will discover that &#8220;Monthly Active Users&#8221; has three competing definitions. That&#8217;s the point. The semantic layer can&#8217;t resolve disagreements that haven&#8217;t been surfaced yet.</p><p><strong>Output</strong>: A metric glossary. This becomes the source document for everything you build next.</p><h2><strong>Map Your Data Sources</strong></h2><p>Inventory every system that feeds into your analytics:</p><p>Source TypeExamplesAccess PatternTransactional databasesPostgreSQL, MySQL, SQL ServerFederated query (read-only)Cloud data lakesS3 (Parquet/Iceberg), Azure Data LakeDirect scan or catalogSaaS platformsSalesforce, HubSpot, StripeAPI extraction or replicationSpreadsheetsGoogle Sheets, ExcelOne-time import or scheduled sync</p><p>Not all sources need to be replicated into a central store. Federation lets you query data where it lives without the cost and complexity of ETL pipelines. Platforms like <a href="https://www.dremio.com/get-started?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Dremio</a> connect to dozens of sources and present them in a single namespace, so your semantic layer can span everything without data movement.</p><h2><strong>Design the Three-Layer View Structure</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5N_z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5N_z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!5N_z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!5N_z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!5N_z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5N_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Bronze, Silver, and Gold data layers in the Medallion Architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Bronze, Silver, and Gold data layers in the Medallion Architecture" title="Bronze, Silver, and Gold data layers in the Medallion Architecture" srcset="https://substackcdn.com/image/fetch/$s_!5N_z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!5N_z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!5N_z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!5N_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce52c60-cd32-491a-a04e-fd6d981f60af_640x640.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The most effective semantic layer architecture uses three layers of SQL views, commonly called the Medallion Architecture.</p><h3><strong>Bronze Layer (Preparation)</strong></h3><p>Create one view per raw source table. Apply no business logic. Just make the data human-readable:</p><ul><li><p>Rename cryptic columns: <code>col_7</code> &#8594; <code>OrderDate</code>, <code>cust_id</code> &#8594; <code>CustomerID</code></p></li><li><p>Cast types to standard formats: strings to dates, integers to decimals</p></li><li><p>Normalize timestamps to UTC</p></li><li><p>Avoid using SQL reserved words as column names (<code>Timestamp</code>, <code>Date</code>, <code>Role</code> will force double-quoting in every downstream query. Use <code>EventTimestamp</code>, <code>TransactionDate</code>, <code>UserRole</code> instead.)</p></li></ul><p>Bronze views should be boring. Their only job is to make raw data safe to work with.</p><h3><strong>Silver Layer (Business Logic)</strong></h3><p>This is where your metric glossary becomes code. Silver views join Bronze views, deduplicate records, filter invalid data, and apply business rules.</p><p>Example:</p><pre><code><code>CREATE VIEW silver.orders_enriched AS
SELECT
    o.OrderID,
    o.OrderDate,
    o.Total AS OrderTotal,
    c.Region,
    c.Segment
FROM bronze.orders_raw o
JOIN bronze.customers_raw c ON o.CustomerID = c.CustomerID
WHERE o.Total &gt; 0 AND o.Status = 'completed';
</code></code></pre><p>Each Silver view encodes exactly one business concept. &#8220;Revenue&#8221; is defined in one place. Every dashboard, notebook, and AI agent that needs revenue queries this view. No exceptions.</p><h3><strong>Gold Layer (Application)</strong></h3><p>Gold views are pre-aggregated for specific consumers. A BI dashboard gets <code>monthly_revenue_by_region</code>. An AI agent gets <code>customer_360_summary</code>. A finance report gets <code>quarterly_financial_summary</code>.</p><p>Gold views don&#8217;t add new business logic. They aggregate and reshape Silver views for performance and usability.</p><h2><strong>Document Everything &#8212; or Let AI Help</strong></h2><p>An undocumented semantic layer is a semantic layer nobody uses. Every table and every column should have a description that explains:</p><ul><li><p>What the data represents</p></li><li><p>Where it comes from</p></li><li><p>Any known limitations or caveats</p></li></ul><p>This is tedious work. Modern platforms accelerate it with AI. Dremio&#8217;s generative AI, for example, can auto-generate Wiki descriptions by sampling table data, and suggest Labels (tags like &#8220;PII,&#8221; &#8220;Finance,&#8221; &#8220;Certified&#8221;) for governance and discoverability. The AI provides a 70% first draft. Your data team fills in the domain-specific context.</p><p>This documentation serves two audiences: human analysts browsing the catalog, and AI agents that need context to generate accurate SQL. Both benefit from rich, accurate descriptions.</p><h2><strong>Enforce Access Policies at the Layer</strong></h2><p>Security should be embedded in the semantic layer, not applied after the fact in each tool. Two patterns:</p><p><strong>Row-Level Security</strong>: Filter what data a user can see based on their role. A regional manager sees only their region&#8217;s data. The SQL view applies the filter automatically.</p><p><strong>Column Masking</strong>: Mask sensitive columns (SSN, email, salary) for roles that don&#8217;t need them. Analysts see <code>****@email.com</code>. Data engineers see the full value.</p><p>The advantage of enforcing policies at the semantic layer: every downstream query inherits the rules, whether the query comes from a dashboard, a Python notebook, or an AI agent. No gaps.</p><h2><strong>Start Small, Then Expand</strong></h2><p>Don&#8217;t try to model your entire data landscape on day one. Start with:</p><ul><li><p>3-5 core metrics from your glossary</p></li><li><p>The 2-3 source systems those metrics depend on</p></li><li><p>One Bronze &#8594; Silver &#8594; Gold pipeline per metric</p></li></ul><p>Validate by running the same question across two different tools (a BI dashboard and a SQL notebook, for example). If both return the same number, the semantic layer is working. If they don&#8217;t, fix the Silver view definition before adding more.</p><p>Once the first metrics are stable, expand incrementally. Add new sources, new Silver views, new Gold views. Each addition is low-risk because the layered structure isolates changes.</p><h2><strong>What to Do Next</strong></h2><p>Pick the metric your organization argues about the most. Define it explicitly in a Silver view. Test it against the current dashboards. If the numbers match, you&#8217;ve validated the approach. If they don&#8217;t, you&#8217;ve just found the inconsistency that&#8217;s been silently costing your organization trust.</p><p><a href="https://www.dremio.com/get-started?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Try Dremio Cloud free for 30 days</a></p>]]></content:encoded></item><item><title><![CDATA[AI Weekly: Free Web Tools, MCP Production Wins, Trusted-Compute Models (April 30–May 6, 2026)]]></title><description><![CDATA[This week pushed three concrete lines forward at once.]]></description><link>https://amdatalakehouse.substack.com/p/ai-weekly-free-web-tools-mcp-production</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/ai-weekly-free-web-tools-mcp-production</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Fri, 08 May 2026 13:00:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0gcD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0gcD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0gcD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!0gcD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!0gcD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!0gcD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0gcD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1125408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196677487?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0gcD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!0gcD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!0gcD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!0gcD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24082042-6be5-45de-bf5d-0c8d5c29a34f_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This week pushed three concrete lines forward at once. Vercel open-sourced an AI security harness, TinyFish made paid web search and fetch APIs free for AI agents, and Jama Software shipped the first Model Context Protocol server for engineering management. Underneath that, Z.ai&#8217;s GLM-5.1 went live inside a trusted execution environment, and Anthropic previewed a proactive assistant for its work product. Here is what shipped and why each piece matters.</p><h2><strong>AI Coding Tools: Vercel Ships deepsec, TinyFish Drops Search Behind a Paywall</strong></h2><p>Vercel <a href="https://www.cryptointegrat.com/p/ai-news-may-5-2026">open-sourced deepsec</a>, an AI-powered security harness that uses Claude and Codex to scan large codebases for vulnerabilities. The tool runs CLI-first, scales to over 1,000 concurrent sandboxes, and works with any pluggable coding agent through Vercel&#8217;s AI Gateway or your own subscription. The pitch is straight: most coding agents handle one repo at a time, but security audits need to fan out across dozens of services in parallel. deepsec treats the agent as a worker pool and scales horizontally.</p><p>TinyFish made its <a href="https://www.cryptointegrat.com/p/ai-news-may-5-2026">Web Search and Fetch APIs free for all developers and AI agents</a> on May 5. The free tier supports Claude Code, Cursor, Codex, and other major agent frameworks, with no credit card required and what TinyFish calls generous rate limits. Web access has been a paid bottleneck for agent workflows since 2024, and a free tier from a vendor that already serves the agent ecosystem will pull pricing pressure across the rest of the search-API market.</p><p>Anthropic also previewed a proactive assistant called <a href="https://www.cryptointegrat.com/p/ai-news-may-5-2026">Orbit</a> for Claude Cowork. Orbit will pull insights from Gmail, Slack, GitHub, Calendar, Drive, and Figma, then surface them on its own without the user asking. The product is reportedly a Max-tier feature, and Orbit Apps were also referenced in the leaks. The combination of always-on context and proactive surface area is the next step beyond chat-only agent products.</p><h2><strong>AI Processing: GLM-5.1 Runs FP8 Inside a Trusted Execution Environment</strong></h2><p>Z.ai&#8217;s GLM-5.1 <a href="https://www.cryptointegrat.com/p/ai-news-may-5-2026">went live on the 0G Private Computer</a> running FP8 inside a Trusted Execution Environment on May 5. The model is a 754-billion-parameter Mixture-of-Experts release with 40 billion active parameters per token, shipped under the MIT license on April 7. Running it inside a TEE means the weights and prompts stay encrypted from the host operating system and cloud provider, which closes the residual trust gap that has slowed enterprise self-hosting of large open-weight models.</p><p>The 0G deployment matters for a specific reason. GLM-5.1 was <a href="https://winbuzzer.com/2026/04/09/z-ai-releases-glm-5-1-754b-model-tops-swe-bench-pro-xcxwbn/">trained entirely on Huawei Ascend 910B chips</a> with no Nvidia or AMD GPUs, scores 58.4 percent on SWE-Bench Pro, and sustains autonomous task execution for over 8 hours. Putting that capability behind a TEE on a third-party serving platform is the first time a frontier-tier open-weight model has been delivered with hardware-backed confidentiality outside a hyperscaler.</p><p>Air Street&#8217;s State of AI report for May noted that the UK AI Security Institute now estimates <a href="https://press.airstreet.com/p/state-of-ai-may-2026">frontier cyber-offence capability is doubling every four months</a>, with both Anthropic&#8217;s Claude Mythos Preview and OpenAI&#8217;s GPT-5.5 clearing a 32-step end-to-end cyber-attack range in a single month. The compute side of the picture stayed dense as well. Anthropic raised an <a href="https://press.airstreet.com/p/state-of-ai-may-2026">additional $40 billion from Google plus $5 billion from Amazon</a>, packaged with $100 billion of AWS spend and chip deals with Google and Broadcom reportedly worth hundreds of billions.</p><h2><strong>Standards &amp; Protocols: First MCP Server for Engineering Management</strong></h2><p>Jama Software <a href="https://itbusinessnet.com/2026/05/jama-software-launches-model-context-protocol-mcp-server/">launched the first MCP Server for engineering management software</a> on May 4. Jama Connect 9.35 lets engineers work in Claude, Codex, Cursor, GitHub Copilot, Visual Studio, or any MCP-compatible tool while keeping the existing Traceability Information Model, permissions, lifecycle workflows, and audit trails intact. The pitch from CTO Jim Davidson is that AI engineering agents need Spec Driven Development to deliver compliant velocity gains, and MCP is now the standard pipe for that integration.</p><p>Unity AI also entered open beta this week with built-in MCP Server support, alongside the AI Gateway for third-party AI integrations. Game studios get a built-in agent tuned for Unity workflows plus the option to plug in any MCP-compatible client. The pattern across both releases is clear: vertical product vendors are no longer asking whether to support MCP. They are shipping it as a default integration surface alongside their native UIs.</p><p>The Model Context Protocol&#8217;s <a href="https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/">2026 roadmap</a> sets the priorities behind these adoptions. Lead Maintainer David Soria Parra named four focus areas: stateless transport for horizontal scaling, server discovery through .well-known URLs, task lifecycle for retry semantics and result expiry, and enterprise-readiness work covering audit trails, SSO, and gateway behavior. The June 2026 specification cycle is targeted for the stateless transport changes. Agentic AI Foundation governance, two specification releases through 2025, and a 500-plus public server count have moved MCP from an experiment into the production layer it was designed to be.</p><h2><strong>Resources to Go Further</strong></h2><p>The AI landscape changes fast. Here are tools and resources to help you keep pace.</p><p><strong>Try Dremio Free</strong>: Experience agentic analytics and an Apache Iceberg-powered lakehouse. <a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=05-06-2026&amp;utm_content=alexmerced">Start your free trial</a></p><p><strong>Learn Agentic AI with Data</strong>: Dremio&#8217;s agentic analytics features let your AI agents query and act on live data. <a href="https://www.dremio.com/use-cases/agentic-ai/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=05-06-2026&amp;utm_content=alexmerced">Explore Dremio Agentic AI</a></p><p><strong>Join the Community</strong>: Connect with data engineers and AI practitioners building on open standards. <a href="https://developer.dremio.com/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=05-06-2026&amp;utm_content=alexmerced">Join the Dremio Developer Community</a></p><p><strong>Book: The 2026 Guide to AI-Assisted Development</strong>: Covers prompt engineering, agent workflows, MCP, evaluation, security, and career paths. <a href="https://www.amazon.com/2026-Guide-AI-Assisted-Development-Engineering-ebook/dp/B0GQW7CTML/">Get it on Amazon</a></p><p><strong>Book: Using AI Agents for Data Engineering and Data Analysis</strong>: A practical guide to Claude Code, Google Antigravity, OpenAI Codex, and more. <a href="https://www.amazon.com/Using-Agents-Data-Engineering-Analysis-ebook/dp/B0GR6PYJT9/">Get it on Amazon</a></p>]]></content:encoded></item><item><title><![CDATA[Apache Data Lakehouse Weekly: April 30–May 6, 2026]]></title><description><![CDATA[The release wave that defined late April carried straight into early May, with Arrow shipping two more votes in seven days, Polaris settling into post-1.4.0 stabilization mode, and the Iceberg dev list staying focused on V4 design follow-ups from the summit.]]></description><link>https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april-b6f</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april-b6f</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Thu, 07 May 2026 13:03:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!MQ-w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MQ-w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MQ-w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!MQ-w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!MQ-w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!MQ-w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MQ-w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1065104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196668927?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MQ-w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!MQ-w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!MQ-w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!MQ-w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66bdef4b-c6b3-4576-b72e-8f08ca514dd1_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The release wave that defined late April carried straight into early May, with Arrow shipping two more votes in seven days, Polaris settling into post-1.4.0 stabilization mode, and the Iceberg dev list staying focused on V4 design follow-ups from the summit. The clearest story of the week is Arrow&#8217;s release engineering: the arrow-rs 58.2.0 vote that opened on April 28 closed cleanly on May 2, and the Arrow .NET 23.0.0 vote opened the same day and passed by May 5. Two votes, two passing results, four days apart &#8212; a cadence that would have been unimaginable a year ago when the project was still navigating its full-stack release cycle. Iceberg&#8217;s design lists stayed in absorption mode as contributors continued to translate post-summit alignments into formal specification work, and Parquet&#8217;s dev list remained dense with format-level threads that have been simmering since the ALP encoding vote closed in April.</p><h2><strong>Apache Iceberg</strong></h2><p>Iceberg&#8217;s dev list ran quieter this week than the Arrow and Polaris lists, but the design conversations that have anchored 2026 continued to advance in the background. The V4 metadata.json optionality direction &#8212; the proposal to treat catalog-managed metadata as a first-class supported mode while preserving static-table portability through explicit opt-in semantics &#8212; is still the project&#8217;s defining specification conversation, with Anton Okolnychyi, Yufei Gu, Shawn Chang, Steven Wu, and Russell Spitzer continuing to push edge cases on portability guarantees and Spark driver behavior. The single-file commits proposal that Russell Spitzer and Amogh Jahagirdar have been advancing remains on track for a formal write-up that should land on the dev list in the coming weeks.</p><p>P&#233;ter V&#225;ry&#8217;s <a href="https://www.mail-archive.com/dev@iceberg.apache.org/msg12972.html">efficient column updates proposal</a> for wide tables continues to attract collaboration. Anurag Mantripragada and G&#225;bor Kaszab are working alongside P&#233;ter on POC benchmarks for both the Iceberg-native and Parquet-native approaches, with the latency and metadata footprint improvements making this one of the more practically grounded V4 proposals on the list. The design &#8212; write only the columns that change on each commit, then stitch the result at read time &#8212; is squarely aimed at petabyte-scale feature stores with thousands of embedding and model-score columns, and that workload pressure is precisely what&#8217;s pulling the V4 spec design forward.</p><p>The <a href="https://www.mail-archive.com/dev@iceberg.apache.org/msg13144.html">labels in LoadTableResponse proposal</a> that Andrei Tserakhau drove through March continues to anchor the catalog-managed metadata conversation. The design lets each catalog (Polaris, Unity Catalog, Lakekeeper) surface internal metadata such as ownership, cost attribution, and semantic context through a standard optional field on table loads, without forcing requirements onto catalogs that don&#8217;t track that data. The cross-implementation POCs that Andrei published &#8212; Polaris (PR #4048), Unity Catalog (PR #1417), Lakekeeper (PR #1676), and the PyIceberg client (PR #3191) &#8212; remain useful reference points as the spec change progresses through review. Iceberg Summit 2026 session recordings continued rolling out on the project&#8217;s YouTube channel, and the published AI contribution policy that Holden Karau, Kevin Liu, Steve Loughran, and Sung Yun pushed through March remains the next concrete deliverable to track.</p><h2><strong>Apache Polaris</strong></h2><p>Polaris transitioned from release-week intensity into stabilization mode this week. The 1.4.0 release that Adnan Hemani <a href="https://mail-archive.com/dev@polaris.apache.org/msg04499.html">announced on April 23</a>, followed by the <a href="https://mail-archive.com/dev@polaris.apache.org/msg04551.html">Python CLI 1.4.0 release</a> on April 28, gave the project its first major release pair as a graduated top-level project. The post-launch issues that Alexandre Dutra surfaced &#8212; the <a href="https://mail-archive.com/dev@polaris.apache.org/msg04512.html">Helm chart repo inconsistency</a>, the <a href="https://mail-archive.com/dev@polaris.apache.org/msg04513.html">release workflow failure in step 4</a>, the <a href="https://mail-archive.com/dev@polaris.apache.org/msg04514.html">Artifact Hub request</a>, and the <a href="https://mail-archive.com/dev@polaris.apache.org/msg04544.html">KMS-related upgrade bug</a> &#8212; are exactly the kind of friction a project surfaces in its first independent release cycle. Yufei Gu has continued to triage most of the upgrade-path issues, and the Helm packaging questions are converging toward resolution.</p><p>Design discussions stayed active alongside the post-release stabilization. EJ Wang&#8217;s <a href="https://mail-archive.com/dev@polaris.apache.org/msg04485.html">DISCUSS thread on AGENTS.md for Polaris</a> &#8212; the proposal to add agent-readable repository metadata so coding agents can pick up the project conventions consistently &#8212; continued building toward a concrete implementation proposal, which the previous newsletter flagged as the next deliverable to watch. ITing Lee&#8217;s <a href="https://mail-archive.com/dev@polaris.apache.org/msg04430.html">proposal to add OpenLineage to Polaris</a> has accumulated the volume of review feedback from Adnan Hemani, Jean-Baptiste Onofr&#233;, Yufei Gu, and Michael Collado that it needs to move toward an implementation RFC. Yufei&#8217;s <a href="https://mail-archive.com/dev@polaris.apache.org/msg04486.html">thread on narrowing the scope of SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION</a> drew further engagement from Dmitri Bourlatchkov and Dennis Huo, and Alexandre Dutra&#8217;s URL path decoding and PolarisPrivilege grant validation threads continued to be active points of discussion.</p><p>Jean-Baptiste Onofr&#233;&#8217;s confirmation that Polaris is back on a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04476.html">monthly release cadence</a> means a 1.4.1 patch release or 1.5.0 planning email is the natural next step. Given the volume of upgrade-path issues that surfaced after 1.4.0, a quick 1.4.1 to address the KMS bug and Helm packaging fixes seems the more likely path before the project moves on to 1.5.0 feature scoping.</p><h2><strong>Apache Arrow</strong></h2><p>Arrow&#8217;s release engine kept running. Andrew Lamb&#8217;s <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34631.html">arrow-rs 58.2.0 RC1 vote</a> that opened on April 28 closed on May 2, with <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34637.html">the release approved</a> by 6 +1 votes (4 binding) and immediately published to crates.io. Bryce Mecum, Ed Seidl, Jeffrey Vo, Ra&#250;l Cumplido, and L. C. Hsieh carried the verification work, with L. C. Hsieh casting the final binding +1 from an Intel Mac on April 29. The 58.2.0 release continues the monthly arrow-rs cadence that has held since 58.1.0 shipped in March, and 59.0.0 remains scheduled as a major release that may include breaking changes.</p><p>Curt Hagenlocher opened the <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34638.html">Arrow .NET 23.0.0 RC0 vote</a> on May 2 &#8212; the same day arrow-rs 58.2.0 was approved &#8212; and <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34653.html">the vote passed</a> on May 5 with 5 binding +1s from Bryce Mecum, Adam Reeve, Ra&#250;l Cumplido, Sutou Kouhei, and Curt himself. Sutou Kouhei verified on Debian sid with .NET SDK 8.0.413, and Curt ported verify_rc.sh to Powershell as part of the validation. Curt is now working through the post-vote release tasks, including a 401 issue with the GitHub release download script that he flagged for follow-up. The .NET 23.0.0 release continues the steady cadence the .NET implementation has settled into post the 22.0.0 cycle.</p><p>Beyond releases, the design conversations stayed lively. The <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34576.html">pyarrow-stubs donation vote</a> that Rok Mihevc opened on April 14 continued building toward a final tally. Emil Sadek&#8217;s <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34619.html">ADBC Logo Proposal</a> drew further engagement from Nic Crane, Julian Hyde, and Rusty Conover, and Benjamin Philip&#8217;s <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34628.html">Arrow Erlang grant documents thread</a> continued the project&#8217;s expansion into more language ecosystems. Andrew Lamb&#8217;s <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34610.html">arrow-rs security policy discussion</a> and Mandukhai Alimaa&#8217;s <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34604.html">canonical BigDecimal extension type proposal</a> both continued to draw input as the project tightens its production posture.</p><h2><strong>Apache Parquet</strong></h2><p>Parquet&#8217;s lists stayed dense. Manu Zhang&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27212.html">DISCUSS thread on a new parquet-java release</a> continued attracting input from Steve Loughran, Aaron Niskode-Dossett, Fokko Driesprong, Julien Le Dem, Gang Wu, and Rahil C, with the conversation now narrowing on a target version and ship date for what would be the next parquet-java release after 1.17.0. Isma&#235;l Mej&#237;a&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27247.html">thread soliciting code reviews for Java performance optimization work</a> continued with Steve Loughran picking up the review load.</p><p>The format-level proposals continued evolving. Will Edwards&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27142.html">DISCUSS thread on an alternative to the FlatBuffer footer with a lightweight byte-offset index</a> kept drawing design feedback from Andrew Lamb, Ed Seidl, Jan Finis, Alkis Evlogimenos, Raphael Taylor-Davies, and Andrew Bell. Ed Seidl&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27197.html">proposal to make path_in_schema optional</a> continued attracting commentary from Gang Wu, Steve Loughran, and Micah Kornfield. Andrew Lamb&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27192.html">thread on where VariantJsonParser should live</a> &#8212; the cross-project boundary question between Parquet and Iceberg&#8217;s variant tooling &#8212; kept moving with input from Steve Loughran and Gang Wu.</p><p>The Geospatial work continued threading toward closure. Milan Stefanovic&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27136.html">Geospatial CRS string format clarification</a> drew further input from Dewey Dunnington and Micah Kornfield, and Jan Finis&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27214.html">question on RLE bitpack page-edge validity</a> continued the kind of spec-edge clarification work that matters for cross-implementation interoperability. The Parquet sync that Julien Le Dem ran on April 22 set the agenda for the design work that&#8217;s now playing out across the dev list.</p><h2><strong>Cross-Project Themes</strong></h2><p>This week&#8217;s clearest pattern is the rhythm of post-graduation Polaris finding its operational footing alongside Arrow&#8217;s well-established release cadence. Two Arrow votes in four days plus the Polaris 1.4.x stabilization wave plus Iceberg&#8217;s quiet absorption of summit alignments plus Parquet&#8217;s dense format-level work make the lakehouse stack feel less like four separate projects and more like one coordinated platform. The arrow-rs 58.2.0 release in particular landed inside a single five-day vote window &#8212; proposed April 28, approved May 2, published to crates.io the same day &#8212; which is a useful benchmark for how tight Apache release engineering can run when the verification community is engaged.</p><p>The second pattern is the continued translation of post-summit alignments into spec work. The V4 metadata.json optionality direction, the labels-in-LoadTableResponse proposal, the AGENTS.md thread for Polaris, the OpenLineage RFC, the Parquet footer redesign work, and the Geospatial spec clarifications are all converging on the same broader question: what does the lakehouse stack look like when the workloads it powers shift from analytical SQL to AI agents and ML feature engineering? Each design conversation makes more sense if you assume the next decade&#8217;s workload mix looks meaningfully different from the last decade&#8217;s.</p><p>The third pattern is enterprise-readiness work surfacing in real time. Polaris&#8217;s KMS upgrade bug, Helm packaging issues, OAuth2 Manager v2 design, and credential-subscoping scope discussion are all the work of a project being deployed at scale rather than a project being built. The visible triage on the dev list rather than behind closed doors is a healthy signal.</p><h2><strong>Looking Ahead</strong></h2><p>Watch for a Polaris 1.4.1 patch release vote to address the KMS bug and Helm packaging issues that surfaced after 1.4.0, with 1.5.0 planning to follow. The AGENTS.md discussion should firm into a concrete implementation proposal, and the Polaris OpenLineage RFC has the volume of feedback it needs to move toward an implementation. On the Iceberg side, the formal V4 single-file commits write-up, the V4 metadata.json optionality direction, and the published AI contribution policy remain the next concrete deliverables to track. The labels-in-LoadTableResponse spec PR (apache/iceberg#15750) should converge toward merge as the cross-catalog POCs validate the design.</p><p>On the Arrow side, the pyarrow-stubs donation vote should close in the coming days, and arrow-go and arrow-cpp release planning will shape what ships in May and June. For Parquet, Manu Zhang&#8217;s parquet-java release thread should converge on a target version, the path_in_schema optionality proposal looks ready for a formal vote, and the FlatBuffer-footer alternative is on track for a more formal design document. Iceberg Summit 2026 session recordings will continue rolling out on YouTube &#8212; the V4 design talks and production case studies from Apple, Bloomberg, and Pinterest are particularly worth catching as they land.</p><div><hr></div><h2><strong>Resources &amp; Further Learning</strong></h2><p><strong>Get Started with Dremio</strong></p><ul><li><p><a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-05-06&amp;utm_content=alexmerced">Try Dremio Free</a> &#8212; Build your lakehouse on Iceberg with a free trial</p></li><li><p><a href="https://www.dremio.com/use-cases/lake-to-iceberg-lakehouse/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-05-06&amp;utm_content=alexmerced">Build a Lakehouse with Iceberg, Parquet, Polaris &amp; Arrow</a> &#8212; Learn how Dremio brings the open lakehouse stack together</p></li></ul><p><strong>Free Downloads</strong></p><ul><li><p><a href="https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html">Apache Iceberg: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li><li><p><a href="https://hello.dremio.com/wp-apache-polaris-guide-reg.html">Apache Polaris: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li></ul><p><strong>Books by Alex Merced</strong></p><ul><li><p><a href="https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/">Architecting an Apache Iceberg Lakehouse</a></p></li><li><p><a href="https://www.amazon.com/Enabling-Agentic-Analytics-Apache-Iceberg-ebook/dp/B0GQXT6W3N/">Enabling Agentic Analytics with Apache Iceberg and Dremio</a></p></li><li><p><a href="https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands/dp/B0GQNY21TD/">The 2026 Guide to Lakehouses, Apache Iceberg and Agentic AI</a></p></li><li><p><a href="https://www.amazon.com/Book-Using-Apache-Iceberg-Python/dp/B0GNZ454FF/">The Book on Using Apache Iceberg with Python</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[What Are Table Formats and Why Were They Needed?]]></title><description><![CDATA[This is Part 1 of a 15-part Apache Iceberg Masterclass. This article covers the fundamental question: what problem do table formats solve, and why does the choice between them matter?]]></description><link>https://amdatalakehouse.substack.com/p/what-are-table-formats-and-why-were</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/what-are-table-formats-and-why-were</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Wed, 06 May 2026 13:02:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wFmj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wFmj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wFmj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!wFmj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!wFmj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!wFmj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wFmj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1729452,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196011553?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wFmj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!wFmj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!wFmj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!wFmj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50711fe8-ed29-4780-a0b1-20f10519707f_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is Part 1 of a 15-part <a href="https://iceberglakehouse.com/posts/">Apache Iceberg Masterclass</a>. This article covers the fundamental question: what problem do table formats solve, and why does the choice between them matter?</p><p>A data lake without a table format is a collection of files. It has no concept of a transaction, no mechanism to prevent two writers from producing corrupted state, and no efficient way to determine which files belong to the current version of a table. Table formats exist because the gap between &#8220;a pile of Parquet files&#8221; and &#8220;a reliable analytical table&#8221; is enormous, and bridging it requires a formal metadata specification.</p><h2><strong>Table of Contents</strong></h2><ol><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-01/">What Are Table Formats and Why Were They Needed?</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-02/">The Metadata Structure of Current Table Formats</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-03/">Performance and Apache Iceberg&#8217;s Metadata</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-04/">Technical Deep Dive on Partition Evolution</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-05/">Technical Deep Dive on Hidden Partitioning</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-06/">Writing to an Apache Iceberg Table</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-07/">What Are Lakehouse Catalogs?</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-08/">Embedded Catalogs: S3 Tables and MinIO AI Stor</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-09/">How Iceberg Table Storage Degrades Over Time</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-10/">Maintaining Apache Iceberg Tables</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-11/">Apache Iceberg Metadata Tables</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-12/">Using Iceberg with Python and MPP Engines</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-13/">Streaming Data into Apache Iceberg Tables</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-14/">Hands-On with Iceberg Using Dremio Cloud</a></p></li><li><p><a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/">Migrating to Apache Iceberg</a></p></li></ol><h2><strong>The World Before Table Formats</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WtnV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WtnV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!WtnV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!WtnV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!WtnV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WtnV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;How table formats solved the chaos of raw data lakes with a structured metadata layer&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How table formats solved the chaos of raw data lakes with a structured metadata layer" title="How table formats solved the chaos of raw data lakes with a structured metadata layer" srcset="https://substackcdn.com/image/fetch/$s_!WtnV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!WtnV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!WtnV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!WtnV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566d5808-df34-4db4-975d-1514b7beb11a_800x800.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Before table formats, data lakes relied on a simple convention: data was organized into directories in object storage (S3, ADLS, GCS), and the <a href="https://cwiki.apache.org/confluence/display/hive/design#Design-HiveMetastore">Hive Metastore</a> tracked which directories corresponded to which partitions.</p><p>This approach had five critical problems:</p><p><strong>No atomic commits.</strong> If a Spark job wrote 500 new Parquet files and failed after writing 300, readers could see the 300 partial files. There was no mechanism to make all 500 files visible at once or none of them. Cleanup required manual intervention or custom garbage collection scripts.</p><p><strong>Expensive query planning.</strong> To determine which files to scan, the engine issued <code>LIST</code> requests against object storage. S3 returns up to 5,000 objects per request. A table with 100,000 files required 20+ sequential HTTP calls before query execution could even start. At Netflix, query planning for large tables could take minutes just from directory listing.</p><p><strong>Schema changes required rewrites.</strong> Adding a column to a Hive table meant either rewriting every file (expensive) or accepting that old files had a different schema than new files (confusing). Renaming a column was not supported without a full table rewrite because Hive mapped columns by position, not by identity.</p><p><strong>No time travel.</strong> Once data was overwritten, the previous version was gone. There was no snapshot history, no ability to roll back a bad write, and no way to reproduce a query result from last Tuesday.</p><p><strong>Exposed partitioning.</strong> Users had to know the physical partition layout. A table partitioned by <code>year</code> and <code>month</code> required queries to explicitly filter on those columns using the exact partition column names (<code>WHERE year = 2024 AND month = 3</code>). If partitioning changed, every downstream query broke.</p><h2><strong>What a Table Format Actually Is</strong></h2><p>A table format is a specification that defines how to organize metadata about data files so that query engines can treat them as reliable, transactional tables. It sits between the query engine and the physical files.</p><p>The core responsibilities of every table format:</p><ul><li><p><strong>File tracking</strong>: Maintain an explicit list of which data files belong to the current version of the table, eliminating directory listing</p></li><li><p><strong>Atomic commits</strong>: Make all changes to a table visible to readers at once through a single metadata pointer swap</p></li><li><p><strong>Schema management</strong>: Track the table schema and its evolution history, allowing safe column adds, drops, renames, and reorders</p></li><li><p><strong>Partition management</strong>: Define how data is partitioned and enable query pruning without exposing the physical layout to users</p></li><li><p><strong>Snapshot history</strong>: Maintain a history of table states for time travel, rollback, and auditing</p></li><li><p><strong>Statistics</strong>: Store column-level min/max values and other statistics to enable file skipping during query planning</p></li></ul><p>The data files themselves are still standard <a href="https://parquet.apache.org/">Parquet</a> or ORC. The table format adds a metadata layer on top that gives those files the properties of a database table.</p><h2><strong>The Five Table Formats</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DnF7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DnF7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!DnF7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!DnF7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!DnF7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DnF7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Timeline showing the evolution from Hive Metastore through Hudi, Iceberg, Delta Lake, Paimon, and DuckLake&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Timeline showing the evolution from Hive Metastore through Hudi, Iceberg, Delta Lake, Paimon, and DuckLake" title="Timeline showing the evolution from Hive Metastore through Hudi, Iceberg, Delta Lake, Paimon, and DuckLake" srcset="https://substackcdn.com/image/fetch/$s_!DnF7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!DnF7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!DnF7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!DnF7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b49e32e-ece9-4ca3-a458-e54ee8d32a89_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Five table formats exist today, each born from a different problem and optimized for a different workload.</p><h3><strong>Apache Iceberg</strong></h3><p>Iceberg started at Netflix in 2017, created by Ryan Blue to solve Netflix&#8217;s petabyte-scale query planning problems. It uses a three-layer metadata tree: a <code>metadata.json</code> file points to a manifest list, which points to manifest files, which track individual data files with column-level statistics.</p><p>Iceberg&#8217;s defining feature is its <a href="https://iceberg.apache.org/spec/">formal specification</a>. Any engine that follows the spec can read and write Iceberg tables correctly. This makes Iceberg the most engine-neutral format. Spark, Trino, Flink, <a href="https://www.dremio.com/blog/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/">Dremio</a>, Snowflake, BigQuery, Athena, StarRocks, and DuckDB all support it.</p><p>Iceberg also introduced <a href="https://www.dremio.com/blog/fewer-accidental-full-table-scans-brought-to-you-by-apache-icebergs-hidden-partitioning/">hidden partitioning</a> and partition evolution, which are covered in depth in Parts 4 and 5 of this series.</p><h3><strong>Delta Lake</strong></h3><p>Delta Lake was created at Databricks in 2019. It stores metadata as a sequential transaction log (<code>_delta_log/</code>) of JSON and Parquet checkpoint files. Each commit appends a new log entry describing which files were added or removed.</p><p>Delta Lake&#8217;s design prioritizes simplicity within the Spark ecosystem. Its strongest features are Liquid Clustering (adaptive data organization that replaces static partitioning) and UniForm (automatic generation of Iceberg-compatible metadata so other engines can read Delta tables as Iceberg).</p><h3><strong>Apache Hudi</strong></h3><p>Hudi originated at Uber in 2016, designed specifically for Change Data Capture (CDC) pipelines that needed to upsert millions of records per hour. Hudi uses a timeline-based metadata architecture where each commit, compaction, and rollback is an &#8220;action instant.&#8221;</p><p>Hudi offers both Copy-on-Write (rewrite entire files on update) and Merge-on-Read (write deltas and merge at read time) table types, plus record-level indexing for fast point lookups. It excels when your primary workload involves frequent row-level updates and deletes.</p><h3><strong>Apache Paimon</strong></h3><p>Paimon evolved from Flink Table Store at Alibaba and entered Apache incubation in 2023. It uses <a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree">LSM-tree</a> based storage internally, making it the most streaming-native table format.</p><p>Tables in Paimon are divided into partitions and then further into buckets, each containing an independent LSM tree. This structure enables high-throughput streaming writes with millisecond-level latency. Paimon supports multiple merge engines (deduplication, partial update, aggregation) that determine how records with the same primary key are resolved.</p><h3><strong>DuckLake</strong></h3><p>DuckLake is the newest entry, released by DuckDB Labs and MotherDuck in 2025. It takes a fundamentally different approach: instead of storing metadata as files in object storage, DuckLake stores all metadata in a standard SQL database (PostgreSQL, MySQL, SQLite, or DuckDB itself).</p><p>This means a single SQL query resolves all metadata (schema, file list, statistics) instead of the multiple HTTP requests required by file-based metadata formats. The tradeoff is a dependency on a running database for the metadata layer and currently limited engine support (primarily DuckDB).</p><h2><strong>Where Each Format Excels</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gjLl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gjLl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!gjLl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!gjLl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!gjLl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gjLl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Positioning chart showing where Iceberg, Delta Lake, Hudi, Paimon, and DuckLake sit on batch vs streaming and single vs multi-engine axes&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Positioning chart showing where Iceberg, Delta Lake, Hudi, Paimon, and DuckLake sit on batch vs streaming and single vs multi-engine axes" title="Positioning chart showing where Iceberg, Delta Lake, Hudi, Paimon, and DuckLake sit on batch vs streaming and single vs multi-engine axes" srcset="https://substackcdn.com/image/fetch/$s_!gjLl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!gjLl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!gjLl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!gjLl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb361351e-11bf-4b5d-992b-e513da8b0be8_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7xNl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7xNl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png 424w, https://substackcdn.com/image/fetch/$s_!7xNl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png 848w, https://substackcdn.com/image/fetch/$s_!7xNl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png 1272w, https://substackcdn.com/image/fetch/$s_!7xNl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7xNl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png" width="1456" height="1176" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1176,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:286414,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196011553?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7xNl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png 424w, https://substackcdn.com/image/fetch/$s_!7xNl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png 848w, https://substackcdn.com/image/fetch/$s_!7xNl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png 1272w, https://substackcdn.com/image/fetch/$s_!7xNl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c59de22-fd62-4fdc-b2b9-359a1654f8ce_1538x1242.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The key insight: each format reflects the priorities of the team that built it. Netflix needed multi-engine reads at petabyte scale (Iceberg). Uber needed high-frequency upserts (Hudi). Alibaba needed real-time streaming from Flink (Paimon). Databricks needed Spark-optimized simplicity (Delta). DuckDB Labs wanted SQL-native metadata management (DuckLake).</p><h2><strong>Why Iceberg Has Become the Default</strong></h2><p>Iceberg has achieved the broadest adoption for three reasons:</p><ol><li><p><strong>Specification-first design.</strong> Iceberg&#8217;s <a href="https://iceberg.apache.org/spec/">spec</a> is independent of any engine or vendor. Any team can build a conforming implementation. This created a network effect: more engine support attracted more users, which attracted more engine support.</p></li><li><p><strong>No engine dependency.</strong> Unlike Delta Lake&#8217;s historical Spark dependency or Paimon&#8217;s Flink focus, Iceberg was designed from day one to work across engines. A table written by Spark can be read by <a href="https://www.dremio.com/blog/apache-iceberg-delta-lake-apache-hudi-a-comparison/">Dremio</a>, Trino, Flink, or Snowflake without conversion.</p></li><li><p><strong>Industry convergence.</strong> Snowflake, AWS (Athena, EMR), Google (BigQuery), and Databricks (via UniForm) have all adopted Iceberg as an interoperability standard. When the major cloud vendors align on a format, it becomes the safe choice for long-term investments.</p></li></ol><p>That said, Iceberg is not universally superior. Hudi&#8217;s record-level indexing makes it faster for point lookups on upsert-heavy tables. Paimon&#8217;s LSM-tree architecture handles continuous streaming ingestion with lower latency than Iceberg&#8217;s batch-oriented commit model. DuckLake&#8217;s SQL-based metadata is simpler for single-engine, local-first analytics.</p><p>The rest of this series focuses on Iceberg because its design decisions and capabilities represent the state of the art for multi-engine analytical lakehouses. <a href="https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-02/">Part 2</a> examines the metadata structures of all five formats in detail.</p><h3><strong>Books to Go Deeper</strong></h3><p>To learn more about Apache Iceberg and the lakehouse architecture, check out these resources:</p><ul><li><p><a href="https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/">Architecting the Apache Iceberg Lakehouse</a> by Alex Merced (Manning)</p></li><li><p><a href="https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands-ebook/dp/B0GQL4QNRT/">Lakehouses with Apache Iceberg: Agentic Hands-on</a> by Alex Merced</p></li><li><p><a href="https://www.amazon.com/Constructing-Context-Semantics-Agents-Embeddings/dp/B0GSHRZNZ5/">Constructing Context: Semantics, Agents, and Embeddings</a> by Alex Merced</p></li><li><p><a href="https://www.amazon.com/Apache-Iceberg-Agentic-Connecting-Structured/dp/B0GW2WF4PX/">Apache Iceberg &amp; Agentic AI: Connecting Structured Data</a> by Alex Merced</p></li><li><p><a href="https://www.amazon.com/Open-Source-Lakehouse-Architecting-Analytical/dp/B0GW595MVL/">Open Source Lakehouse: Architecting Analytical Systems</a> by Alex Merced</p></li></ul><h3><strong>Free Resources</strong></h3><ul><li><p><a href="https://drmevn.fyi/linkpageiceberg">FREE - Apache Iceberg: The Definitive Guide</a></p></li><li><p><a href="https://drmevn.fyi/linkpagepolaris">FREE - Apache Polaris: The Definitive Guide</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-ai-for-dummies-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Agentic AI for Dummies</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-analytics-guide-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Leverage Federation, The Semantic Layer and the Lakehouse for Agentic AI</a></p></li><li><p><a href="https://forms.gle/xdsun6JiRvFY9rB36">FREE with Survey - Understanding and Getting Hands-on with Apache Iceberg in 100 Pages</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[What is Dremio? The Unified Lakehouse and AI Platform]]></title><description><![CDATA[If you manage a modern data stack, you likely spend the majority of your time and compute budget moving data around.]]></description><link>https://amdatalakehouse.substack.com/p/what-is-dremio-the-unified-lakehouse</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/what-is-dremio-the-unified-lakehouse</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Tue, 05 May 2026 18:41:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!DjRz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DjRz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DjRz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!DjRz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!DjRz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!DjRz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DjRz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20ed5628-1500-462b-925a-2911b092696f_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1439604,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196574481?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DjRz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!DjRz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!DjRz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!DjRz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20ed5628-1500-462b-925a-2911b092696f_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you manage a modern data stack, you likely spend the majority of your time and compute budget moving data around. You pull data from an operational database, stage it in object storage, transform it, load it into a data warehouse, and finally extract it into BI extracts. This DIY approach creates fragile pipelines, delayed insights, and vendor lock-in.</p><p>Dremio exists to eliminate this complexity. As a mature platform with 11 years of engineering development behind it, it is a unified analytics solution that allows you to query data where it lives, govern it securely, and interact with it using built-in Agentic AI.</p><p>To understand what Dremio does, you must view it as a three-part platform: a Federated Query Engine, an Iceberg Lakehouse Platform, and an Agentic AI Layer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EaxM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EaxM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!EaxM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!EaxM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!EaxM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EaxM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Dremio's Three-Part Platform Overview: Federated Query Engine, Iceberg Lakehouse, and Agentic AI&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Dremio's Three-Part Platform Overview: Federated Query Engine, Iceberg Lakehouse, and Agentic AI" title="Dremio's Three-Part Platform Overview: Federated Query Engine, Iceberg Lakehouse, and Agentic AI" srcset="https://substackcdn.com/image/fetch/$s_!EaxM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!EaxM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!EaxM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!EaxM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe978cd96-42fd-4efd-a73e-f440a905e799_800x800.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Pillar 1: The Federated Query Engine</strong></h2><p>At its core, Dremio is an execution engine built on the principle of &#8220;Query, Don&#8217;t Move.&#8221;</p><p>Instead of forcing you to centralize all your data into a single proprietary warehouse, Dremio acts as a logical abstraction layer. When a user or BI dashboard submits a SQL query, Dremio parses the request, identifies the underlying data sources, and generates optimized sub-queries. It pushes down filters and aggregations to the source systems, retrieves the minimal necessary data, and executes the final joins in memory.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j7lJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j7lJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!j7lJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!j7lJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!j7lJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j7lJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Federated Query Engine splitting a single query to Amazon S3, PostgreSQL, and Oracle&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Federated Query Engine splitting a single query to Amazon S3, PostgreSQL, and Oracle" title="Federated Query Engine splitting a single query to Amazon S3, PostgreSQL, and Oracle" srcset="https://substackcdn.com/image/fetch/$s_!j7lJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!j7lJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!j7lJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!j7lJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172cf300-039e-4252-bbc7-568bb61cdaba_800x800.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This architecture eliminates the serialization tax and allows for <strong>Zero-Copy Data Movement</strong>. While many other platforms have historically struggled to scale query federation, Dremio is able to scale it effortlessly. This is because of Apache Arrow&#8217;s high-speed in-memory columnar execution, Dremio&#8217;s intelligent pushdowns, and Iceberg-based Reflections. These features give Dremio a massive performance advantage over other query federation tools that do not leverage them. You bypass complex, multi-stage ETL pipelines entirely while maintaining interactive analytics speed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sAEi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sAEi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!sAEi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!sAEi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!sAEi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sAEi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Comparison of a massive ETL pipeline against a direct zero-copy pointer to raw storage&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Comparison of a massive ETL pipeline against a direct zero-copy pointer to raw storage" title="Comparison of a massive ETL pipeline against a direct zero-copy pointer to raw storage" srcset="https://substackcdn.com/image/fetch/$s_!sAEi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!sAEi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!sAEi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!sAEi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb11b9644-500b-4555-8501-187abc1e8eb0_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Pillar 2: The Iceberg Lakehouse Platform</strong></h2><p>While federation is a great starting place to operationalize your data analytics rapidly, you ideally want the majority of your analytics to operate directly from your data lake using Apache Iceberg tables. Shifting workloads to Iceberg provides three major advantages:</p><ol><li><p><strong>Reduction in costs:</strong> You rely on cheaper object storage (like Amazon S3, ADLS, or Google Cloud Storage) while eliminating the need for duplicative storage and expensive ETL pipelines.</p></li><li><p><strong>Tool interoperability:</strong> Open standards ensure better collaboration between teams, allowing data engineers, analysts, and data scientists to interact with the exact same data using different compute engines.</p></li><li><p><strong>Autonomous performance management:</strong> Dremio automatically optimizes your Iceberg tables and accelerates their performance with background Reflections. This makes a lakehouse feel as fast and easy to use as a traditional warehouse, but without the premium costs.</p></li></ol><p>By natively supporting Apache Parquet and Apache Iceberg, Dremio brings relational database capabilities (like ACID transactions, schema evolution, and time travel) directly to your object storage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!510G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!510G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!510G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!510G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!510G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!510G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Iceberg Lakehouse Architecture showing the hierarchy from catalog to metadata to Parquet files&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Iceberg Lakehouse Architecture showing the hierarchy from catalog to metadata to Parquet files" title="Iceberg Lakehouse Architecture showing the hierarchy from catalog to metadata to Parquet files" srcset="https://substackcdn.com/image/fetch/$s_!510G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!510G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!510G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!510G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54900e2e-3a94-4fc9-a5d4-a08d7bb1ddb1_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To manage this open ecosystem securely, Dremio integrates tightly with Apache Polaris. Polaris serves as a neutral, open catalog that provides centralized governance, role-based access control (RBAC), and credential vending. It ensures that whether you query data using Dremio, Apache Spark, or Apache Flink, every engine respects the same security policies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ze9S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ze9S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!Ze9S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!Ze9S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!Ze9S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ze9S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Apache Polaris Governance acting as an umbrella over multiple query engines&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Apache Polaris Governance acting as an umbrella over multiple query engines" title="Apache Polaris Governance acting as an umbrella over multiple query engines" srcset="https://substackcdn.com/image/fetch/$s_!Ze9S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!Ze9S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!Ze9S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!Ze9S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04ba8a8-8806-4e88-8ab5-33405674db92_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>However, querying raw files on object storage can occasionally bottleneck at large scales. Dremio solves this with <strong>Autonomous Reflections</strong>. Instead of relying on data engineers to manually build and maintain materialized views or OLAP cubes, Dremio monitors query patterns and automatically materializes optimized data structures in the background. When a user runs a query, the engine transparently routes it to the Reflection, delivering sub-second BI performance directly on the lakehouse.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7ORQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7ORQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!7ORQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!7ORQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!7ORQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7ORQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Autonomous Reflections Lifecycle: Query Monitoring, Background Materialization, and Instant Acceleration&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Autonomous Reflections Lifecycle: Query Monitoring, Background Materialization, and Instant Acceleration" title="Autonomous Reflections Lifecycle: Query Monitoring, Background Materialization, and Instant Acceleration" srcset="https://substackcdn.com/image/fetch/$s_!7ORQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!7ORQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!7ORQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!7ORQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaafd88-d876-45a9-be94-9ace626e26ef_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Pillar 3: The Agentic AI Layer</strong></h2><p>A fast query engine is useless if users cannot find or understand the data. Dremio bridges this gap by integrating artificial intelligence deeply into the platform.</p><p>The foundation of this layer is the AI-powered semantic layer. It maps raw tables and columns into clean, business-friendly concepts through SQL Views, tags, wikis, lineage and a knowledge graph with built-in semantic search capabilities to leverage it. This governed semantic layer ensures that both human analysts and AI agents interpret the data identically.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JwJK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JwJK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!JwJK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!JwJK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!JwJK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JwJK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Agentic AI Layer Overview showing the Semantic Layer feeding both Human Analysts and AI Agents&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Agentic AI Layer Overview showing the Semantic Layer feeding both Human Analysts and AI Agents" title="Agentic AI Layer Overview showing the Semantic Layer feeding both Human Analysts and AI Agents" srcset="https://substackcdn.com/image/fetch/$s_!JwJK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!JwJK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!JwJK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!JwJK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff65bed2-f44b-440d-8f16-98cc005af56f_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For human users, Dremio includes a built-in AI Agent. Users simply type a natural language request, such as &#8220;Show top customers by revenue,&#8221; and the agent instantly translates it into a highly optimized SQL query based on the context embedded in the semantic layer. But it goes beyond just translation (the agent immediately executes the query and can automatically generates interactive data visualizations or insightsbased on the results).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n3yv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n3yv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!n3yv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!n3yv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!n3yv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n3yv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Built-in AI Agent Flow translating natural language into SQL, executing it, and generating a visual chart&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Built-in AI Agent Flow translating natural language into SQL, executing it, and generating a visual chart" title="Built-in AI Agent Flow translating natural language into SQL, executing it, and generating a visual chart" srcset="https://substackcdn.com/image/fetch/$s_!n3yv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!n3yv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!n3yv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!n3yv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae0d7ab-5884-4c14-b2c1-3e4a6147d96c_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For system automation, Dremio provides a Model Context Protocol (MCP) Server. The Dremio MCP Server allows external AI assistants and local IDEs to securely connect to the lakehouse with already built in ability to leverage Dremio&#8217;s semantic layer. The server registers tools for semantic discovery and query execution, enabling AI agents to autonomously research and analyze data on your behalf.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yVjF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yVjF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!yVjF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!yVjF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!yVjF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yVjF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Dremio MCP Server Architecture connecting a Local AI Assistant to the Lakehouse&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Dremio MCP Server Architecture connecting a Local AI Assistant to the Lakehouse" title="Dremio MCP Server Architecture connecting a Local AI Assistant to the Lakehouse" srcset="https://substackcdn.com/image/fetch/$s_!yVjF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!yVjF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!yVjF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!yVjF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F223ab264-3a8f-4d63-bbd4-92d70b6aecd6_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Finally, Dremio brings Generative AI directly into your data pipelines through Native AI SQL Functions. Functions like <code>AI_COMPLETE</code>, <code>AI_GENERATE</code>, and <code>AI_CLASSIFY</code> allow you to process unstructured data directly within a <code>SELECT</code> statement. You can extract structured fields from raw PDF blobs or classify customer sentiment without ever moving the data to an external machine learning service.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h883!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h883!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!h883!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!h883!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!h883!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h883!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Native AI SQL Functions extracting structured data from a raw PDF document&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Native AI SQL Functions extracting structured data from a raw PDF document" title="Native AI SQL Functions extracting structured data from a raw PDF document" srcset="https://substackcdn.com/image/fetch/$s_!h883!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!h883!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!h883!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!h883!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2939e2bd-962d-4b00-a980-84ebfcfdecd4_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Conclusion</strong></h2><p>Dremio is not a traditional data warehouse. It is a unified platform that eliminates data silos through a federated query engine, secures your object storage with an Iceberg-based lakehouse, and accelerates insights with an Agentic AI layer.</p><p>By building on open standards like Apache Iceberg, Apache Parquet, Apache Arrow, and Apache Polaris, you maintain full control of your data. You achieve interactive BI performance without vendor lock-in.</p><p>Ready to build your open data architecture? Take the next step:</p><ul><li><p><strong><a href="https://www.dremio.com/get-started">Try the free trial</a></strong></p></li><li><p><strong>Learn more about Dremio at a workshop or webinar</strong> (<a href="https://www.dremio.com/events">Events</a> and <a href="https://www.dremio.com/workshops">Workshops</a>)</p></li><li><p><strong>Download free books:</strong></p><ul><li><p><a href="https://drmevn.fyi/linkpageiceberg">FREE - Apache Iceberg: The Definitive Guide</a></p></li><li><p><a href="https://drmevn.fyi/linkpagepolaris">FREE - Apache Polaris: The Definitive Guide</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-ai-for-dummies-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Agentic AI for Dummies</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-analytics-guide-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Leverage Federation, The Semantic Layer and the Lakehouse for Agentic AI</a></p></li><li><p><a href="https://forms.gle/xdsun6JiRvFY9rB36">FREE with Survey - Understanding and Getting Hands-on with Apache Iceberg in 100 Pages</a></p></li><li><p><a href="https://www.puppygraph.com/ebooks/apache-iceberg-digest-vol-1">FREE - The Apache Iceberg Digest: Vol1</a></p></li></ul></li></ul>]]></content:encoded></item><item><title><![CDATA[What Is a Semantic Layer? A Complete Guide]]></title><description><![CDATA[Ask three teams in your company how they calculate &#8220;revenue&#8221; and you&#8217;ll get three answers.]]></description><link>https://amdatalakehouse.substack.com/p/what-is-a-semantic-layer-a-complete</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/what-is-a-semantic-layer-a-complete</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Mon, 04 May 2026 13:03:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tyTN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tyTN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tyTN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!tyTN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!tyTN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!tyTN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tyTN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1104502,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/189276974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tyTN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!tyTN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!tyTN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!tyTN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa02964d-95ab-4dff-b65e-ed5622baa44f_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Ask three teams in your company how they calculate &#8220;revenue&#8221; and you&#8217;ll get three answers. Sales counts bookings. Finance counts recognized revenue. Marketing counts pipeline value. All three call it &#8220;revenue.&#8221; All three get different numbers. Nobody knows which one is right.</p><p>This is the problem a semantic layer solves.</p><h2><strong>What a Semantic Layer Actually Is</strong></h2><p>A semantic layer is a logical abstraction between your raw data and the people (or AI agents) querying it. It maps technical database objects &#8212; tables, columns, join paths &#8212; to business-friendly terms like &#8220;Revenue,&#8221; &#8220;Active Customer,&#8221; or &#8220;Churn Rate.&#8221;</p><p>It&#8217;s not a database. It doesn&#8217;t store data. It&#8217;s a layer of definitions, calculations, and context that ensures every query against your data produces consistent results, regardless of which tool or person runs it.</p><p>The concept isn&#8217;t new. Business Objects introduced &#8220;universes&#8221; in the 1990s &#8212; metadata models that let users drag and drop business concepts instead of writing SQL. What&#8217;s changed is scope. Modern semantic layers are universal (not tied to one BI tool), AI-aware (they provide context to language models), and governance-integrated (they enforce access policies alongside definitions).</p><h2><strong>What a Semantic Layer Contains</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kNZ1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kNZ1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!kNZ1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!kNZ1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!kNZ1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kNZ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Five key components of a semantic layer connected to a central hub&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Five key components of a semantic layer connected to a central hub" title="Five key components of a semantic layer connected to a central hub" srcset="https://substackcdn.com/image/fetch/$s_!kNZ1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!kNZ1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!kNZ1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!kNZ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7535e330-1a53-483e-87d5-e968970f4f99_640x640.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A complete semantic layer includes six components:</p><p>ComponentWhat It Does<strong>Virtual datasets (Views)</strong>SQL-defined business logic applied once and reused everywhere<strong>Metric definitions</strong>Canonical calculations for KPIs (e.g., MRR = SUM of active subscription revenue)<strong>Documentation</strong>Human- and machine-readable descriptions of tables, columns, and relationships<strong>Labels and tags</strong>Categorization for governance (PII, Finance) and discovery<strong>Join relationships</strong>Pre-defined join paths so users don&#8217;t need to know foreign keys<strong>Access policies</strong>Row-level security and column masking enforced at the layer</p><p>The key insight: these components serve both human analysts and AI agents. When an AI generates SQL from a natural language question, it consults this same layer to understand what &#8220;revenue&#8221; means, which tables to join, and which columns to filter.</p><h2><strong>How It Works in Practice</strong></h2><p>Here&#8217;s what happens when someone queries data through a semantic layer:</p><ol><li><p>A user (or AI agent) asks: &#8220;What was revenue by region last quarter?&#8221;</p></li><li><p>The semantic layer translates:</p><ul><li><p>&#8220;Revenue&#8221; &#8594; <code>SUM(orders.total) WHERE orders.status = 'completed'</code></p></li><li><p>&#8220;Region&#8221; &#8594; <code>customers.region</code></p></li><li><p>&#8220;Last quarter&#8221; &#8594; <code>WHERE order_date BETWEEN '2025-10-01' AND '2025-12-31'</code></p></li></ul></li><li><p>The query engine generates optimized SQL against the underlying data sources</p></li><li><p>Results are returned using business terms, not raw column names</p></li></ol><p>The user never writes SQL. The AI never guesses at column names. The metric definition is applied identically whether the query runs in a dashboard, a Python notebook, or a chat interface.</p><h2><strong>Why It Matters Now More Than Ever</strong></h2><p>Three trends are making semantic layers essential, not optional.</p><p><strong>AI agents need business context.</strong> Large language models generating SQL will hallucinate column names, use incorrect aggregation logic, and join tables wrong unless they have explicit definitions to work from. A semantic layer provides that grounding. This is why platforms like <a href="https://www.dremio.com/blog/agentic-analytics-semantic-layer/?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Dremio embed a semantic layer directly into the query engine</a> &#8212; it&#8217;s the context that makes the AI accurate instead of confidently wrong.</p><p><strong>Self-service analytics demands accessibility.</strong> Business users want to query data without filing a ticket. Exposing raw database schemas to non-technical users creates more problems than it solves. A semantic layer presents data in terms people already understand.</p><p><strong>Governance requires centralized definitions.</strong> GDPR, CCPA, and industry regulations require organizations to know what data they have, who can access it, and how it&#8217;s used. A semantic layer centralizes these definitions and enforces access policies in one place instead of across dozens of tools.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vtLw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vtLw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!vtLw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!vtLw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!vtLw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vtLw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Without vs. with a semantic layer &#8212; from metric chaos to alignment&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Without vs. with a semantic layer &#8212; from metric chaos to alignment" title="Without vs. with a semantic layer &#8212; from metric chaos to alignment" srcset="https://substackcdn.com/image/fetch/$s_!vtLw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!vtLw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!vtLw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!vtLw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ba55d-e927-4f1b-999c-9ec45ee48372_640x640.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Common Misconceptions</strong></h2><p><strong>&#8220;It&#8217;s just a data catalog.&#8221;</strong> A data catalog is an inventory &#8212; it tells you what data exists. A semantic layer defines what data <em>means</em> and how to calculate it. You need both. They&#8217;re complementary, not interchangeable. (See: Semantic Layer vs. Data Catalog)</p><p><strong>&#8220;It&#8217;s just a BI tool feature.&#8221;</strong> Some BI tools include semantic models (Looker&#8217;s LookML, Power BI&#8217;s datasets). But these are tool-specific. If your organization uses three BI tools, you maintain three separate semantic models. A universal semantic layer defines metrics once and serves them to every tool.</p><p><strong>&#8220;It adds a performance penalty.&#8221;</strong> Modern semantic layers don&#8217;t just translate queries &#8212; they optimize them. Dremio, for example, uses <a href="https://www.dremio.com/blog/5-ways-dremio-reflections-outsmart-traditional-materialized-views/?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Reflections</a> (pre-computed, physically optimized data copies) to accelerate queries that pass through its semantic layer. The result is often faster than querying raw tables directly.</p><h2><strong>What to Do Next</strong></h2><p>Pick your organization&#8217;s five most important metrics. Ask two different teams how each one is calculated. If the answers don&#8217;t match, that&#8217;s your signal. You don&#8217;t have a semantic layer problem &#8212; you have a trust problem, and a semantic layer is how you fix it.</p><p><a href="https://www.dremio.com/get-started?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Try Dremio Cloud free for 30 days</a></p>]]></content:encoded></item><item><title><![CDATA[Semantic Layer: The Definitive Guide]]></title><description><![CDATA[The term &#8220;semantic layer&#8221; has been part of the data industry&#8217;s vocabulary for over 35 years.]]></description><link>https://amdatalakehouse.substack.com/p/semantic-layer-the-definitive-guide</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/semantic-layer-the-definitive-guide</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Fri, 01 May 2026 13:23:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5jrL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5jrL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5jrL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!5jrL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!5jrL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!5jrL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5jrL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1439281,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196111291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5jrL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!5jrL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!5jrL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!5jrL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe806063b-fdf0-4ce5-b5e9-e6537c6f3d2d_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The term &#8220;semantic layer&#8221; has been part of the data industry&#8217;s vocabulary for over 35 years. It first appeared in a 1991 patent filing by Business Objects, and it has since been reinvented, abandoned, and reinvented again across three distinct eras of data architecture. Today, it sits at the center of one of the most consequential design debates in the industry: should the semantic layer be a standalone product you bolt onto your stack, or a native capability of the platform that already manages your data?</p><p>This guide covers the full arc: what a semantic layer is, where it came from, how it split into two competing architectural approaches, and why the choice between them determines whether your AI agents produce accurate answers or hallucinated nonsense.</p><h2><strong>What a Semantic Layer Actually Is</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aQyB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aQyB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!aQyB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!aQyB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!aQyB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aQyB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The semantic layer sits between raw data sources and consumers, providing metric consistency, access governance, and query abstraction&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The semantic layer sits between raw data sources and consumers, providing metric consistency, access governance, and query abstraction" title="The semantic layer sits between raw data sources and consumers, providing metric consistency, access governance, and query abstraction" srcset="https://substackcdn.com/image/fetch/$s_!aQyB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!aQyB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!aQyB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!aQyB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe06e22ba-33c4-43d2-803e-ac0a3e6b5af6_800x800.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A semantic layer is an abstraction that maps the physical structure of your data (table names, column names, join logic, filter conditions) to the business terms that people actually use (revenue, churn rate, active customer, cost per acquisition). It sits between the raw data and every consumer of that data: BI dashboards, AI agents, Python notebooks, Excel spreadsheets, and custom applications.</p><p>The semantic layer has three core responsibilities:</p><p><strong>Metric consistency.</strong> When the finance team says &#8220;revenue,&#8221; they mean recognized revenue net of refunds. When the sales team says &#8220;revenue,&#8221; they mean bookings including pending deals. Without a semantic layer, both teams write their own SQL, get different numbers, and spend the next two weeks arguing about which dashboard is right. A semantic layer defines &#8220;revenue&#8221; once, in one place, and every downstream consumer uses that definition.</p><p><strong>Access governance.</strong> The semantic layer controls who sees what. A marketing analyst querying customer data should not see Social Security numbers. A regional manager should only see data for their region. These rules (row-level security, column masking, role-based access) are defined at the semantic layer and enforced consistently regardless of which tool is doing the querying.</p><p><strong>Query abstraction.</strong> Business users and AI agents should not need to know that &#8220;customer churn rate&#8221; requires joining three tables, filtering out test accounts, calculating a 90-day rolling window, and dividing by the active customer count from the prior period. The <a href="https://www.dremio.com/platform/unified-analytics/ai-semantic-layer/">semantic layer</a> encapsulates that logic in a reusable definition. Consumers ask for &#8220;churn rate&#8221; and get the right answer.</p><h2><strong>The Origin Story: Business Objects, 1991</strong></h2><p>The semantic layer was invented to solve a simple problem: business users could not write SQL.</p><p>In 1991, Business Objects filed a patent for a &#8220;relational database access system using semantically dynamic objects.&#8221; The product feature was called &#8220;Universes.&#8221; It worked like this: a data architect would build a metadata model that mapped physical database tables and join paths into business-friendly objects (&#8221;Customer,&#8221; &#8220;Product,&#8221; &#8220;Sales Amount&#8221;). Report builders could then drag and drop these objects to create queries without touching SQL.</p><p>This was a significant advance. Before Universes, generating a report from a relational database required either a developer who understood the schema or a business user willing to learn SQL. Business Objects eliminated that requirement entirely.</p><p>IBM&#8217;s Cognos followed with &#8220;Framework Manager,&#8221; which served the same purpose: map the physical database into a logical, business-friendly model. SAP built InfoProviders and BEx queries on top of SAP BW. Microsoft introduced SQL Server Analysis Services.</p><p>Every major enterprise BI vendor in the 1990s built some version of a semantic layer. But they all shared the same fundamental limitation: <strong>the semantic layer was proprietary and locked to a single vendor&#8217;s BI tool.</strong> If you built your metrics in Business Objects Universes, those definitions did not carry over to Cognos. If you modeled your data in SSAS, Tableau could not read it. The semantic layer existed, but it was a walled garden.</p><h2><strong>OLAP Cubes: The Implicit Semantic Layer</strong></h2><p>Running parallel to the relational semantic layer was the OLAP (Online Analytical Processing) cube. Products like SQL Server Analysis Services, Cognos TM1, and Oracle Essbase pre-computed data into multidimensional structures: dimensions (Customer, Product, Time), measures (Revenue, Quantity, Profit), and hierarchies (Year &gt; Quarter &gt; Month &gt; Day).</p><p>The cube itself functioned as a semantic layer. Business users did not query tables; they navigated dimensions. They did not write SQL; they used MDX (Multidimensional Expressions) or simply clicked through pivot-table interfaces. The business logic was baked into the cube&#8217;s structure.</p><p>OLAP cubes worked well for their era. Pre-computing aggregations meant that analytical queries returned in seconds, even on the hardware of the early 2000s. But they had three fatal weaknesses:</p><ol><li><p><strong>Rigidity.</strong> Adding a new dimension or changing a hierarchy required rebuilding the cube, which could take hours for large datasets. Business requirements change faster than cubes can be rebuilt.</p></li><li><p><strong>Cost.</strong> Cubes stored pre-aggregated copies of data. For large organizations, this meant maintaining terabytes of redundant, pre-computed data on expensive storage.</p></li><li><p><strong>Specialization.</strong> Operating an OLAP cube required specialized skills (MDX, cube design, aggregation strategies) that most data teams did not have.</p></li></ol><p>As cloud data warehouses like Snowflake, BigQuery, and Redshift made raw compute cheap and fast, the need for pre-aggregation declined. You could run the analytical query directly against the detail data and get results in seconds. The cube&#8217;s primary value proposition, speed through pre-computation, was no longer unique.</p><h2><strong>The Self-Service Era and the Loss of the Semantic Layer</strong></h2><p>The 2010s brought a dramatic shift. Self-service BI tools like Tableau and Power BI connected directly to databases, bypassing the semantic layer entirely. This was marketed as democratization: give every analyst direct access to the data, and they will find their own insights.</p><p>For small teams, this worked. For organizations with more than a handful of analysts, it created a problem that the industry calls &#8220;metric drift.&#8221; Without a centralized semantic layer, each analyst wrote their own SQL. Each SQL query embedded its own business logic. Revenue was calculated five different ways by five different people, and no one could agree on which number was correct.</p><p>The first response to metric drift came from <a href="https://cloud.google.com/looker">Looker</a>, founded in 2012, which introduced LookML as a code-based semantic layer. You defined your metrics, dimensions, and relationships in version-controlled modeling files. This was a meaningful evolution because it separated the semantic logic from the BI tool&#8217;s proprietary report format. Google acquired Looker for $2.6 billion in 2019, validating that the semantic layer was worth owning. But LookML was still tied to Looker&#8217;s ecosystem. If you used Tableau or Power BI as your primary BI tool, LookML did not help.</p><p>The broader industry realization was clear: <strong>skipping the semantic layer does not eliminate the need for one. It just distributes the problem across every team and every dashboard, where it becomes harder to find and harder to fix.</strong></p><h2><strong>Dremio: The Semantic Layer Built Into the Query Engine From Day One</strong></h2><p>While Looker was coupling the semantic layer to a BI tool, a different approach was emerging. Dremio was founded in 2015 by Tomer Shiran and Jacques Nadeau, creators and contributors to the Apache Drill project. When Dremio publicly launched in July 2017, it introduced what it called a &#8220;governed, self-service semantic layer&#8221; as a core architectural component, not an add-on.</p><p>The key difference: Dremio&#8217;s semantic layer was integrated directly into a high-performance query engine. From its first release, the platform shipped with:</p><ul><li><p><strong>Virtual Datasets (Views).</strong> SQL-defined business logic that users could create, share, and layer on top of any connected data source. No data movement required.</p></li><li><p><strong>Reflections.</strong> Patented, transparent materialized views that the query optimizer substitutes automatically. Users query the governed view; Dremio serves the fastest available Reflection behind the scenes.</p></li><li><p><strong>Federated access.</strong> The semantic layer worked across data sources (S3, HDFS, relational databases) from the start, not just against a single warehouse.</p></li></ul><p>Dremio added Wikis and Labels (Tags) in subsequent releases, providing Markdown-formatted documentation and classification metadata directly attached to datasets in the catalog. This meant the semantic layer was not just a set of views; it included the context that made those views discoverable and understandable.</p><p>This was architecturally distinct from every other semantic layer on the market. AtScale (founded 2013) and Cube (open-sourced 2019) built the semantic layer as a separate product. Dremio built it into the same platform that executed the queries and managed the catalog. That design decision would become increasingly important as AI agents entered the picture.</p><h2><strong>The Modern Resurgence: Two Divergent Paths</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SQox!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SQox!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!SQox!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!SQox!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!SQox!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SQox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The semantic layer evolved from 1991 through OLAP cubes and self-service BI into two divergent paths: standalone products and platform-integrated semantic layers&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The semantic layer evolved from 1991 through OLAP cubes and self-service BI into two divergent paths: standalone products and platform-integrated semantic layers" title="The semantic layer evolved from 1991 through OLAP cubes and self-service BI into two divergent paths: standalone products and platform-integrated semantic layers" srcset="https://substackcdn.com/image/fetch/$s_!SQox!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!SQox!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!SQox!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!SQox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff170ea82-b05b-4ded-a9d0-6e045fe064ee_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>By the early 2020s, the semantic layer was firmly back. dbt Labs acquired Transform (the creators of MetricFlow) in February 2023 to build a code-based metrics layer. Cube had open-sourced its API-first semantic layer in 2019 and launched Cube Cloud commercially in 2021. AtScale had been building its enterprise virtualization layer since 2013.</p><p>The market had split into two fundamentally different architectural forms, and the choice between them has significant consequences for how your data platform operates.</p><p><strong>Path 1: The semantic layer as a standalone product.</strong> Companies like AtScale (2013), Cube (2019), and dbt (MetricFlow, 2023) built the semantic layer as a separate service that sits between your data warehouse and your BI tools. You deploy it as its own infrastructure, manage it as its own system, and integrate it with your existing stack.</p><p><strong>Path 2: The semantic layer as a platform feature.</strong> <a href="https://www.dremio.com/blog/agentic-analytics-semantic-layer/">Dremio</a> (2017) integrated the semantic layer directly into its query engine and data catalog from the start. There is no separate service to deploy. The semantic layer is a native capability of the same platform that stores, governs, and queries your data.</p><p>Both approaches solve the metric consistency problem. They differ in how they solve it, what they require operationally, and how well they extend to AI-driven analytics.</p><h2><strong>Path 1: The Semantic Layer as a Standalone Product</strong></h2><p>Three standalone semantic layer products dominate the current market. Each targets a different architecture and team profile.</p><h3><strong>AtScale (Founded 2013)</strong></h3><p>AtScale, founded by veterans of the Yahoo data team, positions itself as a &#8220;universal semantic layer&#8221; for large enterprises. It creates a virtualization layer across multiple data warehouses (Snowflake, BigQuery, Databricks), presenting a unified semantic model to BI tools. Its strongest feature is native MDX and DAX compatibility, which makes it the only option for organizations with heavy Excel and SSAS dependencies.</p><p>AtScale excels when you have data spread across multiple warehouses and need a single semantic model that works across all of them. The tradeoff is infrastructure complexity and licensing cost. AtScale requires dedicated infrastructure, and its enterprise pricing model reflects its positioning.</p><h3><strong>Cube (Open-Sourced 2019)</strong></h3><p><a href="https://cube.dev/">Cube</a> started as Statsbot in 2016 before pivoting to become an open-source, API-first semantic layer in 2019. It provides REST, GraphQL, SQL, MDX, and DAX APIs, making it the most flexible option for embedded analytics and customer-facing dashboards. Cube Cloud launched commercially in 2021. Cube&#8217;s pre-aggregation engine can deliver sub-second responses for complex queries by pre-computing results and caching them.</p><p>Cube excels when your primary consumer is a custom application, not a BI tool. The tradeoff is operational overhead: Cube runs as a separate server, requires its own infrastructure, and demands expertise in designing pre-aggregation strategies to achieve optimal performance.</p><h3><strong>dbt Semantic Layer (MetricFlow, 2023)</strong></h3><p>The dbt Semantic Layer is powered by MetricFlow, the technology dbt Labs acquired when it purchased Transform in February 2023. It lets teams define metrics as code in YAML files within their existing dbt projects. Metrics are version-controlled, reviewed via pull requests, and deployed alongside your dbt transformations. In late 2025, dbt Labs moved MetricFlow to an Apache 2.0 license, signaling a commitment to open, portable metric definitions.</p><p>The dbt Semantic Layer excels when your team is already a dbt shop and wants metrics managed in the same Git-based workflow as transformations. The tradeoff is that it requires dbt Cloud for the serving layer, lacks native caching (relying on the underlying warehouse for query execution), and is less suited for high-concurrency embedded applications.</p><h3><strong>The Structural Tradeoff of Standalone Products</strong></h3><p>All three standalone products share the same architectural limitation: they exist as a separate layer of infrastructure that must integrate with your data catalog, your governance system, and your query engine.</p><p>This means:</p><ul><li><p><strong>Another system to operate.</strong> You deploy it, monitor it, upgrade it, and debug it.</p></li><li><p><strong>Governance is a separate concern.</strong> Access policies defined in your catalog or warehouse must be replicated or synced with the semantic layer. Any gap is a security risk.</p></li><li><p><strong>No native execution.</strong> Standalone semantic layers define metrics but do not execute queries. They translate user requests into SQL and send that SQL to an external engine. If the engine and the semantic layer disagree on the data model, you get wrong results.</p></li><li><p><strong>Sync lag.</strong> When you change a table schema, add a column, or update governance rules, the semantic layer must be updated separately. Until it is, your definitions are stale.</p></li></ul><p>For teams with a single data warehouse, a strong DevOps practice, and a primary use case that matches one of these products, standalone semantic layers work well. For teams managing federated data across multiple sources, or teams building AI-driven analytics, the gap between &#8220;definition&#8221; and &#8220;execution&#8221; creates friction that compounds over time.</p><h2><strong>Path 2: The Semantic Layer as a Platform Feature</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!akkQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!akkQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!akkQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!akkQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!akkQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!akkQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Dremio's architecture integrating semantic layer, query engine, and open catalog in a single platform&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Dremio's architecture integrating semantic layer, query engine, and open catalog in a single platform" title="Dremio's architecture integrating semantic layer, query engine, and open catalog in a single platform" srcset="https://substackcdn.com/image/fetch/$s_!akkQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!akkQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!akkQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!akkQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cf92d92-91e6-4dc4-a431-57fd97352862_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The alternative is to build the semantic layer into the same platform that manages your data catalog, governs access, and executes queries. This is the approach <a href="https://www.dremio.com/blog/the-ai-foundation-of-the-agentic-lakehouse/">Dremio</a> takes.</p><p>In Dremio, the semantic layer is not a separate product you bolt on. It is a native set of capabilities (views, wikis, labels, lineage, knowledge graph) that are integrated with the <a href="https://www.dremio.com/platform/enterprise-data-catalog/">Open Catalog</a> (built on Apache Polaris), the MPP query engine (built on Apache Arrow), and the governance system (Fine-Grained Access Control, row-level security, column masking).</p><p>This matters because the three activities that define a semantic layer, defining metrics, governing access, and executing queries, all happen in the same system. There is no handoff, no sync, no governance gap.</p><h2><strong>How Dremio&#8217;s Semantic Layer Works</strong></h2><p>Dremio&#8217;s <a href="https://www.dremio.com/platform/unified-analytics/ai-semantic-layer/">AI Semantic Layer</a> is built from five components that work together: views, wikis, labels, lineage, and the knowledge graph.</p><h3><strong>Views (Virtual Datasets)</strong></h3><p>Views are the foundation. A view is a SQL-defined virtual dataset that encapsulates business logic: joins, filters, calculations, and transformations. You write the SQL once, and every consumer (BI tool, AI agent, Python notebook) queries the view instead of the raw tables.</p><p>Dremio recommends a three-layer architecture for views:</p><ul><li><p><strong>Preparation Layer.</strong> One view per source table. Handles type casting, column renaming, null handling. A direct 1:1 mapping of raw data into clean, standardized form.</p></li><li><p><strong>Business Layer.</strong> Shared business logic. This is where you define &#8220;active customer&#8221; (customers with at least one order in the last 90 days, excluding test accounts), &#8220;revenue&#8221; (order_amount minus refunds, in USD), and every other metric that needs a single definition.</p></li><li><p><strong>Application Layer.</strong> Tailored datasets for specific consumers. A marketing dashboard view joins customer demographics with campaign performance. An AI agent view exposes the most commonly asked metrics with rich column-level documentation.</p></li></ul><p>Because views are virtual, they do not copy or move data. They execute against the underlying data at query time, using Dremio&#8217;s <a href="https://www.dremio.com/blog/why-agentic-analytics-requires-federation-virtualization-and-the-lakehouse-how-dremio-delivers/">federated query engine</a> to pull from S3, PostgreSQL, Snowflake, MongoDB, or any connected source. Change the underlying data, and the view reflects it immediately.</p><h3><strong>Wikis</strong></h3><p>Wikis are Markdown-formatted documentation attached directly to spaces, sources, folders, tables, views, and columns. They serve two audiences: human analysts browsing the catalog, and AI agents generating SQL.</p><p>A wiki for a view called <code>analytics.customer_health</code> might contain:</p><pre><code><code>## Customer Health Score

Composite metric combining purchase frequency, support ticket volume,
and NPS survey responses over the trailing 90 days.

**Owner:** Customer Success team
**Refresh:** Updated daily by the ETL pipeline
**Filters:** Excludes test accounts (account_type != 'test')
**Churn definition:** Score below 30 for two consecutive months
</code></code></pre><p>Dremio can also auto-generate wiki content. The platform samples table data, analyzes column distributions, and produces context-rich descriptions using generative AI. This is particularly valuable for large data estates where manually documenting hundreds of tables is impractical.</p><h3><strong>Labels</strong></h3><p>Labels classify and organize data assets. You tag a table as <code>PII</code>, <code>Finance</code>, <code>Marketing</code>, <code>Approved</code>, or <code>Draft</code>. Labels serve two purposes: they improve discoverability (semantic search returns results filtered by label), and they integrate with governance rules (all <code>PII</code>-labeled columns automatically apply masking policies).</p><p>Like wikis, labels can be AI-suggested. Dremio analyzes column names, data patterns, and content to recommend labels like <code>contains-email</code> or <code>likely-PII</code>.</p><h3><strong>Lineage</strong></h3><p>Dremio automatically tracks the flow of data from source to view to consumer. You can see which raw tables feed into which business views, and which dashboards or AI queries consume those views.</p><p>Lineage is critical for impact analysis. Before changing the schema of a source table, you can trace all downstream dependencies and understand exactly what will break. Without automated lineage, this analysis requires manually reading SQL definitions and hoping you did not miss one.</p><h3><strong>Knowledge Graph</strong></h3><p>The knowledge graph is the newest addition to Dremio&#8217;s semantic layer. It operates at a higher level than individual wikis and labels, building a connected graph of entity relationships, metric definitions, and usage patterns.</p><p>The knowledge graph works in three ways:</p><ol><li><p><strong>Pattern detection.</strong> It analyzes query patterns across your organization to detect implicit definitions. If 80% of queries that reference &#8220;active customers&#8221; use the same WHERE clause (<code>last_order_date &gt; CURRENT_DATE - INTERVAL '90' DAY AND account_type != 'test'</code>), the knowledge graph surfaces that pattern as a candidate definition.</p></li><li><p><strong>User-defined context.</strong> You can provide business context as structured markdown files. These files define entities, relationships, and business rules that the knowledge graph ingests and makes available to AI agents.</p></li><li><p><strong>Relationship mapping.</strong> The knowledge graph connects related entities (customers are related to orders, orders contain products, products belong to categories) and exposes those relationships to AI agents, enabling more accurate multi-table SQL generation.</p></li></ol><h3><strong>Semantic Search</strong></h3><p>Semantic search lets users and AI agents discover data assets using natural language. Instead of browsing a schema tree looking for a table called <code>dwh_fact_cust_ord_line_item</code>, you search for &#8220;customer orders by product category&#8221; and find the relevant view, complete with its wiki documentation and labels.</p><p>Semantic search indexes wikis, labels, column names, table descriptions, and view definitions. It is the entry point for both human exploration and AI agent data discovery.</p><h2><strong>Why the Integrated Approach Changes Everything for AI</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZJAE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZJAE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!ZJAE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!ZJAE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZJAE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZJAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;How an AI agent uses the semantic layer to generate accurate SQL from a natural language question&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How an AI agent uses the semantic layer to generate accurate SQL from a natural language question" title="How an AI agent uses the semantic layer to generate accurate SQL from a natural language question" srcset="https://substackcdn.com/image/fetch/$s_!ZJAE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!ZJAE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!ZJAE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZJAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52320f69-9aa6-440d-a311-94b14474d5a4_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The reason the platform-versus-product distinction matters more now than it did five years ago is AI. Specifically, AI agents that generate SQL from natural language questions.</p><p>An AI agent that receives the question &#8220;What was our customer churn rate by region last quarter?&#8221; needs three things to produce an accurate answer:</p><ol><li><p><strong>Context.</strong> What does &#8220;churn rate&#8221; mean in this organization? What table contains the data? Which columns are relevant? What filters should be applied? The semantic layer&#8217;s wikis, labels, views, and knowledge graph provide this context.</p></li><li><p><strong>Access.</strong> Can this user see the churn data? Are there row-level filters based on their role? Are any columns masked? The governance system enforces these rules.</p></li><li><p><strong>Execution speed.</strong> The user expects an answer in seconds, not minutes. The query engine needs to be fast enough for interactive use.</p></li></ol><p>In a standalone semantic layer architecture, these three capabilities live in three different systems: the semantic layer product provides context, the data catalog or warehouse provides governance, and a separate query engine provides execution. The AI agent must coordinate across all three, and any mismatch between them produces wrong answers, security violations, or slow responses.</p><p>In Dremio&#8217;s architecture, all three are co-located. The <a href="https://www.dremio.com/ai-agent/">AI Agent</a> reads the wikis, labels, and view definitions from the semantic layer, generates SQL that respects governance rules, and executes the query on the built-in MPP engine. The entire loop happens within a single governed platform.</p><p>Dremio&#8217;s <a href="https://docs.dremio.com/current/developer/mcp-server/">MCP Server</a> extends this to external AI tools. ChatGPT, Claude Desktop, or any custom agent that supports the Model Context Protocol can connect to Dremio and query through the same governed semantic layer. The external AI agent receives the same business context, respects the same governance rules, and gets the same fast query execution as the built-in AI Agent.</p><p>The semantic layer teaches the AI your business language so it generates the right SQL, not generic SQL.</p><h2><strong>Platform vs Product: A Side-by-Side Comparison</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QNMb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QNMb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png 424w, https://substackcdn.com/image/fetch/$s_!QNMb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png 848w, https://substackcdn.com/image/fetch/$s_!QNMb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png 1272w, https://substackcdn.com/image/fetch/$s_!QNMb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QNMb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png" width="1364" height="1628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1628,&quot;width&quot;:1364,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:440072,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/196111291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QNMb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png 424w, https://substackcdn.com/image/fetch/$s_!QNMb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png 848w, https://substackcdn.com/image/fetch/$s_!QNMb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png 1272w, https://substackcdn.com/image/fetch/$s_!QNMb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7be4b3db-c12f-49f1-96d5-7b14142a59b3_1364x1628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3><strong>When a Standalone Product Fits</strong></h3><ul><li><p>You use a single data warehouse (Snowflake, BigQuery) and your semantic layer needs are limited to consistent BI metrics</p></li><li><p>Your team is already deeply invested in dbt and wants metrics alongside transformations</p></li><li><p>You are building customer-facing embedded analytics and need Cube&#8217;s pre-aggregation performance</p></li><li><p>You have heavy Excel/MDX requirements that only AtScale supports</p></li></ul><h3><strong>When the Platform Approach Fits</strong></h3><ul><li><p>Your data lives across multiple sources (S3, PostgreSQL, Snowflake, MongoDB) and you need federated access</p></li><li><p>You want governance rules defined once and enforced everywhere, including for AI agents</p></li><li><p>You are building or planning AI-driven analytics (AI Agent, MCP, natural language querying)</p></li><li><p>You want to eliminate the operational overhead of managing a separate semantic layer product</p></li><li><p>You need the semantic layer, the catalog, and the query engine to operate as a single governed system</p></li></ul><h2><strong>Building Your Semantic Layer: A Practical Framework</strong></h2><p>If you are starting from scratch or migrating from an ad-hoc metric landscape, here is a practical sequence:</p><p><strong>Step 1: Identify your top 10 metrics.</strong> Not all metrics need to be in the semantic layer on day one. Start with the metrics that cause the most confusion: revenue, churn, active users, cost per acquisition, NPS. These are the metrics where two teams have two different SQL queries and two different numbers.</p><p><strong>Step 2: Build the layered view architecture.</strong> For each metric, create the three-layer view stack in Dremio. Preparation views clean the source data. Business views encode the agreed-upon logic. Application views tailor the output for specific consumers.</p><p><strong>Step 3: Add wikis and labels.</strong> Document each view and its columns. Define what the metric means, who owns it, how it is calculated, and what filters are applied. Tag columns with labels like <code>PII</code>, <code>Finance</code>, or <code>Approved</code>.</p><p><strong>Step 4: Configure governance.</strong> Apply Fine-Grained Access Control: row-level security for multi-tenant data, column masking for sensitive fields, role-based access for views. These rules are enforced at query time for every consumer, including AI agents.</p><p><strong>Step 5: Connect AI interfaces.</strong> Enable the <a href="https://www.dremio.com/blog/5-steps-to-supercharge-your-analytics-with-dremios-ai-agent-and-apache-iceberg/">Dremio AI Agent</a> for your team. Set up the MCP Server for external AI tools. The wikis and labels you added in Step 3 become the context that makes AI-generated SQL accurate.</p><p><strong>Step 6: Expand.</strong> Add the next 10 metrics. Build knowledge graph definitions for complex entity relationships. Let autonomous Reflections learn from query patterns and accelerate the most common queries automatically.</p><p>The semantic layer is not a one-time project. It is a living system that grows with your organization&#8217;s data needs. Start small, prove value on the metrics that matter most, and expand from there.</p><p><a href="https://www.dremio.com/get-started">Try Dremio Cloud free for 30 days</a> to build your semantic layer on top of your existing data sources with zero data movement and native AI agent support.</p><h3><strong>Free Resources to Go Deeper</strong></h3><ul><li><p><a href="https://hello.dremio.com/wp-resources-agentic-ai-for-dummies-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Agentic AI for Dummies</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-analytics-guide-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Leverage Federation, The Semantic Layer and the Lakehouse for Agentic AI</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Apache Data Lakehouse Weekly: April 23–29, 2026]]></title><description><![CDATA[Three weeks past the Iceberg Summit, the lakehouse projects shifted from in-person alignment back into shipping mode.]]></description><link>https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april-29b</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april-29b</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Thu, 30 Apr 2026 13:02:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9SRl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9SRl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9SRl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!9SRl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!9SRl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!9SRl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9SRl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1710779,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/195868776?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9SRl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!9SRl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!9SRl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!9SRl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a1c374-3159-487d-9c9d-19358d4bac1e_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Three weeks past the Iceberg Summit, the lakehouse projects shifted from in-person alignment back into shipping mode. Polaris cut its 1.4.0 release and immediately followed up with a Python CLI 1.4.0, Arrow shipped its 24.0.0 major release and kicked off an arrow-rs 58.2.0 vote, and Parquet&#8217;s design lists stayed dense with proposals on footers, page encoding, and a new java release discussion. Iceberg&#8217;s dev list was quieter this week as contributors digested summit follow-ups and continued narrowing on V4 design questions in the background.</p><h2><strong>Apache Iceberg</strong></h2><p>The post-summit wave of formal proposals continued translating into design work this week. The V4 metadata.json optionality direction that has anchored multiple syncs &#8212; treating catalog-managed metadata as a first-class supported mode while keeping static-table portability through explicit opt-in semantics &#8212; is still the defining V4 design conversation, with Anton Okolnychyi, Yufei Gu, Shawn Chang, and Steven Wu continuing to push edge cases on portability and Spark driver behavior. The single-file commits proposal that Russell Spitzer and Amogh Jahagirdar have been advancing remains on track for a formal write-up, with the latency and metadata footprint reductions driving urgency.</p><p>P&#233;ter V&#225;ry&#8217;s <a href="https://www.mail-archive.com/dev@iceberg.apache.org/msg12972.html">efficient column updates proposal</a> for wide tables continued attracting collaboration. The design &#8212; write only the columns that change on each commit, then stitch the result at read time &#8212; is squarely aimed at petabyte-scale feature stores with thousands of embedding and model-score columns, and the I/O savings make it one of the more practically grounded V4 proposals on the list. Anurag Mantripragada and G&#225;bor Kaszab are working alongside P&#233;ter on POC benchmarks to support the formal proposal that should land on the dev list in the coming weeks.</p><p>On the Rust side, the <a href="https://www.mail-archive.com/dev@iceberg.apache.org/msg12986.html">Iceberg Rust 0.9.0 release</a> shipped earlier this development cycle and continues to anchor downstream adoption discussions, with its DataFusion integration making it a serious option for teams that want Iceberg without a JVM dependency. Iceberg Summit 2026 session recordings are also rolling out on the project&#8217;s YouTube channel this week, giving the global community access to the V4 design talks, the vendor panel, and the production case studies from Apple, Bloomberg, Pinterest, and others. The AI contribution policy that Holden Karau, Kevin Liu, Steve Loughran, and Sung Yun pushed through March is still expected to land as published guidance covering disclosure requirements and code provenance standards.</p><h2><strong>Apache Polaris</strong></h2><p>Polaris had its biggest release week of the year. Adnan Hemani <a href="https://mail-archive.com/dev@polaris.apache.org/msg04499.html">announced Apache Polaris 1.4.0</a> on April 23, the project&#8217;s first major release as a graduated top-level project. Dmitri Bourlatchkov, Yufei Gu, Xi Wen, and Alexandre Dutra all weighed in with congratulations and follow-up notes on packaging and distribution. Right behind it, Adnan kicked off and shepherded the <a href="https://mail-archive.com/dev@polaris.apache.org/msg04509.html">Apache Polaris Python CLI 1.4.0 RC2 vote</a>, which collected binding +1s from Yufei Gu, Honah J., and Jean-Baptiste Onofr&#233;, with Yong Zheng adding non-binding support. The <a href="https://mail-archive.com/dev@polaris.apache.org/msg04551.html">Python CLI 1.4.0 release</a> shipped on April 28, completing the back-to-back release pair. Jean-Baptiste also confirmed in a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04476.html">HEADS UP note</a> that the project is now back on a monthly release cadence after the graduation transition.</p><p>The release had its share of post-launch fires. Alexandre Dutra opened threads on <a href="https://mail-archive.com/dev@polaris.apache.org/msg04512.html">Helm chart repo inconsistency after the 1.4.0 release</a>, <a href="https://mail-archive.com/dev@polaris.apache.org/msg04513.html">a release workflow failure in step 4</a>, and an <a href="https://mail-archive.com/dev@polaris.apache.org/msg04514.html">Artifact Hub request for official status</a>. A <a href="https://mail-archive.com/dev@polaris.apache.org/msg04544.html">GitHub thread on KMS-related errors after bumping to 1.4.0</a> surfaced a real upgrade bug that drew immediate attention. Yufei Gu took the lead on triaging most of these, and the discussions are doing exactly what a healthy post-release cycle should &#8212; surfacing rough edges before they reach more users.</p><p>Design discussions stayed active alongside the release work. EJ Wang&#8217;s <a href="https://mail-archive.com/dev@polaris.apache.org/msg04485.html">DISCUSS thread on AGENTS.md for Polaris</a> opened a conversation about adding agent-readable repository metadata, picking up engagement from Yufei Gu. Yufei separately started <a href="https://mail-archive.com/dev@polaris.apache.org/msg04486.html">a discussion on narrowing the scope of SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION</a>, which Dmitri Bourlatchkov and Dennis Huo dug into. ITing Lee&#8217;s <a href="https://mail-archive.com/dev@polaris.apache.org/msg04430.html">proposal to add OpenLineage to Polaris</a> continued attracting feedback from Adnan Hemani, Jean-Baptiste Onofr&#233;, Yufei Gu, and Michael Collado. Alexandre Dutra&#8217;s URL path decoding thread and his <a href="https://mail-archive.com/dev@polaris.apache.org/msg04429.html">PolarisPrivilege fields and grant validation</a> discussion both kept multiple contributors engaged through the week, and Selvamohan Neethiraj raised a <a href="https://mail-archive.com/dev@polaris.apache.org/msg04496.html">PolarisPrincipal user attributes server-side bug</a> that Alexandre and Yufei traced through.</p><h2><strong>Apache Arrow</strong></h2><p>Arrow had its own back-to-back release week. Ra&#250;l Cumplido <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34611.html">announced Apache Arrow 24.0.0</a> on April 22, closing out the <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34606.html">24.0.0 RC0 vote</a> that spanned mid-April. Matt Topol followed with the <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34613.html">Apache Arrow Go 18.6.0 RC0 vote</a> on April 22 and announced the <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34629.html">release result</a> on April 28, with Pedro Matias, Ian Cook, David Li, and Bryce Mecum carrying the verification work. Andrew Lamb then opened the <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34631.html">arrow-rs 58.2.0 RC1 vote</a> on April 28, with Bryce Mecum, Ed Seidl, Jeffrey Vo, and Ra&#250;l Cumplido moving quickly through verification &#8212; finishing what last week&#8217;s newsletter flagged as the next ship to watch.</p><p>Beyond releases, the design conversations stayed lively. Emil Sadek opened a <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34619.html">DISCUSS thread on an ADBC Logo Proposal</a> with Nic Crane, Julian Hyde, and Rusty Conover weighing in on visual identity for the database connectivity standard. Benjamin Philip kicked off a new <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34628.html">DISCUSS thread on Arrow Erlang&#8217;s grant documents</a>, continuing the project&#8217;s expansion into more language ecosystems. The <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34576.html">pyarrow-stubs donation vote</a> that Rok Mihevc opened on April 14 stayed active, drawing additional support this week with Rok pushing for a final tally. Mandukhai Alimaa&#8217;s earlier <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34604.html">proposal for a canonical BigDecimal extension type</a> and Andrew Lamb&#8217;s <a href="https://www.mail-archive.com/dev@arrow.apache.org/msg34610.html">arrow-rs security policy discussion</a> both continued generating engagement as the project tightens its production posture.</p><h2><strong>Apache Parquet</strong></h2><p>Parquet&#8217;s lists were as dense as any project&#8217;s this week. Isma&#235;l Mej&#237;a opened a <a href="https://mail-archive.com/dev@parquet.apache.org/msg27247.html">thread soliciting code reviews for Java performance optimization work</a>, with Steve Loughran picking it up immediately. Manu Zhang&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27212.html">DISCUSS thread on a new parquet-java release</a> drew sustained engagement from Steve Loughran, Aaron Niskode-Dossett, Fokko Driesprong, Julien Le Dem, Gang Wu, and Rahil C &#8212; covering both the timing question and what should ship in the next release. Julien Le Dem&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27227.html">Parquet sync on April 22</a> drew Manu Zhang and Micah Kornfield into the agenda discussion.</p><p>The format-level proposals continued to evolve. Will Edwards&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27142.html">DISCUSS thread on an alternative to the FlatBuffer footer with a lightweight byte-offset index</a> kept pulling in design feedback from Andrew Lamb, Ed Seidl, Jan Finis, Alkis Evlogimenos, Raphael Taylor-Davies, Andrew Bell, and others. Ed Seidl&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27197.html">proposal to make path_in_schema optional</a> attracted commentary from Gang Wu, Steve Loughran, and Micah Kornfield. Andrew Lamb&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27192.html">thread on where VariantJsonParser should live</a> &#8212; touching the boundary between Parquet and Iceberg&#8217;s variant tooling &#8212; continued with Steve Loughran and Gang Wu. Jan Finis&#8217;s question on <a href="https://mail-archive.com/dev@parquet.apache.org/msg27214.html">whether a too-long RLE bitpack at the end of a page is valid</a> drew careful answers from Raphael Taylor-Davies and Micah Kornfield, the kind of spec-edge clarification that matters for cross-implementation interop. Milan Stefanovic&#8217;s <a href="https://mail-archive.com/dev@parquet.apache.org/msg27136.html">Geospatial CRS string format clarification</a> continued threading toward closure with Dewey Dunnington and Micah Kornfield.</p><h2><strong>Cross-Project Themes</strong></h2><p>This week&#8217;s clearest pattern is post-graduation Polaris finding its operational rhythm. The 1.4.0 release plus the Python CLI 1.4.0, the return to a monthly cadence, and the visible upgrade-path bugs and Helm packaging issues are all the work of a project growing into its TLP independence. The fact that contributors are surfacing problems publicly and triaging them on the dev list &#8212; rather than routing through a parent project &#8212; is itself the marker of a healthy graduation.</p><p>The release wave across projects also reflects how synchronized the lakehouse stack has become. Arrow 24.0.0 plus arrow-rs 58.2.0 plus arrow-go 18.6.0 plus Polaris 1.4.0 plus Polaris Python CLI 1.4.0 all landing within a single week is a coordination story. Engines and tools downstream of these libraries &#8212; Spark, Trino, Dremio, DataFusion, DuckDB, Snowflake &#8212; can pick up the new versions in a coherent batch rather than chasing staggered upgrades across half a dozen vendors. The format-level design work in Parquet (footers, optional path_in_schema, variant tooling location) and the V4 design work in Iceberg (metadata.json optionality, single-file commits, efficient column updates) are also starting to rhyme: both communities are picking apart assumptions baked into v1 and v2 spec design and asking what a leaner, AI-workload-aware format looks like.</p><h2><strong>Looking Ahead</strong></h2><p>Watch the arrow-rs 58.2.0 RC vote close out in the coming days. Polaris should publish 1.4.1 or move toward 1.5.0 planning given the monthly cadence commitment, and the AGENTS.md discussion is likely to firm into a concrete proposal. The Polaris OpenLineage RFC has the volume of feedback it needs to move toward implementation. On the Iceberg side, the formal V4 single-file commits write-up and the published AI contribution policy remain the next concrete deliverables to track. Iceberg Summit 2026 talk recordings will continue rolling out on YouTube, and the parquet-java release discussion should converge on a target version.</p><div><hr></div><h2><strong>Resources &amp; Further Learning</strong></h2><p><strong>Get Started with Dremio</strong></p><ul><li><p><a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-04-29&amp;utm_content=alexmerced">Try Dremio Free</a> &#8212; Build your lakehouse on Iceberg with a free trial</p></li><li><p><a href="https://www.dremio.com/use-cases/lake-to-iceberg-lakehouse/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-04-29&amp;utm_content=alexmerced">Build a Lakehouse with Iceberg, Parquet, Polaris &amp; Arrow</a> &#8212; Learn how Dremio brings the open lakehouse stack together</p></li></ul><p><strong>Free Downloads</strong></p><ul><li><p><a href="https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html">Apache Iceberg: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li><li><p><a href="https://hello.dremio.com/wp-apache-polaris-guide-reg.html">Apache Polaris: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li></ul><p><strong>Books by Alex Merced</strong></p><ul><li><p><a href="https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/ref=sr_1_5?crid=1304S78BQAP6U&amp;dib=eyJ2IjoiMSJ9.7Z17wXFJVWtv1gDIVF5-z5NwgT7B-vj9kEQuLkAKtLh00KncwXYc4bQ6hyydwcMHXbJOlFCSO7-2JmKTC5KCV-q2XEdeq7kBBmicVzI6tlDtqPqAgE6RHJE_XZ_n-zxxAjRHE2THP0J4DEgzDmiXrF9bdkEFyaruSUW28Ryx0zYyI_NuD5vZ4HYqQv3u5hzBVjjOlxyRYSTIsRSeVIoJC2XvjrXdNFvQ9jm4Kr1xFOw.yog4MgCdYecbJT0bAcGXNJJvZbvD4F_TP0lDbPA1xGI&amp;dib_tag=se&amp;keywords=alex+merced&amp;qid=1773236747&amp;sprefix=alex+mer%2Caps%2C570&amp;sr=8-5">Architecting an Apache Iceberg Lakehouse</a></p></li><li><p><a href="https://www.amazon.com/Enabling-Agentic-Analytics-Apache-Iceberg-ebook/dp/B0GQXT6W3N/">Enabling Agentic Analytics with Apache Iceberg and Dremio</a></p></li><li><p><a href="https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands/dp/B0GQNY21TD/">The 2026 Guide to Lakehouses, Apache Iceberg and Agentic AI</a></p></li><li><p><a href="https://www.amazon.com/Book-Using-Apache-Iceberg-Python/dp/B0GNZ454FF/">The Book on Using Apache Iceberg with Python</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[AI Weekly: Google's TPU Split, Cursor's $60B, and MCP at Scale]]></title><description><![CDATA[Week of April 23&#8211;29, 2026]]></description><link>https://amdatalakehouse.substack.com/p/ai-weekly-googles-tpu-split-cursors</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/ai-weekly-googles-tpu-split-cursors</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Wed, 29 Apr 2026 12:57:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!R3_2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R3_2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R3_2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!R3_2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!R3_2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!R3_2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R3_2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1980913,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/195867411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R3_2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!R3_2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!R3_2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!R3_2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f33fae4-5f72-4888-b760-602d0cfc32a5_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Week of April 23&#8211;29, 2026</strong></p><p>This week, Google split its eighth-generation TPU into two specialized chips. SpaceX disclosed rights to acquire Cursor for $60 billion. Google Cloud Next 2026 framed enterprise software around autonomous agents, and the Model Context Protocol moved deeper into production-grade territory.</p><h2><strong>AI Coding Tools: SpaceX Eyes Cursor at $60B and Google Pushes Agent Platforms</strong></h2><p>SpaceX announced on April 22 that it has rights to buy AI coding tool Cursor for $60 billion later this year, with an alternative $10 billion partnership option. The move positions Elon Musk&#8217;s space and AI properties to compete with Anthropic and OpenAI ahead of a planned Wall Street debut. Cursor, made by San Francisco startup Anysphere, has wide distribution to expert software engineers, which is part of what makes it attractive to Musk&#8217;s company. <a href="https://www.usnews.com/news/best-states/california/articles/2026-04-22/spacex-says-it-can-buy-ai-coding-tool-cursor-for-60b-later-this-year">Read the AP report</a>.</p><p>Google Cloud Next 2026 ran April 22&#8211;24 in Las Vegas and made coding agents the centerpiece. Google rebranded its AI platform as the Gemini Enterprise Agent Platform, billed as a one-stop shop for autonomous agents with 200+ foundation models and enterprise governance. The platform supports a new Agents CLI that takes agents from creation to production through a single command-line tool. <a href="https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/next-2026/">See the announcements</a>.</p><p>Cursor 2.0 also gained attention this month for supporting up to eight parallel AI agents working on different sections of a codebase at the same time. Claude Code, meanwhile, now powers GitHub Copilot&#8217;s enterprise tier with multi-agent coordination that splits large tasks into parallel subtasks. The category leaders are converging on the same pattern: agents that read codebases, plan changes across multiple files, write the code, and run the tests.</p><h2><strong>AI Processing: Google Splits Its TPU Into Training and Inference Chips</strong></h2><p>Google Cloud announced on April 22 that its eighth-generation TPU is splitting into two specialized chips. The TPU 8t targets model training and the TPU 8i targets inference. Google reports up to 3x faster AI model training and 80% better performance per dollar over the previous generation, with the ability to link more than 1 million TPUs in a single cluster. <a href="https://techcrunch.com/2026/04/22/google-cloud-next-new-tpu-ai-chips-compete-with-nvidia/">Read the TechCrunch coverage</a>.</p><p>Google also confirmed a partnership with Nvidia to extend Falcon, the software-based networking technology Google created and open-sourced in 2023 under the Open Compute Project. The work aims to make Nvidia-based systems perform better inside Google Cloud, a notable detente given Google&#8217;s TPU sales push.</p><p>The Nvidia chip rival market is also booming. AI chip startups raised $8.3 billion globally in 2026, according to Dealroom, with Cerebras Systems pulling in $1 billion in February and $500 million rounds going to MatX, Ayar Labs, and Etched. European companies like Axelera and Olix raised rounds north of $200 million. The argument: GPUs were not purpose-designed for AI inference, and novel system architectures bring big savings in energy and cost. <a href="https://www.cnbc.com/2026/04/17/nvidia-ai-chip-rivals-funding-euclyd-fractile.html">See the CNBC report</a>.</p><h2><strong>Standards &amp; Protocols: MCP Hits Production Scale and Agentic Foundations Mature</strong></h2><p>The Model Context Protocol crossed a clear adoption threshold this month. MCP downloads now run at roughly 110 million per month across OpenAI, Google, LangChain, and other frameworks, according to a recent Anthropic keynote on the protocol&#8217;s direction. By Q2 2026, community-built MCP servers exist for GitHub, Slack, PostgreSQL, Stripe, Figma, Docker, Kubernetes, and over 200 other tools. <a href="https://en.wikipedia.org/wiki/Model_Context_Protocol">See the Wikipedia summary</a>.</p><p>The 2026 MCP roadmap published in March identified four priorities. First is streamable HTTP transport scalability. Second is the Tasks primitive lifecycle, including retry semantics and expiry policies. Third is governance maturation. Fourth is enterprise readiness covering audit trails, SSO-integrated auth, gateway behavior, and configuration portability. Stateful sessions fight with load balancers and horizontal scaling needs better support, so the working groups are evolving the existing transport rather than adding new ones. <a href="https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/">Read the roadmap</a>.</p><p>Google Cloud Next 2026 also gave standards work a public showcase. A breakout session covered &#8220;Generative UI for any agent, anywhere: A2UI, AG-UI, MCP Apps, and more.&#8221; Interoperability between agent UI standards is now part of mainstream cloud roadmaps.</p><p>The Agentic AI Foundation launched in December 2025 under the Linux Foundation. Founding contributions came from Anthropic&#8217;s MCP, OpenAI&#8217;s AGENTS.md, and Block&#8217;s Goose. AAIF held its first MCP Dev Summit North America in New York earlier this month, drawing about 1,200 attendees, double the prior event. The next AAIF events are AGNTCon + MCPCon Europe on September 17&#8211;18 in Amsterdam and AGNTCon + MCPCon North America on October 22&#8211;23 in San Jose.</p><h2><strong>Resources to Go Further</strong></h2><p>The AI landscape changes fast. Here are tools and resources to help you keep pace.</p><p><strong>Try Dremio Free</strong> &#8212; Experience agentic analytics and an Apache Iceberg-powered lakehouse. <a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=04-29-2026&amp;utm_content=alexmerced">Start your free trial</a></p><p><strong>Learn Agentic AI with Data</strong> &#8212; Dremio&#8217;s agentic analytics features let your AI agents query and act on live data. <a href="https://www.dremio.com/use-cases/agentic-ai/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=04-29-2026&amp;utm_content=alexmerced">Explore Dremio Agentic AI</a></p><p><strong>Join the Community</strong> &#8212; Connect with data engineers and AI practitioners building on open standards. <a href="https://developer.dremio.com/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=04-29-2026&amp;utm_content=alexmerced">Join the Dremio Developer Community</a></p><p><strong>Book: The 2026 Guide to AI-Assisted Development</strong> &#8212; Covers prompt engineering, agent workflows, MCP, evaluation, security, and career paths. <a href="https://www.amazon.com/2026-Guide-AI-Assisted-Development-Engineering-ebook/dp/B0GQW7CTML/">Get it on Amazon</a></p><p><strong>Book: Using AI Agents for Data Engineering and Data Analysis</strong> &#8212; A practical guide to Claude Code, Google Antigravity, OpenAI Codex, and more. <a href="https://www.amazon.com/Using-Agents-Data-Engineering-Analysis-ebook/dp/B0GR6PYJT9/">Get it on Amazon</a></p>]]></content:encoded></item><item><title><![CDATA[Data Modeling Best Practices: 7 Mistakes to Avoid]]></title><description><![CDATA[A bad data model doesn&#8217;t announce itself.]]></description><link>https://amdatalakehouse.substack.com/p/data-modeling-best-practices-7-mistakes</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/data-modeling-best-practices-7-mistakes</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Mon, 27 Apr 2026 13:01:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ljZA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ljZA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ljZA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!ljZA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!ljZA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!ljZA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ljZA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1032957,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/189063164?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ljZA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!ljZA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!ljZA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!ljZA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab97e09-c87c-4111-a333-35ba8580dbf4_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l6GL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l6GL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!l6GL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!l6GL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!l6GL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l6GL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Checklist of data modeling quality markers with warning symbols on common mistakes&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Checklist of data modeling quality markers with warning symbols on common mistakes" title="Checklist of data modeling quality markers with warning symbols on common mistakes" srcset="https://substackcdn.com/image/fetch/$s_!l6GL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!l6GL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!l6GL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!l6GL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e2b42af-68f8-453b-a38d-f6a19d55cc72_640x640.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A bad data model doesn&#8217;t announce itself. It hides behind slow dashboards, conflicting numbers, confused analysts, and AI agents that generate wrong SQL. By the time someone identifies the model as the root cause, the team has already built dozens of reports on top of it.</p><p>Here are seven modeling mistakes that create downstream pain &#8212; and how to avoid each one.</p><h2><strong>Mistake 1: No Defined Grain</strong></h2><p>The grain declares what one row in a fact table represents. &#8220;One row per order line item.&#8221; &#8220;One row per daily user session.&#8221; &#8220;One row per monthly account balance.&#8221;</p><p>Without a declared grain, aggregation produces wrong numbers. If some rows represent individual transactions and others represent daily summaries, a SUM query double-counts or under-counts depending on the mix.</p><p><strong>Fix:</strong> Before designing any fact table, write down the grain in one sentence. Share it with your team. If you can&#8217;t state the grain clearly, the table isn&#8217;t ready for production.</p><h2><strong>Mistake 2: Cryptic Naming</strong></h2><p>Columns named <code>c1</code>, <code>dt</code>, <code>amt</code>, <code>flg</code>, and <code>cat_cd</code> save keystrokes during development but cost hours during analysis. Every analyst who encounters these names must either read the ETL code, ask the engineer, or guess.</p><p>AI agents have the same problem. An agent asked to calculate &#8220;total revenue&#8221; can&#8217;t identify the right column if it&#8217;s called <code>amt3</code> instead of <code>revenue_usd</code>.</p><p><strong>Fix:</strong> Use descriptive, business-friendly names. <code>customer_name</code>, <code>order_date</code>, <code>revenue_usd</code>, <code>is_active</code>, <code>product_category</code>. Include units where ambiguous (<code>weight_kg</code>, <code>duration_minutes</code>). Use <code>snake_case</code> consistently.</p><h2><strong>Mistake 3: Skipping the Conceptual Model</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XXq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XXq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!XXq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!XXq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!XXq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XXq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Conceptual model as the foundation layer that business and technical teams align on&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Conceptual model as the foundation layer that business and technical teams align on" title="Conceptual model as the foundation layer that business and technical teams align on" srcset="https://substackcdn.com/image/fetch/$s_!XXq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!XXq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!XXq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!XXq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f9a4387-b3e2-4f6c-96ed-be04d175602a_640x640.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Going straight from a stakeholder request to <code>CREATE TABLE</code> skips the alignment step. Engineers build what they understand from the request. Stakeholders assumed something different. The gap surfaces weeks or months later when reports don&#8217;t match expectations.</p><p><strong>Fix:</strong> For every new business domain, create a conceptual model first. List the entities, name the relationships, and get business stakeholder sign-off before writing any SQL.</p><h2><strong>Mistake 4: Over-Normalizing for Analytics</strong></h2><p>Third Normal Form (3NF) is correct for transactional systems where writes are frequent and consistency matters. Applied to an analytics workload, it creates queries with 10-15 joins that run slowly and break easily.</p><p><strong>Fix:</strong> Separate your transactional model from your analytical model. Keep the OLTP system in 3NF. Build a denormalized star schema (or a set of wide views) for analytics. Different workloads deserve different models.</p><h2><strong>Mistake 5: Under-Documenting</strong></h2><p>A data model without documentation is a puzzle that only its creator can solve. And even they forget the details after a few months.</p><p>Without documentation, every new team member reverse-engineers the model from scratch. Every AI agent generates SQL based on guesses. Every analyst interprets column meanings differently, leading to metric discrepancies that take weeks to reconcile.</p><p><strong>Fix:</strong> Document at three levels:</p><ul><li><p><strong>Column level:</strong> What does each column mean? Where does it come from?</p></li><li><p><strong>Table level:</strong> What grain does this table use? Who maintains it?</p></li><li><p><strong>Model level:</strong> How do tables connect? What business process does this model represent?</p></li></ul><p>Platforms like <a href="https://www.dremio.com/blog/agentic-analytics-semantic-layer/?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Dremio</a> make this practical with built-in Wikis for every dataset and Labels for classification (PII, Certified, Raw, Deprecated). The documentation lives next to the data, not in a separate spreadsheet that goes stale.</p><h2><strong>Mistake 6: One Model for Every Workload</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gVg5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gVg5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!gVg5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!gVg5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!gVg5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gVg5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Single model struggling to serve transactions, analytics, and AI simultaneously&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Single model struggling to serve transactions, analytics, and AI simultaneously" title="Single model struggling to serve transactions, analytics, and AI simultaneously" srcset="https://substackcdn.com/image/fetch/$s_!gVg5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!gVg5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!gVg5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!gVg5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53a6c572-f390-4e8f-9401-3461a7830334_640x640.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A model designed for a transactional application doesn&#8217;t serve analytics well. A model designed for analytics doesn&#8217;t serve a machine learning feature store well. Trying to make one model serve every use case leads to compromises that serve no use case well.</p><p><strong>Fix:</strong> Build purpose-specific models layered on top of shared source data. The Medallion Architecture does this naturally:</p><ul><li><p><strong>Bronze:</strong> Raw data from sources (shared foundation)</p></li><li><p><strong>Silver:</strong> Business logic layer (shared across analytics and ML)</p></li><li><p><strong>Gold:</strong> Purpose-built views (one for dashboards, one for ML features, one for AI agents)</p></li></ul><p>Each Gold view is tailored to its consumer without duplicating the transformation logic in Silver.</p><h2><strong>Mistake 7: Ignoring Governance</strong></h2><p>Data models don&#8217;t exist in a vacuum. They contain PII, financial data, health records, and other sensitive information. Ignoring governance creates compliance risk and erodes trust.</p><p>Common governance gaps:</p><ul><li><p>No access controls (everyone sees everything)</p></li><li><p>No classification (no one knows which columns contain PII)</p></li><li><p>No ownership (no one knows who to ask about table X)</p></li><li><p>No lineage (no one knows where the data came from)</p></li></ul><p><strong>Fix:</strong> Integrate governance from day one:</p><ul><li><p>Tag columns by sensitivity (PII, financial, public)</p></li><li><p>Assign ownership per table or domain</p></li><li><p>Apply row and column-level access policies</p></li><li><p>Document data lineage from source to consumption</p></li></ul><p>In Dremio, Fine-Grained Access Control enforces row and column-level policies, Labels classify datasets, and the Open Catalog tracks lineage. Governance is part of the platform, not an afterthought.</p><h2><strong>What to Do Next</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q7BF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q7BF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!q7BF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!q7BF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!q7BF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q7BF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Iterative data modeling cycle: design, document, measure, improve&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Iterative data modeling cycle: design, document, measure, improve" title="Iterative data modeling cycle: design, document, measure, improve" srcset="https://substackcdn.com/image/fetch/$s_!q7BF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!q7BF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!q7BF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!q7BF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a2255e-5623-415f-bf1c-00bc3d2ff43b_640x640.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Pick one of these seven mistakes. Check whether your current data model has it. Fix it. Then move to the next one. Data modeling is iterative &#8212; no team gets it perfect on the first pass. The goal is not perfection but continuous improvement: clearer names, better documentation, tighter governance, and models that match what your consumers actually need.</p><p><a href="https://www.dremio.com/get-started?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Try Dremio Cloud free for 30 days</a></p>]]></content:encoded></item><item><title><![CDATA[The Journey from Scattered Data to an Apache Iceberg Lakehouse with Governed Agentic Analytics]]></title><description><![CDATA[The conventional wisdom for data platform modernization goes like this: pick a target system, build ETL pipelines for every source, migrate everything, validate the data, retrain your users, and then start getting value.]]></description><link>https://amdatalakehouse.substack.com/p/the-journey-from-scattered-data-to</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/the-journey-from-scattered-data-to</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Mon, 27 Apr 2026 13:01:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WgvP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WgvP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WgvP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!WgvP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!WgvP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!WgvP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WgvP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1371456,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/195504652?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WgvP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!WgvP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!WgvP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!WgvP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3c7889d-1926-4df4-acb6-84739e3c5915_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6_c8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6_c8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!6_c8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!6_c8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!6_c8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6_c8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Journey from scattered data to governed agentic analytics through federation, semantic layer, and Iceberg lakehouse&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Journey from scattered data to governed agentic analytics through federation, semantic layer, and Iceberg lakehouse" title="Journey from scattered data to governed agentic analytics through federation, semantic layer, and Iceberg lakehouse" srcset="https://substackcdn.com/image/fetch/$s_!6_c8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!6_c8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!6_c8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!6_c8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e4bae-e3e1-4cfe-a8b8-8809b5f579c7_800x800.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The conventional wisdom for data platform modernization goes like this: pick a target system, build ETL pipelines for every source, migrate everything, validate the data, retrain your users, and then start getting value. That process takes six to eighteen months. During that time, analysts are waiting and leadership is asking why the investment has not produced results yet.</p><p>There is a better sequence. Instead of making everyone wait for a full migration, you start producing value on day one and migrate to <a href="https://iceberg.apache.org/">Apache Iceberg</a> at your own pace. The key is treating federation, the semantic layer, AI access, and Iceberg migration as four independent phases, each delivering value on its own, rather than a single all-or-nothing project.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7YJ4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7YJ4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!7YJ4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!7YJ4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!7YJ4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7YJ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Four-phase journey from connecting sources to Iceberg lakehouse showing value at every phase&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Four-phase journey from connecting sources to Iceberg lakehouse showing value at every phase" title="Four-phase journey from connecting sources to Iceberg lakehouse showing value at every phase" srcset="https://substackcdn.com/image/fetch/$s_!7YJ4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!7YJ4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!7YJ4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!7YJ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2108a92-dacf-4e06-ba4a-6756c516b3e5_800x800.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Phase 1: Connect Your Data Where It Lives</strong></h2><p>Sign up for <a href="https://www.dremio.com/get-started">Dremio Cloud</a> and you get a lakehouse project with a pre-configured Open Catalog right away. From there, start connecting your existing data sources through Dremio&#8217;s federated query engine: PostgreSQL, MySQL, MongoDB, S3, Snowflake, BigQuery, Redshift, AWS Glue, Unity Catalog, and more.</p><p>No data copying. No ETL pipelines. Dremio queries your data where it already lives, using predicate pushdowns to push filtering work down to each source system.</p><p>The result: by the end of day one, your team has unified SQL access across every connected source. An analyst can join a PostgreSQL customer table with an S3-based event stream in a single query, without waiting for a data engineer to build a pipeline first.</p><h2><strong>Phase 2: Build a Semantic Layer Over Everything</strong></h2><p>Raw source tables have cryptic column names, inconsistent types, and zero business context. Before anyone can get reliable answers, whether human or AI, you need a curated layer on top.</p><p>Dremio&#8217;s <a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vl7mjbliccc61w8okl7q.png">AI Semantic Layer</a> uses SQL views organized in three tiers:</p><ul><li><p><strong>Bronze/Raw views</strong> map to raw sources. They standardize column names, cast data types, and apply basic filters. One Bronze view per source table.</p></li><li><p><strong>Silver/Business views</strong> apply business logic. This is where you define what &#8220;active customer&#8221; means (purchased in the last 90 days, not on a trial), join data across sources, and compute metrics.</p></li><li><p><strong>Gold/Application views</strong> serve specific consumers: a dashboard, a report, or an AI agent. Each Gold view is optimized for its use case.</p></li></ul><p>Dremio&#8217;s AI Agent can help you come up with the SQL to generate these views efficiently.</p><h3><strong>Govern Access and Document Everything</strong></h3><p>Grant users access to specific views using Role-Based Access Control (RBAC) at the folder, dataset, and column level. For sensitive data, add Fine-Grained Access Control (FGAC) via UDFs for row-level security and column-level masking.</p><p>Then enrich every dataset with <strong>Wikis</strong> (human-readable documentation explaining what each column means) and <strong>Tags</strong> (categorical labels for discoverability). Dremio can auto-generate Wiki descriptions and suggest Tags by sampling your table data and schema. You review and refine the output instead of writing everything from scratch.</p><p>This metadata is not just for humans. It is what the AI Agent reads when generating SQL. Better documentation means more accurate answers.</p><h2><strong>Phase 3: Turn On Agentic Analytics</strong></h2><p>With a governed semantic layer in place, you are ready for AI. This is the important part: <strong>you do not need to complete the Iceberg migration first.</strong> Agentic analytics works on federated data from the moment the semantic layer exists.</p><p>Dremio&#8217;s built-in <a href="https://www.dremio.com/ai-agent/">AI Agent</a> lets users type plain-English questions in the console. The agent writes SQL, executes it against your governed views, returns results, generates charts, and suggests follow-up questions. It respects every RBAC and FGAC policy in your catalog. Users can only get answers about data they are authorized to see.</p><p>For teams that want to use external tools, Dremio&#8217;s MCP (Model Context Protocol) server lets ChatGPT, Claude Desktop, or custom agents connect directly to your Dremio environment. External tools get the same semantic context and security controls as the built-in agent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ntor!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ntor!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png 424w, https://substackcdn.com/image/fetch/$s_!Ntor!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png 848w, https://substackcdn.com/image/fetch/$s_!Ntor!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png 1272w, https://substackcdn.com/image/fetch/$s_!Ntor!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ntor!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png" width="1456" height="572" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:572,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147209,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/195504652?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ntor!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png 424w, https://substackcdn.com/image/fetch/$s_!Ntor!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png 848w, https://substackcdn.com/image/fetch/$s_!Ntor!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png 1272w, https://substackcdn.com/image/fetch/$s_!Ntor!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0094d6-9520-451e-9f96-fdb957616cbb_1522x598.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At this point your organization has unified data access, a governed semantic layer, and AI-powered analytics, and you have not migrated a single table to Iceberg yet.</p><h2><strong>Phase 4: Migrate to Iceberg, One Dataset at a Time</strong></h2><p>Federation gets you access, but a full <a href="https://www.dremio.com/platform/apache-iceberg/">Apache Iceberg</a> lakehouse gets you more: Autonomous Reflections that optimize query performance based on actual usage patterns, end-to-end caching, automated table maintenance (compaction, clustering, vacuuming), and interoperability with every Iceberg-compatible engine (Spark, Flink, Trino). Your data stays in your storage, in an open format, with no vendor lock-in.</p><p>The migration pattern is deliberately incremental:</p><ol><li><p><strong>Pick one dataset</strong> to migrate (start with the highest-volume or most-queried table)</p></li><li><p><strong>Build an Iceberg pipeline</strong> to land that data in your object storage (S3 or Azure)</p></li><li><p><strong>Update the Bronze view</strong> to point to the new Iceberg table instead of the legacy federated source</p></li><li><p><strong>Silver and Gold views stay unchanged.</strong> They reference the Bronze view, which now reads from Iceberg instead of the old source.</p></li><li><p><strong>Every consumer is unaffected.</strong> Dashboards, reports, and AI agents continue to work exactly as before.</p></li></ol><p>Repeat for the next dataset whenever you are ready. There is no deadline and no big-bang cutover.</p><h2><strong>Why the View Layer Makes Migration Invisible</strong></h2><p>This is the architectural insight that makes the whole journey work. The semantic layer acts as a contract between physical data storage and every consumer above it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iqRk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iqRk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!iqRk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!iqRk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!iqRk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iqRk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1095618,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/195504652?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iqRk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!iqRk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!iqRk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!iqRk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6f72899-689d-4f72-8da0-42d2157e5007_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you swap a Bronze view&#8217;s underlying source from PostgreSQL to an Iceberg table, every Silver view, Gold view, dashboard, report, and AI agent that depends on it continues to work without changes. The view contract (column names, data types, business logic) is preserved. Only the physical source pointer changes.</p><p>This means:</p><ul><li><p>No dashboard rewiring</p></li><li><p>No report migration</p></li><li><p>No API endpoint changes</p></li><li><p>No AI Agent reconfiguration</p></li><li><p>No user communication (beyond governance notifications if your policies require them)</p></li></ul><p>The migration happens underneath the abstraction layer. Everyone above it is oblivious.</p><h2><strong>The Tradeoffs</strong></h2><p>This phased approach is not free of costs.</p><p>Federation introduces network latency. Queries that join a PostgreSQL table in one region with an S3 bucket in another will be slower than queries against co-located Iceberg tables. Reflections and caching mitigate this for repeated queries, but the first execution of a new query pattern will feel it.</p><p>Iceberg migration still requires building ingest pipelines. Dremio does not eliminate that work. What it does is decouple the pipeline work from the analytics timeline. Your analysts and AI agents are productive while engineers build migration pipelines in the background.</p><p>Autonomous Reflections need a 7-day observation window before they start optimizing. Day-one performance on brand-new Iceberg tables relies on baseline optimizations (C3 caching, predicate pushdowns, vectorized execution). The system gets faster as it learns your query patterns.</p><p>And Dremio is an analytical engine, not a transactional database. Your OLTP workloads stay in PostgreSQL, MongoDB, or whatever system runs your application. You query those systems through federation, not as a replacement.</p><h2><strong>Start Today, Migrate Over Time</strong></h2><p>The traditional approach forces you to choose: spend months migrating, or keep running fragmented analytics on scattered data. Dremio eliminates that choice. Connect your sources, build your semantic layer, enable AI access, and start migrating to Iceberg when you are ready. Each phase delivers value independently, and the view layer ensures that migration never disrupts the people who are already getting answers.</p><p><a href="https://www.dremio.com/get-started">Try Dremio Cloud free for 30 days</a> and start the journey from wherever your data lives today.</p><h3><strong>Free Resources to Go Deeper</strong></h3><ul><li><p><a href="https://drmevn.fyi/linkpageiceberg">FREE - Apache Iceberg: The Definitive Guide</a></p></li><li><p><a href="https://drmevn.fyi/linkpagepolaris">FREE - Apache Polaris: The Definitive Guide</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-ai-for-dummies-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Agentic AI for Dummies</a></p></li><li><p><a href="https://hello.dremio.com/wp-resources-agentic-analytics-guide-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Leverage Federation, The Semantic Layer and the Lakehouse for Agentic AI</a></p></li><li><p><a href="https://forms.gle/xdsun6JiRvFY9rB36">FREE with Survey - Understanding and Getting Hands-on with Apache Iceberg in 100 Pages</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Apache Data Lakehouse Weekly: April 16–22, 2026]]></title><description><![CDATA[Two weeks past the Iceberg Summit, the San Francisco in-person alignments are now translating into formal proposals and code on the dev lists.]]></description><link>https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april-cd3</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april-cd3</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Fri, 24 Apr 2026 13:01:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sJAo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sJAo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sJAo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!sJAo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!sJAo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!sJAo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sJAo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b168331-8bef-4a57-b405-d10f13932962_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:878975,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/195065793?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sJAo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!sJAo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!sJAo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!sJAo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b168331-8bef-4a57-b405-d10f13932962_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Two weeks past the Iceberg Summit, the San Francisco in-person alignments are now translating into formal proposals and code on the dev lists. Iceberg&#8217;s V4 design work continued consolidating, Polaris kept moving toward its 1.4.0 milestone, Parquet&#8217;s Geospatial spec picked up a cleanup commit from a new contributor, and Arrow&#8217;s release engineering and Java modernization discussions stayed active.</p><h2><strong>Apache Iceberg</strong></h2><p>The post-summit V4 design work continued as the defining thread on the Iceberg dev list this week. The <a href="http://www.mail-archive.com/dev@iceberg.apache.org/msg12699.html">V4 metadata.json optionality discussion</a> that Anton Okolnychyi, Yufei Gu, Shawn Chang, and Steven Wu drove through March kept narrowing on practical design questions. The concrete direction emerging from the summit is to treat catalog-managed metadata as a first-class supported mode while preserving static-table portability through explicit opt-in semantics, rather than the current implicit assumption that the root JSON file is always present.</p><p>Russell Spitzer and Amogh Jahagirdar&#8217;s <a href="http://www.mail-archive.com/dev@iceberg.apache.org/msg12574.html">one-file commits design</a> moved toward a formal spec write-up this week. The approach replaces manifest lists with root manifests and introduces manifest delete vectors, enabling single-file commits that cut metadata write overhead dramatically for high-frequency writers. The in-person sessions at the summit cleared the last design disagreements about inline versus external manifest delete vectors, and the community is now aligning on the implementation plan.</p><p>P&#233;ter V&#225;ry&#8217;s <a href="http://www.mail-archive.com/dev@iceberg.apache.org/msg12958.html">efficient column updates proposal</a> for AI and ML workloads drew steady engagement. The design lets Iceberg write only the columns that change on each write for wide feature tables, then stitch the result at read time. For teams managing petabyte-scale feature stores with embedding vectors and model scores, the I/O savings are meaningful. Anurag Mantripragada and G&#225;bor Herman are working alongside P&#233;ter on POC benchmarks to support the formal proposal.</p><p>The AI contribution policy that Holden Karau, Kevin Liu, Steve Loughran, and Sung Yun pushed through March is moving toward published guidance. The summit provided the in-person alignment that async debate rarely produces, and a working policy covering disclosure requirements and code provenance standards for AI-generated contributions is expected on the dev list in the next couple of weeks. Polaris is navigating the same question in parallel, and the two communities are likely to converge on a shared approach given their overlapping contributor base.</p><h2><strong>Apache Polaris</strong></h2><p>The <a href="https://polaris.apache.org/downloads/">Polaris 1.4.0 release</a> is in active scope finalization as the project&#8217;s first release since graduating to top-level status on February 18. Credential vending for Azure and Google Cloud Storage is the headline feature, alongside catalog federation that lets one Polaris instance front multiple catalog backends across clouds. The <a href="https://polaris.apache.org/community/release-guides/semi-automated-release-guide/">schedule-driven release model</a> calls for a release intent email to the dev list about a week before the RC cut, so watch the list for that thread shortly.</p><p>The <a href="http://www.mail-archive.com/dev@ranger.apache.org/msg39491.html">Apache Ranger authorization RFC from Selvamohan Neethiraj</a> remained the most active governance discussion. The plugin lets organizations running Ranger with Hive, Spark, and Trino manage Polaris security within the same policy framework, eliminating the policy duplication that arises when teams bolt separate authorization onto each engine. It is opt-in and backward compatible with Polaris&#8217;s internal authorization layer, which lowers the enterprise adoption barrier considerably.</p><p>On the community side, Polaris&#8217;s blog continued its post-graduation cadence with a <a href="https://polaris.apache.org/blog/">Sunday April 4 post on building a fully integrated, locally-running open data lakehouse in under 30 minutes</a> using k3d, Apache Ozone, Polaris, and Trino. The Polaris PMC also shipped a <a href="https://polaris.apache.org/blog/">March 29 post</a> covering automated entity management for catalogs, principals, and roles. With incubator overhead behind it, release velocity has picked up noticeably from the 1.3.0 release on January 16.</p><h2><strong>Apache Arrow</strong></h2><p>Arrow&#8217;s <a href="https://github.com/apache/arrow-rs">release calendar</a> shows arrow-rs 58.2.0 landing this month, following 58.1.0 in March which shipped with no breaking API changes. The cadence has held at roughly one minor version per month, with 59.0.0 already scheduled for May as a major release that may include breaking changes. The Rust implementation has become one of the most actively maintained segments of the Arrow ecosystem, with a DataFusion integration drawing engines that want Arrow without a JVM dependency.</p><p>Jean-Baptiste Onofr&#233;&#8217;s JDK 17 minimum proposal for Arrow Java 20.0.0 continued drawing input from Micah Kornfield and Antoine Pitrou. The practical rationale is coordination: setting JDK 17 as Arrow&#8217;s Java baseline aligns with Iceberg&#8217;s own upgrade timeline and effectively raises the minimum across the entire lakehouse stack in a single coordinated move. The decision is expected before the 20.0.0 release cycle formally opens.</p><p>Nic Crane&#8217;s thread on using LLMs for Arrow project maintenance continued generating discussion. The framing &#8212; AI as a resource for maintainers, not just contributors &#8212; is distinct from how Iceberg and Polaris are approaching their AI policies. Arrow&#8217;s angle is practical: a lean maintainer group managing a growing issue backlog needs help triaging, and LLMs can do that work without introducing the code-provenance concerns that matter for contributions. Google Summer of Code 2026 student proposals that landed in early April are being sorted this week, with interest concentrated in compute kernels and Go and Swift language bindings.</p><h2><strong>Apache Parquet</strong></h2><p>Parquet&#8217;s week centered on hardening the Geospatial spec that was adopted earlier this year. Milan Stefanovic merged <a href="http://www.mail-archive.com/commits@parquet.apache.org/msg04335.html">PR #560 on April 20</a>, clarifying the Geospatial spec wording for coordinate reference systems. The change documents existing CRS usage practice for the default OGC:CRS84 system and removes ambiguity caught during implementation reviews. Small spec-hardening commits like this are how a new type goes from &#8220;shipped&#8221; to &#8220;production-reliable&#8221; across engines.</p><p>The community blog effort continued alongside the spec work. The <a href="https://parquet.apache.org/blog/2026/02/13/native-geospatial-types-in-apache-parquet/">Native Geospatial Types blog</a> that Jia Yu and Dewey Dunnington published on February 13 remains the community&#8217;s reference explainer, and Andrew Lamb has been coordinating with Aihua Xu on the companion Variant blog post. Spotlighting recent additions through the Parquet blog is part of a deliberate push to give the project the same kind of voice that DataFusion and Arrow have built.</p><p>The ALP encoding that cleared its acceptance vote in the prior week moved into implementation discussion. Engine teams across Spark, Trino, Dremio, and DataFusion are comparing notes on how to integrate ALP into their Parquet readers, with compression gains for float-heavy ML feature stores as the immediate benefit. The File logical type proposal for unstructured data (images, PDFs, audio) also kept advancing in community discussion, extending Parquet&#8217;s scope beyond pure analytics.</p><h2><strong>Cross-Project Themes</strong></h2><p>The summit&#8217;s downstream effect is now visible across every dev list. Iceberg&#8217;s V4 work, Polaris&#8217;s 1.4.0 scope, Arrow&#8217;s JDK 17 decision, and Parquet&#8217;s Geospatial cleanup are running in parallel, and the cross-project coordination on shared questions like AI contribution policy and Java baselines has intensified. The JDK 17 alignment is the clearest case: moving Arrow Java 20.0.0, Iceberg&#8217;s next major, and downstream engines to the same floor in a single window removes years of compatibility friction.</p><p>The second pattern is the steady expansion of format scope to meet AI workloads. Iceberg&#8217;s efficient column updates, Parquet&#8217;s File logical type, the Geospatial spec hardening, and Polaris&#8217;s multi-cloud federation all respond to the same pressure: the lakehouse stack is being asked to power AI pipelines, not just analytical queries. Each project is making changes that only make sense if you assume the next decade&#8217;s workloads look different from the last.</p><h2><strong>Looking Ahead</strong></h2><p>Watch for the V4 single-file commits formal spec write-up and the metadata optionality vote on the Iceberg dev list, along with a published AI contribution policy. The Polaris 1.4.0 release intent email should land in the coming days. Arrow&#8217;s JDK 17 baseline decision for Java 20.0.0 is close to a vote, and arrow-rs 58.2.0 should ship before the end of the month. Iceberg Summit 2026 session recordings are also rolling out on the project&#8217;s YouTube channel.</p><div><hr></div><h2><strong>Resources &amp; Further Learning</strong></h2><p><strong>Get Started with Dremio</strong></p><ul><li><p><a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-04-22&amp;utm_content=alexmerced">Try Dremio Free</a> &#8212; Build your lakehouse on Iceberg with a free trial</p></li><li><p><a href="https://www.dremio.com/use-cases/lake-to-iceberg-lakehouse/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-04-22&amp;utm_content=alexmerced">Build a Lakehouse with Iceberg, Parquet, Polaris &amp; Arrow</a> &#8212; Learn how Dremio brings the open lakehouse stack together</p></li></ul><p><strong>Free Downloads</strong></p><ul><li><p><a href="https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html">Apache Iceberg: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li><li><p><a href="https://hello.dremio.com/wp-apache-polaris-guide-reg.html">Apache Polaris: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li></ul><p><strong>Books by Alex Merced</strong></p><ul><li><p><a href="https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/">Architecting an Apache Iceberg Lakehouse</a></p></li><li><p><a href="https://www.amazon.com/Enabling-Agentic-Analytics-Apache-Iceberg-ebook/dp/B0GQXT6W3N/">Enabling Agentic Analytics with Apache Iceberg and Dremio</a></p></li><li><p><a href="https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands/dp/B0GQNY21TD/">The 2026 Guide to Lakehouses, Apache Iceberg and Agentic AI</a></p></li><li><p><a href="https://www.amazon.com/Book-Using-Apache-Iceberg-Python/dp/B0GNZ454FF/">The Book on Using Apache Iceberg with Python</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[AI Weekly: Opus 4.7, Kimi K2.6, and a $25B Amazon Deal, April 16–22, 2026]]></title><description><![CDATA[Three stories defined the past week: Anthropic shipped Claude Opus 4.7, Moonshot open-sourced Kimi K2.6 with 300-agent swarms, and Amazon committed another $25 billion to Anthropic alongside a $100 billion AWS spend.]]></description><link>https://amdatalakehouse.substack.com/p/ai-weekly-opus-47-kimi-k26-and-a</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/ai-weekly-opus-47-kimi-k26-and-a</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Thu, 23 Apr 2026 13:02:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ATqN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ATqN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ATqN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!ATqN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!ATqN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!ATqN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ATqN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:953707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/195063866?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ATqN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!ATqN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!ATqN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!ATqN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5350ed-5258-4c24-b88f-c21555b2cb52_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Three stories defined the past week: Anthropic shipped Claude Opus 4.7, Moonshot open-sourced Kimi K2.6 with 300-agent swarms, and Amazon committed another $25 billion to Anthropic alongside a $100 billion AWS spend. Here is what you need to know.</p><h2><strong>AI Coding Tools: Opus 4.7 Ships With a 1M Context Window</strong></h2><p>Anthropic released <a href="https://www.cnbc.com/2026/04/16/anthropic-claude-opus-4-7-model-mythos.html">Claude Opus 4.7 on April 16</a>, a new flagship model focused on agentic coding and long-horizon work. The model scores 87.6% on SWE-bench Verified and 64.3% on SWE-bench Pro, jumping from 80.8% on Opus 4.6. It runs with a full 1 million token context window and high-resolution image support for charts and dense documents.</p><p>The model landed across major platforms the same week. <a href="https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-opus-4-7-in-amazon-bedrock-aws-interconnect-ga-and-more-april-20-2026/">Claude Opus 4.7 arrived on Amazon Bedrock</a> on launch day in four regions, with up to 10,000 requests per minute per account. <a href="https://github.blog/changelog/2026-04-16-claude-opus-4-7-is-generally-available/">GitHub Copilot began rolling out Opus 4.7 to Copilot Pro+</a> users with a 7.5x premium request multiplier until April 30. The model is replacing both Opus 4.5 and Opus 4.6 in the Copilot model picker.</p><p>Claude Code shipped Opus 4.7 the same day with new controls. The update added an &#8220;xhigh&#8221; effort level between high and max, a <code>/ultrareview</code> multi-agent code review command, and Auto mode for Max subscribers. <a href="https://releasebot.io/updates/anthropic">Anthropic also launched Claude Design</a>, a new Anthropic Labs product for building prototypes, slides, and one-pagers in collaboration with the model. Pricing stays at $5 per million input tokens and $25 per million output tokens.</p><h2><strong>AI Models: Kimi K2.6 Opens the Door to 12-Hour Agent Runs</strong></h2><p><a href="https://siliconangle.com/2026/04/20/moonshot-ai-releases-kimi-k2-6-model-1t-parameters-attention-optimizations/">Moonshot AI released Kimi K2.6 on April 20</a> as an open-source agentic model built for long-horizon coding. The model has 1 trillion total parameters in a Mixture-of-Experts architecture with 32 billion active per forward pass. It supports text, image, and video input, a 256K context window, and thinking and non-thinking modes behind an OpenAI-compatible API.</p><p>The headline claim is stamina. <a href="https://www.marktechpost.com/2026/04/20/moonshot-ai-releases-kimi-k2-6-with-long-horizon-coding-agent-swarm-scaling-to-300-sub-agents-and-4000-coordinated-steps/">Kimi K2.6 targets 12-hour autonomous coding sessions</a> and agent swarms that scale to 300 sub-agents across 4,000 coordinated steps. On benchmarks, Moonshot claims SWE-Bench Pro at 58.6, SWE-bench Multilingual at 76.7, and BrowseComp at 83.2. The model matches or beats GPT-5.4 and Claude Opus 4.6 on several open-source leaderboards.</p><p>K2.6 is available immediately on Kimi.com, the developer API, Kimi Code CLI, Ollama, and Hugging Face. Day-one integrations cover Kilo Code, VS Code and JetBrains extensions, OpenClaw, Tencent CodeBuddy, and Genspark. The MIT-derived license allows commercial use and redistribution, a direct challenge to closed-source frontier labs.</p><h2><strong>AI Infrastructure: AWS Interconnect Reaches GA and Amazon Adds $25B to Anthropic</strong></h2><p><a href="https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-opus-4-7-in-amazon-bedrock-aws-interconnect-ga-and-more-april-20-2026/">AWS Interconnect reached general availability on April 20</a> with two new capabilities. AWS Interconnect Multicloud provides Layer 3 private connections between AWS VPCs and other clouds, starting with Google Cloud, with Azure and OCI coming later in 2026. Traffic flows over the AWS global backbone with built-in MACsec encryption, never crossing the public internet. AWS also published the Interconnect specification on GitHub under Apache 2.0, so any cloud provider can become a partner.</p><p><a href="https://www.cnbc.com/2026/04/20/amazon-invest-up-to-25-billion-in-anthropic-part-of-ai-infrastructure.html">Amazon announced a $25 billion investment in Anthropic on April 20</a>, on top of the $8 billion already committed. The deal includes $5 billion immediately, with up to $20 billion tied to commercial milestones. <a href="https://finance.yahoo.com/sectors/technology/articles/amazon-investing-25-billion-more-113801183.html">Anthropic committed to spending more than $100 billion on AWS over 10 years</a>, securing up to 5 gigawatts of Trainium chip capacity. One gigawatt is scheduled to come online this year using Trainium2 and Trainium3.</p><p>The structure mirrors the <a href="https://www.geekwire.com/2026/amazon-doubles-down-on-anthropic-with-25b-investment-mirroring-its-openai-cloud-deal/">$50 billion Amazon-OpenAI deal from February</a>. Anthropic is now valued at $380 billion, with annualized revenue climbing from $9 billion at the end of 2025 to more than $30 billion. Enterprise customers spending at least $1 million annually have doubled since February, crossing 1,000 accounts.</p><h2><strong>Standards and Protocols: Interconnect Spec Goes Open</strong></h2><p>The AWS Interconnect specification going to GitHub under Apache 2.0 is the standards story of the week. The move gives any cloud provider a published path to join the private connectivity mesh without negotiating bilateral deals. For AI workloads moving data between model training clusters in one cloud and inference infrastructure in another, the alternative has been either the public internet or expensive dedicated circuits.</p><p>The broader pattern is that hyperscale cloud providers are open-sourcing infrastructure specs to lock in network effects. Trainium chip access is exclusive, but the connectivity layer is open. This is the same playbook the Linux Foundation&#8217;s Agentic AI Foundation uses for MCP and A2A: open standards for the plumbing, proprietary value on top.</p><p>MCP and A2A also saw continued adoption this week. Claude Opus 4.7 ships with both protocols built in, and Kimi K2.6 supports tool calls and OpenAI-compatible APIs that slot into MCP-aware agent stacks. The layered architecture is holding: MCP handles agent-to-tool connections, A2A handles agent-to-agent coordination, and the new open models and frontier releases are all landing with both built in by default.</p><div><hr></div><h2><strong>Resources to Go Further</strong></h2><p>The AI landscape changes fast. Here are tools and resources to help you keep pace.</p><p><strong>Try Dremio Free</strong> &#8212; Experience agentic analytics and an Apache Iceberg-powered lakehouse. <a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=04-22-2026&amp;utm_content=alexmerced">Start your free trial</a></p><p><strong>Learn Agentic AI with Data</strong> &#8212; Dremio&#8217;s agentic analytics features let your AI agents query and act on live data. <a href="https://www.dremio.com/use-cases/agentic-ai/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=04-22-2026&amp;utm_content=alexmerced">Explore Dremio Agentic AI</a></p><p><strong>Join the Community</strong> &#8212; Connect with data engineers and AI practitioners building on open standards. <a href="https://developer.dremio.com/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=04-22-2026&amp;utm_content=alexmerced">Join the Dremio Developer Community</a></p><p><strong>Book: The 2026 Guide to AI-Assisted Development</strong> &#8212; Covers prompt engineering, agent workflows, MCP, evaluation, security, and career paths. <a href="https://www.amazon.com/2026-Guide-AI-Assisted-Development-Engineering-ebook/dp/B0GQW7CTML/">Get it on Amazon</a></p><p><strong>Book: Using AI Agents for Data Engineering and Data Analysis</strong> &#8212; A practical guide to Claude Code, Google Antigravity, OpenAI Codex, and more. <a href="https://www.amazon.com/Using-Agents-Data-Engineering-Analysis-ebook/dp/B0GR6PYJT9/">Get it on Amazon</a></p>]]></content:encoded></item><item><title><![CDATA[What "Apache Iceberg Native" Actually Means]]></title><description><![CDATA[Every major data platform now lists Apache Iceberg somewhere on its feature page.]]></description><link>https://amdatalakehouse.substack.com/p/what-apache-iceberg-native-actually</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/what-apache-iceberg-native-actually</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Thu, 23 Apr 2026 06:39:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ep2P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ep2P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ep2P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Ep2P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Ep2P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Ep2P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ep2P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:425877,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/195208728?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ep2P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Ep2P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Ep2P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Ep2P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5d08be1-be79-49ac-8420-88b327952dff_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every major data platform now lists <strong><a href="https://iceberg.apache.org/">Apache Iceberg</a></strong> somewhere on its feature page. Snowflake has Iceberg Tables. Databricks has UniForm. BigQuery has BigLake. This is a genuinely good thing for the ecosystem because it gives users more choice and more portability.</p><p>But &#8220;supports Iceberg&#8221; and &#8220;Iceberg native&#8221; are not the same thing. The distinction matters when you are deciding where your data gravity should live, meaning where your primary analytical data is stored, optimized, and governed. A platform that bolts Iceberg support onto a proprietary core makes different engineering tradeoffs than one built on Iceberg from the ground up.</p><p>Dremio claims to be Iceberg native. Here are the three things that back up that claim.</p><h3><strong>Iceberg by Default, Not by Request</strong></h3><p>There is no &#8220;Dremio Table&#8221; format. When you create a table in Dremio, it is an Apache Iceberg table. There is no proprietary alternative, no internal format that gets better performance, no checkbox to opt into openness. Iceberg is the only option because it is the foundation of the platform.</p><p>This is a bigger deal than it sounds. Engineering priority follows the default format. When a platform&#8217;s default is a proprietary format, that is where the optimization budget goes: faster writes, smarter compaction, deeper query optimizer integration. Iceberg support becomes a secondary project staffed by a smaller team, maintained to a &#8220;good enough&#8221; standard.</p><p>Dremio&#8217;s engineers spend their time on one thing: making the <strong><a href="https://www.dremio.com/platform/sql-query-engine/">Dremio query engine</a></strong> faster on Apache Iceberg tables and operations. Every query optimizer improvement, every table management feature, every caching enhancement targets Iceberg directly. There is no internal format competing for attention.</p><p>The practical result: you don&#8217;t face a choice between &#8220;fast but proprietary&#8221; and &#8220;open but slower.&#8221; In Dremio, Iceberg IS the fast format.</p><h3><strong>Iceberg-Native Acceleration</strong></h3><p>Speed is where the &#8220;supports Iceberg&#8221; vs. &#8220;Iceberg native&#8221; distinction gets concrete.</p><p>Most platforms that added Iceberg support did not build their query engines for it. Their engines are tuned for their proprietary format. So when customers ask how to make Iceberg queries faster, the answer is often: create materialized views stored in the platform&#8217;s proprietary format, in the platform&#8217;s managed storage.</p><p>Think about what that means. The whole point of adopting Iceberg was to avoid writing data into proprietary formats. But the performance optimization path pushes you right back into one.</p><p>Dremio does the opposite.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VFKn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VFKn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!VFKn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!VFKn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!VFKn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VFKn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png" width="1000" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!VFKn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!VFKn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!VFKn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!VFKn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e021c98-02a2-49f2-8632-d44b59153422_1000x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h3><strong>Reflections: Iceberg as the Faster Format</strong></h3><p>Dremio&#8217;s <strong><a href="https://www.dremio.com/platform/sql-query-engine/">Reflections</a></strong> are pre-computed, optimized materializations stored as Apache Iceberg tables. When you query a federated source like PostgreSQL or MongoDB through Dremio, the engine can create a Reflection in Iceberg format that serves future queries faster.</p><p>Read that again: Dremio uses Iceberg to speed up non-Iceberg sources. The industry pattern is &#8220;use a proprietary format to speed up Iceberg.&#8221; Dremio flips it: &#8220;use Iceberg to speed up everything.&#8221;</p><p>Autonomous Reflections take this further. Dremio analyzes your query patterns over a 7-day window and automatically creates, manages, and drops Reflections without human intervention. Reflections on Iceberg tables support incremental updates, so only changed data gets reprocessed. No full rebuilds.</p><h3><strong>Why the Engine Is Fast on Iceberg Natively</strong></h3><p>Dremio&#8217;s speed on Iceberg is not just about caching tricks. It is architectural:</p><ul><li><p><strong>Apache Arrow as native in-memory format</strong>: Dremio co-created Apache Arrow. Data stays in columnar Arrow format in memory, which eliminates the serialization overhead that traditional engines pay when converting between formats.</p></li><li><p><strong>Vectorized Parquet reader</strong>: Maximizes parallelism when reading Iceberg&#8217;s underlying Parquet files.</p></li><li><p><strong>Columnar Cloud Cache (C3)</strong>: Stores frequently accessed data on local NVMe drives at executor nodes. Cloud storage latency becomes local-disk speed.</p></li><li><p><strong>Results and Query Plan Cache</strong>: Identical queries return instantly.</p></li></ul><p>The combination means Dremio doesn&#8217;t need a proprietary format to deliver fast analytics. Iceberg plus Arrow plus aggressive caching is the performance strategy.</p><h3><strong>Operationalizing Iceberg</strong></h3><p>Apache Iceberg is a table format specification. It is not, by itself, a managed analytics experience. The gap between &#8220;I have Iceberg tables&#8221; and &#8220;I have a production lakehouse&#8221; is filled with operational work that never ends: choosing a catalog, configuring compaction jobs, monitoring small file accumulation, managing schema evolution, vacuuming obsolete snapshots, and building alerting for all of it.</p><p>Dremio closes this gap so that working with Iceberg feels like working with a traditional data warehouse: you onboard datasets, write SQL, use AI agents, and run analytics. The open format stays open underneath.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xAjs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xAjs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!xAjs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!xAjs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!xAjs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xAjs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png" width="1000" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!xAjs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!xAjs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!xAjs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!xAjs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb83536-d6e7-4bd1-8154-09cce33da27f_1000x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><h3><strong>Built-In Apache Polaris Catalog</strong></h3><p>Dremio&#8217;s <strong><a href="https://www.dremio.com/platform/enterprise-data-catalog/">Open Catalog</a></strong> is built on Apache Polaris, the open Iceberg REST catalog standard. It tracks and governs your lakehouse assets with Role-Based Access Control (RBAC) at the folder, dataset, and column level, plus Fine-Grained Access Control (FGAC) via UDFs for row-level security and column-level masking.</p><p>Dremio can also connect to your external Iceberg catalogs (AWS Glue, Unity Catalog, and others) alongside the built-in one, so you govern everything from a single namespace.</p><h3><strong>Iceberg V3 Spec Support</strong></h3><p>Dremio tracks the latest <strong><a href="https://www.dremio.com/platform/apache-iceberg/">Apache Iceberg</a></strong> spec features, including Variant type for semi-structured data and Geospatial types. Being Iceberg native means adopting new spec capabilities as they land, not waiting for them to be translated through a compatibility layer.</p><h3><strong>Automated Table Maintenance</strong></h3><p>Background jobs handle the operational work that makes DIY lakehouses &#8220;perpetually a work-in-progress&#8221;:</p><p>Maintenance TaskWhat Dremio Automates<strong>Compaction</strong>Consolidates small files into optimally-sized files<strong>Clustering</strong>Sorts data by frequently filtered columns for faster queries<strong>Snapshot management</strong>Vacuums obsolete snapshots and orphan files to control storage costs<strong>Manifest rewriting</strong>Optimizes metadata structure for faster query planning</p><p>These jobs run based on actual table usage patterns. No cron jobs. No custom scripts. No on-call rotations for your data lake.</p><h3><strong>The Tradeoffs</strong></h3><p>Being Iceberg native is a deliberate architectural choice, and it comes with boundaries.</p><p>Dremio is an analytical engine. OLTP workloads (high-frequency transactional writes) belong in purpose-built databases like PostgreSQL or MongoDB. You can query those systems through Dremio&#8217;s federation, but you should not try to replace them.</p><p>Dremio&#8217;s opinionated commitment to Iceberg also means it is not the right fit if your organization&#8217;s strategy centers on a different format. If Delta Lake is your standard and you have no plans to move, a Delta-native platform is the better match while Dremio can access Uniform tables in Unity Catalog for data unification use cases.</p><p>And Iceberg itself is still evolving. V3 features like Variant and Geospatial are new, and the tooling ecosystem around them is maturing. Being on the leading edge of the spec means occasional rough edges.</p><h3><strong>Where Your Data Gravity Should Live</strong></h3><p>It is a great thing that so many platforms now support Apache Iceberg. More support means more flexibility for everyone. But if your intention is to make Iceberg your primary analytics format, then &#8220;supports Iceberg&#8221; and &#8220;built for Iceberg&#8221; lead to very different outcomes.</p><p>A platform where Iceberg is the default format, where acceleration stays in Iceberg, and where the operational complexity of running a lakehouse is abstracted away is a platform built for that Iceberg-native use case. That is what Dremio is designed to be.</p><p><strong><a href="https://www.dremio.com/get-started">Try Dremio Cloud free for 30 days</a></strong> and see how an Iceberg-native platform handles the workloads you are running today.</p><h3><strong>FREE Books</strong></h3><ul><li><p><strong><a href="https://drmevn.fyi/linkpageiceberg">FREE - Apache Iceberg: The Definitive Guide</a></strong></p></li><li><p><strong><a href="https://drmevn.fyi/linkpagepolaris">FREE - Apache Polaris: The Definitive Guide</a></strong></p></li><li><p><strong><a href="https://hello.dremio.com/wp-resources-agentic-ai-for-dummies-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Agentic AI for Dummies</a></strong></p></li><li><p><strong><a href="https://hello.dremio.com/wp-resources-agentic-analytics-guide-reg.html?utm_source=link_page&amp;utm_medium=influencer&amp;utm_campaign=iceberg&amp;utm_term=qr-link-list-04-07-2026&amp;utm_content=alexmerced">FREE - Leverage Federation, The Semantic Layer and the Lakehouse for Agentic AI</a></strong></p></li><li><p><strong><a href="https://forms.gle/xdsun6JiRvFY9rB36">FREE with Survey - Understanding and Getting Hands-on with Apache Iceberg in 100 Pages</a></strong></p></li></ul>]]></content:encoded></item><item><title><![CDATA[What is Apache Iceberg? The Table Format Revolution]]></title><description><![CDATA[Read the complete Open Source and the Lakehouse series:]]></description><link>https://amdatalakehouse.substack.com/p/what-is-apache-iceberg-the-table</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/what-is-apache-iceberg-the-table</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Tue, 21 Apr 2026 15:02:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!CvMJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CvMJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CvMJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!CvMJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!CvMJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!CvMJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CvMJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:833693,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/194170148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CvMJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!CvMJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!CvMJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!CvMJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30cc2d56-2fe2-4896-ad82-03293685c2d7_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read the complete Open Source and the Lakehouse series:</em></p><ul><li><p><a href="https://datalakehousehub.com/blog/2026-04-apache-software-foundation/">Part 1: Apache Software Foundation</a></p></li><li><p><a href="https://datalakehousehub.com/blog/2026-04-apache-parquet/">Part 2: What is Apache Parquet?</a></p></li><li><p><a href="https://datalakehousehub.com/blog/2026-04-apache-iceberg/">Part 3: What is Apache Iceberg?</a></p></li><li><p><a href="https://datalakehousehub.com/blog/2026-04-apache-polaris/">Part 4: What is Apache Polaris?</a></p></li><li><p><a href="https://datalakehousehub.com/blog/2026-04-apache-arrow/">Part 5: What is Apache Arrow?</a></p></li><li><p><a href="https://datalakehousehub.com/blog/2026-04-assembling-apache-lakehouse/">Part 6: Assembling the Apache Lakehouse</a></p></li><li><p><a href="https://datalakehousehub.com/blog/2026-04-agentic-analytics/">Part 7: Agentic Analytics on the Apache Lakehouse</a></p></li></ul><p>If you drop ten thousand Parquet files into an S3 bucket, you have a data swamp. You do not have a database. To run SQL queries against those files safely, your engine needs to know exactly which files belong to which table, what the columns are, and which files to ignore. Historically, Apache Hive solved this by tracking directories. Apache Iceberg solves this by tracking files.</p><p>That shift from directory-listing to file-level metadata fundamentally changes how organizations scale analytics. Iceberg brings the reliability of a transactional database to cloud object storage.</p><h2><strong>The Directory Listing Bottleneck</strong></h2><p>Legacy data architectures treated cloud storage like a local hard drive. If an engine like Hive wanted to read a table, it asked the cloud provider to list all the files inside a specific directory.</p><p>Listing millions of files in Amazon S3 or Google Cloud Storage takes an incredibly long time. Worse, cloud providers aggressively throttle high-frequency listing requests. When concurrent writers update a heavily partitioned Hive table, metadata synchronization operations cause readers to see inconsistent, partial data. Scaling meant hitting a hard wall.</p><p>Iceberg architects recognized that the file system is the wrong place to store database state. They moved the state into a dedicated metadata tree.</p><h2><strong>The Iceberg Metadata Tree Architecture</strong></h2><p>When an engine queries an Iceberg table, it never asks S3 to list directories. File discovery becomes an instant, <code>O(1)</code> metadata lookup. The architecture works through a strict hierarchy of pointers.</p><p>The query begins at the <strong>Catalog</strong>, which holds a single pointer to the current <code>metadata.json</code> file. This ensures atomic commits; whichever engine successfully updates the catalog pointer wins the transaction. The <code>metadata.json</code> tracks the table schema and points to a <strong>Manifest List</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-0i-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-0i-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!-0i-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!-0i-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!-0i-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-0i-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80b70be6-d535-4106-8119-0e88c7284497_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The Iceberg Metadata Tree showing the path from Catalog down to Data Files&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Iceberg Metadata Tree showing the path from Catalog down to Data Files" title="The Iceberg Metadata Tree showing the path from Catalog down to Data Files" srcset="https://substackcdn.com/image/fetch/$s_!-0i-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!-0i-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!-0i-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!-0i-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b70be6-d535-4106-8119-0e88c7284497_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Manifest List acts as a table of contents for a specific point in time (a snapshot). It points to multiple <strong>Manifest Files</strong>. Finally, these Manifest Files contain the explicit paths to the individual Parquet data files, along with statistics like minimum and maximum values for every column.</p><p>This strict tree structure means the engine knows exactly which Parquet files it needs to read before touching the raw data.</p><h2><strong>Schema and Partition Evolution</strong></h2><p>Data shapes change. In traditional data lakes, renaming a column or changing a partition strategy required a total table rewrite. Iceberg executes these changes in milliseconds as metadata operations.</p><p>Iceberg achieves Schema Evolution by assigning a unique ID to every column. It tracks schema changes against the ID, not the string name. If you delete a column named <code>user_id</code> and create a new column named <code>user_id</code>, Iceberg knows they are entirely different fields. You can add, drop, rename, and reorder columns with zero side effects on existing files.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OTfh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OTfh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!OTfh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!OTfh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!OTfh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OTfh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Diagram showing Schema Evolution mapping unique column IDs to file structures over time&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing Schema Evolution mapping unique column IDs to file structures over time" title="Diagram showing Schema Evolution mapping unique column IDs to file structures over time" srcset="https://substackcdn.com/image/fetch/$s_!OTfh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!OTfh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!OTfh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!OTfh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7a6f7c-3164-4f8a-a98d-ad8d005fc6a4_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Similarly, Iceberg features &#8220;hidden partitioning&#8221;. Engineers do not have to create physically derived columns just to partition data (e.g., extracting the year from a timestamp). Iceberg tracks the partition logic entirely in metadata. If you decide to change a table from monthly partitioning to daily partitioning, old data remains partitioned by month, and new data partitions by day. The engine handles the difference transparently.</p><h2><strong>Time Travel and Atomic Snapshots</strong></h2><p>Because Iceberg uses a tree of files where data is never updated in place, every write operation creates a brand new, immutable snapshot of the table.</p><p>When you run an <code>UPDATE</code> statement, Iceberg writes a new Parquet file containing the updated records, creates a new Manifest pointing to the new data, and generates a new Manifest List. The previous snapshot remains completely intact.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gE2Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gE2Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!gE2Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!gE2Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!gE2Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gE2Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp" width="800" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Diagram showing Time Travel snapshots pointing an overlapping set of underlying Parquet files&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing Time Travel snapshots pointing an overlapping set of underlying Parquet files" title="Diagram showing Time Travel snapshots pointing an overlapping set of underlying Parquet files" srcset="https://substackcdn.com/image/fetch/$s_!gE2Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp 424w, https://substackcdn.com/image/fetch/$s_!gE2Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp 848w, https://substackcdn.com/image/fetch/$s_!gE2Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!gE2Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85de03b-f28b-4763-bb02-b16e4392fc90_800x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This architecture unlocks Time Travel. Analysts can append <code>FOR SYSTEM_TIME AS OF</code> to their SQL queries to read previous table states. If a faulty pipeline writes bad data, you do not need to rebuild the table from backups. You simply roll back the catalog pointer to the previous, healthy snapshot. Time travel does not duplicate data; the metadata simply points back to the underlying files that were valid at that exact moment.</p><h2><strong>Scaling the Open Source Lakehouse</strong></h2><p>Apache Iceberg provides the structure necessary to treat raw Parquet files like high-performance relational tables. However, a table format alone is incomplete. You need a centralized catalog mechanism to manage the root pointers, enforce security access, and resolve interoperability between multiple query engines.</p><p>That requirement leads directly to Apache Polaris, the open catalog standard designed to unify the Iceberg ecosystem.</p><p>Dremio executes natively against Iceberg tables, managing the metadata optimization lifecycle automatically. To see Iceberg transactions and time travel in action without building infrastructure, <a href="https://www.dremio.com/get-started">try Dremio Cloud free for 30 days</a>.</p>]]></content:encoded></item><item><title><![CDATA[Data Vault Modeling: Hubs, Links, and Satellites]]></title><description><![CDATA[Dimensional modeling works well when your source systems are stable and your business questions are predictable.]]></description><link>https://amdatalakehouse.substack.com/p/data-vault-modeling-hubs-links-and</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/data-vault-modeling-hubs-links-and</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Mon, 20 Apr 2026 13:59:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-wEz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-wEz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-wEz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!-wEz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!-wEz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!-wEz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-wEz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:928345,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/189062893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-wEz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!-wEz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!-wEz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!-wEz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff02b30-c7fb-4bad-909c-4555b8805e6b_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qe_N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qe_N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!qe_N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!qe_N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!qe_N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qe_N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Data Vault model showing Hubs, Links, and Satellites as interconnected components&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Data Vault model showing Hubs, Links, and Satellites as interconnected components" title="Data Vault model showing Hubs, Links, and Satellites as interconnected components" srcset="https://substackcdn.com/image/fetch/$s_!qe_N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!qe_N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!qe_N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!qe_N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602de556-53d2-4d42-ba37-dbbed22a800a_640x640.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Dimensional modeling works well when your source systems are stable and your business questions are predictable. But what happens when sources change constantly, new systems get added every quarter, and regulatory requirements demand a full audit trail of every attribute change?</p><p>Data Vault modeling was designed for exactly this scenario. Created by Dan Linstedt, it separates data into three distinct table types &#8212; Hubs, Links, and Satellites &#8212; each handling a different concern: identity, relationships, and descriptive context.</p><h2><strong>What Problem Data Vault Solves</strong></h2><p>Traditional dimensional models embed everything about a business entity in one dimension table. A <code>dim_customers</code> table contains the customer ID, name, address, segment, acquisition channel, and lifetime value. When a new source system provides additional customer attributes, you add columns to <code>dim_customers</code>. When business rules change how &#8220;segment&#8221; is calculated, you update the ETL pipeline that populates that table.</p><p>Over time, these dimension tables become fragile. They depend on multiple source systems. A change in one source breaks the ETL. Schema changes require coordinated updates across pipelines, tables, and downstream reports.</p><p>Data Vault solves this by decomposing entities into independent components that can evolve separately.</p><h2><strong>The Three Building Blocks</strong></h2><h3><strong>Hubs: Business Identity</strong></h3><p>A Hub stores unique business keys &#8212; the identifiers that define a business entity regardless of which source system provides them.</p><pre><code><code>CREATE TABLE hub_customer (
    customer_hash_key BINARY(32),  -- Hash of the business key
    customer_id VARCHAR(50),        -- Natural business key
    load_date TIMESTAMP,
    record_source VARCHAR(100)
);
</code></code></pre><p>Hubs are immutable. Once a business key is loaded, it never changes. A customer who has <code>customer_id = 'C-1042'</code> always has that key. Hubs answer the question: <em>What business concepts exist?</em></p><h3><strong>Links: Relationships</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t27H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t27H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!t27H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!t27H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!t27H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t27H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Hubs connected by Link tables representing relationships between business entities&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Hubs connected by Link tables representing relationships between business entities" title="Hubs connected by Link tables representing relationships between business entities" srcset="https://substackcdn.com/image/fetch/$s_!t27H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!t27H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!t27H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!t27H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F390c000f-ead3-46bd-ba67-ea1b51a42cc5_640x640.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A Link stores relationships between Hubs. Every relationship &#8212; customer-to-order, order-to-product, employee-to-department &#8212; gets its own Link table.</p><pre><code><code>CREATE TABLE link_customer_order (
    link_hash_key BINARY(32),
    customer_hash_key BINARY(32),
    order_hash_key BINARY(32),
    load_date TIMESTAMP,
    record_source VARCHAR(100)
);
</code></code></pre><p>Links are also immutable. Once a relationship is recorded, it stays. Links support many-to-many relationships by default. They answer the question: <em>How are business concepts related?</em></p><h3><strong>Satellites: Descriptive Context</strong></h3><p>Satellites store the descriptive attributes of a Hub or Link, along with their change history.</p><pre><code><code>CREATE TABLE sat_customer_details (
    customer_hash_key BINARY(32),
    effective_date TIMESTAMP,
    customer_name VARCHAR(200),
    email VARCHAR(200),
    city VARCHAR(100),
    segment VARCHAR(50),
    load_date TIMESTAMP,
    record_source VARCHAR(100)
);
</code></code></pre><p>Every time an attribute changes, a new Satellite row is inserted. This is equivalent to SCD Type 2 &#8212; full history is preserved without modifying existing rows. Different source systems can feed different Satellites for the same Hub, allowing attributes to arrive independently.</p><h2><strong>How a Data Vault Query Works</strong></h2><p>To reconstruct a business entity (like a current customer profile), you join the Hub to its current Satellite rows:</p><pre><code><code>SELECT
    h.customer_id,
    s.customer_name,
    s.email,
    s.city,
    s.segment
FROM hub_customer h
JOIN sat_customer_details s ON h.customer_hash_key = s.customer_hash_key
WHERE s.effective_date = (
    SELECT MAX(effective_date)
    FROM sat_customer_details s2
    WHERE s2.customer_hash_key = s.customer_hash_key
);
</code></code></pre><p>This is more complex than querying <code>dim_customers</code> directly. That complexity is the primary criticism of Data Vault. In practice, teams build a presentation layer &#8212; star schema views on top of the vault &#8212; for business users and BI tools.</p><p>Platforms like <a href="https://www.dremio.com/blog/agentic-analytics-semantic-layer/?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Dremio</a> make this practical. The raw vault tables live in the Bronze layer. Silver-layer views reconstruct business entities by joining Hubs, Links, and Satellites. Gold-layer views present dimensional star schemas for dashboards and AI agents. Users never query the vault tables directly.</p><h2><strong>When Data Vault Fits</strong></h2><p><strong>Multiple source systems that change frequently.</strong> Adding a new source means adding new Satellites &#8212; not redesigning existing tables. The Hub and Link structure remains stable.</p><p><strong>Regulated industries requiring full audit trails.</strong> Financial services, healthcare, and government often need to prove what data looked like at any point in time. Satellites provide that out of the box.</p><p><strong>Large enterprises with parallel development teams.</strong> Hubs, Links, and Satellites can be loaded independently, enabling parallel ETL development without pipeline conflicts.</p><p><strong>Long-term data warehouses with decades of history.</strong> The separation of structure (Hubs, Links) from content (Satellites) makes the vault resilient to business changes over time.</p><h2><strong>When Data Vault Doesn&#8217;t Fit</strong></h2><p><strong>Small teams or simple source environments.</strong> If you have five source tables and one BI tool, Data Vault adds complexity without proportional benefit. A star schema is faster to build and easier to maintain.</p><p><strong>Direct BI tool access.</strong> BI tools don&#8217;t speak Data Vault natively. You always need a presentation layer on top, which means building two models instead of one.</p><p><strong>Speed-to-value projects.</strong> When the goal is &#8220;get a dashboard live this sprint,&#8221; Data Vault&#8217;s up-front design work slows you down.</p><p>FactorData VaultDimensional ModelSource flexibilityHighModerateAudit trailBuilt-inOptional (SCDs)Query simplicityLow (needs presentation layer)HighLearning curveHighModerateAdding new sourcesEasy (new satellites)Harder (redesign dimensions)BI tool compatibilityLowHigh</p><h2><strong>What to Do Next</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8KCy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8KCy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!8KCy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!8KCy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!8KCy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8KCy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Presentation layer of star schema views built on top of a Data Vault foundation&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Presentation layer of star schema views built on top of a Data Vault foundation" title="Presentation layer of star schema views built on top of a Data Vault foundation" srcset="https://substackcdn.com/image/fetch/$s_!8KCy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp 424w, https://substackcdn.com/image/fetch/$s_!8KCy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp 848w, https://substackcdn.com/image/fetch/$s_!8KCy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp 1272w, https://substackcdn.com/image/fetch/$s_!8KCy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f687c00-4a7a-41fe-af75-a9f6fe2de8a0_640x640.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you&#8217;re evaluating Data Vault, start by counting your source systems and estimating how often they change schema. If the answer is &#8220;more than five sources&#8221; and &#8220;at least once a quarter,&#8221; Data Vault&#8217;s separation of concerns will likely save you from painful redesign cycles. If your environment is simpler than that, a well-designed dimensional model will get you to production faster.</p><p><a href="https://www.dremio.com/get-started?utm_source=ev_buffer&amp;utm_medium=influencer&amp;utm_campaign=next-gen-dremio&amp;utm_term=blog-021826-02-18-2026&amp;utm_content=alexmerced">Try Dremio Cloud free for 30 days</a></p>]]></content:encoded></item><item><title><![CDATA[Apache Data Lakehouse Weekly: April 9–15, 2026]]></title><description><![CDATA[The Iceberg Summit wrapped in San Francisco, leaving behind a set of in-person alignments that are now surfacing as concrete proposals on the dev lists.]]></description><link>https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april-07a</link><guid isPermaLink="false">https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april-07a</guid><dc:creator><![CDATA[Alex Merced]]></dc:creator><pubDate>Thu, 16 Apr 2026 16:01:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dGib!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dGib!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dGib!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!dGib!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!dGib!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!dGib!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dGib!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:795625,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://amdatalakehouse.substack.com/i/194338095?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dGib!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png 424w, https://substackcdn.com/image/fetch/$s_!dGib!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png 848w, https://substackcdn.com/image/fetch/$s_!dGib!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png 1272w, https://substackcdn.com/image/fetch/$s_!dGib!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49766253-7459-44b6-9d0f-2be826b84e57_1536x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Iceberg Summit wrapped in San Francisco, leaving behind a set of in-person alignments that are now surfacing as concrete proposals on the dev lists. Parquet&#8217;s ALP encoding vote closed, Polaris 1.4.0 planning accelerated, and Arrow&#8217;s engineering community tackled two interlinked decisions about its future Java baseline and AI tooling policy. The post-summit week is when talk becomes code.</p><h2><strong>Apache Iceberg</strong></h2><p>The two days in San Francisco established alignment on the discussions that have dominated the dev list all spring. The <a href="http://www.mail-archive.com/dev@iceberg.apache.org/msg12699.html">V4 metadata.json optionality thread</a> drew the largest in-person audience of any design session, with Anton Okolnychyi, Yufei Gu, Shawn Chang, and Steven Wu working through the portability and static-table implications of making the root JSON file optional when a catalog manages metadata state. The direction that emerged favors catalog-managed metadata as a first-class supported mode, with portability guarantees preserved through explicit opt-in semantics rather than the current default assumption.</p><p>The <a href="http://www.mail-archive.com/dev@iceberg.apache.org/msg12574.html">one-file commits design</a> &#8212; the work Russell Spitzer and Amogh Jahagirdar have been advancing through multiple proposals &#8212; is heading toward a formal spec write-up following alignment reached at the summit. The approach replaces manifest lists with root manifests and uses manifest delete vectors to enable single-file commits, promising dramatic reductions in commit latency and metadata storage footprint. This is one of the most consequential V4 changes for high-frequency write workloads, and the in-person sessions cleared the remaining design disagreements about inline versus external manifest delete vectors.</p><p>P&#233;ter V&#225;ry&#8217;s <a href="http://www.mail-archive.com/dev@iceberg.apache.org/msg12958.html">efficient column updates proposal</a> for AI and ML workloads drew real interest at the summit. The design targets wide tables where only a subset of columns change on each write &#8212; embedding vectors, model scores, feature values &#8212; allowing Iceberg to write only the updated columns to separate files and merge at read time. For teams managing petabyte-scale feature stores, the I/O savings are significant. P&#233;ter indicated that a formal proposal with POC benchmarks would land on the dev list in the days following the summit.</p><p>The AI contribution policy that pulled in Holden Karau, Kevin Liu, Steve Loughran, and Sung Yun over the preceding weeks moved toward practical resolution. The summit provided the in-person clarity that async debate rarely does, and a working policy covering disclosure requirements and code provenance standards for AI-generated contributions is expected to be published on the dev list this week.</p><h2><strong>Apache Polaris</strong></h2><p>Polaris is one month past its February 18 graduation as a top-level Apache project, and the governance machinery is running. Jean-Baptiste Onofr&#233;&#8217;s <a href="http://www.mail-archive.com/general@incubator.apache.org/msg86108.html">first board report as a TLP</a> covers the March 26 ASF board meeting, documenting community health, development progress, and strategic direction under Polaris&#8217;s own PMC. JB also <a href="https://www.globenewswire.com/news-release/2026/04/06/3268593/0/en/Dremio-Deepens-Apache-Iceberg-Leadership-with-V3-Support-New-Community-Appointments-and-Polaris-Momentum">joined the Apache Software Foundation board itself</a> as a Dremio-nominated director, a governance milestone that deepens the open-source commitment across the entire ecosystem.</p><p>The <a href="http://www.mail-archive.com/dev@ranger.apache.org/msg39491.html">Apache Ranger authorization RFC from Selvamohan Neethiraj</a> remained the most active technical discussion thread. The design allows organizations running Ranger alongside Hive, Spark, and Trino to manage Polaris security within a unified governance framework, eliminating the policy duplication that arises when teams bolt separate authorization systems onto each engine. The plugin is opt-in and backward compatible with Polaris&#8217;s existing internal authorization layer, a design choice that lowers the enterprise adoption barrier considerably.</p><p>The 1.4.0 release &#8212; Polaris&#8217;s first as a graduated project &#8212; is now in active scope finalization. Credential vending for Azure and Google Cloud Storage is the headline feature, alongside catalog federation design that lets Polaris front for multiple catalog backends in multi-cloud deployments. With incubator overhead behind it, release velocity is expected to accelerate. Watch the dev list this week for a 1.4.0 milestone thread and vote timeline.</p><h2><strong>Apache Arrow</strong></h2><p>Jean-Baptiste Onofr&#233;&#8217;s thread proposing JDK 17 as the minimum version for Arrow Java 20.0.0 is approaching decision. <a href="https://amdatalakehouse.substack.com/p/apache-data-lakehouse-weekly-april">Contributors including Micah Kornfield and Antoine Pitrou have been weighing in</a>, and the practical rationale is compelling: setting JDK 17 as the floor would align Arrow&#8217;s Java modernization with Iceberg&#8217;s own upgrade timeline, effectively raising the minimum across the entire lakehouse stack in a single coordinated move. The decision is expected to land before the 20.0.0 release cycle formally opens.</p><p>The <a href="https://github.com/apache/arrow-rs">arrow-rs 58.2.0 release</a> was on track for April, following the 58.1.0 shipment in March, which arrived with no breaking API changes. The Rust implementation has become one of the most actively maintained segments of the Arrow ecosystem, with a release cadence that matches growing adoption in query engines that want Arrow&#8217;s columnar format without a JVM dependency.</p><p>Nic Crane&#8217;s thread on using LLMs for Arrow project maintenance continued to generate thoughtful discussion. The framing &#8212; AI as a resource for maintainers rather than just contributors &#8212; is distinct from how Iceberg and Polaris are approaching the same question. Arrow&#8217;s angle is practical: a lean maintainer group managing a growing issue backlog needs help triaging, and LLMs can do that work without introducing the code-provenance concerns that matter for contributions. Google Summer of Code 2026 student proposals arrived this week, with interest concentrated in compute kernels and language bindings for Go and Swift, adding bandwidth to a project that will need it as the 20.0.0 cycle opens.</p><h2><strong>Apache Parquet</strong></h2><p>The <a href="https://mail-archive.com/dev@parquet.apache.org/">ALP (Adaptive Lossless floating-Point) encoding specification</a> vote closed this week, marking one of the most meaningful additions to the Parquet specification in recent memory. ALP encodes floating-point exponents and mantissas separately, delivering significantly better compression ratios for float-heavy columns. The practical beneficiaries are ML feature stores and scientific computing workloads, where columns full of embedding coordinates and model outputs are common. Months of careful spec review paid off.</p><p>The Variant type that shipped in February has been generating follow-on integration discussion across engine teams. Spark, Trino, and Dremio contributors compared notes on their implementation experiences this week, working through edge cases in semi-structured data handling that the spec leaves partially open. Getting these implementations to converge matters: Parquet&#8217;s value as a cross-engine format depends on consistent behavior, and Variant is novel enough that divergence between engines would fragment the ecosystem.</p><p>The <a href="http://www.mail-archive.com/dev@parquet.apache.org/">File logical type proposal</a> &#8212; which would allow Parquet files to natively embed unstructured data including images, PDFs, and audio as columnar records &#8212; continued advancing through community discussion. Alongside Variant, this proposal signals a deliberate effort to evolve Parquet from a purely analytical format into a unified storage layer capable of managing the diverse data shapes that AI and ML pipelines produce. The direction is ambitious and the community engagement is substantive.</p><h2><strong>Cross-Project Themes</strong></h2><p>The post-summit week is when the conversations that happened in person translate back into the formal proposals and vote threads that actually change the projects. Across all four lists, expect the next two weeks to be among the most active of 2026 as in-person alignments hit the dev list in concrete form.</p><p>The second theme connecting all four projects is the deliberate expansion of format scope to meet AI workload demands. Parquet&#8217;s ALP acceptance, the File logical type proposal, Iceberg&#8217;s efficient column updates for wide ML tables, Polaris&#8217;s Ranger integration and federation work, and Arrow&#8217;s JDK 17 modernization are all responses to the same underlying pressure: the lakehouse stack is being asked to power AI pipelines, not just analytical queries. The pace of that evolution is accelerating, and the summit put the community&#8217;s roadmap on the same page.</p><h2><strong>Looking Ahead</strong></h2><p>Watch the Iceberg dev list for the V4 metadata optionality formal proposal, the single-file commits spec write-up, and a published AI contribution policy. The Polaris 1.4.0 milestone thread and vote timeline should also land this week. Arrow&#8217;s JDK 17 decision for Java 20.0.0 will likely follow close behind. The summit session recordings will appear on YouTube in the weeks ahead &#8212; an excellent resource for anyone who missed San Francisco.</p><div><hr></div><h2><strong>Resources &amp; Further Learning</strong></h2><p><strong>Get Started with Dremio</strong></p><ul><li><p><a href="https://www.dremio.com/get-started?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-04-15&amp;utm_content=alexmerced">Try Dremio Free</a> &#8212; Build your lakehouse on Iceberg with a free trial</p></li><li><p><a href="https://www.dremio.com/use-cases/lake-to-iceberg-lakehouse/?utm_source=ev_external_blog&amp;utm_medium=influencer&amp;utm_campaign=pag&amp;utm_term=apache-newsletter-2026-04-15&amp;utm_content=alexmerced">Build a Lakehouse with Iceberg, Parquet, Polaris &amp; Arrow</a> &#8212; Learn how Dremio brings the open lakehouse stack together</p></li></ul><p><strong>Free Downloads</strong></p><ul><li><p><a href="https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html">Apache Iceberg: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li><li><p><a href="https://hello.dremio.com/wp-apache-polaris-guide-reg.html">Apache Polaris: The Definitive Guide</a> &#8212; O&#8217;Reilly book, free download</p></li></ul><p><strong>Books by Alex Merced</strong></p><ul><li><p><a href="https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/ref=sr_1_5?crid=1304S78BQAP6U&amp;dib=eyJ2IjoiMSJ9.7Z17wXFJVWtv1gDIVF5-z5NwgT7B-vj9kEQuLkAKtLh00KncwXYc4bQ6hyydwcMHXbJOlFCSO7-2JmKTC5KCV-q2XEdeq7kBBmicVzI6tlDtqPqAgE6RHJE_XZ_n-zxxAjRHE2THP0J4DEgzDmiXrF9bdkEFyaruSUW28Ryx0zYyI_NuD5vZ4HYqQv3u5hzBVjjOlxyRYSTIsRSeVIoJC2XvjrXdNFvQ9jm4Kr1xFOw.yog4MgCdYecbJT0bAcGXNJJvZbvD4F_DP0lDbPA1xGI&amp;dib_tag=se&amp;keywords=alex+merced&amp;qid=1773236747&amp;sprefix=alex+mer%2Caps%2C570&amp;sr=8-5">Architecting an Apache Iceberg Lakehouse</a></p></li><li><p><a href="https://www.amazon.com/Enabling-Agentic-Analytics-Apache-Iceberg-ebook/dp/B0GQXT6W3N/ref=sr_1_7?crid=1304S78BQAP6U&amp;dib=eyJ2IjoiMSJ9.7Z17wXFJVWtv1gDIVF5-z5NwgT7B-vj9kEQuLkAKtLh00KncwXYc4bQ6hyydwcMHXbJOlFCSO7-2JmKTC5KCV-q2XEdeq7kBBmicVzI6tlDtqPqAgE6RHJE_XZ_n-zxxAjRHE2THP0J4DEgzDmiXrF9bdkEFyaruSUW28Ryx0zYyI_NuD5vZ4HYqQv3u5hzBVjjOlxyRYSTIsRSeVIoJC2XvjrXdNFvQ9jm4Kr1xFOw.yog4MgCdYecbJT0bAcGXNJJvZbvD4F_DP0lDbPA1xGI&amp;dib_tag=se&amp;keywords=alex+merced&amp;qid=1773236747&amp;sprefix=alex+mer%2Caps%2C570&amp;sr=8-7">Enabling Agentic Analytics with Apache Iceberg and Dremio</a></p></li><li><p><a href="https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands/dp/B0GQNY21TD/ref=sr_1_9?crid=1304S78BQAP6U&amp;dib=eyJ2IjoiMSJ9.7Z17wXFJVWtv1gDIVF5-z5NwgT7B-vj9kEQuLkAKtLh00KncwXYc4bQ6hyydwcMHXbJOlFCSO7-2JmKTC5KCV-q2XEdeq7kBBmicVzI6tlDtqPqAgE6RHJE_XZ_n-zxxAjRHE2THP0J4DEgzDmiXrF9bdkEFyaruSUW28Ryx0zYyI_NuD5vZ4HYqQv3u5hzBVjjOlxyRYSTIsRSeVIoJC2XvjrXdNFvQ9jm4Kr1xFOw.yog4MgCdYecbJT0bAcGXNJJvZbvD4F_DP0lDbPA1xGI&amp;dib_tag=se&amp;keywords=alex+merced&amp;qid=1773236747&amp;sprefix=alex+mer%2Caps%2C570&amp;sr=8-9">The 2026 Guide to Lakehouses, Apache Iceberg and Agentic AI</a></p></li><li><p><a href="https://www.amazon.com/Book-Using-Apache-Iceberg-Python/dp/B0GNZ454FF/ref=sr_1_16?crid=1304S78BQAP6U&amp;dib=eyJ2IjoiMSJ9.7Z17wXFJVWtv1gDIVF5-z5NwgT7B-vj9kEQuLkAKtLh00KncwXYc4bQ6hyydwcMHXbJOlFCSO7-2JmKTC5KCV-q2XEdeq7kBBmicVzI6tlDtqPqAgE6RHJE_XZ_n-zxxAjRHE2THP0J4DEgzDmiXrF9bdkEFyaruSUW28Ryx0zYyI_NuD5vZ4HYqQv3u5hzBVjjOlxyRYSTIsRSeVIoJC2XvjrXdNFvQ9jm4Kr1xFOw.yog4MgCdYecbJT0bAcGXNJJvZbvD4F_DP0lDbPA1xGI&amp;dib_tag=se&amp;keywords=alex+merced&amp;qid=1773236747&amp;sprefix=alex+mer%2Caps%2C570&amp;sr=8-16">The Book on Using Apache Iceberg with Python</a></p></li></ul>]]></content:encoded></item></channel></rss>