Data

Open Semantic Interchange: Can the Industry Finally Agree on What "Revenue" Means?

Vincent VIKOR Wed February 11, 2026

The question is no longer whether we need semantic interoperability — but how we make it work in practice.

The Interchange Problem Is Not New

In the late 1990s, the Data Mining Group (DMG) created PMML — Predictive Model Markup Language — an XML-based standard for representing and exchanging predictive models between analytics platforms. Regression models, neural networks, decision trees — PMML let you export from one tool and import into another. Tools like SAS, SPSS, and early AutoML platforms adopted it successfully in specific niches.

But PMML never achieved universal adoption. The standard was complex, vendor neutrality proved difficult to maintain, and implementations remained fragmented across platforms. It succeeded where the scope was narrow — portable model definitions between compatible tools — and struggled everywhere the scope was broad.

Around the same time, the Semantic Web initiative promised something far more ambitious. Tim Berners-Lee's vision, formalized through W3C standards like RDF, OWL, and SPARQL, aimed to make the entire web machine-readable — data carrying meaning, not just structure. In 2009, at a meetup in San Francisco where we were demonstrating KXEN's AutoML capabilities, I asked researchers what would follow Web 2.0. The answer was unanimous: the Semantic Web — Web 3.0¹.

That grand vision never materialized as planned. Adding semantic metadata was too tedious, adoption incentives were too weak, and the approach was too academic for the web's organic growth. Search engines solved many of the same problems pragmatically through statistical methods and knowledge graphs. But the core insight was right: without shared semantics, machines cannot meaningfully interpret data across systems.

Fast-forward to September 2025: Snowflake convenes the industry around the Open Semantic Interchange (OSI), an open standard for sharing semantic metadata across data platforms. Same fundamental promise, different era. This time, there's a forcing function that PMML and the Semantic Web never had: AI needs semantic consistency to work.

Why This Matters Now: The Semantic Layer as AI Foundation

Organizations today run dozens of analytics tools — BI platforms, data warehouses, AI agents, reporting systems — each with its own interpretation of business concepts. "Revenue" means one thing to Marketing, another to Finance, a third to Sales. Dashboards show conflicting numbers in board meetings. Nobody is wrong, but nobody agrees.

A semantic layer solves this by creating a single governed layer where business metrics are defined once in code and consumed consistently everywhere. Define "revenue" once — including its calculation logic, dimensional relationships, time grain, and access controls — and every tool, every analyst, every AI agent uses the same definition. Pre-aggregation caching delivers sub-second performance. Row-level and column-level security enforce governance at the infrastructure level rather than hoping each tool implements it correctly.

Leadership must trust the data to make confident, unified decisions. Universal semantic definitions address this directly — they don't just improve consistency, they prevent the business comprehension gaps that stall progress and erode decision-making.

The arrival of AI makes this exponentially worse. Traditional dashboards present conflicting numbers passively — someone eventually notices the discrepancy. AI agents present conflicting numbers *confidently*, at scale, with explanations that sound authoritative. As Gartner analyst Rita Sallam noted in their 2025 predictions, organizations prioritizing semantic modeling will increase AI accuracy by 80% and reduce costs by 60%. Without semantic grounding, you don't get intelligence — you get confident confusion, automated.

This is the context in which OSI emerges — not as an academic exercise, but as an industry response to a problem that AI has made urgent.

OSI: What It Is, Who's Behind It, and Where It Stands

The Specification

OSI uses a declarative YAML format developed in collaboration with dbt Labs' MetricFlow framework. The spec defines semantic models as top-level containers that include datasets, relationships, measures, dimensions, and — crucially for the AI era — contextual metadata.

SBI Group - Define semantic layer once in YAML Figure 1: OSI flow - Define semantic layer once in YAML, Consume consistently across tools and AI agents

Datasets represent logical business entities (fact and dimension tables), with fields, primary keys, and source mappings. Measures define quantitative calculations (sums, averages, ratios) that can span multiple datasets. Dimensions are categorical attributes for slicing data. Relationships specify join logic and cardinality between datasets.

What sets OSI apart from previous standardization efforts is the explicit `ai_context` field — a dedicated section in the YAML where you provide natural language instructions for AI agents consuming the model. This is where you tell an AI agent: "Use this semantic model for retail analytics. It supports time-based analysis, customer segmentation, and product performance." This isn't decorative metadata — it's the bridge between structured definitions and the natural language understanding that LLMs need to query data accurately.

The promise: define once, use everywhere — across BI tools, AI agents, and analytics platforms. Licensed under Apache 2.0, vendor-neutral by design.

A simple semantic model looks like this:

semantic_model:
- name: retail_model
description: Retail semantic model for sales analytics
ai_context:
instructions: "Use this model for retail analytics.
Supports time-based analysis and customer segmentation."
datasets:
- name: store_sales
source: schema.store_sales
primary_key: [item_id, ticket_number]
measures:
- name: order_total
expression: SUM(store_sales.sale_price)
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day

Important caveat: while OSI's specification defines fields like ai_context, dialect, and synonyms, these are not yet supported in MetricFlow. When I tested them in my stack, dbt returned a parsing error: "Additional properties are not allowed ('ai_context', 'dialect', 'synonyms' were unexpected)." The OSI spec is ahead of the tooling — the vision is defined, but implementations haven't caught up yet.

Timeline

Snowflake deserves significant credit for convening the ecosystem around this initiative. The journey from concept to specification moved fast:

February 2023 — dbt Labs acquires Transform, bringing MetricFlow into the dbt ecosystem.
September 23, 2025 — Snowflake launches OSI with founding partners: Salesforce (Tableau), dbt Labs, BlackRock, Alation, Atlan, Cube, Hex, Honeydew, Mistral AI, Omni, RelationalAI, Select Star, Sigma, ThoughtSpot, and others.
October 14, 2025 — dbt Labs open-sources MetricFlow under Apache 2.0 at Coalesce 2025. This licensing shift matters: MetricFlow had previously been under AGPL and then BSL (Business Source License), both of which restrict commercial use and derivative works. Moving to Apache 2.0 means any vendor can build on MetricFlow without legal constraints — a genuine commitment to OSI being community-owned, not dbt-controlled.
October 17, 2025 — First working group session at Snowflake's offices in Menlo Park.
November 2025 — Starburst joins.
December 2025 — Collibra, DataHub, and Strategy (formerly MicroStrategy) join.
January 27, 2026 — OSI v1.0 specification released on GitHub. New members include Databricks, AtScale, Qlik, JetBrains, Lightdash, Coalesce, and Credible.
February 3, 2026 — Collate joins.

The January expansion tells a story of growing industry conviction. Databricks, initially absent, joined alongside the spec release. AtScale's path is particularly instructive — they initially took a distant stance, promoting their own Semantic Modeling Language (SML) as a more comprehensive approach. By January, they joined the initiative. This kind of critical engagement followed by participation signals maturity: vendors joining not out of hype, but after evaluating the specification on its merits and recognizing the value of contributing from inside rather than competing from outside.

What's Ahead

The OSI roadmap anticipates three phases. We're currently in Phase 1 (Q4 2025–Q1 2026): specification finalization, reference implementations, and community governance establishment — largely completed with the January 2026 spec release. Phase 2 (Q2–Q4 2026) targets broader adoption with native support in 50+ platforms, domain-specific extensions, and pilot programs with early adopters. Phase 3 (2027 and beyond) envisions OSI as a de facto industry standard with potential international recognition and a marketplace for shared semantic models.

Notable Absences

Microsoft remains conspicuously absent. Given Power BI's dominant market position — Gartner's Magic Quadrant leader — their non-participation creates a meaningful gap. SAP, IBM, and Oracle — all with deep semantic layer heritage in their BI platforms — are also missing. Among major AI players, only Mistral AI participates; OpenAI, Anthropic, and Google Gemini are not involved.

From Spec to Practice: Testing OSI in My Modern Data Stack

Reading specifications is useful. Running them is better. In this open-source modern data stack project (Trino, dbt, Cube.js, Metabase — all orchestrated via Docker Compose), I had defined metrics in Cube.js — revenue calculations, order counts, dimensional breakdowns. Adding MetricFlow meant defining those same business concepts a second time, in a second syntax. OSI promises to eliminate exactly this kind of duplication.

Integrating dbt MetricFlow required specific infrastructure. A time spine table is mandatory — MetricFlow needs a continuous date table to anchor temporal calculations. Entity-prefixed dimensions follow a distinct naming convention (`order_id__customer_name` rather than just `customer_name`). The YAML syntax is more verbose than Cube.js's JavaScript definitions, but it's declarative and version-controllable.

Here's what I defined in MetricFlow for my orders semantic model:

semantic_models:
- name: orders
defaults:
agg_time_dimension: order_date
model: ref('fct_orders')
entities:
- name: order_id
type: primary
measures:
- name: order_total
agg: sum
expr: revenue
- name: order_count
agg: count
expr: order_id
- name: total_revenue
description: "Sum of order amounts"
agg: sum
expr: revenue
agg_time_dimension: order_date
dimensions:
- name: order_date
type: time
expr: order_date
type_params:
time_granularity: day

And validated it against the same metrics served through Cube.js. Total revenue: $12,629.50 in both tools.

SBI Group - Query Metrics

SBI Group - Cube.js playground results are identical

Figure 3: Cube.js playground results are identical

The semantic definitions are consistent. The math checks out. OSI would, in theory, let me define this once and have both tools consume the same model. But here's where my test ended — and this is expected, not a criticism. The v1.0 specification was released on January 27, 2026. Phase 2 adoption and native platform support are roadmapped for Q2–Q4 2026. No vendor has shipped import tooling yet because the specification just became available. The foundation is laid; the interchange tooling comes next.

The Road to Success: Where OSI Must Evolve

The 80/20 Problem

The simple metrics in my demo stack — sums, counts, averages broken down by time and category — work naturally in YAML. They represent perhaps 80% of typical analytics. But the remaining 20% is where enterprises live, and where OSI has significant ground to cover.

Consider Stock Coverage — a metric I implemented for a major company that answers a deceptively simple question: how many months of forecast demand can current inventory cover?

The calculation required joining three separate data domains: inventory from Supply Chain, sales forecasts from Financial Consolidation, and product attributes from a master catalog. It used window functions with variable offsets across a 24-month horizon — computing cumulative forecasts period-by-period, then comparing stock against those thresholds through heavy conditional logic. Some dimensions applied only to inventory (not forecasts), requiring precise per-measure dimensionality control. Attributes classified distribution centers ("Corporate Stock" vs. "Local Stock"), security filters restricted visibility by geographic zone, and missing forecast periods risked silent distortions. Crucially, this metric had to work interactively on dashboards — supporting ad-hoc filters, drill-downs, dynamic time slicing.

The final metric relied on cascading conditionals over the horizon, falling through to an average-based fallback when stock exceeded the full forecast period. Expressing that fully and portably in current YAML-based OSI is challenging.

This isn't contrived — it's routine in enterprise BI. Window functions, conditional aggregation, cross-domain calculations with varying dimensional semantics, dynamic horizon logic. The OSI spec handles the foundation well, but GitHub community discussions (particularly #29 and #19) already highlight critical gaps: no dataset-level measures, no dimension hierarchies, string-based metric expressions that limit cross-platform compatibility, no distinction between additive and non-additive metrics, and limited lineage tracking for derived calculations.

The Semantic Layer Gap in Today's Market

There's a structural challenge that OSI must navigate — and it starts with the most widely deployed BI platform. Microsoft Power BI, a Leader in Gartner's 2025 Magic Quadrant for Analytics and BI Platforms, has a semantic layer. But business logic — DAX measures, calculations, relationships — lives inside individual semantic models, and that logic isn't modularly shareable across them. In enterprise deployments, the common pattern of one dataset per report creates version control issues, duplicate calculations, inconsistent KPIs, and undermines data trust, as documented in analyses of large-scale implementations.

Beyond these internal challenges, there's the cross-platform reality. As SQLBI notes, Power BI defines business logic through DAX — a proprietary language specific to the Microsoft ecosystem. Gartner's 2025 Magic Quadrant flags Power BI as "Limited to the Azure Stack," and independent analyst Aurimas Račas documented that while Power BI functions as a semantic layer internally, it cannot serve as a universal one for other BI tools — its definitions don't travel outside the Microsoft ecosystem.

This isn't a Power BI problem — it's an industry-wide pattern. Most BI platforms evolved their semantic capabilities as embedded features, not as interoperable infrastructure. In large organizations running multiple platforms, this creates duplication, inconsistency, and technical debt. Standalone semantic layer tools like AtScale, Cube, Strategy's Mosaic and others provide solutions for organizations that need a governed layer on top of or alongside platforms that lack universal portability. OSI's success depends partly on whether it can bridge these heterogeneous environments — organizations running three or four BI tools, each with different semantic capabilities, needing a common interchange format precisely because their tools don't agree today.

Learning from the Past

Every standardization effort teaches the same lesson: syntax is the easy part, adoption is everything. PMML succeeded in a narrow niche but never achieved universal adoption — limited by complexity and fragmented implementations. The Semantic Web produced foundational technology (knowledge graphs power Google Search today) but failed as a universal standard because the adoption cost exceeded the perceived benefit for most participants.

OSI has advantages its predecessors lacked: a powerful market forcing function (AI demands semantic consistency), backing from major vendors with commercial incentives, and an existing implementation (MetricFlow) rather than just a specification. But it also faces familiar risks. Community discussions raise valid concerns: the risk of a lowest-common-denominator standard that handles only simple cases, and the political challenge of getting competing vendors to genuinely invest in interoperability rather than pay lip service to an initiative.

The AI Wildcard

There's an interesting counterargument: maybe AI makes manual semantic standards less critical. If LLMs can interpret data context on the fly, do we need humans to write YAML? Perhaps the future involves an orchestration layer — an agent grounded in organizational knowledge (via RAG) that works alongside an OSI-aware agent for structural interoperability. Human-curated semantics and AI-interpreted context complementing each other.

But that's aspirational. Today, AI without semantic grounding hallucinates. The standard matters precisely because AI isn't smart enough yet to infer business context reliably.

Where Do We Go From Here?

OSI is the most credible attempt at semantic interoperability the data industry has produced. Snowflake's initiative in convening this coalition — and the speed at which the ecosystem has grown — signals genuine industry appetite for solving semantic fragmentation. The timing is right, the vendor coalition is broad (and broadening), and the forcing function — AI's need for semantic consistency — is real.

The v1.0 specification marks the transition from announcement to substance. Phase 2 adoption through 2026, with native support expected across 50+ platforms, will be the real test. By 2027, the initiative aims for de facto standard status. That's ambitious but plausible given the momentum.

My position is cautiously optimistic, grounded in pattern recognition. PMML promised model interchange and delivered it partially — lasting value, narrower scope than originally envisioned. The Semantic Web promised machine-readable meaning and produced knowledge graphs — transformative technology, different form than planned. Both created real impact, just not exactly as promised.

OSI will likely follow a similar trajectory: tremendous value in establishing a common vocabulary, real interoperability for the 80% of metrics that are straightforward, and continued platform-specific solutions for the complex 20%. The question is whether the industry sustains commitment through the inevitable trough that follows every standards initiative — and whether the Phase 2 tooling delivers on the Phase 1 specification.

For practitioners, the actionable takeaway is clear: learn dbt MetricFlow now. It's the implementation backbone of OSI. Whether the standard achieves full industry adoption or not, MetricFlow competency translates directly to semantic layer skills that every modern data platform will value. My modern data stack repository includes a working MetricFlow implementation you can run locally — it's a practical starting point.

I'm watching closely for the first vendor to ship genuine OSI import tooling during Phase 2 this year. That will be the signal that moves this from standard to practice. Until then, I'll keep testing, documenting, and building — because the semantic layer, regardless of which standard wins, is the foundation AI needs to actually work.

OSI lays the groundwork for semantic interoperability: define metrics once, reuse them everywhere (BI and AI). The next challenge is tooling and platform adoption throughout 2026. In the meantime, building semantic-layer skills—especially with MetricFlow—is a practical step to improve KPI consistency and AI reliability.

Sources:

PMML, DMG official archive https://dmg.org/pmml/pmml-v4-4.html
Semantic Web : https://en.wikipedia.org/wiki/Semantic_Web
OSI Official Site: https://open-semantic-interchange.org/
OSI GitHub Repository: https://github.com/open-semantic-interchange/OSI
Snowflake OSI Announcement: https://www.snowflake.com/blog/open-semantic-interchange/
Snowflake OSI Spec Finalized: https://www.snowflake.com/en/blog/open-semantic-interchanges-specs-finalized/
dbt Labs MetricFlow Open Source: https://www.getdbt.com/blog/metricflow-open-source
Gartner Magic Quadrant for Analytics and BI Platforms (June 2025): https://www.gartner.com/en/documents/6234651
Semantic Layer among top trends: https://www.techtarget.com/searchbusinessanalytics/feature/Agents-semantic-layers-among-top-data-analytics-trends
SQLBI — Power BI is a Model-Based Tool: https://www.sqlbi.com/articles/power-bi-is-a-model-based-tool/
Aurimas Račas — Metrics Layers and Power BI: https://aurimas.eu/blog/2022/08/metrics-layers-and-power-bi/
AtScale Joins OSI: https://www.atscale.com/blog/atscale-joins-osi-open-semantic-infrastructure/
Starburst Joins OSI: https://www.starburst.io/press-releases/starburst-teams-up-with-snowflake-and-industry-leaders/
Qlik Joins OSI: https://www.qlik.com/blog/qlik-joins-snowflake-led-open-semantic-interchange-to-bring-consistent
ThoughtSpot on OSI: https://www.thoughtspot.com/blog/the-agentic-semantic-layer-and-OSI
Strategy Joins OSI: https://www.businesswire.com/news/home/20251202153508/en/Strategy-Joins-Open-Semantic-Interchange
Collate joins OSI : https://www.getcollate.io/blog/collate-joins-snowflake-and-industry-leaders-to-advance-open-semantic-interchange-osi-for-data-and-ai-interoperability
Modern Data Stack Repository: https://github.com/vincevv017/modern-data-stack

Footnotes

¹ : Originally Semantic Web; now commonly blockchain/decentralized ownership Web3.

Partagez cet article

Nos Actus

13 août 2024

Unveiling the Power of MicroStrategy's Semantic Graph

In the realm of business intelligence and data analytics, understanding and utilizing data efficiently is paramount. One of the key tools that...

3 octobre 2025

Snowflake Innovation Partner of the Year!

SBI has been named Innovation Partner of the Year for Southeast Asia by Snowflake, recognizing our leading role in delivering innovative, AI-powered...

13 mai 2025

SBI to Attend Snowflake Summit in San Francisco

A Snowflake Premier Service Partner, SBI attends the Snowflake Summit in San Francisco, the global must-attend event dedicated to the world of Data,...