DEFINING DATA ARCHITECTURE - DATA VAULT AND MEDALLION
by Bill Inmon and Dan Linstedt
DEFINING A DATA ARCHITECTURE – DATA VAULT AND MEDALLION
By W H Inmon and Dan Linstedt
In the very earliest days of technology life was simple. You had some punch cards. You had a computer and COBOL. And you had some master files.
Then quickly – very quickly – life got to be more complex. There were disk files in addition to magnetic tape files. There were data bases in addition to master files. There were applications for everything under the sun. There was maintenance to programs. There were end users wanting to analyze the data. User requirements were stated one day and restated the next. Vendors kept coming up with new advancements in technology. IT was a whirlwind.
DATA ARCHITECTURE
Suddenly, last summer (apologies to Tennessee Williams) there arose a need for architecture for information technology. There simply was too much confusion – too much chaos – to operate without a larger picture of where things were and where they were going. Data architecture was one of those needs. Data was scattered all over the countryside. No longer did the end user need to see data, the end user discovered that it needed to see believable data. And seeing data was quite different than seeing believable data.
Developers, designers, management, end users – suddenly had a need to have a paradigm for the structure and composition of data to start to be able to cope with the mass of data, the diversity of data and the confusion that was data in the corporation. It no longer was acceptable to build a system that manages data. The systems of the future had to provide believable data.
And the focus turned from data inside applications to data across the enterprise. In order to manage the corporation, it was necessary look at data from an enterprise perspective.
DATA ARCHITECTURE
So what is data architecture? There are several essential components to data architecture.
These components of data architecture are (not in any intended order) –
1 SEPARATION OF THE ARCHITECTURE FROM THE TECHNICAL COMPONENTS THAT COMPRISE IT
An architecture is a recognizable pattern. An architecture can be implemented in many ways. The architecture is always recognizable however it is implemented.
When one looks at data architecture it is obvious that it is not an organization chart, a manuscript, or the blueprint to manufacture a car.
There often is confusion between data architecture and the implementation of data architecture. Many times people see the technology that is used to build the architecture and mistake the technology of the architecture for the architecture itself. Vendors love to confuse people with this differentiation. Technology vendors love to sell their technical product as “the” architecture. In doing so, they hoodwink their customers. The customer buys a product from the vendor, not an architecture. Vendors do not sell architecture, they sell products.
Unfortunately, most IT consumers do not spend any time understanding the distinction between a technology and an architecture. In bypassing this investigation, IT managers become easy prey for unscrupulous vendors.
2 AN ARCHITECTURE HAS RECOGNIZABLE AND DISTINCT CHARACTERISTICS
If you have even been in New Mexico you don’t doubt for a second where you are at. You are not in Kansas. You are not in New York City. You are not in Hollywood. You are definitively in New Mexico.
And how do you know that? All you have to do is look at the houses and buildings in New Mexico. They are architecturally different from any other place on earth. Recognizable and distinctive.
Data architecture has the same characteristics. When one looks at a data architecture, there is no mistaking the architecture. Data architecture has unique and defining characteristics
.3 THE COMPONENTS OF THE ARCHITECTURE ARE OBVIOUS AND RECOGNIZABLE
In New Mexico you look for several distinctive things when you want to determine the architectural heritage –
Ristras – red chile strands hung out to dry. Originally a very functional thing, now a tradition. The red chile need to dry and turn red before they are beaten in the red chile sauce that goes into many New Mexican dishes. Today, in addition to being functional, they have become a symbol for New Mexico
Adobe framed windows. In an adobe how you better know what windows you want when you build the house because you can’t change your mind after the house is built. Some of the adobe walls are 4 feet thick. Trying to relocate a window means you may have to tear down the whole house
Exposed beams. The exposed ceiling beams are designed to hold up the roof. Then the beams emerge on the outside of the house.
Vigas. These indoor, corner fireplaces came from the pueblos that dotted the landscape. They were efficient to make in an adobe confine and did a good job of heating the room.
And these are but some of the many components that define a New Mexico architecture.
Data architecture has a similar set of characteristics that can be defined by answering questions such as:
Is the data being used in transaction processing?
How much historical data is there?
Is the data needed for up-to-the-second accuracy?
Is the data being accessed and analyzed at the detailed level or at the summary level?
Is the lineage of the data important?
Can the data be updated or is it static?
What granularity does the data need to have?
There are many more data architecture characteristics that must be addressed.
4 A BLUEPRINT OF HOW EVERYTHING FITS TOGETHER
You don’t start to build anything of any complexity without a blueprint defining what the different components of the structure are and how the different parts are to work together.
An essential part of an architecture is a rendition of how the components of the structure are to be laid out. The right place to start in a data architecture is with the taxonomy/ontology. Understanding the business in a business context is paramount to everything that follows.
In data storage architecture the blueprint is the data model. A data model becomes the compass of the ship on the sea. Without a compass a ship on the ocean is in real trouble. The same is true with an organization trying to manage its data. The data model becomes the compass for finding one’s way through the data jungle.
To envision the value of the data model imagine this scenario: suppose you want to build a house. If what you want is a one-room log cabin in the mountains, you may need a very simple blueprint or no blueprint at all. But if you want to build a 60-story skyscraper in Manhattan, you must have a blueprint. When you build the skyscraper you must coordinate electricity, gas, water, the elevator shaft, and a thousand other items, each of which is important and requires coordination in the construction effort. Building a skyscraper is a complex and expensive task and requires a blueprint to succeed.
It is noted that there are different kinds of data models. A classical ERD based data model is used for classical structured data. An ontology/taxonomy is used for understanding structured, semi-structured, and textual data. A distillation algorithm is defined for analog data.
An essential part of defining the architecture is a statement of the scale and scope of the project. Are you building a bridge over a small canal? Or are you building the Golden Gate Bridge? Before construction begins, the parameters of the architecture and its intended application must be clearly understood, based on how the architecture is expected to be used.
Stated differently, the scale of the project greatly affects the way construction will be done.
Are you going to have children walking over the bridge? Are you going to have a 4-lane highway? A pedestrian lane? What is it that you are building and how will it be used?
In the final analysis a child’s bridge is just as much a bridge like the Golden Gate Bridge. They just happen to have very different scales and utility.
The scale of the data involved impacts the data architecture. An approach that works just fine for one volume of data may not be appropriate at all for another volume of data.
6 IDENTIFICATION OF THE BUSINESS VALUE
If a project does not fulfill some business value, it is questionable as to whether it should be built at all. Furthermore, the clearer the business value, the more focused the developer can be.
If the data architecture is to have long term viability, it must have some business value that is serviced by the architecture. Stated differently, if there is no business value behind an architecture, then the long-term viability of the architecture is compromised.
7 LONG TERM/SHORT TERM THREATS TO THE VIABILITY OF THE PROJECT
The architect must be aware of and plan for the long-term threats to the viability of the project. The builders of the Golden Gate Bridge had to plan for saltwater erosion. For the tides and currents that sweep through the bay. For the storms that blow in from the Pacific and in the unlikely event that the foundations of the bridge may be rammed by an errant ship.
The considerations of architecture mandate that threats – long term and short term – be identified and mitigated as part of the design of the project.
Similarly, there are many threats to data that must be accounted for by the architecture. Some of them are –
Data accuracy - Inaccurate data is misleading and can even be dangerous
Data completeness - If an end user is looking at 50% of the data and assumes that he/she is looking at 100%, all sorts of conclusions can be wrong
Up to the second accuracy - some data needs to be accurate up to the second (bank balances, for example). If data is not accurate up to the second incorrect conclusions can be drawn.
Data lineage - for the analyst to use data successfully, data lineage is required.
And the list of threats to data continues. This short list is just the tip of the iceberg
.8 WHERE DOES THE ARCHITECTURE BELONG?
In almost every case there will be places where a project should be built and places where a project should not be built. A Santa Fe architecture would never work in Seattle. The rain would quickly dissolve the Santa Fe adobe. The Golden Gate Bridge does not belong in Wyoming. There simply are no expanses of the ocean in Wyoming.
Trying to use a data mart to conduct online transaction processing is a mistake. Trying to use a spreadsheet to do bank teller processing is a mistake. Trying to use standard DBMS technology on textual data is a mistake.
9 THE TYPES OF MATERIAL BEING CONSIDERED
There are all sorts of bodies of water. When it comes to bridges and other waterfront structures, the type of water must become a consideration of the architect. There are streams. There are rivers. There are bays, and the open ocean. Bridges need to account for the type of water that they will span.
When it comes to data, there are many different forms of data. There is classical structured data, such as transaction-based data. There is data that is found in text, such as email or internet websites. There is data that is created and managed by a machine, such as a drone feedback or telemetry.
Each of the different forms of data requires its own treatment. For example, trying to use structured architectural approaches on textual data is an unmitigated failure. Or trying to treat analog data as if it were structured data is a big mistake.
The architect must account for the type of data being considered.
10 THE REALITY OF CONSTRUCTION
There are many considerations to the reality of project construction. Considerations include:
Cost - Will funds be available for the completion of the project?
Time - How long will it take to build the project?
Materials - Are adequate and affordable materials available?
Workers - Are construction crews available and adequate?
Logistics - Is the project to be built on the moon? Is it even possible to get the materials to the moon that are needed for the project’s construction?
PUBLIC DISCLOSURE
For the outcome of the project to be recognizable, the shape and characteristics of the architecture must be familiar to the public.
ARCHITECTURE IS ARCHITECTURE
The principles that define architecture are universal. They apply equally, regardless of whether the architecture is for data, a house, a writing instrument, or an airplane.
Architecture does not change by domain – architecture is architecture!
Data architecture is therefore governed by the same architectural principles as all other forms of architecture.
So how does data architecture meet reality?
A COMPARISON – MEDALLION VERSUS DATA VAULT
At this point, the phrase “Medallion vs Data Vault” requires correction before the discussion can proceed responsibly. The framing itself implies a choice between two comparable architectures, and that comparison is false based on the framing.
This is why “Medallion vs Data Vault” is the wrong question. This is not a disagreement over tooling, platforms, or implementation preferences. It is a category error. Medallion and Data Vault do not address the same architectural problem, do not operate at the same level of concern, and were never intended to be compared.
The reason this comparison keeps appearing is understandable. Both terms are used in modern analytics conversations, both are widely adopted, and both are associated with successful outcomes.
Proximity in conversation does not imply equivalence in function.
Mistaking classification patterns for architectural systems replaces clarity with convenience, and convenience ultimately produces risk.
What follows is neither a critique of Medallion nor a defense of Data Vault. It is an application of the architectural principles just established – applied consistently, without exception and without regard for popularity. Only by restoring precision to this comparison can the discussion move forward without perpetuating a false choice.
ARCHITECTURAL PRECISION, ACCOUNTABILITY, AND THE MEDALLION MISCONCEPTION
You are accountable for the language your organization uses. Architecture is not a stylistic choice, and it is not defined by popularity. It is a formal engineering discipline with definitions that exist precisely to prevent organizations from repeating the same structural mistakes under new names.
According to ISO/IEC/IEEE 42010, architecture is defined as the fundamental organization of a system, embodied in its components, their relationships to each other and to the environment, and the principles governing its design and evolution. This definition exists so that systems remain explainable, defensible, and survivable when change inevitably arrives.
If this definition feels inconvenient, it does not make it wrong - it explains why architecture was needed in the first place.
WHAT MEDALLION ACTUALLY IS – BY DATABRICKS’ OWN DEFINITION
Let’s be precise. Databricks describes Medallion as a data organization and refinement approach using Bronze, Silver, and Gold to indicate raw, refined, and curated data states. This framing is accurate, useful, and pragmatic. It communicates data readiness for consumption and aligns teams around shared expectations.
Databricks’ own documentation does not claim that Medallion defines enterprise integration strategy, historization, semantic governance, or survivability under change. It is a classification scheme that describes data condition, not an architectural blueprint.
When a classification scheme is promoted to architecture, it does not signal innovation, but rather a loss of architectural discipline.
Reference: https://www.databricks.com/glossary/medallion-architecture
WHY CALLING MEDALLION AN ARCHITECTURE IS A CATEGORY ERROR
Architecture governs structure. Classification labels state. These are not interchangeable.
A data classification scheme does not become an architecture through repetition, marketing, or widespread adoption. Architecture exists only when structure, relationships, and principles of evolution are explicitly defined and enforced.
If you are comfortable calling Medallion an architecture, you are also implicitly accepting the absence of an immutable system of record capable of preserving enterprise evidence over time. That acceptance has consequences whether you intend it or not.
ISO/IEC/IEEE Architecture Reference:
https://standards.ieee.org/standard/42010-2011.html
WHY MEDALLION AND DATA VAULT ARE NOT COMPARABLE
You should not be comparing Medallion and Data Vault as competing architectures. That comparison itself is evidence of conceptual drift.
Medallion classifies data by condition. Data Vault defines how enterprise facts, relationships, and history are structured, integrated, and preserved. One evaluates readiness; the other governs structure. One is descriptive; the other is foundational.
If your team cannot articulate this distinction clearly, that is not a training gap. It is a leadership gap - because whoever is leading the effort failed to define what is architecture and what is not.
WHAT DATA VAULT ACTUALLY IS
Data Vault is not a reporting layer, a BI technique, or a data model. It is a System of Information Management overlay that governs enterprise information above delivery layers, establishing how facts, relationships, and history are captured and preserved as evidence - regardless of tools, platforms, or analytical demands.
It operationalizes the classical data warehouse definition articulated by Bill Inmon:
“A data warehouse is a subject-oriented, integrated (by business key), time-variant and non-volatile collection of data in support of management’s decision-making process, and/or in support of auditability as a system-of-record.”
Data Vault exists so that meaning can evolve without destroying history and so that explanations remain possible long after assumptions change.
Calling Data Vault “just a data model” is not a harmless simplification. It is a declaration that enterprise evidence is optional.
HOW MEDALLION AND DATA VAULT COEXIST WHEN USED CORRECTLY
Bronze, Silver, and Gold classifications can exist anywhere: in a data lake, in a data warehouse, in a data fabric, or across multiple platforms. They describe data state, not architectural placement.
When used correctly, Medallion does not replace Data Vault; it complements it. Data Vault governs how enterprise information is structured, integrated, and preserved as evidence, while Medallion classifies that information for consumption and analytical use.
Confuse these two, and you do not accelerate delivery - you allow classification to masquerade as architecture, and failure simply arrives later and at greater cost.
THE CONSEQUENCES YOU WILL EVENTUALLY OWN
You will not be challenged when systems are stable. You will be challenged when numbers change, when AI-driven outcomes are questioned, and when regulators or boards ask why.
At that moment, explanations stop being narratives and become evidence. Architecture determines whether that evidence exists.
Architectures do not fail because they are challenged. They fail because they were never designed to be challenged.
ULTIMATELY…
You can use Medallion. You can benefit from it. You should - when it is applied for what it is designed to do.
But architecture does not adapt to convenience, and it does not tolerate category confusion. When classification is allowed to replace structure, nothing has been modernized. Control has been surrendered.
This is not innovation. It is repetition.
The industry has made this mistake before. The cost is never immediate, and that is why it is repeated. The reckoning arrives later - when scale increases, when change accelerates, when trust is questioned, and when accountability can no longer be deferred.
History is patient. It waits until your system is stressed. Then it asks a single question: did you understand the difference between categorization and architecture - or did you choose not to?
ABOUT THE AUTHORS
Bill Inmon is the father of data warehouse. Bill has sold over 1,500,000 books in his life and has been published in nine languages.
Dan Linstedt is the father of data vault. Dan is a published author and conducts conferences each year on data vault.
BOOKS YOU MAY ENJOY
BUILDING A SCALABLE DATA WAREHOUSE WITH DATA VAULT 2.0, by Dan Linstedt, Morgan Kaufman
SUPERCHARGE YOUR WAREHOUSE, by Dan Linstedt, Createspace Independent Publication Platform
DATA ARCHITECTURE – BUILDING THE FOUNDATION, by Dave Rapien and Bill Inmon, Technics Publications
STONE TO SILICON – A HISTORY OF TECHNOLOGY AND COMPUTERS, by Dr Roger Whatley and Bill Inmon, Technics Publications
TURNING TEXT INTO GOLD, by Ranjeet Srivastava and Bill Inmon, Technics Publications














Excellent distinction between classification and structure. Too many orgs treat layer patterns as substitute for true integration architecture. Seen this play out where teams adopt medallion labeling but then struggle when business keys change or source systems merge, because no underlying immutabile record exists. The category error point is crucial, naming conventions cant replace hitorical preservation logic.
"WHY CALLING MEDALLION AN ARCHITECTURE IS A CATEGORY ERROR
Architecture governs structure. Classification labels state. These are not interchangeable."
Amen!!!!!