THE ANATOMY OF AN ONTOLOGY
What the heck does an ontology look like? How is it structured?
THE ANATOMY OF AN ONTOLOGY
By W H Inmon and Jessica Talisman
Once there was only structured data. Transactions ran in banks and airlines reservations systems. Data bases kept track of inventory. The world of business was running just fine based on the transaction based, structured data that served the environment.
Then one day someone noticed that there was data that was not structured in the corporation. That “foreign” data was not being generated by the transactions of the corporation.
The “foreign” data that had been discovered was text. Text was everywhere in the corporation. On the Internet. On emails. In spreadsheets. In conversations. In print. Everywhere.
Furthermore, there was valuable data wrapped up in the form of text. The problem was that accessing, getting to the text, and making sense of the text and the data that it held was a challenge. Text was random – there is no order to text. People can say or write anything they want. There simply is no rhythm to creating and collection of text. Text requires context in order to be meaningful. And context is seldom obvious when wrapped in the form of text. But context was an absolute necessity when dealing with text. Text can come in many forms – voice, print, spreadsheets, email and so forth.
In a word, reading text and extracting meaningful data from text was and is a daunting task.
ONTOLOGY
To this end, there is a technology/discipline that greatly abets the challenges of extracting meaningful data from text. That technology/discipline is called an ontology
.So what in the world is an ontology? How could you tell an ontology from your Aunt Alice? From your cocker spaniel? From your favorite football team?
DEFINING CHARACTERISTICS
What are the defining characteristics and structures that are found in an ontology?
A simplified characterization of an ontology is that an ontology is a carefully vetted vocabulary designed to unravel and classify a body of raw text. The ontology contains what can be called a series or collections of taxonomies.
TAXONOMY
And what exactly is a taxonomy? A taxonomy is a vocabulary designed to classify something. The something that can be classified by a taxonomy can be anything – a tangible object, a concept, a discipline, a football team, a method of teaching swimming, a car, and so forth. Anything that can be classified can have a taxonomy built for it. And anything can be classified.
As such there are an infinite number of possibilities for the creation of a taxonomy.
The defining characteristic of a taxonomy is that each word in the taxonomy has a similar relationship to the object that is being classified as all of the other elements contained in the taxonomy. The relationship of each element found in the taxonomy may not be exactly the same as the relationship of every other element found in the taxonomy, but the relationship of the element to its classification is at least very similar.
SOME EXAMPLES
As some examples of taxonomies, consider the following –
A classification for a car can be a Porsche, Ford or Honda. The classification of a sport may be golf, football, or tennis. The classification for food might be a steak, potatoes, or tomatoes.
Each of the elements found in the classification that shapes the taxonomy has the same basic relationship to the classifying object as each other element in a taxonomy.
INTERRELATIONSHIPS
The elements of a taxonomy may or may not have a relationship to other elements in other taxonomies in the same ontology. For example, suppose there were three taxonomies for country, state and city. The states Texas, New Mexico and others all relate to the country USA. And El Paso is found in Texas, Santa Fe is found in New Mexico, and Denver is found in Colorado.
These are examples of a relationship between taxonomies.
However, notice that Atlanta, Miami, and Nashville do not relate to the state taxonomy because the states that those cities belong to are not in the state taxonomy.
And China, Russia, and Brazil have no cities in the city taxonomy.
The relationship between elements of a taxonomy to the elements in another taxonomy then are casual. The relationship between elements in a taxonomy may or may not exist.
And this is a perfectly normal state.
DEFINING A TAXONOMY
So what is not a taxonomy? Consider the vocabulary seen in the following figure. The vocabulary looks like a taxonomy. The heading is country and England, France and the USA are found in the vocabulary. But there are elements in the vocabulary that are decidedly not countries. There is Chablis, egg whites and Babe Ruth that are found in the vocabulary. It is unclear if there is any meaning at all to the grouping of the words found in the vocabulary that has been shown.
The vocabulary shown is definitely not a taxonomy.
There are then different types of collections of vocabularies. The following vocabulary is just a list of words.
ACRONYMS
Taxonomies can include more than just words. Taxonomies can include acronyms as well as words. Acronyms are widely used, especially in medicine, legal writings and military writings. However, in addition to those commonly found places, acronyms are used everywhere.
Taxonomies need to include acronyms as well as words in order to be able to make sense of the text that is being examined.
MISSPELLINGS AND SLANG
In addition, taxonomies can include common misspellings of words. When a misspelling is common, the misspelled version of the word can be interpreted as the properly spelled version of the word.
In addition, slang – ain’t, rizz, cheugy, sus, and others – can be placed in a taxonomy
.STEMMING
Another structure that can be placed in a taxonomy are word families. A word family is a form of what is termed “‘stemming”. A word stem is a word that can be stated in more than one form.
The use of stemmed words depends on the existence and characteristics of the processor that will do the transformation of raw text into a data base. Different processors have different properties and characteristics. In some case the processor does its own stemming. In other cases, the processor only uses the literal embodiment of the word as found in the taxonomy.
WORD CASE, LANGUAGES
And there are other considerations in the making of the taxonomy, such as the considerations of case of the words – upper case/lower case – different languages, and so forth.
Cross language interpretation of taxonomies can be very complicated. As words are translated into different languages, nuances can and often do occur. When language translation of a taxonomy is done, it is a really good idea to have a knowledgeable native speaker review how the translation has been done.
MULTIPLE LEVELS OF CLASSIFICATION
Taxonomies always have some one unifying category to draw the elements in the taxonomy together. However, taxonomies may have multiple levels of categories that are contained inside the taxonomy as well.
In the following example, sports have the subclassification of team sports and individual sports. Then underneath team sports are found football, basketball and baseball. And in the same manner, food is subclassified by meat, vegetable, and dessert.
THE FOCUS OF THE ONTOLOGY
In its most effective form, ontologies have a general focus. Typically, some ontologies may focus on –
Medicine
Banking
Airlines
Insurance
And so forth
.It is noteworthy that the taxonomies that reside in an ontology contain only taxonomies that relate to the focus. For example, you would expect an ontology that was for airlines to include such taxonomies for –
Reservation systems
Baggage handling
Frequent flyer programs
Flight scheduling
And so forth
Unless there were really unusual circumstances, you would not expect to find airlines needing taxonomies for –
Deep sea exploration
Building construction
Bulldog rodeo riding
And so forth.
The focus of the ontology has an influence on the contents of the taxonomy that is placed inside it.
For example, suppose the ontology was focused on the North American landscape. This would include mountain ranges, deserts, seashore, trees, rivers, and so forth.
Now suppose that the taxonomy for rivers was chosen to be part of the ontology. The ontology for rivers would contain only rivers found in North America. The taxonomy would not contain such rivers as the Amazon or the Nile, as they exist well outside of North America.
GENERIC TAXONOMIES
The taxonomies that reside inside the ontology may or may not have relationships to each other. In the example shown for a bank, the taxonomies for credit card, savings and loans have a relationship to each other. But the taxonomies for security and accounting exist independently.
Together the taxonomies collectively describe the banking environment. But portions of the ontology for the bank exist in a very loosely connected fashion.
Because there are independent disciplines that can exist inside an ontology, there are what can be termed “generic’ taxonomies. A generic taxonomy is a taxonomy that can be used in many places.
For example, the GAAP rules which constitute much of the accounting taxonomy are universal. The GAAP rules and their subsequent text that describes them are the same regardless of whether you are an airline, a bank, an insurance company, or a telecommunications company.
When building the ontology for banking, for example, the analyst can simply apply the accounting taxonomy that has been created for insurance, or airlines.
MAINTAINING THE ONTOLOGY
A final and important consideration for the building and usage of an ontology is the consideration of the maintenance of the ontology. The ontology and its taxonomies will change over time, as the business changes.
As those basic business changes occur, the changes will need to be reflected in the taxonomies that represent them.
The good news is that most businesses only change marginally over time. It would be very unusual for Ford Motor Company to declare that it was going into the making of ice cream. Therefore, the biggest challenge in building and using a taxonomy lies in the first iteration of the building of the taxonomy.
Once the first iteration is built, all subsequent updates to the ontology are marginal.
THE WORK BEHIND THE WORDS
The maintenance of an ontology, points to something deeper about what makes an ontology work in the first place. An ontology is not simply a collection of words arranged in categories. An ontology represents decisions—thousands of small, careful decisions about how concepts relate to one another, where boundaries should be drawn, and what matters enough to include.
Consider the banking ontology mentioned earlier. Someone had to decide whether “mortgage” belongs in the loans taxonomy or deserves its own taxonomy. Someone had to determine whether “refinancing” is a type of loan or an action performed on a loan. These are not trivial questions. The answers shape how the ontology will perform when it encounters raw text about banking.
This is why building an ontology is a human activity, not a mechanical one. The ontology builder must understand the words of a domain AND the conceptual architecture underneath those words—the way practitioners in that domain actually think about their work.
SCOPE AND GRANULARITY
Every ontology must answer two fundamental questions: How wide should we cast the net? And how fine should we cut the cloth?
The first question concerns scope. An ontology for medical text could, in theory, include every term ever used in medicine. But such an ontology would be unwieldy and expensive to maintain. More importantly, it would likely, perform poorly. An ontology designed to extract meaning from radiology reports has different requirements than one designed for clinical trial documentation or insurance claims processing. Clinical trials, radiology and insurance claims are subdomains of medicine, and have different characteristics that define each as unique.
The most effective ontologies are purpose-built. They serve a specific need within a specific context. The ontology builder must resist the temptation to capture everything and instead focus on capturing what matters for the task at hand.
The second question concerns granularity. This question pertains to both the taxonomy and ontology. How detailed should each taxonomy be? Should the taxonomy for “medical procedures” include broad categories like “surgery” and “diagnostic testing”? Or should it drill down to specific procedures like “laparoscopic cholecystectomy” and “computed tomography angiography”?
And how about granularity as it pertains to an ontology? How detailed and nuanced must properties and relationships be? It is very easy to paint oneself into a corner with descriptive hyper granularity. Is it necessary to say Is it necessary to say that laparoscopicCholecystectomy requiresInstrumentInsertionThroughIncision NotTheSameIncisionAsTheOneUsedForTheCamera
IsAdjacentToIt
InAMannerDeterminedBy
TheSurgeonsDominantHand
and some TrocarPort?
Or
Is it necessary to assert that laparoscopicCholecystectomy
isPerformedIn
ARoomWhereAtLeastOnePersonPresent
HasSecretlyWonderedWhether
TheyLeftTheStoveOn value true?
Determining the scope or coverage of a taxonomy and ontology are both decisions that must be analyzed and decided upon by humans, as ultimately, these decisions construct the profile of the organization, workflows and the things human workers care about and need, to be successful.
The answer to the granularity question depends entirely on what you intend to do with the extracted data. If you need to route documents to the correct department, broad categories may suffice. If you need to identify patterns in treatment outcomes, fine-grained distinctions become essential.
THE ROLE OF HIERARCHY
We have seen that taxonomies can contain multiple levels of classification—sports divided into team sports and individual sports, food divided into meats, vegetables, and desserts. These hierarchical structures exist as organizational vehicles while also carrying semantic weight.
A taxonomy category and its sub category has a relationship that asserts something meaningful. It says that football is a kind of team sport, that a steak is a kind of meat. This “is-a” relationship allows the taxonomy and its associated ontology to make inferences. If you know that someone ate a steak, you can infer that they ate meat. If you know that someone played basketball, you can infer that they participated in a team sport.
These inferences may seem obvious to a human reader, but they are precisely what makes an ontology useful for processing text at scale. The ontology encodes common-sense relationships that allow a system to understand that a document mentioning “the patient underwent a coronary bypass” is discussing cardiac surgery, even if the phrase “cardiac surgery” never appears.
SYNONYMS AND VARIANTS
Real text is messy. The same concept can be expressed in dozens of different ways. A document might refer to a “heart attack,” a “myocardial infarction,” an “MI,” or simply state that “the patient’s heart stopped receiving adequate blood flow.”
A well-constructed ontology accounts for this variation. Each concept in the taxonomy can carry with it a collection of synonyms, abbreviations, and alternative phrasings—all mapped to a single canonical term. When the ontology encounters any of these variants in raw text, it recognizes them as the same underlying concept.
This normalization is one of the most valuable functions an ontology performs. It transforms the chaos of natural language—where the same thing can be said in countless ways—into consistent, structured data that can be analyzed and compared.
RELATIONSHIPS BEYOND HIERARCHY
Not all relationships between concepts fit neatly into a hierarchical structure. Some relationships run sideways rather than up and down.
Consider the relationship between a disease and its treatment. Diabetes is not a “kind of” insulin therapy, nor is insulin therapy a “kind of” diabetes. Yet these two concepts are meaningfully connected. An ontology can capture this by defining relationship types beyond the simple parent-child hierarchy. This is where ontologies do their work.
A disease “is treated by” a therapy. A drug “interacts with” another drug. A symptom “indicates” a condition. A procedure “requires” equipment. These associative relationships allow the ontology to represent the rich web of connections that characterize any complex domain. These are not taxonomy relationships — we model these descriptive relationships using ontologies.
When such relationships are explicitly defined, the ontology becomes capable of answering questions that go beyond simple classification. It can tell you a document mentions diabetes, and that the document discusses a treatment approach for diabetes using a particular class of medications.
THE LIVING DOCUMENT
An ontology is never truly finished. Language evolves. Domains change. New concepts emerge while old ones fade from use.
In medicine, new diseases are identified, new treatments are developed, and new terminology enters the vocabulary of practitioners. An ontology that was comprehensive five years ago may now be missing critical concepts. An ontology that was carefully balanced may now be weighted toward obsolete distinctions.
This is why the organizations that maintain major ontologies treat them as living documents, subject to continuous review and revision. New terms and concepts are proposed, evaluated, and either incorporated or rejected. Existing terms and concepts are periodically assessed for continued relevance. Relationships are refined as understanding of the domain deepens.
For an organization building its own ontology, this ongoing maintenance is a core commitment. The ontology must have a steward, someone responsible for keeping it aligned with the reality it represents. And a business must invest in these knowledge assets and cherish them for the insights and value they deliver.
THE BRIDGE BETWEEN WORLDS
We began by observing that text presents a fundamental challenge: it is unstructured, context-dependent, and infinitely variable. The ontology serves as a bridge between this world of raw text and the world of structured data where analysis and computation become possible.
But the ontology does something more than simply extract data points from text. It imposes a conceptual framework—a way of seeing and organizing the domain. When you build an ontology for banking, you are articulating a model of how banking works, what entities matter, and how those entities relate to one another.
This is the power and the responsibility of ontology work. The ontology shapes what can be seen and what remains invisible. Concepts that exist in the ontology can be found, counted and analyzed. Concepts that are absent from the ontology will pass through unrecognized, no matter how prominently they appear in the text.
The ontologist is not a technician cataloging words . The ontology builder is making choices about what matters—choices that will ripple through every analysis that depends on the ontology upstream, downstream and in between.
ABOUT THE AUTHORS
W.H. Inmon is widely recognized as the father of the data warehouse and has spent decades helping organizations structure and leverage their data assets for analytics and decision-making. Bill has sold over 1,500,000 books in his life
Jessica Talisman is a semantic infrastructure consultant and knowledge organization expert specializing in ontology development, controlled vocabularies, and enterprise semantic architectures that bridge library science principles with modern AI and knowledge management systems. Jessica created the Ontology Pipeline, a framework for building semantic knowledge infrastructures and ontologies.
Some of Bill’s latest books include –
DATA ARCHITECTURE – BUILDING THE FOUNDATION, with Dave Rapien, Technics Publications.
STONE TO SILICON – A HISTORY OF TECHNOLOGY AND COMPUTERS, with Dr Roger Whatley, Technics Publications.
Articles and presentations by Jessica can be found on Linkedin, Substack and Youtube.



















Appreciated the way you separate taxonomy work (controlled vocab, hierarchy, synonyms) from ontology work (relationship types that let you reason across concepts). The two failure modes I see in the field line up with your scope and granularity section: (1) trying to model the whole domain up front, and (2) over-specifying relationships that no downstream workflow actually uses. Both end with an ontology that is expensive to maintain and quietly ignored.
A pragmatic pattern that holds up: start from a concrete use case and real text, build a thin set of canonical concepts with strong synonym/acronym coverage, then add relationships only when they drive a decision (routing, retrieval, analytics). And treat change management as first-class: versioning, review, and feedback from how practitioners actually write. That is what keeps the ontology a living asset, not a one-off taxonomy dump.
Thanks for writing this, it realy clarifies a lot. It highlights how crucial and difficult making sense of 'raw' text data remains, even with modern NLP and AI models.