Chapter 5: Standard Ontologies in the Wild
Chapter Introduction
Almost every domain you will ever model has a published ontology — an industry-standard, maintained by a standards body, documented, free or paid, frequently updated. The first reflex of any serious modeller is to ask: “is there already an ontology for this domain, and what would I lose by adopting it?” Inventing your own ontology when a published one exists is one of the most expensive mistakes a team can make. You will spend years rebuilding what experts have already debated, and you will permanently isolate your data from every external partner, vendor, and regulator who speaks the standard.
This chapter is a tour of the four published ontology families that matter most for the program’s domain focus: finance / quantitative trading (FIBO, GICS, NAICS, ISIC), healthcare (FHIR, SNOMED CT, ICD-10, RxNorm, LOINC, CPT), macroeconomics (SDMX, COFOG, BPM6, SNA), and energy (IEC CIM, NAESB, NERC, OPC UA). For each, the chapter answers four practical questions:
- What does the ontology cover, and what does it deliberately not cover?
- How is it structured, and what does its native serialisation look like?
- How do you adopt it without paying the cost of the parts you don’t need?
- When is extending the ontology the right move, and when is wrapping it (creating your own ontology that points to the standard) better?
A note on register: most published ontologies are dense, occasionally bureaucratic, and never intended to be read end-to-end. Practitioners use them as reference dictionaries — look up exactly what you need, ignore the rest. This chapter teaches that look-up discipline rather than reproducing the standards in full; the official documentation is the source of truth.
Table of Contents
- The Adoption Decision — Build, Extend, or Wrap
- FIBO and the Financial-Industry Standard Stack
- GICS / NAICS / ISIC — Industry Classification
- FHIR, SNOMED CT, ICD-10, RxNorm — the Healthcare Stack
- SDMX, COFOG, BPM6, SNA — Macroeconomic Standards
- IEC CIM, NAESB, NERC — the Energy Stack
- Multi-Standard Cross-Walks
The Adoption Decision — Build, Extend, or Wrap
Three patterns for incorporating a published ontology, ordered from least to most invasive:
Wrap — keep your internal ontology, link each of your object types to the corresponding concept in the public ontology by ID. Your code, queries, and APIs use your own names; the link is used for external interchange. This is the default for most enterprises. Cost is low; you can adopt selectively.
Extend — adopt the public ontology as a parent and add your own subclasses, properties, or links for what the standard doesn’t cover. Used when you operate at the edge of the standard (e.g., insurance: ACORD plus your own product-line extensions; healthcare: FHIR plus your local extensions).
Build — reinvent. Justified only when no public ontology exists or every published one is hopelessly mismatched to your domain. Rare in mature industries.
The decision matrix:
| You have… | Use pattern… |
|---|---|
| External partners / regulators / vendors who speak the standard | Wrap at minimum |
| Need to publish data publicly | Extend (compatible with the standard) |
| A genuinely novel domain (cutting-edge research, brand-new market) | Build, but plan to align with whatever standard emerges |
| Tight engineering budget, narrow use case | Wrap the slice you need, ignore the rest |
FIBO and the Financial-Industry Standard Stack
The Financial Industry Business Ontology (FIBO) is the OMG / EDM Council’s formal ontology for financial instruments, legal entities, transactions, and markets. Published in OWL; over 17,000 classes spread across modules (Business Entities, Loans, Securities, Derivatives, Indices, Corporate Actions, Markets, Statistics).
Real-world adopters: JP Morgan Chase, Wells Fargo, Citi, ICE, Bloomberg’s data taxonomy, the European Central Bank, the Office of the Comptroller of the Currency, the European Securities and Markets Authority (ESMA), the Securities and Exchange Commission. FIBO is the closest thing the financial industry has to a Rosetta stone.
Three core FIBO modules a working quant team interacts with most:
- FIBO Business Entities (FBC) — legal entities, corporate hierarchies, controlling interests. Anchored on the LEI from Chapter 4.
- FIBO Securities (SEC) — equity, debt, derivatives, structured products. Each instrument has typed properties (issuer, maturity, coupon, currency, exchange).
- FIBO Indices and Indicators (IND) — index methodology, constituents, weighting rules.
The FIBO advantage: when your fund’s internal Security object inherits from fibo:Security, you can publish a regulatory filing using the standard property names with no translation layer. When the regulator updates a definition, you can adopt the new version with a versioned schema change rather than a full rewrite.
The FIBO disadvantage: it is large. A typical quant team only uses a few hundred FIBO classes; the other ~16,500 are noise. Wrap pattern is almost always the right choice — use the small slice that matches your products, link by URI to the rest, ignore.
LEI and the GLEIF reference data
The Legal Entity Identifier (LEI) is the practical anchor for FIBO Business Entities. The Global LEI Foundation (GLEIF) maintains a free, machine-readable registry of every LEI-issued entity in the world (~2.5 million as of 2026), with corporate-hierarchy (“level 2”) data showing parent–subsidiary relationships. Downloading the GLEIF concatenated file once a week is a routine ingest at every major bank’s reference-data team.
GICS / NAICS / ISIC — Industry Classification
Three coexisting industry-classification systems that an analyst must distinguish:
- GICS (Global Industry Classification Standard) — MSCI/S&P joint standard. 11 sectors → 25 industry groups → 74 industries → 163 sub-industries. The default for equity portfolio construction at every major asset manager.
- NAICS (North American Industry Classification System) — US, Canada, and Mexico. 20 sectors → ~1,000 industries. Statistical-agency-grade; used by every US government data source.
- ISIC (International Standard Industrial Classification) — UN-maintained. 21 sections → ~700 sub-classes. The global statistical standard.
These three do not align cleanly with each other. A “Software” company is in GICS sector 45 Information Technology, NAICS code 5112, ISIC class 5820 — three different identifiers, three different hierarchies, occasionally three different industry-level peer groups. The cross-walk is essential and is maintained by Eurostat, the US Census Bureau, and several commercial vendors.
The practical rule: pick one classification as your internal default (almost always GICS for an asset manager, NAICS for US-domestic work, ISIC for cross-border), and store the cross-walks to the others. Treat each classification as an external ontology that you wrap, not extend.
FHIR, SNOMED CT, ICD-10, RxNorm — the Healthcare Stack
Healthcare has the most-developed, most-formally-maintained published ontology stack of any industry. Five standards a working healthcare analyst will encounter on day one:
- HL7 FHIR (Fast Healthcare Interoperability Resources) — the data-exchange standard. Defines ~150 resources (Patient, Encounter, Observation, Condition, MedicationRequest, Procedure, Practitioner, Organization, Claim, Coverage, …) with strongly-typed JSON/XML serialisations. Mandated by US federal regulation (CMS Final Rule, ONC 21st Century Cures Act). Every modern EHR exposes a FHIR API.
- SNOMED CT — the clinical-terminology ontology. ~350,000 medical concepts arranged in a strict subsumption hierarchy. The truly semantic layer: it lets a system reason that “metformin” is a “biguanide” is a “hypoglycaemic agent” is a “drug.”
- ICD-10 / ICD-11 — billing-grade diagnosis codes. ~70,000 codes; coarser than SNOMED CT but universally adopted for claims and reimbursement.
- RxNorm — US National Library of Medicine drug-name ontology. The standard for medication interop.
- LOINC — Logical Observation Identifiers Names and Codes. The standard for laboratory results and clinical observations.
- CPT — Current Procedural Terminology (AMA). The procedure-code standard for billing in the US.
Real-world deployments: every hospital in the US (mandated), every EHR vendor (Epic, Cerner, Meditech), every payer (UnitedHealth, Anthem, Aetna), national health systems (UK NHS, Australia, Canada, Singapore), and every modern pharmacovigilance / clinical-trials platform.
A FHIR Patient resource
Notice the design: identifier is a list of typed identifiers (Chapter 4’s cross-walk built into the resource). name and address are also lists (a patient may have multiple addresses across time). managingOrganization is a typed reference to another FHIR resource — the foundation of the FHIR graph.
A Condition resource pointing into SNOMED CT
The code.coding array carries the same clinical concept in multiple standards: SNOMED CT (semantically rich), ICD-10 (billing). The hospital’s analytics team queries one or the other depending on the use case; the standards remain interoperable because both codings travel with every resource.
Adoption strategy for FHIR
Wrap. Do not extend FHIR lightly — the standard already covers nearly everything a healthcare team needs, extensions are formalised through FHIR’s Extension mechanism, and the regulator-approved extensions are well documented. Where FHIR feels under-specified (genomic data, social-determinants-of-health, behavioral-health), there is almost certainly an active HL7 working group with a draft implementation guide. Search before extending.
SDMX, COFOG, BPM6, SNA — Macroeconomic Standards
Macroeconomic statistics are produced by national agencies, central banks, and supranational organisations (IMF, World Bank, BIS, OECD, UN, Eurostat). The standards that unify them are less famous than FIBO or FHIR but no less important.
- SDMX (Statistical Data and Metadata Exchange) — ISO 17369. The data-and-metadata-exchange standard maintained by a consortium of seven international organisations (BIS, ECB, Eurostat, IMF, OECD, UN, World Bank). Defines code lists, dataflows, structures, and a query API. Every major macro data API is SDMX-compatible.
- BPM6 (Balance of Payments and International Investment Position Manual, 6th edition) — IMF. Defines balance-of-payments concepts and categories. The basis of every cross-border flow statistic.
- SNA 2008 (System of National Accounts) — UN/IMF/Eurostat/OECD/World Bank joint standard. Defines GDP, the institutional sector classification, and every macro aggregate.
- COFOG (Classification of the Functions of Government) — UN. Classifies government expenditure by function (general public services, defence, education, health, social protection). The grain of every fiscal-policy analysis.
- NACE / ISIC — industry classifications used for sector-level value-added statistics.
A typical SDMX dataflow
SDMX organises every statistic along several dimensions (with controlled vocabularies for each):
The SDMX power is that every macro series fits this structure. A single canonical table can hold “every observation the IMF publishes” — billions of rows, but consistent shape. The trade-off is that you need a separate metadata catalog describing what each dimension code means; this catalog is also SDMX-formatted.
COFOG-classified government expenditure
IEC CIM, NAESB, NERC — the Energy Stack
The electricity sector has its own standards body and its own ontology. The dominant standards:
- IEC 61970 / 61968 — Common Information Model (CIM) — the canonical ontology for electric power systems. UML-based. Defines generators, lines, substations, breakers, transformers, loads, and the topology graph that connects them. Used by every grid operator and EMS (Energy Management System) vendor.
- NAESB (North American Energy Standards Board) — business-process and data-exchange standards for wholesale power, gas markets, and renewables.
- NERC GADS (Generator Availability Data System) — reliability and availability reporting for generating units.
- OPC UA — for SCADA / industrial-automation interop with field devices.
- EIA Form 860 / 923 — US DOE plant-level filings (mentioned in Chapter 4).
- OASIS / CDS — the FERC capacity-market data feeds.
CIM is the most consequential of these for a working analyst because it provides the graph of the power system, not just a list of assets.
The CIM advantage: every analytic that depends on grid topology (loss calculation, congestion pricing, contingency analysis, renewable-integration studies) operates on the same graph structure. An algorithm that finds the shortest electrical path from Bayswater to Sydney runs identically on every CIM-conformant grid.
Adoption strategy in the energy sector: wrap the EIA/FERC/ISO identifiers (Chapter 4 cross-walks) onto a CIM-compliant internal graph; populate from public data and internal SCADA; serve the graph through a property-graph database (Chapter 6).
Multi-Standard Cross-Walks
In every domain you will end up bridging more than one standard. Three patterns that work:
Pattern 1 — Canonical identifier + alias table. Pick one standard as the canonical ID (LEI in finance, FHIR id in healthcare, EIA Plant Code in US energy, SDMX dataflow URN in macro). Store every other identifier as an alias. Use one query against the alias table to translate.
Pattern 2 — Multi-coded resource. Carry all the standard codes inside the resource itself (FHIR’s Condition.code.coding list). Slightly more storage per record; trivial multi-standard query at runtime.
Pattern 3 — Mapping ontology. Build a small ontology of the mappings: each row says “concept X in standard A == concept Y in standard B with confidence C, valid from date D.” Used at agencies that maintain official cross-walks (Eurostat, US Census, BIS). Heavy investment; the operational gold standard.
A surprising amount of the value of a master-data system comes from this single small table. Build it once, expose it through an API or a graph database (Chapter 6 / 7), and every downstream query becomes a one-line lookup.
Wrap, not extend. Use FHIR resources as the canonical representation on the hospital side. Each Condition and Procedure already carries both SNOMED CT and ICD-10/CPT codings in the code.coding list. The payer pipeline can read the ICD-10/CPT coding directly from each resource — no new ontology is needed. If the payer wants to expose SNOMED-grade reasoning (e.g., “all diabetes-related conditions”), use a SNOMED CT reference set rather than re-coding everything to a new standard.
Chapter Wrap-up
Every domain you will model in production already has a published ontology. The four families covered above — financial (FIBO/LEI/GICS), healthcare (FHIR/SNOMED CT/ICD-10/RxNorm/LOINC/CPT), macroeconomic (SDMX/COFOG/BPM6/SNA), and energy (CIM/NAESB/NERC) — cover the program’s core domain emphasis and the vast majority of practitioner work in those verticals.
The default adoption pattern is wrap: keep your internal ontology, link by URI to the standard, retain freedom to evolve internally. Extend only when you genuinely operate at the edge of the standard. Build only when no standard exists or every published one is wrong for your domain.
The cross-walks between standards are the highest-leverage data asset most teams underinvest in. Build them once, govern them like any other reference dataset, expose them through an API, and they pay back forever.
Chapter 6 moves from which ontology to how to build it in Python — NetworkX for in-memory work, rdflib for RDF/OWL, Neo4j and Apache AGE for property graphs at scale, pgvector for hybrid relational-plus-vector storage.
← Chapter 4 · Contents · Chapter 6: Building the Ontology in Python →