Chapter 9: AI on Top of the Ontology

Chapter Introduction

A working ontology is structured data with semantics. That makes it the ideal substrate for modern AI techniques that struggle on raw text and tables alone: they get the precision of structured data, the reasoning of a domain model, and the breadth of unstructured language all at once. This chapter teaches three AI patterns built directly on top of the ontology.

Retrieval-Augmented Generation (RAG) on the Knowledge Graph — combine semantic search over the ontology with an LLM that answers questions grounded in retrieved facts. The user gets natural-language answers; the system gets citation-grade auditability because every claim points to a node in the ontology.
Graph Neural Networks (GNNs) — learn predictive representations that exploit the ontology’s link structure. The model knows that two patients are linked through a shared provider, or that two stocks are linked through a shared institutional holder, and uses that structure for prediction.
Knowledge-Graph Embeddings — turn the ontology’s (subject, predicate, object) triples into vectors in \(\mathbb{R}^d\) such that arithmetic on vectors corresponds to ontology operations. Used for link prediction, anomaly detection, and downstream features in classical regression.

Real-world deployments span the four domain verticals. Bloomberg’s internal news pipeline runs RAG against an entity-and-event knowledge graph linking companies, executives, deals, and macro events. Mayo Clinic’s disease-progression team uses GNNs over a SNOMED-CT-anchored patient graph. Bridgewater’s “All-Weather” research uses GNN-derived factor embeddings across global macro indicators. PJM (the US Mid-Atlantic ISO) uses GNNs over the CIM grid topology for congestion forecasting.

The chapter is the deepest of the book in technical content but the friendliest for browser-running code — NetworkX-based GNN message-passing and the simpler KG-embedding methods are all small enough to teach in pure NumPy.

GraphRAG — Retrieval Grounded in the Ontology
GNNs on Ontology Graphs
Knowledge-Graph Embeddings — TransE, DistMult, RotatE
Link Prediction — Finding Missing Edges
Case Study: Quant — Macro Knowledge Graph for Factor Discovery
Case Study: Healthcare — Disease Progression Modelling
Case Study: Macro — Cross-Country Indicator Embeddings
Case Study: Energy — Grid-Topology GNN for Congestion

GraphRAG — Retrieval Grounded in the Ontology

Vanilla RAG (Volume I Chapter 8) retrieves text passages by vector similarity and hands them to an LLM. GraphRAG (popularised by Microsoft Research in 2024) retrieves subgraphs — a query’s question is answered by extracting a connected piece of the ontology, summarising it, and grounding the LLM’s answer in the retrieved structure.

Why GraphRAG beats plain RAG on structured domains:

Questions like “what relationships exist between X and Y” cannot be answered by individual passages — they require connectivity.
Citations are precise: the answer cites nodes and edges, not paragraphs.
The same retrieval handles questions about aggregates (e.g., “summarise the relationships of all subsidiaries of LegalEntity X”) that paragraph-RAG struggles with.

The standard GraphRAG pipeline:

Entity-anchor extraction. Parse the user’s question; identify the ontology entities it mentions (entity linking — Chapter 4).
Subgraph extraction. Pull the \(k\)-hop neighbourhood around each anchor; trim with relevance scoring.
Summarisation. Convert the subgraph to a structured textual form (a list of typed triples or a brief narrative).
LLM generation. Pass the summary + the question to the LLM; require citations.

Production implementations: Microsoft’s open-source graphrag package; Neo4j’s neo4j-graphrag library; LlamaIndex’s KnowledgeGraphIndex; LangChain’s GraphRetriever.

The citations point to triples in the ontology — a regulator can verify each claim by querying the graph. This is the single largest advantage GraphRAG has over plain text RAG.

GNNs on Ontology Graphs

Chapter 8 of Volume I introduced GNNs as a method on relational data. Here we apply them specifically to the ontology graph and to the four domains.

The basic message-passing layer is the same:

\[ h_v^{(\ell+1)} = \sigma\Big(W^{(\ell)} h_v^{(\ell)} + U^{(\ell)} \cdot \text{aggregate}_{u \in \mathcal{N}(v)}(h_u^{(\ell)})\Big) \]

What makes a GNN on an ontology special is that the aggregation is typed — different relation types contribute differently. Relational-GCN (R-GCN, Schlichtkrull et al. 2018) uses one weight matrix per relation type; Heterogeneous-GAT (HAN) does the same with attention. Production GNN libraries (PyTorch Geometric, DGL) all expose these directly.

Two GNN layers stacked, with trained weights, would do dramatically better — and in production, the weights are trained against a supervised label (e.g., “this patient will be readmitted within 30 days”). The GNN’s representation becomes the feature for the downstream classifier.

Knowledge-Graph Embeddings — TransE, DistMult, RotatE

A knowledge graph’s triples \((h, r, t)\) — “head, relation, tail” — can be embedded into \(\mathbb{R}^d\) such that the geometry of the embedding encodes the ontology’s logical structure. Three canonical models:

TransE (Bordes et al., 2013) — relations are translations: \(\mathbf{h} + \mathbf{r} \approx \mathbf{t}\). Simple, fast, weak on 1-to-many relations.
DistMult (Yang et al., 2015) — relations are diagonal matrices: \(\text{score}(h, r, t) = \mathbf{h}^\top \text{diag}(\mathbf{r}) \mathbf{t}\). Better on multi-arity relations, symmetric.
RotatE (Sun et al., 2019) — relations are rotations in complex space. Handles symmetry, antisymmetry, inversion, composition. State-of-the-art for many years; widely deployed.
Newer models (ComplEx, TuckER, NodePiece) extend the family with various trade-offs.

Trained on the existing triples of an ontology, embeddings are used for:

Link prediction — given a head and a relation, predict the most-likely tail. “Apple’s CEO is…?” The model emits a ranked list of candidate entities even for relations / entities not seen during training.
Anomaly detection — score every existing triple; the ones with anomalously low scores are candidates for being incorrect.
Downstream features — the embedding of an entity is a fixed-length vector that summarises everything the ontology says about it; feed it into a classical regression.

The toy embedding correctly ranks Google and Microsoft as Apple’s competitors (the triples it was trained on) and the other entities lower. With more data, more dimensions, and proper training (Adam optimiser, evaluation-based stopping), TransE-class models scale to tens-of-millions of entities and hundreds-of-millions of triples — Bloomberg’s internal company KG embedding, Microsoft’s Bing Knowledge Graph, and Google’s KG all run this family of models at planetary scale.

Link Prediction — Finding Missing Edges

A trained KG embedding lets you score every possible triple. The ones with high scores that aren’t in the graph yet are the most-promising candidate edges. This is the operational workflow for:

Drug-target prediction in pharmacology (predict missing edges in the drug-disease-gene KG).
Recommender systems (predict missing user-item edges).
Fraud detection (predict suspicious linkages between accounts and counterparties).
Macroeconomic network discovery (predict missing transmission channels between policy variables and downstream indicators).

The result will depend on the small random sample, but the structure of the operation is universal: rank candidate tails by their embedding score; the top ones are the predictions. Production link-prediction systems evaluate this with hits@K and mean reciprocal rank on a held-out test set, and run nightly to surface new candidate edges for human review.

Case Study: Quant — Macro Knowledge Graph for Factor Discovery

A quant fund’s macro research team maintains a knowledge graph of countries, policy variables, central banks, indicators, and the relationships among them (which central bank publishes which indicator; which indicator influences which other indicator; which country is a member of which monetary union). Embedded into a low-dimensional space, the graph yields:

Country embeddings that cluster by underlying economic structure (G7, BRICS, oil exporters, financial centres) — useful as features in cross-country regressions.
Indicator embeddings that reveal substitutes (e.g., headline CPI ≈ core CPI ≈ PCE deflator are near each other) — useful for missing-data imputation.
Link prediction between policy announcements and downstream indicators (“if the BOJ raises rates by 25bps, which Asian-currency pairs move first?”) — used as a screening prior for the actual time-series regression.

Bridgewater and PIMCO have published research-level descriptions of similar systems; the operational implementation is a Postgres + Apache AGE deployment with a nightly KG-embedding refresh.

Case Study: Healthcare — Disease Progression Modelling

A hospital research team builds a KG of patients, diagnoses (SNOMED CT), medications (RxNorm), procedures (CPT), and outcomes (mortality, readmission, complication). The GNN trained on this graph predicts:

30-day readmission risk at discharge — feature input to a clinical decision-support alert.
Disease progression for chronic conditions (CKD → ESRD timing; T2DM → cardiovascular event timing).
Adverse-event likelihood for specific drug combinations.

The Mayo Clinic, MIT, and Stanford have all published on GNN-based clinical-pathway analytics; in production they typically feed a downstream regression rather than directly drive a clinical decision (regulatory and interpretability constraints — Volume I Chapter 10’s discipline applies in full).

A characteristic finding from this literature: a small GNN with two message-passing layers on a SNOMED-anchored patient graph systematically outperforms the same-features-flattened gradient-boosted classifier for readmission prediction by 3–7 ROC-AUC points. The graph structure is real signal; throwing it away in a flat feature matrix is real cost.

Case Study: Macro — Cross-Country Indicator Embeddings

Embed every (country, indicator, period) triple from the IMF International Financial Statistics; learn embeddings such that countries with similar economic structure cluster, indicators with similar economic role cluster, and the embedding can answer “what is the analogue of Brazil’s GDP-growth-vs-inflation profile in Asia?” (a nearest-neighbour query in the country embedding subspace).

The downstream uses for a macro researcher:

Analogue countries for stress-testing — find five countries historically similar to today’s target.
Indicator substitution when a specific country’s data series is unavailable or unreliable — use the nearest-neighbour indicator from a country with parallel structure.
Regime classification — clusters of country-time embeddings often correspond to recognisable macro regimes (expansion, stagflation, currency crisis).

Implementation: train a TransE-style embedding nightly on the SDMX-conformant observation store; expose the embeddings via a small API; the analyst notebook does the nearest-neighbour queries.

Case Study: Energy — Grid-Topology GNN for Congestion

A CIM-conformant grid topology has thousands of generators, substations, transmission lines, and loads. A GNN trained on the topology + historical load and weather features predicts locational marginal price (LMP) congestion several hours ahead.

The model exploits the graph structure: a substation’s LMP is heavily influenced by the LMPs of its neighbours through transmission constraints; a generator’s marginal cost interacts with the binding constraint on the line that exports its power. A flat regression on per-substation features misses these interactions; a GNN captures them automatically.

Real deployments: PJM Interconnection, ERCOT, and CAISO all have research-stage GNN congestion forecasters in 2024–2026 papers; production use lags research by 2–3 years in regulated markets due to model-risk-management requirements.

Prefer a 2-layer GNN paired with a downstream linear or shallow tree model whose inputs include the GNN-derived embeddings. Two reasons. (1) Interpretability: a 4-layer GNN’s predictions depend on 4-hop neighbourhoods that no physician can reason about; 2 hops keep the explanation local to the patient and their immediate clinical contacts. (2) Regulatory: SHAP values on a linear/tree model are tractable; SHAP on a deep GNN is far less so. The GNN provides features; the downstream model provides the decision and the explanation. This is the standard pattern at every regulated healthcare-AI team.

Chapter Wrap-up

Three AI techniques that operate directly on the ontology: GraphRAG for question answering with citations; GNNs for ontology-aware predictive features; KG embeddings for link prediction, analogue retrieval, and downstream features.

The four domain case studies share the same structural pattern: build the ontology (Chapters 4–6), populate it with high-quality data (Chapter 3), query it (Chapter 7), wire actions and events around it (Chapter 8), and then layer AI on top to produce decisions. None of the AI techniques replaces the statistical methods of Volume I — they feed them, providing features that classical methods could not extract from raw data alone.

Chapter 10 closes the book with the discipline that determines whether any of this AI is actually deployable: trust, lineage, and governance. Every AI prediction, every action’s outcome, every workflow’s audit trail must be traceable to the ontology that produced it, and the entire stack must satisfy the regulators and stakeholders who will scrutinise it.

← Chapter 8 · Contents · Chapter 10: Trust, Lineage, and Governance →