• 📖 Cover
  • Contents

Contents

Contents

Tap any chapter to start reading.

Chapter 1 Why Domain Modelling — From Schemas to Ontologies

The conceptual shift from “tables of data” to “objects in the world.” Schemas vs. data models vs. ontologies. The PLTR/Foundry worldview. The open-source alternative stack. When you don’t need an ontology.

Chapter 2 Object Types, Properties, Links, Actions

The four primitives of every ontology. Worked examples in healthcare, lending, and operations. The ontology canvas — a template you’ll use in every project.

Chapter 3 Grain, Time, and Point-in-Time Correctness

Choosing the unit of analysis. Slowly-changing dimensions (SCD1, SCD2, SCD6). The silent killer: point-in-time leakage. As-of joins. Event sourcing versus snapshots.

Chapter 4 Entity Resolution and Master Data

Deterministic matching, fuzzy string methods, the Fellegi–Sunter probabilistic model, ML-based entity resolution, and master-data management. Four worked case studies: trading (CUSIP/ISIN/FIGI/LEI), healthcare (patient matching), macroeconomics (country codes), and energy (generator and substation identifiers).

Chapter 5 Standard Ontologies in the Wild

FIBO (finance), FHIR / SNOMED CT / ICD-10 / RxNorm (healthcare), SDMX / COFOG / BPM6 / SNA (macroeconomic), IEC CIM / NAESB / NERC (energy). When to wrap, extend, or build your own.

Chapter 6 Building the Ontology in Python

NetworkX for in-memory work, rdflib for the formal semantic stack, Neo4j and Apache AGE for property graphs at scale, PostgreSQL + JSONB + pgvector for the pragmatic hybrid. Layered architectures combining all four.

Chapter 7 Querying the Ontology

SQL for the relational projection, SPARQL for RDF/OWL stores, Cypher / GQL for property graphs, GraphQL for the API layer. The same business question expressed in all four idioms. Performance and anti-patterns.

Chapter 8 Actions, Events, and Workflows

The anatomy of an action (pre-conditions, parameters, effects, audit). Event sourcing, correlation IDs, idempotency, replay. Workflow orchestrators (Dagster, Prefect, Apache Airflow) and the Foundry actions framework. Four domain case studies: quant order management, healthcare prior authorisation, macro data release, energy market clearing.

Chapter 9 AI on Top of the Ontology

GraphRAG for retrieval grounded in the ontology, graph neural networks for ontology-aware prediction, knowledge-graph embeddings (TransE, DistMult, RotatE) for link prediction and downstream features. Domain case studies in quant, healthcare, macro, and energy.

Chapter 10 Trust, Lineage, and Governance

Lineage (OpenLineage, Marquez), audit trails (append-only, hash-chained, externally attested), access control (RLS, ALS, ABAC), the EU AI Act and Annex IV documentation, and the regulatory frameworks for each of the four domains. The architecture that makes the previous nine chapters defensible in production.


How to read this book

This is the practitioner’s companion to Learning Statistics in Python. Where the methods book teaches what statistical operations to use, this book teaches what to operate on — the domain model that turns raw data into a structure where statistical reasoning is well-defined.

Every Python code block in this book runs live in your browser via Pyodide. NetworkX, rdflib, DuckDB, pandas, and scikit-learn run with no installation. Click into any cell, edit, press Run, see the output.

The two-book pair
  • Volume I — Learning Statistics in Python teaches the methods. Read it for self-study; the chapters are designed to be read in any order driven by what you need.
  • Volume II — Domain Modelling in Python (this book) teaches the discipline of structuring the world before you reach for a method. Class time is built around the case studies in this book; the methods book is the prerequisite reading.
Domain focus

The case studies recur throughout the book in four verticals: quantitative trading, healthcare, macroeconomics, and energy. Each vertical has its own published ontology stack (FIBO, FHIR/SNOMED, SDMX, IEC CIM), its own regulatory regime, and its own characteristic modelling idioms. The chapters teach the common discipline; the case studies show its expression in each domain.

What this book is not

This is not a Palantir Foundry user manual, an SQL textbook, or a graph-database tutorial. Those are platforms and languages; this book is about the conceptual discipline you apply on whichever platform you have. Foundry, Microsoft Fabric, Databricks Unity Catalog, and an open-source PostgreSQL + pgvector + NetworkX stack are all valid implementations of the same ideas.

← Back to Cover

 

Prof. Xuhu Wan · HKUST ISOM · Domain Modelling in Python