Chapter 7: Querying the Ontology

Chapter Introduction

An ontology is only as useful as the queries you can run against it. This chapter is a guided tour of the four query languages a working practitioner will meet — SQL (for the relational projection of the ontology), SPARQL (for RDF/OWL stores), Cypher (for property graphs), and GraphQL (for the API layer that downstream applications and dashboards consume). The conceptual mapping between them is mostly direct; once the patterns click, you can read and write any of the four.

The chapter is organised around five recurring query patterns that show up in every domain:

Lookup — retrieve an object by its identifier.
Pattern match — find objects connected by a typed path of links.
Aggregation — compute roll-ups across a hierarchy.
Multi-hop traversal — explore the n-hop neighbourhood of an object.
Subgraph extraction — return a connected piece of the ontology for downstream analysis.

For each pattern we show the same query in all four idioms, on the four domain case studies (quant trading, healthcare, macro, energy). The browser cells use a NetworkX stand-in for the graph store; the SQL/SPARQL/Cypher/GraphQL syntax is shown alongside as production reference.

The Five Universal Query Patterns
SQL — Querying the Relational Projection
SPARQL — Querying the Triple Store
Cypher — Querying the Property Graph
GraphQL — the API Layer
The Same Query in Four Idioms — Worked Comparisons
Performance and Anti-Patterns

The Five Universal Query Patterns

Every ontology query is one of (or a composition of) five patterns. Mastering them in any one language lets you read and write the others quickly.

Lookup — Get Patient PT-001 (give me one object by ID).
Pattern match — Find Claim → Patient → Policy where ... (traverse a typed path).
Aggregation — Sum claim amounts per Patient (roll up over a hierarchy or property).
Multi-hop traversal — Find every Provider within 2 hops of this Provider in the referral graph (variable-length paths).
Subgraph extraction — Return the subgraph induced by Patient PT-001 and its claims, providers, and diagnoses (connected component).

The same five patterns appear whether you are querying a hospital’s FHIR server, a quant fund’s trade-and-position store, the IMF’s SDMX endpoint, or an ISO’s CIM topology.

SQL — Querying the Relational Projection

Every ontology has a relational projection — a set of tables (one per object type) joined on foreign keys (the typed links). SQL queries this projection. It is the language every analyst already speaks; for the ~70% of ontology queries that don’t require deep graph traversal, SQL is the right answer.

The lookup pattern:

-- Find a patient by FHIR id
SELECT *
FROM   patients
WHERE  patient_id = 'PT-001';

The pattern-match pattern (FHIR-style: patient → claim → procedure):

SELECT pt.name, c.service_date, p.cpt_code, p.label
FROM   patients   pt
JOIN   claims     c   ON c.patient_id   = pt.patient_id
JOIN   procedures p   ON p.procedure_id = c.procedure_id
WHERE  pt.patient_id  = 'PT-001'
  AND  c.service_date >= DATE '2025-01-01';

The aggregation pattern (industry-relative ROE, finance):

WITH sector_mean AS (
  SELECT sector_id, AVG(roe) AS mean_roe
  FROM   securities
  GROUP  BY sector_id
)
SELECT s.ticker, s.roe, s.roe - sm.mean_roe AS roe_vs_sector
FROM   securities s
JOIN   sector_mean sm USING (sector_id)
ORDER  BY roe_vs_sector DESC
LIMIT  10;

The multi-hop pattern requires a recursive CTE in SQL — supported by Postgres, SQL Server, Oracle, and BigQuery but historically the SQL feature that beginners trip on:

-- Find every physician within 3 referral hops of DrA
WITH RECURSIVE referral_graph AS (
  SELECT 'DrA' AS provider_id, 0 AS hops
  UNION ALL
  SELECT r.referee_id, rg.hops + 1
  FROM   referrals r
  JOIN   referral_graph rg ON rg.provider_id = r.referrer_id
  WHERE  rg.hops < 3
)
SELECT DISTINCT provider_id, MIN(hops) AS shortest_path
FROM   referral_graph
GROUP  BY provider_id
ORDER  BY shortest_path;

The recursive CTE works but quickly becomes awkward past ~4 hops or when the graph is dense. That is exactly the inflection point where Cypher and SPARQL win.

SPARQL — Querying the Triple Store

SPARQL is the SQL of the RDF world. Queries describe a graph pattern — a set of triples with variables — and the engine finds all bindings of variables that match.

The lookup:

PREFIX fibo:  <https://spec.edmcouncil.org/fibo/ontology/>
PREFIX :      <https://example.org/securities/>

SELECT ?ticker ?isin ?lei
WHERE {
  :AAPL_common  fibo:hasTicker  ?ticker ;
                fibo:hasISIN    ?isin ;
                fibo:hasIssuer  ?issuer .
  ?issuer       fibo:hasLEI     ?lei .
}

Read this as: find every value of ?ticker, ?isin, ?lei such that the four triples in the WHERE clause hold simultaneously. The ; separator means “same subject”; the . ends a triple.

The pattern-match (macroeconomic — find every IMF dataflow about a country):

PREFIX sdmx: <http://purl.org/linked-data/sdmx/2009/concept#>

SELECT ?dataflow ?indicator ?value
WHERE {
  ?obs  sdmx:refArea     <http://www.imf.org/codes/country/USA> ;
        sdmx:refIndicator ?indicator ;
        sdmx:obsValue     ?value ;
        sdmx:dataflow     ?dataflow .
  FILTER (?value > 0)
}
LIMIT 50

The variable-length traversal (the SPARQL 1.1 feature SQL doesn’t have):

PREFIX :  <https://example.org/grid/>

# Find every substation reachable from Bayswater-1 in any number of hops
SELECT DISTINCT ?sub
WHERE {
  :Bayswater_1 (:FEEDS | :CONNECTED_VIA | :CONNECTED_TO)+ ?sub .
  ?sub a :Substation .
}

The + operator means “one or more hops along any of the listed predicates.” This is the analogue of Cypher’s * and is what makes SPARQL natural for ontology traversals.

The aggregation:

SELECT ?sector (AVG(?roe) AS ?mean_roe)
WHERE {
  ?security a fibo:Equity ;
            fibo:roe ?roe ;
            fibo:gicsSector ?sector .
}
GROUP BY ?sector
ORDER BY DESC(?mean_roe)

SPARQL queries can also federate — fetch part of the result from one endpoint and part from another with the SERVICE keyword. This is what makes the W3C Linked Data ecosystem work: an analyst writing a SPARQL query at the IMF endpoint can join its results against the BIS endpoint with one line.

Cypher — Querying the Property Graph

Cypher’s design is intentionally pictorial: you draw the pattern. The same patterns:

// Lookup
MATCH (s:Security {ticker: 'AAPL'})
RETURN s

// Pattern match — patient → claim → procedure
MATCH (p:Patient {id: 'PT-001'})-[:HAS_CLAIM]->(c:Claim)-[:OF_PROCEDURE]->(proc:Procedure)
WHERE c.service_date >= date('2025-01-01')
RETURN p.name, c.service_date, proc.cpt_code, proc.label

// Aggregation
MATCH (s:Security)
WHERE s.sector_id IS NOT NULL
WITH s.sector_id AS sector, avg(s.roe) AS mean_roe
RETURN sector, mean_roe
ORDER BY mean_roe DESC

// Multi-hop — variable-length path
MATCH (g:Generator {name: 'Bayswater-1'})-[:FEEDS|CONNECTED_VIA|CONNECTED_TO*1..]->(s:Substation)
RETURN DISTINCT s.name

// Subgraph extraction
MATCH (p:Patient {id: 'PT-001'})-[r]-(connected)
RETURN p, r, connected

Three Cypher features worth knowing:

Variable-length paths — [:REL_TYPE*1..3] matches 1 to 3 hops; *1.. is unlimited.
OPTIONAL MATCH — like SQL’s LEFT JOIN; matches the pattern if present, returns NULL if not.
WITH — passes intermediate results to the next stage of the query (Cypher’s pipeline operator).

Cypher is now an ISO standard via GQL (ISO/IEC 39075, 2024), so the syntax is portable across Neo4j, Apache AGE, Memgraph, Amazon Neptune (openCypher mode), TigerGraph (GSQL), and the new generation of cloud graph databases.

GraphQL — the API Layer

GraphQL is a query language for APIs, not a database query language. It sits in front of the database (often Postgres + AGE, or Neo4j, or a federated set of backends) and gives downstream applications a typed, hierarchical interface.

A typical GraphQL schema for the patient example:

type Patient {
  id: ID!
  name: String!
  dob: Date!
  claims: [Claim!]!     # field returns a list, no nulls
}

type Claim {
  id: ID!
  serviceDate: Date!
  billedAmount: Float!
  procedure: Procedure!
}

type Procedure {
  cptCode: ID!
  label: String!
  category: String
}

type Query {
  patient(id: ID!): Patient
  patients(name: String): [Patient!]!
}

A client query is hierarchical and self-describing:

query {
  patient(id: "PT-001") {
    name
    dob
    claims {
      serviceDate
      billedAmount
      procedure {
        cptCode
        label
      }
    }
  }
}

The response is JSON with exactly the requested fields and nothing more:

{
  "patient": {
    "name": "Mary Wong",
    "dob": "1978-11-02",
    "claims": [
      {"serviceDate": "2025-03-04", "billedAmount": 340.0,
       "procedure": {"cptCode": "CPT-99213", "label": "Office visit 15 min"}},
      {"serviceDate": "2025-06-18", "billedAmount": 425.0,
       "procedure": {"cptCode": "CPT-93000", "label": "ECG complete"}}
    ]
  }
}

Why GraphQL has won the API layer at most enterprises:

One round-trip — the client gets exactly the data needed for the screen.
Strongly typed — the schema is the contract; clients get autocomplete and type checking.
Federation — multiple backend services compose into one schema (Apollo Federation, Hot Chocolate, GraphQL Hive).
Versionless — fields are added without breaking existing clients.

GraphQL doesn’t replace SQL/SPARQL/Cypher; it wraps one of them. The resolver behind each field issues the actual database query. A pattern: GraphQL schema → resolvers → Cypher or SPARQL → graph database. Clients never see the underlying query language.

The Same Query in Four Idioms — Worked Comparisons

The same business question — “for patient PT-001, return all claims since 2025-01-01 along with their procedures” — in all four idioms:

Four different surfaces, same underlying intent. Knowing the mapping between them is the single most useful skill in a practitioner’s portfolio — every team uses at least one, most teams use two or three.

Performance and Anti-Patterns

Cross-language performance rules:

Filter early. In SQL push the WHERE into the most-selective subquery; in Cypher / SPARQL filter inside the MATCH rather than after.
Index the join keys. Postgres indexes on FK columns; Neo4j indexes on the lookup properties; SPARQL stores index on the predicates you query most.
Variable-length paths cost. Cypher’s [:REL*1..5] and SPARQL’s + traverse exponentially in the worst case. Bound the depth.
Avoid Cartesian explosions. A MATCH (a), (b) in Cypher without a relationship between a and b produces the full Cartesian product. Always provide a connecting relationship.
Use EXPLAIN / PROFILE. Every query language has a planner-introspection command. Use it before declaring a query slow.

Common anti-patterns:

Storing the entire ontology in a single JSONB blob. Defeats every index; queries become full table scans.
Putting graph traversals in SQL when the graph is large. Recursive CTEs are correct but slow. If you find yourself writing many of them, you have a property graph in disguise — adopt one.
Hand-writing Cypher for an API client. The API should be GraphQL or REST; the Cypher is the resolver’s job.
Federated SPARQL across many endpoints in real-time user paths. Federated SPARQL is great for batch and reporting; latency is unpredictable.

N+1 query problem. Each request to fetch a Patient triggers one Cypher query for the patient, then one more query for each claim, then one more for each procedure. For a patient with 100 claims, that’s 100+ database round trips. Fix: rewrite the resolver to issue a single Cypher query that retrieves the patient and its claims and the linked procedures in one traversal (a “fragment query”), and use a DataLoader-style batching layer to combine concurrent requests. Modern GraphQL libraries (Hot Chocolate, Strawberry, Apollo) all provide this batching primitive.

Chapter Wrap-up

Four query languages, five universal patterns, one underlying ontology. The right idiom depends on where in the stack you are:

SQL at the relational projection layer.
SPARQL at the formal-semantics layer.
Cypher / GQL at the property-graph layer.
GraphQL at the API layer that downstream applications consume.

Production stacks chain them: GraphQL resolver → Cypher → graph database, or SQL → SPARQL via a Sparql-to-SQL adapter, or REST → GraphQL → SQL. Fluency in all four is rare; even fluency in two puts you ahead of the median practitioner.

Chapter 8 turns from query to change: actions, events, and workflows — the operational layer that mutates the ontology and produces the auditable trail of what was done, by whom, and why.

← Chapter 6 · Contents · Chapter 8: Actions, Events, and Workflows →