Chapter 2: Data Models and Query Languages

“The limits of my language mean the limits of my world.” — Ludwig Wittgenstein

The Real Backbone of Every System

If Chapter 1 was about how systems behave, Chapter 2 is about something more dangerous:
how you think about your data.

Because once you pick a data model…
you’re kind of stuck with it.

Why Data Models Matter More Than You Think

Most engineers treat databases like a tool:

“Just pick Postgres… or Mongo… or whatever works.”

But Martin Kleppmann makes a subtle point:

👉 Data models shape your thinking, not just your storage.

Every system is layered:

Real-world concepts (users, payments, events)
Application objects (classes, structs, JSON)
Database representation (tables, documents, graphs)
Storage (bytes, disk, memory)

Each layer abstracts the one below it

And here’s the catch:

Every abstraction makes some things easy… and others painful.

The Three Core Data Models

Let’s strip it down. Almost everything you’ll use falls into three camps:

1. Relational Model (SQL)

The OG. Still undefeated in many ways.

Core idea:

Data lives in tables (relations)
Rows = records
Columns = attributes
Relationships handled via joins

Strengths:

Handles complex relationships (many-to-many) extremely well
Mature ecosystem
Query optimizers do the hard work for you

Weaknesses:

Rigid schema (schema-on-write)
Mapping objects → tables can feel awkward (the famous ORM headache)

2. Document Model (NoSQL / JSON)

Think MongoDB, Firestore, etc.

Core idea:

Store data as self-contained documents (usually JSON)

{
  "user": "Frank",
  "skills": ["React", "AI"],
  "experience": {
    "years": 8
  }
}

Strengths:

Flexible schema (schema-on-read)
Maps naturally to application code
Great for one-to-many / tree-like data

Weaknesses:

Weak support for joins
Many-to-many relationships get messy fast
You might push complexity into your app instead of the DB

👉 Translation:
You avoided SQL… but now you are the query engine. Congrats.

3. Graph Model

The underrated weapon.

Core idea:

Data = nodes (entities) + edges (relationships)

Perfect for:

Social networks
Recommendation systems
Fraud detection
Anything with deep relationships

Strengths:

Relationships are first-class citizens
Traversals are natural and efficient
Extremely flexible schema

Weaknesses:

Not ideal for simple CRUD apps
Requires a mindset shift

Graphs shine when:

“Everything is connected to everything.”

The Real Trade-Off (No BS Version)

Let’s simplify the decision-making:

Use CaseBest ModelStructured data, transactionsRelationalNested data, fast iterationDocumentHighly connected dataGraph

And here’s the truth Kleppmann hints at:

There is no “best” model—only fit-for-purpose

Schema: Strict vs Flexible

This is where developers start fights on Twitter.

Schema-on-Write (Relational)

Define structure before storing
Enforced by the database

✔ Safe
❌ Less flexible

Schema-on-Read (Document)

Store anything
Interpret later

✔ Flexible
❌ Easy to shoot yourself in the foot

Kleppmann puts it nicely:

It’s not “schemaless”—it’s just schema somewhere else (usually your code)

Query Languages: Declarative Wins

Another underrated concept.

Imperative (how to do it)

for (...) {
  if (...) {
    ...
  }
}

Declarative (what you want)

SELECT * FROM animals WHERE family = 'Sharks';

Declarative queries:

Let the database optimize execution
Scale better long-term
Reduce application complexity

That’s why SQL is still everywhere.

Beyond SQL: Modern Query Approaches

Chapter 2 also explores different query styles:

SQL → relational
MongoDB aggregation → document pipelines
Cypher → graph queries
SPARQL → RDF graph queries
Datalog → rule-based logic (very powerful, slightly mind-bending)

Datalog in particular is interesting:

You define rules, not just queries—and they can be reused and composed

The Object-Relational Mismatch (a.k.a. Why ORMs Exist)

Your app uses objects.
Your DB uses tables.

That mismatch creates:

Boilerplate
Complexity
Performance trade-offs

This is why:

ORMs exist
And also why people complain about ORMs 😅

The Convergence Trend

Here’s the plot twist:

Databases are starting to look… the same.

Relational DBs now support JSON
Document DBs are adding joins
Hybrid systems are emerging

Kleppmann calls this out clearly:

👉 The future is likely a mix of models, not one winner

Practical Takeaways (No Fluff)

If you remember nothing else, remember this:

1. Your data model is a long-term decision

Changing it later = pain.

2. Model your relationships first

Few relationships → Document
Many relationships → Relational or Graph

3. Don’t blindly follow trends

“Use Mongo” or “Use Postgres” is not architecture.

4. Complexity moves somewhere

Not in the DB? → It’s in your app
Not in your app? → It’s in the DB

Pick your poison wisely.

Final Thoughts

Chapter 2 is basically Kleppmann saying:

“Your database choice is not a tooling decision.
It’s a thinking framework.”

And once you see it that way, you stop asking:

“What database should I use?”

And start asking:

“What shape does my data actually have?”

If you want, I can also:

Turn this into a Notion-ready blog template
Add code examples (Postgres vs Mongo vs Neo4j)
Or tailor it to your blog voice (The Practical Engineer) with stronger personality and visuals

Chapter 2: Data Models and Query Languages

The Real Backbone of Every System

Why Data Models Matter More Than You Think

The Three Core Data Models

1. Relational Model (SQL)

2. Document Model (NoSQL / JSON)

3. Graph Model

The Real Trade-Off (No BS Version)

Schema: Strict vs Flexible

Schema-on-Write (Relational)

Schema-on-Read (Document)

Query Languages: Declarative Wins

Imperative (how to do it)

Declarative (what you want)

Beyond SQL: Modern Query Approaches

The Object-Relational Mismatch (a.k.a. Why ORMs Exist)

The Convergence Trend

Practical Takeaways (No Fluff)

1. Your data model is a long-term decision

2. Model your relationships first

3. Don’t blindly follow trends

4. Complexity moves somewhere

Final Thoughts

Stay in the loop

💬 Leave a Comment