Chapter 4: Encoding and Evolution

FMFrank Mendez·
Chapter 4: Encoding and Evolution

Your system doesn’t break when you deploy it. It breaks when old data meets new code. That’s the uncomfortable reality of software: data outlives everything. You can rewrite your frontend. You can refactor your backend. You can even replace your database. But your data? It sticks around quietly waiting to expose every bad decision you made six months ago.

Why Your Data Format Will Betray You (If You Let It)

Most engineers obsess over:

  • frameworks

  • databases

  • performance

But quietly running underneath everything is something way more fragile:

How your data is encoded.

And worse…

How it changes over time.

Because your system won’t break on day one.
It’ll break six months later—when version 3 of your API meets version 1 of your data.


The Real Problem: Data Outlives Code

Here’s the uncomfortable truth:

  • Code is redeployed all the time

  • Data sticks around forever

That means:

👉 Old and new versions must coexist

This is where most systems start to rot.


Encoding: Turning Objects Into Bytes

Before data is stored or sent over the network, it must be encoded.

Think:

const user = {
  name: "Frank",
  age: 30
};

Becomes:

  • JSON

  • XML

  • Protocol Buffers

  • Avro

  • etc.


Why Encoding Matters

Because different formats affect:

  • Performance

  • Size

  • Compatibility

  • Developer sanity


Human-Readable vs Binary Formats

Let’s break it down.


JSON / XML (Human-Readable)

Pros:

  • Easy to debug

  • Widely supported

  • Flexible

Cons:

  • Bigger size

  • Slower parsing

  • No strict schema

👉 Great for APIs, not always for internal systems.


Binary Formats (Protobuf, Avro, Thrift)

Pros:

  • Compact

  • Faster

  • Strong schema support

Cons:

  • Harder to debug

  • Requires tooling

👉 Great for:

  • Microservices

  • High-performance systems

  • Data pipelines


The Real Boss Fight: Schema Evolution

This is the heart of Chapter 4.

How do you change your data structure… without breaking everything?


Two Key Compatibility Types

1. Backward Compatibility

New code can read old data

✔ Required when:

  • You deploy new services

  • Old data still exists


2. Forward Compatibility

Old code can read new data

✔ Required when:

  • Rolling deployments

  • Multiple versions running at once


👉 In real systems, you need both.


JSON: Flexible… but Dangerous

JSON doesn’t enforce schemas.

Sounds great… until:

  • Fields are missing

  • Types change

  • Naming becomes inconsistent

Example:

// Version 1
{ "name": "Frank" }

// Version 2
{ "fullName": "Frank Mendez" }

Boom. Silent bugs.


The Hidden Truth

JSON is not schemaless.
The schema just lives in your code.

And your code is… inconsistent.


Binary Formats: Built for Evolution

Formats like Avro and Protobuf solve this properly.


How They Do It

They use schemas with versioning rules:

  • Fields have IDs

  • Unknown fields are ignored

  • Defaults can be defined


Example (Protobuf-style thinking)

message User {
  string name = 1;
  int32 age = 2;
}

If you later add:

string email = 3;

Old systems:

  • Ignore email

New systems:

  • Handle it properly

👉 No drama. No production fire.


Avro’s Big Advantage

Avro stores schema separately from data.

That means:

  • Data is smaller

  • Schema can evolve independently

👉 Perfect for:

  • Kafka

  • Data pipelines

  • Event-driven systems


Data Evolution in the Real World

This is where things get spicy.


Scenario: Microservices

  • Service A sends data

  • Service B consumes it

  • They deploy independently

If your encoding is fragile:

💥 You break production


Scenario: Databases

You change schema:

ALTER TABLE users ADD COLUMN email;

Now:

  • Old code doesn’t know about it

  • New code depends on it

Welcome to migration hell.


Strategies That Actually Work


1. Add Fields, Don’t Remove

Safe:

{ "name": "Frank", "email": "..." }

Dangerous:

{ "fullName": "Frank" } // renamed field

2. Use Defaults

Always assume:

  • Field might be missing


3. Never Reuse Field Meaning

If status = 1 meant “active” before…

Don’t suddenly make it mean “pending”.

That’s how bugs become legends.


4. Version Your APIs (But Don’t Abuse It)

Versioning helps, but:

If you rely on versioning too much, your system becomes a museum.


RPC vs REST: Encoding in Communication

Chapter 4 also touches on service communication.


REST (HTTP + JSON)

✔ Simple
✔ Debuggable
❌ Loose contracts


RPC (gRPC, Thrift)

✔ Strong contracts
✔ Efficient
❌ Tighter coupling


👉 Trend today:

  • External APIs → REST

  • Internal services → gRPC / binary


Message Passing (Async Systems)

Instead of direct calls:

  • Services communicate via messages (Kafka, queues)

This makes encoding even more critical:

Messages might be consumed days… or weeks later.

So compatibility isn’t optional—it’s survival.


The Big Idea

Chapter 4 boils down to this:

Data systems must evolve without breaking.

And that requires:

  • Careful encoding choices

  • Strong compatibility guarantees

  • Discipline (yes, the boring part)


Practical Takeaways (Engineer Survival Kit)

1. Data format is a long-term decision

Changing it later is painful.


2. JSON is fine… until it isn’t

Use it for APIs, not everything.


3. Prefer schema-based formats internally

Avro / Protobuf > raw JSON for systems


4. Design for evolution from day one

Assume:

  • Fields will change

  • Services will version

  • Data will live forever


5. Backward compatibility is non-negotiable

Break it once → regret it forever


Final Thoughts

Chapter 4 is where you realize:

Building systems isn’t just about writing code.
It’s about designing change.

Because in real life:

  • Requirements change

  • Teams change

  • Systems evolve

And your data has to survive all of it.