Skip to content

This is part of an ongoing series: Building a Personal Data Lab.

Why I Built a Data Lab

I wanted a place where I could build and break real systems.

Most of my work with data platforms and AI happens in environments where iteration is slow — large systems, high stakes, and long feedback loops. That’s necessary, but it makes it difficult to test ideas quickly.

I wanted a place where I could move faster.


The limits of theory

It’s easy to understand modern data architecture at a conceptual level:

  • lakehouse patterns
  • streaming pipelines
  • orchestration layers
  • graph databases
  • AI systems

But those ideas only really click when you build them.

After pushing an old gaming PC to its limit, I ran into a different problem — I could stand up infrastructure, or I could process data, but not both. There wasn’t enough headroom to actually learn through doing.

At the same time, the motherboard (from a machine I bought in 2007) started failing and wouldn’t even recognize the RAM.

That was a pretty clear signal.


Why not just use the cloud?

I looked at the obvious option — cloud providers.

They all offer some version of a free tier, but in practice: - resources are heavily constrained
- experimentation is limited
- mistakes can be expensive

More importantly, cloud environments abstract away many of the constraints I actually want to understand.

I didn’t need: - 11 nines of durability
- the latest hardware
- production-grade reliability

What I needed was:

a place where I could break things, iterate quickly, and not worry about the bill.


Building a lab instead

So I went to eBay.

If there’s one thing millennials who grew up in big families tend to accumulate, it’s old hardware. This felt like a natural extension of that instinct — just more intentional.

The goal wasn’t to build something cutting-edge.
It was to build something good enough to learn on.

The result:

  • Supermicro X10SRi-F
  • 16-core 2.6GHz processor
  • 256GB RAM
  • repurposed SSDs and HDDs for storage
  • NVIDIA Titan (24GB) for AI workloads

By modern standards, this is already aging hardware.

But compared to where I started, it’s a massive step up.

More importantly, it gives me:

  • enough capacity to run a full data platform
  • room to experiment without constant constraint
  • a safe environment to fail

What this enables

This lab isn’t about building a perfect system.

It’s about building real systems:

  • standing up a full lakehouse stack
  • integrating streaming and batch pipelines
  • connecting structured data with graph models
  • experimenting with AI on top of it

Not in isolation, but as a system — where failures, bottlenecks, and tradeoffs become visible.


What comes next

The plan is simple:

  1. Get the machine up and running (Ubuntu, clean slate)
  2. Stand up a lightweight Kubernetes environment (k3d)
  3. Deploy a base data platform
  4. Start working with real data

I’m planning to start with datasets like: - weather
- financial markets
- flight data

Big enough to matter, small enough to manage locally.


Why this matters

Most systems look clean in architecture diagrams.
Very few remain clean once they’re running.

The only way to understand the difference is to build them — to see where things break, where complexity accumulates, and what actually matters.

This lab is a space to do exactly that.


A final note

There’s also something deeply satisfying about working closer to the metal.

Spinning up cloud resources is convenient.
But installing hardware, configuring a system from scratch, and bringing it to life — that’s a different kind of feedback loop.

For me, it’s not just practical.
It’s therapeutic.


Series: Building a Personal Data Lab

  • Part 1: Why I Built a Data Lab (this post)
  • Part 2: Designing the Data Lab Architecture (coming next)

→ Next: Designing the Data Lab Architecture