This is part of an ongoing series: Building a Personal Data Lab.
Why I Built a Data Lab
I wanted a place where I could build and break real systems.
Most of my work with data platforms and AI happens in environments where iteration is slow — large systems, high stakes, and long feedback loops. That’s necessary, but it makes it difficult to test ideas quickly.
I wanted a place where I could move faster.
The limits of theory
It’s easy to understand modern data architecture at a conceptual level:
- lakehouse patterns
- streaming pipelines
- orchestration layers
- graph databases
- AI systems
But those ideas only really click when you build them.
After pushing an old gaming PC to its limit, I ran into a different problem — I could stand up infrastructure, or I could process data, but not both. There wasn’t enough headroom to actually learn through doing.
At the same time, the motherboard (from a machine I bought in 2007) started failing and wouldn’t even recognize the RAM.
That was a pretty clear signal.
Why not just use the cloud?
I looked at the obvious option — cloud providers.
They all offer some version of a free tier, but in practice:
- resources are heavily constrained
- experimentation is limited
- mistakes can be expensive
More importantly, cloud environments abstract away many of the constraints I actually want to understand.
I didn’t need:
- 11 nines of durability
- the latest hardware
- production-grade reliability
What I needed was:
a place where I could break things, iterate quickly, and not worry about the bill.
Building a lab instead
So I went to eBay.
If there’s one thing millennials who grew up in big families tend to accumulate, it’s old hardware. This felt like a natural extension of that instinct — just more intentional.
The goal wasn’t to build something cutting-edge.
It was to build something good enough to learn on.
The result:
- Supermicro X10SRi-F
- 16-core 2.6GHz processor
- 256GB RAM
- repurposed SSDs and HDDs for storage
- NVIDIA Titan (24GB) for AI workloads
By modern standards, this is already aging hardware.
But compared to where I started, it’s a massive step up.
More importantly, it gives me:
- enough capacity to run a full data platform
- room to experiment without constant constraint
- a safe environment to fail
What this enables
This lab isn’t about building a perfect system.
It’s about building real systems:
- standing up a full lakehouse stack
- integrating streaming and batch pipelines
- connecting structured data with graph models
- experimenting with AI on top of it
Not in isolation, but as a system — where failures, bottlenecks, and tradeoffs become visible.
What comes next
The plan is simple:
- Get the machine up and running (Ubuntu, clean slate)
- Stand up a lightweight Kubernetes environment (k3d)
- Deploy a base data platform
- Start working with real data
I’m planning to start with datasets like:
- weather
- financial markets
- flight data
Big enough to matter, small enough to manage locally.
Why this matters
Most systems look clean in architecture diagrams.
Very few remain clean once they’re running.
The only way to understand the difference is to build them — to see where things break, where complexity accumulates, and what actually matters.
This lab is a space to do exactly that.
A final note
There’s also something deeply satisfying about working closer to the metal.
Spinning up cloud resources is convenient.
But installing hardware, configuring a system from scratch, and bringing it to life — that’s a different kind of feedback loop.
For me, it’s not just practical.
It’s therapeutic.
Series: Building a Personal Data Lab
- Part 1: Why I Built a Data Lab (this post)
- Part 2: Designing the Data Lab Architecture (coming next)