How Perfection Prevents Progress: A Lesson From My Thesis

I fell into the classic developer's trap and how you can avoid making the same mistake

How Perfection Prevents Progress: A Lesson From My Thesis

Ever spent weeks building a system where the problem could be solved in a few hours?

That's exactly the trap I fell into while working on my machine learning thesis at Tetra Pak.

The Over-Engineering Trap

My task seemed straightforward: Clean messy data to build a prediction model.

But instead of rolling up my sleeves and getting started, I spent weeks trying to architect the perfect system:

  • Advanced visualizations
  • Complex data classes
  • The ultimate data pipeline that could process anything

The result? Almost nothing to show for it.

The One Simple Truth I Discovered

"You Can't automate what you haven't first done manually"

As developers, we're notorious for spending 2 hours automating a 5-minute task. Without realizing it, I'd spent 3 weeks trying to automate something that I could do manually in 3 hours.

The Breakthrough Moment

Everything changed when I shiften my goal from:

"Create the perfect data cleaning pipeline that works for every dataset"

to:

"Clean just one dataset to 80% of what I would want it to be"

Results? Task completed in one morning. I was stunned.

This simple, focused approach was so easy to iterate on that I quickly could scale it to work on the majority on my dataset.

The 80/20 Rule for Data Scientist

If you're stuck in data cleaning hell, remember that you don't need:

  • A perfect data pipeline
  • Classes hanlding all possible data formats
  • Perfectly cooperating functions

All you need is one notebook file that takes one data file from unworkable to acceptable.

It's faster, it actually works, and most importantly - You'll make real progress instead of chasing perfection.


Want to get into data science & AI? Subscribe for more lessons delivered straight to your inbox.