It was 2 AM. My phone buzzed. Our monitoring system was screaming. I had just run a migration script that was supposed to clean up old records. Instead, it deleted half our user table.
Here is what happened and what I learned.
The Setup
We had a table with inactive users. The business wanted to remove accounts that hadn’t logged in for over 2 years. Simple enough, right?
I wrote a migration script that looked something like this:
DELETE FROM users WHERE last_login < '2022-01-01';
Seemed harmless. It was just old data we didn't need anymore.
What Went Wrong
What I didn't know was that another team had changed the last_login column to be nullable and added a default of NULL for new signups who hadn't logged in yet.
So my query deleted:
- All inactive users (intended)
- All users who signed up but never logged in (not intended)
- About 40% of our total user base
The query ran. The logs went quiet. Then came the alerts.
The Recovery
Thank god we had point-in-time recovery enabled. We restored from a backup 15 minutes before the migration. But those 15 minutes of new signups? Gone.
We had to manually reach out to about 200 affected users and ask them to re-sign up. Not a great look.
What I Learned
1. Always filter with explicit conditions
Instead of relying on one column, I now always use explicit filters:
DELETE FROM users WHERE last_login IS NOT NULL AND last_login < '2022-01-01';
2. Run with SELECT first
Before any delete or update, run the same WHERE clause with SELECT to see what you're actually matching:
SELECT COUNT(*) FROM users WHERE last_login IS NOT NULL AND last_login < '2022-01-01';
3. Use transactions with rollback
Wrap destructive operations in transactions. You can always roll back if things look wrong.
4. Test on staging first
Staging had different data patterns. I should have mirrored production-like data or at least counted the rows on staging to verify.
Bottom Line
Simple migrations bite hard. The queries that look the most harmless are often the most dangerous. Now I triple-check every delete, update, or migration script before running it on anything beyond my local machine.
And yes, I set up a separate alert for any table deletes going forward.
