Skip to content

Navid's Blog

Ideas, Experiments, and Lessons Learned

Menu
Menu

I Accidentally Crashed Our Production Server by Opening Too Many Database Connections

Posted on March 25, 2026 by Navid

It was 2 AM when my phone started buzzing. Production server down. Database connections maxed out. Great.

What Happened

We had a simple feature — fetch user data from an external API and save it to our database. Sounds easy, right?

The code looked something like this:

for user in users:
    external_data = fetch_from_api(user.id)
    save_to_db(external_data)

Looks fine. But we had 50,000 users. The loop ran sequentially, each iteration waiting for the external API to respond. Some requests took 5 seconds. Some took 30.

The Real Problem

Here’s what I missed: every time we called the external API, we were creating a new HTTP connection and never closing it properly. On top of that, our database connection pool was configured to allow 100 connections, but our app was trying to open way more than that.

Within minutes, we hit the connection limit. New requests started queuing up. The server ran out of memory. Everything stopped.

What I Learned

  • Set connection pool limits — Don’t just use defaults. Know your limits and respect them.
  • Use context managers — Always use ‘with’ statements or try/finally to close connections. Every single time.
  • Add timeouts — External API calls should have timeouts. Waiting forever is not a strategy.
  • Monitor connections — Set up alerts for connection pool usage before it hits 80%.

The Fix

We added connection pooling for the HTTP calls (using requests.Session), set a reasonable pool size, added timeouts, and wrapped everything in proper context managers. The whole thing took about 30 minutes to fix once we understood the problem.

The bug taught me something important: it’s not about writing code that works. It’s about writing code that fails gracefully when things go wrong. Because in production, things always go wrong.

Categories

  • AI Experiments
  • Coding
  • Debugging Stories
  • Hot Takes
  • Ideas
  • Lessons Learned
  • Project Management
  • Uncategorized
  • Vibe Coding

Recent Posts

  • How I Handled My First Production Outage (And What I Learned)
  • I Finally Fixed Our Slow Database Queries — Here’s What Actually Worked
  • I Finally Fixed Our Slow Database Queries — Here’s What Actually Worked
  • Why I Stopped Using Microservices for Small Projects
  • I Gave AI Full Access to Our Production Database. Here’s What Happened