
🧞♂️ New to Exponential Scale? Each week, I provide tools, tips, and tricks for tiny teams with big ambitions that want to scale big. For more: Exponential Scale Podcast | Scalebrate | Scalebrate Hub
Founding Supporters: Support the following people and companies because they supported us from the beginning: DataEI | Dr. Bob Schatz | .Tech Domains | Fairman Studios | Jean-Philippe Martin | RocketSmart AI | UMBC
In today's newsletter:
Latest Podcasts: What You Missed
The Chill Work Manifesto: Interview with Rand Fishkin, Co-founder & CEO of SparkToro - Rand Fishkin raised $29M in VC at Moz, watched it grow to 200+ employees — and then deliberately built his next company to run on 2.5 people with tens of thousands of customers.
The Book That Replaced the Sales Team: Interview with Gia Laudi, Customer-Led - A 4-person team with no sales department counts Bitly, Sprout Social, and dbt Labs as clients, because a book replaced the sales motion entirely.
No-Code as Leverage - Interview with Emmanuel Straschnov, CEO & Co-Founder of Bubble. - Emmanuel bootstrapped Bubble for seven years without a dollar of outside capital. Today, companies built on Bubble have generated over $1 billion in revenue in 2025 alone.
Stop Broadcasting, Start Focusing - Interview with Brennan Dunn, CEO & Founder, RightMessage - Brennan Dunn built something different: a behavioral system that segments, routes, and converts without a marketing team behind it. ~$1M ARR business with just himself and some contract help.
Community as the Unfair Advantage: Interview with Gina Bianchini, Founder & CEO of Mighty Networks - Learn why community works as a retention engine, what founders get wrong about building one, and how Mighty Networks powers over $500M in revenue for its customers.
Standardize the "First 5 Minutes": The First Five Minutes Protocol
The server goes down. A customer is furious. A critical bug breaks checkout.
What happens next?
In most companies: chaos.
People panic. Someone Slacks "URGENT!!!" Someone else starts debugging blindly. A third person calls a client to apologize—but has no information.
15 minutes later, you realize no one actually diagnosed the problem. You just reacted to it.
This is the incident chaos trap. And it wastes hours—sometimes days—on problems that could've been solved in 30 minutes with a clear protocol.
The fix? Standardize the first 5 minutes.
When something breaks, everyone knows exactly what to do—in order, with no panic.
The 6-Hour Outage That Should've Been 30 Minutes
Let me tell you about Lisa, founder of a 7-person SaaS company.
One Friday afternoon, their app went down. Completely offline.
Here's what happened (from Lisa's post-mortem):
2:15pm: Customer reports "App isn't loading."
2:17pm: Support forwards to engineering. Engineer A starts investigating.
2:22pm: Engineer B also starts investigating (didn't know A was already on it).
2:30pm: Engineer A suspects database issue. Starts checking database logs.
2:35pm: Engineer B suspects CDN issue. Starts checking CDN settings.
2:45pm: Lisa joins Slack thread. Asks "What's happening?"
2:50pm: Both engineers realize they're investigating different things. They haven't diagnosed the root cause—just guessing.
3:00pm: Lisa asks, "Has anyone contacted customers?"
3:05pm: Support scrambles to draft an email. But they don't know what to say because engineering hasn't diagnosed the issue.
3:30pm: Engineering finally identifies the problem: Database ran out of storage (disk full).
3:45pm: Fix applied. App back online.
4:00pm: Customers notified.
Total downtime: 1 hour 45 minutes.
But here's the kicker: The actual fix took 15 minutes.
The other 90 minutes were wasted on:
Duplicate work (2 engineers investigating independently)
No clear owner (who's in charge?)
No diagnosis protocol (jumping to solutions before understanding the problem)
No customer communication plan (support waited for engineering)
Lisa implemented a "First 5 Minutes" protocol.
Now, when an incident happens:
Minute 1: Incident declared in #incidents Slack channel. Someone says "I'm Incident Commander."
Minute 2: Incident Commander assigns roles:
Investigator: Diagnose the issue
Communicator: Update customers
Documenter: Log everything
Minute 3-5: Investigator runs diagnostic checklist:
Is the server up? (Check monitoring dashboard)
Is the database responsive? (Run health check)
Is the CDN working? (Check status page)
Are there errors in logs? (Check error monitoring)
Minute 5: Communicator sends holding message to customers: "We're aware of an issue and investigating. ETA for update: 15 minutes."
Next incident:
2:15pm: App goes down.
2:16pm: Engineer declares incident. Becomes Incident Commander.
2:17pm: Roles assigned (Investigator, Communicator, Documenter).
2:18-2:20pm: Diagnostic checklist run. Root cause identified: Database connection pool exhausted.
2:21pm: Fix identified and applied.
2:25pm: App back online.
2:26pm: Customers notified: "Issue resolved. Root cause: database connection limit. We've increased capacity."
Total downtime: 10 minutes.
"The First 5 Minutes protocol turned chaos into a system. Now we solve incidents 10x faster."
Why the First 5 Minutes Matter
Most incidents aren't hard to fix. They're hard to diagnose—because there's no process.
Think of it like a fire drill.
When a fire alarm goes off, people don't stand around asking "What should we do?"
They follow the protocol:
Evacuate
Gather at the designated spot
Account for everyone
No panic. No confusion. Just a clear process.
The same applies to incidents.
Without a protocol:
People panic
Multiple people start working independently (duplicating effort)
No one owns the problem
Diagnosis is slow or skipped entirely
Customers are left in the dark
With a protocol:
Clear roles (who's doing what)
Systematic diagnosis (not random guessing)
Fast communication (customers get updates immediately)
Root cause identified quickly
The First 5 Minutes sets the tone for everything that follows.
Why This Matters for Microteams
Big companies have incident response teams, on-call engineers, and documented runbooks.
You? You have 5-7 people wearing multiple hats, and when something breaks, everyone scrambles.
Here's why a First 5 Minutes protocol is critical:
You don't have redundancy. One person panicking = 20% of your team out of commission.
Downtime is expensive. Every minute offline = lost revenue, angry customers, damaged trust.
Chaos compounds. Unclear ownership leads to duplicate work, slow diagnosis, and delayed fixes.
Customers expect communication. Silence during an outage is worse than the outage itself.
The best microteams don't panic when things break. They follow a protocol.
The First 5 Minutes Protocol Framework
Here's how to standardize incident response so your team moves fast, not chaotically.
Step 1: Define What Counts as an "Incident"
Not every issue is an incident.
Incident = Something that significantly impacts customers or business operations.
Examples of incidents:
App is down
Critical feature broken (e.g., checkout, login)
Data loss or corruption
Security breach
Major performance degradation (app unusably slow)
Not incidents:
Minor bugs (e.g., typo on a page)
Feature requests
Internal tools broken (unless they block critical work)
Rule: If customers are impacted or revenue is at risk, it's an incident.
Step 2: Create an #incidents Channel (or Equivalent)
Centralize all incident communication in one place.
Platform options:
Slack/Discord: Create #incidents channel
Microsoft Teams: Create Incidents team
Email: Create [email protected] distribution list (not ideal—too slow)
Rules for #incidents:
Only incidents go here (no chit-chat)
First person to declare incident becomes Incident Commander (unless they delegate)
All updates happen in this channel (keeps everyone aligned)
Step 3: Assign Roles in Minute 1-2
Every incident needs clear ownership.
Three core roles:
1. Incident Commander (IC)
Owns the incident end-to-end
Assigns other roles
Makes final decisions
Ensures protocol is followed
2. Investigator(s)
Diagnoses the root cause
Implements the fix
Reports status to IC
3. Communicator
Updates customers (email, status page, Twitter, etc.)
Updates stakeholders (leadership, investors if needed)
Optional role:
4. Documenter
Logs timeline of events
Records decisions made
Creates post-mortem doc
How to assign:
Incident Commander posts in #incidents:
"Incident declared: App is down. I'm IC.
Investigator: @Engineer1
Communicator: @Support1
Documenter: @PM1
Let's go."
This takes 60 seconds. But it eliminates confusion.
Step 4: Run the Diagnostic Checklist (Minute 3-5)
Don't jump to solutions. Diagnose first.
Diagnostic checklist (customize to your stack):
1. Is the app up?
Check monitoring (Pingdom, UptimeRobot, Datadog)
Try accessing from browser
2. Is the server responding?
SSH into server or check cloud dashboard (AWS, GCP, etc.)
3. Is the database healthy?
Check database connection
Check disk space, CPU, memory
4. Are there errors in logs?
Check error monitoring (Sentry, Rollbar, CloudWatch)
Look for spikes in errors
5. Is the CDN working?
Check CDN status page (Cloudflare, Fastly)
6. Are third-party services down?
Check status pages (Stripe, Twilio, AWS, etc.)
7. Was there a recent deploy?
Check deployment history
If yes, rollback immediately
Run through this checklist systematically. Don't skip steps.
Goal: Identify root cause in 5 minutes or less.
Step 5: Communicate Immediately (Minute 5)
Even if you don't have a fix yet, communicate.
Communicator sends holding message:
Email to customers:
"We're currently experiencing an issue with [brief description]. Our team is investigating and we'll have an update within [15 minutes / 30 minutes / 1 hour].
We apologize for the inconvenience."
Status page update:
"Investigating: [Brief description of issue]. Updates to follow."
Social media (if relevant):
"We're aware of an issue affecting [X]. Investigating now. Updates soon."
Why this matters:
Customers feel acknowledged (not ignored)
Sets expectations (they know you're working on it)
Reduces support load (fewer "Is it just me?" emails)
Update every 15-30 minutes until resolved.
Step 6: Fix and Verify (Minute 5+)
Once root cause is identified, implement fix.
Investigator:
Implements fix
Verifies fix works (tests in production or staging)
Reports to IC: "Fix applied and verified."
IC confirms with Communicator: "Incident resolved. Send all-clear message."
Communicator sends resolution message:
"The issue has been resolved. [Brief description of what happened and what we did to fix it.] We apologize for the disruption. If you continue to experience issues, please contact support."
Step 7: Document and Debrief (Post-Incident)
After the incident is resolved, create a post-mortem.
Post-mortem template:
## Incident Post-Mortem: [Date]
**Incident:** [Brief description]
**Duration:** [Start time - End time]
**Impact:** [Who was affected, how many customers]
**Root Cause:** [What caused it]
**Timeline:**
- 2:15pm: Issue detected
- 2:16pm: Incident declared
- 2:20pm: Root cause identified
- 2:25pm: Fix applied
- 2:26pm: Verified resolved
**What Went Well:**
- Fast diagnosis (5 min)
- Clear communication to customers
**What Went Wrong:**
- Monitoring didn't catch issue before customer reported
- Fix took longer than expected due to [reason]
**Action Items:**
1. [Preventive measure to avoid recurrence]
2. [Improvement to incident response]
3. [Update to monitoring/alerting]
**Owner:** [Name]
**Due:** [Date]
Hold a 15-minute debrief within 24 hours.
Goal: Learn and improve, not blame.
The First 5 Minutes Checklist (Print and Post)
When an incident occurs:
Minute 1:
[ ] Declare incident in #incidents channel
[ ] Assign Incident Commander
Minute 2:
[ ] IC assigns roles (Investigator, Communicator, Documenter)
Minute 3-5:
[ ] Investigator runs diagnostic checklist
[ ] Identify root cause
Minute 5:
[ ] Communicator sends holding message to customers
Minute 5+:
[ ] Investigator implements fix
[ ] IC verifies resolution
[ ] Communicator sends all-clear message
Post-incident:
[ ] Document timeline
[ ] Create post-mortem
[ ] Hold debrief
[ ] Implement action items
Common Mistakes (and How to Avoid Them)
Mistake 1: No clear Incident Commander
Everyone assumes someone else is handling it
Fix: First person to declare incident is IC (or explicitly delegates)
Mistake 2: Jumping to solutions before diagnosing
Wasting time fixing the wrong thing
Fix: Force yourself to run the diagnostic checklist first
Mistake 3: No customer communication
Customers assume you're ignoring the problem
Fix: Communicator sends holding message within 5 minutes, even if no fix yet
Mistake 4: Multiple people working independently
Duplicate effort, no coordination
Fix: IC assigns specific roles—only Investigator diagnoses
Mistake 5: No post-mortem
Same incident happens again because root cause wasn't addressed
Fix: Always create a post-mortem, even for small incidents
Tools to Support the Protocol
Incident management:
PagerDuty — Alerts, on-call rotations, incident tracking
Opsgenie — Similar to PagerDuty, integrates with monitoring tools
Incident.io (Slack app) — Manages incidents directly in Slack
Status pages:
Statuspage.io — Hosted status page for customer updates
Atlassian Statuspage — Same as above
StatusCast — Cheaper alternative
Monitoring:
Datadog, New Relic — Full-stack monitoring
UptimeRobot, Pingdom — Uptime monitoring
Sentry, Rollbar — Error tracking
Communication:
Slack / Discord — #incidents channel
Email templates — Pre-written holding messages and resolution messages
Today's 10-Minute Action Plan
You don't need to build a full incident protocol today. Just draft the basics.
Here's what to do in the next 10 minutes:
Create #incidents channel in Slack (or equivalent)
Pin the First 5 Minutes checklist to the channel
Write a 3-step diagnostic checklist for your most common incident (e.g., "App down")
Draft a holding message template for customer communication
Share with the team: "Next incident, we follow this protocol."
That's it. One protocol drafted, 10 minutes.
Next time an incident happens, run the protocol. After, refine it based on what worked and what didn't.
A Final Thought
Incidents happen. Systems break. Bugs ship.
The question isn't "Will something go wrong?"
The question is: "When something goes wrong, do we have a system to handle it—or do we panic?"
Most teams panic. Because they have no protocol.
They waste time on duplicate work, slow diagnosis, and poor communication.
The First 5 Minutes protocol turns chaos into clarity.
It gives your team a playbook. A clear set of steps. No guessing. No confusion.
Just a system that works—even under pressure.
So stop winging it when things break.
Standardize the first 5 minutes.
Because the best teams aren't the ones that never have incidents.
They're the ones that handle incidents like pros.
Refer Folks, Get Free Access
What This Is
A printable, one-page crisis response card that tells you exactly what to do in the first 5 minutes of any incident—before panic sets in and you make things worse.
Why You Need This
When something breaks, your brain doesn't work right.
Your customer is screaming. Your website is down. Your payment processor failed. Your best client just threatened to leave. In that moment, you're not thinking clearly—you're in fight-or-flight mode.
That's when bad decisions happen:
You fix the symptom, not the problem (and it breaks again in an hour)
You forget to notify stakeholders (and they find out from Twitter)
You skip documentation (and the same issue happens next week)
You panic-post a half-baked response (and make the PR crisis worse)
The first 5 minutes determine whether an incident stays small or explodes into a disaster.
A crisis card is your "break glass in case of emergency" protocol. It's a forcing function that overrides panic with process. You don't think—you follow the card.
How to Use This Crisis Card
Print It and Pin It: Put it somewhere visible (above your desk, in your Notion dashboard, on your phone lock screen)
Customize the Template: Add your specific notification channels, tools, and contacts
Walk Your Team Through It: Make sure everyone knows the protocol
Practice It: Run a fire drill—simulate an incident and follow the card
Update After Every Incident: Refine the protocol based on what you learn
The Template
The Crisis Card (Printable One-Pager)
🚨 CRISIS PROTOCOL: FIRST 5 MINUTES
When something breaks, STOP and follow this protocol. Do not skip steps.
⏱️ MINUTE 1: STOP & ASSESS
Don't fix anything yet. Don't notify anyone yet. Just assess.
Ask Yourself:
What broke? [System / Customer issue / Financial / PR / Security / Other]
How bad is it?
🟢 Minor: Annoying but not breaking core functionality
🟡 Moderate: Impacting some customers or revenue
🔴 Critical: Total outage, major revenue loss, security breach, or reputational crisis
Who is affected?
[ ] All customers
[ ] Subset of customers
[ ] Internal team only
[ ] Public/brand reputation
Write it down: [Incident log link or Slack channel]
⏱️ MINUTE 2: NOTIFY THE RIGHT PEOPLE
Who needs to know RIGHT NOW?
Use this decision tree:
Severity | Notify Immediately | Notify Within 30 Min | Can Wait |
|---|---|---|---|
🟢 Minor | Nobody (fix quietly) | Team lead | Everyone else |
🟡 Moderate | Team lead, affected customers | Full team, leadership | Public |
🔴 Critical | Everyone: Team, leadership, affected customers | Backup team, legal (if needed) |
Templates:
Internal Notification (Slack/Email):
🚨 INCIDENT ALERT
Severity: [Minor / Moderate / Critical]
What: [One-sentence description]
Impact: [Who/what is affected]
Status: Investigating
Owner: [Your name]
Next update: [15 min / 30 min / 1 hour]Customer Notification (Email/In-App):
We're aware of an issue affecting [specific feature/service].
Our team is actively working on it.
Estimated resolution: [timeframe or "investigating"]
We'll update you within [30 min / 1 hour].DO NOT:
Over-promise a fix timeline you can't meet
Blame third parties (even if it's their fault)
Provide technical details customers don't care about
⏱️ MINUTE 3: LOG THE INCIDENT
Open the incident log and capture:
Field | What to Write |
|---|---|
Incident ID | [Auto-increment or timestamp: INC-2025-0047] |
Severity | Minor / Moderate / Critical |
Description | What broke, in plain language |
Start Time | [Exact time issue was detected] |
Reporter | [Who reported it] |
Owner | [Who's leading the fix] |
Affected Systems | [List tools, features, or services] |
Customer Impact | [# of customers affected, revenue at risk] |
Status | Investigating / In Progress / Resolved / Monitoring |
Incident Log Location: [Link to your incident tracker: Notion, Airtable, Jira, Spreadsheet]
⏱️ MINUTE 4: STABILIZE (DON'T FIX YET)
Your goal is containment, not root-cause resolution.
Stabilization Checklist:
[ ] Stop the bleeding: Disable broken feature, rollback last deploy, switch to backup system
[ ] Preserve evidence: Take screenshots, export logs, save error messages
[ ] Prevent escalation: Block access to broken systems, pause related processes
[ ] Set a timer: You have 15-30 minutes to stabilize. If you can't, escalate.
Common Stabilization Actions:
Issue Type | Stabilization Action |
|---|---|
Website down | Switch to maintenance page, check hosting status |
Payment failure | Notify customers, manually process if possible, check processor status |
Data breach | Lock accounts, disable API access, preserve logs |
Customer complaint | Acknowledge receipt, assign owner, pause automated responses |
PR crisis | Draft holding statement, pause social posts, notify leadership |
⏱️ MINUTE 5: DEFINE NEXT STEPS & COMMUNICATE
You've bought yourself time. Now plan the fix.
Next Steps Template:
Immediate Actions (Next 30 Minutes):
[Action 1 with owner]
[Action 2 with owner]
[Action 3 with owner]
Next Update: [Specific time, e.g., "11:30 AM"]
Escalation Plan:
If not resolved by [time], escalate to [person/team]
If customer impact increases, notify [stakeholders]
Post-Incident:
[ ] Schedule post-mortem within 48 hours
[ ] Update runbook/SOP
[ ] Notify customers of resolution + what we're doing to prevent recurrence
REMINDER: Update the incident log and stakeholders every 30 minutes until resolved.
🛠️ Quick Reference: Key Contacts & Tools
Role/Tool | Contact/Link |
|---|---|
Incident Log | [Link to your tracker] |
Internal Notification Channel | #incidents (Slack) |
Customer Support Tool | [Zendesk / Intercom / Email] |
System Status Page | [If you have one] |
Leadership Contact | [Name, phone, email] |
Technical Lead | [Name, phone, email] |
Hosting/Infrastructure | [AWS / Heroku / Vercel dashboard link] |
Payment Processor Support | [Stripe / PayPal support link] |
🧯 After the Fire: Post-Incident Checklist
Within 24-48 hours of resolution:
[ ] Update customers: Send "all clear" notification with brief explanation and prevention steps
[ ] Run a post-mortem: Use the 5 Whys to find root cause
[ ] Update the runbook: Add this incident to your troubleshooting guide
[ ] Fix the process: What early warning signs did we miss? How do we prevent this?
[ ] Close the incident log: Mark as resolved, add resolution summary
Print this card. Keep it visible. Follow it every time.
Pro Tips
Tip 1: Practice With Fire Drills
Run a fake incident once a quarter. Simulate a crisis and follow the card. Time yourself. Identify gaps. Update the card.
Tip 2: Keep a "Lessons Learned" Log
After every incident, add one entry to a shared doc:
What happened
What we did right
What we missed
What we're changing
Review it monthly. Your crisis card should evolve based on real incidents.
Tip 3: Build Incident Notification Templates in Advance
Pre-write your customer notification emails and Slack messages. When the crisis hits, you just fill in the blanks—no thinking required.
That’s it for this issue.
Think Big. Stay Lean. Scale Smarter.
— Scalebrate