# Postmortem: INC-20260228-a3f7

## Summary

| Field | Value |
|---|---|
| **Incident ID** | INC-20260228-a3f7 |
| **Title** | Ollama inference latency spike on Jasper |
| **Severity** | SEV2 |
| **Trigger** | slo_high_burn (3.2x in rolling_1h) |
| **Duration** | 13 minutes |
| **Opened** | 2026-02-28T14:32:00Z |
| **Resolved** | 2026-02-28T14:45:00Z |
| **SLO** | ollama_availability (99.5% target) |

## Impact

Inference latency for Qwen 2.5 32B exceeded 2 seconds (normal: <500ms). All OpenClaw agents routing through Jasper experienced degraded response times. No data loss. No downstream incidents triggered.

**Blast Radius:** Jasper GPU inference, all agents using Ollama on Jasper
**Single Points of Failure:** Single GPU (RTX 4090) handles all inference workloads

## Root Cause

VS Code Remote Tunnel was consuming ~3GB of VRAM on the RTX 4090, leaving insufficient headroom for Ollama to serve the 32B parameter model efficiently. The model requires ~20GB VRAM; with only ~21GB available (24GB total minus 3GB), the GPU was at 98% utilization and thrashing.

## Resolution

1. Identified competing process via `nvidia-smi`
2. Killed VS Code Remote Tunnel process
3. VRAM freed to 22GB available
4. Inference latency returned to <500ms within 2 minutes

## Lessons Learned

### What went well
- SLO burn rate alert fired within 2 minutes of degradation onset
- Incident auto-created with correct severity and SLO linkage
- Resolution was straightforward once root cause identified

### What didn't go well
- No VRAM reservation policy existed for Ollama
- Manual investigation required - no automated GPU contention detection

### Where we got lucky
- The competing process was non-critical and could be killed immediately

## Action Items

| # | Action | Owner | Due | Status |
|---|---|---|---|---|
| 1 | Create VRAM reservation policy for Ollama (minimum 20GB) | Micheal | 2026-03-07 | ✅ Done |
| 2 | Add GPU utilization metric to SLO dashboard | Micheal | 2026-03-14 | Planned |
| 3 | Investigate running dev tools on a separate GPU or node | Micheal | 2026-03-21 | Planned |

---
*Auto-generated by Incident Commander — reviewed and annotated by operator.*
