System Design
Architecture Overview
From raw observability signals to structured AI triage — a production-ready data flow built for extensibility.
01 · Inputs
🐕
Datadog
APM traces, metrics, monitor alerts
📈
Grafana
Alertmanager webhooks, dashboards
🔎
Splunk
Log search & aggregated events
📟
PagerDuty
Incident event streams
🛡️
Sentry
Error tracking & stack traces
☁️
CloudWatch
AWS metrics & log groups
↓
02 · Processing Pipeline
⚙️Input Parser
Zod validatedNormalizes raw signals into a structured incident context envelope. Handles JSON, plaintext, and multiline log formats.
🔎Vector Search
pgvector / PineconeEmbeds incident context and performs nearest-neighbor search against a library of historical incidents for pattern matching.
🧠Incident Memory
Supabase (planned)Stores enriched incident context across sessions. Enables follow-up queries and post-incident report generation.
⚡LLM Engine
OpenAI / AnthropicSends structured context + retrieved patterns to GPT-4 / Claude with a carefully engineered SRE-domain system prompt.
↓
03 · Outputs
🔍
Root Cause Analysis
Probable root cause with supporting evidence from the input signals.
🗺️
Service Impact Map
Blast radius — impacted services, upstream and downstream dependencies.
🚨
Severity Scoring
Automated P1–P4 classification. Overridable by on-call engineer.
🧭
Debugging Runbooks
Step-by-step recovery actions and links to relevant runbook pages.
💬
Stakeholder Update
Pre-written Slack message ready to send to engineering and business stakeholders.
📊
Confidence Score
0–100 transparency score so engineers know how much to trust the AI output.
Tech Stack
Frontend
- Next.js 15 App Router
- TypeScript strict
- Tailwind CSS
- Framer Motion
Validation
- Zod schemas
- Input sanitization
- Type-safe API boundaries
AI Layer
- OpenAI GPT-4o (planned)
- Anthropic Claude fallback
- Mock service (current)
- Streaming responses
Storage
- Supabase (planned)
- pgvector embeddings
- Incident audit log
- User sessions
Deployment
- Vercel (Edge Functions)
- GitHub CI/CD
- Environment-safe secrets
- Preview deployments
Design Principles
Replaceable AI backend
The mock service matches the same interface as a real OpenAI call. Swap one function, keep everything else.
Feature-based folder structure
Each feature owns its UI, logic, and types. No monolithic files.
Type-safe boundaries
Zod validates all inputs at API boundaries. TypeScript strict mode everywhere.
Accessible by default
Semantic HTML, keyboard navigation, and proper color contrast ratios throughout.
Edge-ready deployment
Server Components + Vercel Edge Functions for fast global response times.
Observable architecture
Every layer is designed to integrate with real monitoring tools as the system scales.