System Design

Architecture Overview

From raw observability signals to structured AI triage — a production-ready data flow built for extensibility.

01 · Inputs
🐕
Datadog
APM traces, metrics, monitor alerts
📈
Grafana
Alertmanager webhooks, dashboards
🔎
Splunk
Log search & aggregated events
📟
PagerDuty
Incident event streams
🛡️
Sentry
Error tracking & stack traces
☁️
CloudWatch
AWS metrics & log groups
02 · Processing Pipeline
⚙️Input Parser
Zod validated

Normalizes raw signals into a structured incident context envelope. Handles JSON, plaintext, and multiline log formats.

🔎Vector Search
pgvector / Pinecone

Embeds incident context and performs nearest-neighbor search against a library of historical incidents for pattern matching.

🧠Incident Memory
Supabase (planned)

Stores enriched incident context across sessions. Enables follow-up queries and post-incident report generation.

LLM Engine
OpenAI / Anthropic

Sends structured context + retrieved patterns to GPT-4 / Claude with a carefully engineered SRE-domain system prompt.

03 · Outputs
🔍
Root Cause Analysis
Probable root cause with supporting evidence from the input signals.
🗺️
Service Impact Map
Blast radius — impacted services, upstream and downstream dependencies.
🚨
Severity Scoring
Automated P1–P4 classification. Overridable by on-call engineer.
🧭
Debugging Runbooks
Step-by-step recovery actions and links to relevant runbook pages.
💬
Stakeholder Update
Pre-written Slack message ready to send to engineering and business stakeholders.
📊
Confidence Score
0–100 transparency score so engineers know how much to trust the AI output.

Tech Stack

Frontend
  • Next.js 15 App Router
  • TypeScript strict
  • Tailwind CSS
  • Framer Motion
Validation
  • Zod schemas
  • Input sanitization
  • Type-safe API boundaries
AI Layer
  • OpenAI GPT-4o (planned)
  • Anthropic Claude fallback
  • Mock service (current)
  • Streaming responses
Storage
  • Supabase (planned)
  • pgvector embeddings
  • Incident audit log
  • User sessions
Deployment
  • Vercel (Edge Functions)
  • GitHub CI/CD
  • Environment-safe secrets
  • Preview deployments

Design Principles

Replaceable AI backend
The mock service matches the same interface as a real OpenAI call. Swap one function, keep everything else.
Feature-based folder structure
Each feature owns its UI, logic, and types. No monolithic files.
Type-safe boundaries
Zod validates all inputs at API boundaries. TypeScript strict mode everywhere.
Accessible by default
Semantic HTML, keyboard navigation, and proper color contrast ratios throughout.
Edge-ready deployment
Server Components + Vercel Edge Functions for fast global response times.
Observable architecture
Every layer is designed to integrate with real monitoring tools as the system scales.