System Design

Architecture Overview

From raw observability signals to structured AI triage — a production-ready data flow built for extensibility.

01 · Inputs

🐕

Datadog

APM traces, metrics, monitor alerts

📈

Grafana

Alertmanager webhooks, dashboards

🔎

Splunk

Log search & aggregated events

📟

PagerDuty

Incident event streams

🛡️

Sentry

Error tracking & stack traces

☁️

CloudWatch

AWS metrics & log groups

↓

02 · Processing Pipeline

⚙️Input Parser

Zod validated

Normalizes raw signals into a structured incident context envelope. Handles JSON, plaintext, and multiline log formats.

🔎Vector Search

pgvector / Pinecone

Embeds incident context and performs nearest-neighbor search against a library of historical incidents for pattern matching.

🧠Incident Memory

Supabase (planned)

Stores enriched incident context across sessions. Enables follow-up queries and post-incident report generation.

⚡LLM Engine

OpenAI / Anthropic

Sends structured context + retrieved patterns to GPT-4 / Claude with a carefully engineered SRE-domain system prompt.

↓

03 · Outputs

🔍

Root Cause Analysis

Probable root cause with supporting evidence from the input signals.

🗺️

Service Impact Map

Blast radius — impacted services, upstream and downstream dependencies.

🚨

Severity Scoring

Automated P1–P4 classification. Overridable by on-call engineer.

🧭

Debugging Runbooks

Step-by-step recovery actions and links to relevant runbook pages.

💬

Stakeholder Update

Pre-written Slack message ready to send to engineering and business stakeholders.

📊

Confidence Score

0–100 transparency score so engineers know how much to trust the AI output.

Tech Stack

Frontend

Next.js 15 App Router
TypeScript strict
Tailwind CSS
Framer Motion

Validation

Zod schemas
Input sanitization
Type-safe API boundaries

AI Layer

OpenAI GPT-4o (planned)
Anthropic Claude fallback
Mock service (current)
Streaming responses

Storage

Supabase (planned)
pgvector embeddings
Incident audit log
User sessions

Deployment

Vercel (Edge Functions)
GitHub CI/CD
Environment-safe secrets
Preview deployments

Design Principles

Replaceable AI backend

The mock service matches the same interface as a real OpenAI call. Swap one function, keep everything else.

Feature-based folder structure

Each feature owns its UI, logic, and types. No monolithic files.

Type-safe boundaries

Zod validates all inputs at API boundaries. TypeScript strict mode everywhere.

Accessible by default

Semantic HTML, keyboard navigation, and proper color contrast ratios throughout.

Edge-ready deployment

Server Components + Vercel Edge Functions for fast global response times.

Observable architecture

Every layer is designed to integrate with real monitoring tools as the system scales.