Senior SRE & Platform Engineer — Kubernetes Observability, AI-assisted Tooling & Cloud Operations
Abingdon-on-Thames, United Kingdom
5 years in site reliability and platform engineering, working on the platform that runs seismic interpretation and subsurface planning tools for energy companies in production across Europe, the Americas, and Asia-Pacific.
I build tooling that makes Kubernetes observable and incidents shorter. My current focus is AI-assisted SRE tooling — deterministic CLI tools paired with Claude Code reasoning skills.
| Project | Description | Status |
|---|---|---|
| kubectl-sentinel | 10-section Kubernetes health checker. Nodes, pods, workloads, probes, services, HPAs, PVCs. Structured JSON + HTML output. Works in CI. | |
| incident-triage | PagerDuty alert → root cause → causation chain → fix plan. Deterministic correlation engine. Python stdlib only. |
Kubernetes Observability ██████████████████░░ Prometheus, Thanos, Grafana, Kiali, Istio
Incident Response █████████████████░░░ PagerDuty, causation chains, root cause classification
Platform Automation ████████████████░░░░ Helm, ArgoCD, CI/CD pipelines, tenant provisioning
Cloud Operations ██████████████░░░░░░ GCP, Azure, multi-region, cost optimisation
Security & Compliance ████████████░░░░░░░░ RBAC, TLS, pod security, secrets hygiene
AI-assisted Tooling ████████████░░░░░░░░ Dual-layer CLI + LLM reasoning pattern (building)
- The dual-layer pattern — a deterministic CLI layer that works at 3am in CI, paired with an AI reasoning layer that explains why. Separating them gives you both portability and intelligence.
- Severity as an exit code —
0/1/2makes health checkers CI-composable. Think of them as HTTP status codes for shell tools. - Classify before you recommend — OOMKill does not mean "raise the limit". The root cause determines the fix.
- Building for the 3am reader — every design decision in an ops tool should be made as if the reader has been awake for 3 hours and needs to act in 5 minutes.
→ GreenRanger-IT/learning-journal — architecture decisions and engineering thinking from building production SRE tooling
Working toward CKA → CKS → CNCF contributor. Open to collaborating on Kubernetes observability and platform engineering tooling.
