Caskey Engineering

About Eric

Safety-Critical Systems & AI-Augmented Engineering
Executive Summary

Eric is a Senior Software Engineer at Amazon within a platform engineering org whose mandate is to be a force multiplier for service teams. He owns the monitoring platform that standardizes infrastructure monitoring across thousands of services, and he built a workflow orchestration control plane serving development teams across multiple global regions. That work centers on safety-adherent workflows that meet Amazon reliability standards.

At Amazon, he designed safety guardrail validations for a pre-execution engine that runs multiple concurrent checks before every automated operation — change control, regional isolation, monitoring readiness, and pipeline health. He built a Just-In-Time composite monitoring system that aggregates multiple alarm sources into ephemeral execution-scoped monitors, converting O(N) health polling to O(1). He implemented a non-opt-outable environment isolation guardrail and contributed to a tiered monitoring response model that distinguishes between rollback-triggering and log-only conditions.

He created a large-scale specification-as-code system as the single source of truth for the platform, and a hub-and-spoke multi-agent setup with specialist coding agents scoped by architecture layer. The bet: context architecture beats documentation dumps — each task loads a curated spec slice, not the whole corpus.

With 15+ years of experience spanning Amazon and Prudential Financial, Eric builds platforms that balance safety enforcement with engineering velocity — from enterprise monitoring standardization and real-time VPN security operations to distributed workflow orchestration and AI-augmented development.

Enterprise Platform Leadership

My career spans two chapters of this work:

At Amazon (Senior Software Engineer)
  • Pre-execution safety engine with multiple concurrent guardrail checks across a multi-service orchestration platform
  • JIT composite monitoring — O(1) health checks by aggregating multiple alarm sources into ephemeral execution-scoped monitors
  • Automated rollback monitoring that discovers deployment health signals from environment configuration
  • Non-opt-outable environment stage isolation guardrail — defense-in-depth at the application layer
  • Large-scale specification-as-code system powering specialist AI agents via hub-and-spoke context architecture
At Prudential Financial (SRE / Cloud Engineering)
  • Enterprise monitoring standardization — 400,000+ monitors, migration from legacy stacks
  • Real-time VPN security tool scanning every global endpoint for high-risk sessions — used by ICC, help desk, and operations teams for on-demand session termination
  • QR Code MFA enrollment portal — eliminated help-desk-mediated provisioning across all enterprise help desks
  • RSA Self Service Portal + Mobile App — full platform ownership
  • Sole technical owner of all financial reporting and accounting delivery during COVID-19
Org-Wide Impact

Design safety guardrails that enforce monitoring, rollback, and change control before every automated infrastructure operation

Architect specification-driven AI systems where specialist agents inherit curated context from a central hub

Build composite monitoring systems that convert O(N) health polling to O(1) for safety-critical execution paths

Drive platform safety standards across a multi-service platform spanning Control Plane, Data Plane, and cross-cutting layers

Pioneer context architecture patterns that partition what AI agents see — preventing overload while maintaining full platform understanding

Mission

To empower organizations and individuals to achieve more through technology — by building resilient systems, fostering innovation, and leading with autonomy, trust, and shared context. Outside of engineering: a dad, a marathoner, a constant reader on AI and distributed systems, and a hands-on team builder.

Core Values
Integrity
Innovation
Empowerment
Service
Continuous Learning
Key Skills
Distributed Systems
Workflow Orchestration
Control Plane Safety
Safety-Critical Systems
Automated Rollback & Monitoring
Python
Java
TypeScript / CDK
AWS (CloudWatch, DynamoDB, Lambda, ECS, S3)
Infrastructure as Code
AI-Augmented Development
Specification-Driven Development
Multi-Agent AI Systems
System Design
Platform Engineering
Reliability Engineering
Mentorship & Leadership
Explore
Career Journey

Enterprise platform work at Amazon, Prudential, and beyond.

Read
Technical Blog

Articles on distributed systems, platform architecture, and engineering leadership.