Infrastructure · Automation · Reliability Engineering — Fairfield, CA. Get in touch →

Infrastructure · Automation · Reliability

Micheal Breedlove

Building resilient infrastructure, distributed automation systems, and AI-assisted operational platforms.

Featured Project View Architecture

Evidence: Cluster Architecture · Live Status · Proof Pack · GitHub

Infrastructure & automation engineer.
Designed a self-healing AI cluster orchestration platform with adaptive routing, shared memory, and automated disaster recovery.
Seeking: Security Engineer / DevSecOps / Platform / SRE roles.

Get in Touch Resume PDF

About

Infrastructure engineer building reliable, automated, recovery-first systems.

I design and operate distributed infrastructure with an emphasis on automation, resilience, and operational discipline. My background includes military intelligence (former 35N SIGINT), operations leadership in high-pressure environments, and hands-on systems engineering. Currently completing a B.S. in Cybersecurity at WGU.

Self-healing AI cluster orchestration platform with adaptive routing and shared memory
Distributed task queue with health-aware dispatch and automated failure recovery
Proxmox virtualization, OPNsense firewall, ZFS-backed storage with snapshot policies
SRE automation: SLO tracking, incident management, automated curation and DR drills
CompTIA Tech+ certified, targeting Security+ · B.S. Cybersecurity at WGU (in progress)

Featured Project

AI Cluster Orchestration Platform

A self-healing control plane that coordinates distributed worker nodes, manages shared operational knowledge, and automates recovery, routing, and validation workflows across a 4-node homelab cluster.

Adaptive Routing Shared Memory Self-Healing Recovery-First

Jasper orchestrates health-aware job dispatch across Nova, Mira, and Orin based on role specialization, real-time health, and observed task performance. A shared Markdown memory corpus on ZFS-backed storage provides durable operational knowledge, while local semantic indexes keep each node fast and independent.

Architecture → Live Status →

Jasper
Orchestrator · Inference

Nova
Automation · Monitoring

Mira
Network · Auditing

Orin
Analysis · Validation

NAS · ZFS · Snapshots

Adaptive Routing

Tasks assigned based on node health, role specialization, and observed success rates. Routing improves automatically as execution history builds.

Shared Knowledge

Shared Markdown corpus on NAS with per-node daily notes. Nightly curation promotes durable facts and archives old observations automatically.

Self-Healing

Runtime watchdog detects degradation, quarantines bad state, and restores from verified backups. Preflight guards block startup on corrupted state.

Disaster Recovery

Portable recovery bundle rebuilds the entire orchestrator. Monthly DR drills validate the bundle in a sandboxed environment without touching live state.

Autonomous Operations

The orchestrator generates maintenance and remediation tasks automatically. Risky actions require approval; safe operations execute without intervention.

Durable Job Queue

File-based queue on shared storage with dispatch, tracking, retry logic, and stale-job detection. No external database dependencies.

Selected Systems

Supporting projects with documentation, code, and case studies.

SRE Reliability Pipeline

SLO tracking with burn-rate alerting, automated incident management, postmortem generation, and safety-gated actions.

Read Case Study →

GitOps Backup System

Automated daily backups with CI-enforced secret scanning. Sanitized state committed to version control with zero credential exposure.

Read Case Study →

Infrastructure Status

Live operational dashboard with node health, SLO metrics, queue activity, and cluster orchestration status — updated automatically.

View Status →

Proof Pack → View on GitHub

Engineering Capabilities

Core strengths demonstrated through working systems, not just theory.

Infrastructure Automation

Ansible, Proxmox, systemd, scheduled tasks, and infrastructure-as-code with CI/CD pipelines and GitOps workflows.

Reliability Engineering

SLO tracking, burn-rate alerting, safety gates, incident management, and recovery-first system design.

Distributed Systems

Multi-node orchestration, shared-nothing memory architecture, file-based job queues, and health-aware task dispatch.

AI-Assisted Operations

Local LLM inference, autonomous task generation with approval gates, and AI-orchestrated cluster management.

Recovery & Resilience

ZFS snapshots, self-healing watchdogs, portable recovery bundles, monthly DR drills, and conservative safety policies.

Documentation Discipline

Automated knowledge curation, architecture documentation, operational runbooks, and clear rollback procedures.

What This Demonstrates

These projects show how I approach real infrastructure problems: design for failure, automate recovery, observe everything, and document decisions. I build systems that heal themselves, route work intelligently, and produce clear operational records — the same principles that matter at production scale.

Architecture Live Status Proof Pack Get in Touch