Projects
Content Agent: AI Research & Publishing System
A production LangGraph system that researches, fact-checks, and publishes technical articles through grounded retrieval, claim verification, human review, and safe publishing workflows. 100% recall@3 / 96.7% recall@1 on a 30-query golden eval set. Live demo + case study.
Code-Fix Agent: Self-Correcting AI System
A goal-directed AI agent that takes broken Python scripts, executes them, diagnoses the failure, patches the code, and retries on its own. Built with a LangGraph state machine, human-in-the-loop approval, LangSmith tracing, and a 95% fix rate across 20 diverse error types.
Knowledge Agent: Persistent RAG with Hybrid Search
Local memory-augmented agent with hybrid retrieval (BM25 + dense + cross-encoder reranking), claim verification, and persistent session memory. 92% accuracy, 100% tool-routing accuracy on adversarial test cases.
CLI Research Agent: Raw Agent Loop + LangGraph Rebuild
Terminal-based research agent built from scratch using OpenAI-compatible tool-calling, with no frameworks involved. Takes a question, searches the web, reads source pages, writes a structured markdown report. Later rebuilt with LangGraph to compare raw loop vs. state machine execution.
Bosch Production Line: Predictive Quality Control
End-to-end ML pipeline predicting manufacturing failures on 1.18M rows with 171:1 class imbalance. Engineered path features revealing 72× failure rate signal: certain station paths fail at 41.7% vs 0.58% global mean. Chunk-aware CV and phased feature roadmap progressing from MCC 0.19 → 0.33, targeting ≥ 0.52.
Silent Recalls: Live Vehicle Safety Monitoring
Production-grade ETL pipeline monitoring NHTSA complaints with live risk tracking. Automated detection of vehicles with dangerous complaint-to-recall ratios. GMC Sierra 1500: 445 complaints, zero recalls. Weekly automated runs with hash-based alerting.
Bearing Failure Prediction: 2.88h Accuracy
Production-grade ML system predicting bearing RUL with 2.88-hour accuracy in critical zones. 10x improvement through weighted loss optimization. Modeled ~$300K annual savings and modeled 98.5% failure prevention in critical zones.
About
I build agentic LLM systems and the evaluation infrastructure that makes them reliable: claim-level grounding verification, hybrid retrieval, reflection loops, and human-in-the-loop control, with observability and cost/latency tracking on every run.
My flagship, Content Agent, runs a full research-to-publish pipeline: it retrieves sources, drafts, verifies every factual claim against those sources, self-critiques, routes through human approval, and publishes live. Every run is measured against a golden evaluation set.
Underneath the agent work is a deeper foundation in production ML (predictive maintenance, manufacturing defect modeling, and automotive safety analytics), built across years in mechanical engineering and at Honda R&D Americas. That domain background is why I treat agent design as a reliability problem first: anticipate the failure modes, measure them, engineer around them.
I'm interested in systems that anticipate failures, learn from context, and operate reliably in production.
Skills
Agentic AI & LLMs
- LLM Tool Calling & Agent Loops
- RAG: Hybrid Retrieval & Reranking
- LangGraph State Machines
- Claim Verification & Grounding
Evaluation & Observability
- Evaluation Harnesses & Golden Datasets
- LangSmith Tracing
- Cost & Latency Tracking
- Failure-Mode & Regression Testing
Machine Learning & Data
- Predictive Modeling & Risk Systems
- Time Series Analysis
- Feature Engineering
- Statistical Analysis
Engineering & Infrastructure
- Python, SQL, PostgreSQL
- FastAPI, Docker, AWS
- ETL Pipeline Design
- GitHub Actions, CI & Netlify
Learning Log
Supervised Learning Models: Technical Notes
A breakdown of the model families you reach for most in production: linear models, tree-based ensembles, and gradient boosting. Deep dives on each.
LightGBM: A Practitioner's Guide
How LightGBM actually works: leaf-wise growth, histogram binning, and why it often beats XGBoost on tabular data. Includes practical tuning notes and common pitfalls.
Data Preprocessing: Technical Notes
A systematic series on preparing data for machine learning: feature scaling, encoding, missing data, and outlier treatment.
Data Preprocessing: Feature Scaling Deep Dive
Why scaling matters, when it doesn't, and how to pick the right method for your model. A practical guide covering StandardScaler, MinMaxScaler, RobustScaler, and the assumptions each one makes about your data.
Contact
Open to AI Engineer roles: full-time, contract, and remote (US / Global).