The Machinist

AI Engineer: Agentic Systems & LLM Applications • Tool Calling • RAG • LangGraph • ML Systems

Projects

Content Agent: AI Research & Publishing System

LangGraph DeepSeek Qdrant FastAPI Docker

A production LangGraph system that researches, fact-checks, and publishes technical articles through grounded retrieval, claim verification, human review, and safe publishing workflows. 100% recall@3 / 96.7% recall@1 on a 30-query golden eval set. Live demo + case study.

Code-Fix Agent: Self-Correcting AI System

Python LangGraph DeepSeek API LangSmith subprocess sandbox

A goal-directed AI agent that takes broken Python scripts, executes them, diagnoses the failure, patches the code, and retries on its own. Built with a LangGraph state machine, human-in-the-loop approval, LangSmith tracing, and a 95% fix rate across 20 diverse error types.

Knowledge Agent: Persistent RAG with Hybrid Search

Python LangGraph ChromaDB BM25 Sentence Transformers

Local memory-augmented agent with hybrid retrieval (BM25 + dense + cross-encoder reranking), claim verification, and persistent session memory. 92% accuracy, 100% tool-routing accuracy on adversarial test cases.

CLI Research Agent: Raw Agent Loop + LangGraph Rebuild

Python DeepSeek API Tavily httpx BeautifulSoup LangGraph

Terminal-based research agent built from scratch using OpenAI-compatible tool-calling, with no frameworks involved. Takes a question, searches the web, reads source pages, writes a structured markdown report. Later rebuilt with LangGraph to compare raw loop vs. state machine execution.

Bosch Production Line: Predictive Quality Control

Python LightGBM Optuna Feature Engineering Streamlit

End-to-end ML pipeline predicting manufacturing failures on 1.18M rows with 171:1 class imbalance. Engineered path features revealing 72× failure rate signal: certain station paths fail at 41.7% vs 0.58% global mean. Chunk-aware CV and phased feature roadmap progressing from MCC 0.19 → 0.33, targeting ≥ 0.52.

Silent Recalls: Live Vehicle Safety Monitoring

Python PostgreSQL ETL Pipeline Streamlit GitHub Actions

Production-grade ETL pipeline monitoring NHTSA complaints with live risk tracking. Automated detection of vehicles with dangerous complaint-to-recall ratios. GMC Sierra 1500: 445 complaints, zero recalls. Weekly automated runs with hash-based alerting.

Bearing Failure Prediction: 2.88h Accuracy

Python LightGBM PostgreSQL Optuna Streamlit

Production-grade ML system predicting bearing RUL with 2.88-hour accuracy in critical zones. 10x improvement through weighted loss optimization. Modeled ~$300K annual savings and modeled 98.5% failure prevention in critical zones.

About

I build agentic LLM systems and the evaluation infrastructure that makes them reliable: claim-level grounding verification, hybrid retrieval, reflection loops, and human-in-the-loop control, with observability and cost/latency tracking on every run.

My flagship, Content Agent, runs a full research-to-publish pipeline: it retrieves sources, drafts, verifies every factual claim against those sources, self-critiques, routes through human approval, and publishes live. Every run is measured against a golden evaluation set.

Underneath the agent work is a deeper foundation in production ML (predictive maintenance, manufacturing defect modeling, and automotive safety analytics), built across years in mechanical engineering and at Honda R&D Americas. That domain background is why I treat agent design as a reliability problem first: anticipate the failure modes, measure them, engineer around them.

I'm interested in systems that anticipate failures, learn from context, and operate reliably in production.

Skills

Agentic AI & LLMs

  • LLM Tool Calling & Agent Loops
  • RAG: Hybrid Retrieval & Reranking
  • LangGraph State Machines
  • Claim Verification & Grounding

Evaluation & Observability

  • Evaluation Harnesses & Golden Datasets
  • LangSmith Tracing
  • Cost & Latency Tracking
  • Failure-Mode & Regression Testing

Machine Learning & Data

  • Predictive Modeling & Risk Systems
  • Time Series Analysis
  • Feature Engineering
  • Statistical Analysis

Engineering & Infrastructure

  • Python, SQL, PostgreSQL
  • FastAPI, Docker, AWS
  • ETL Pipeline Design
  • GitHub Actions, CI & Netlify

Contact

Open to AI Engineer roles: full-time, contract, and remote (US / Global).