↓ tl;dr I design and ship LLM-powered systems — agents, retrieval pipelines, and automations that hold up in production. PhD in RL; day-to-day in tool-use, evals, and the plumbing that makes models actually useful.
I'm an AI engineer with a PhD in Computer Science from the University of Essex. I build LLM-powered systems — agentic workflows, retrieval pipelines, evaluation harnesses, and the unglamorous plumbing that makes a model actually useful in production.
My research background is in reinforcement learning and memory-augmented agents, which turns out to be excellent preparation for modern LLM work: tool-use, planning, context management, and reward-shaped evaluation are the same problems in new clothes. I care about systems that behave reliably on the long tail, not demos that only work on the happy path.
nowSenior AI Engineer. Shipping LLM agents, automations, and RAG systems. Open to senior / staff / applied-research roles.
Four overlapping threads. Most of my projects touch two or more of these.
Multi-step agents that plan, call tools, recover from failure, and produce auditable traces. Orchestration with LangGraph and hand-rolled state machines.
Production-grade retrieval pipelines — hybrid search, rerankers, query rewriting, chunk-level eval. Making models answer from sources they can cite.
Replacing brittle rule-based workflows with LLM pipelines that extract, classify, summarise, and act — with guardrails and human-in-the-loop where it matters.
How do you know it works? Offline evals, LLM-as-judge, red-teaming, drift monitoring. My RL background applied to a new kind of reward signal.
Tap a paper to read the abstract.
A simple memory augmentation technique that lets tabular Q-learning solve binary cell-structured mazes with randomly placed exits — without Recurrent Neural Networks. We expand the state with a cell-visit history, letting a frozen policy generalise to newly-generated exits and, across five maze problems of varying complexity, outperform standard deep-learning baselines.
Read paper →A simulated organism must explore an environment containing a food pile — observing, remembering, planning, and navigating to regions of strongest food density. We compare RL algorithms to adaptive dynamic programming and show backpropagation through time convincingly solves this recurrent-network challenge, mimicking the fundamental objectives of a minimal sentient organism.
Read paper →This thesis examines ADP agents using Backpropagation Through Time in continuous spaces. A Memory-Based BPTT (MBPTT) refines decision-making in partially observable environments; LSTM and GRU memory models simulate a "functionally sentient" organism. Two main contributions: simple memory-augmented agents outperform existing techniques in maze navigation; and BPTT combined with memory functions handles bicycle navigation and food-search tasks efficiently.
Read paper →Selected from my GitHub. Each one started as a question I couldn't stop thinking about.
A tracing + eval dashboard for multi-step LLM agents. Tracks tool calls, token spend, failure modes, and replays runs deterministically. LangGraph + OpenTelemetry under the hood.
A retrieval-augmented QA system over medical literature. Hybrid search + cross-encoder rerank, citations on every claim, hallucination eval on a curated gold set.
An LLM-driven email triage agent. Classifies, drafts replies, schedules follow-ups, and escalates edge cases. Function-calling against Gmail + calendar + CRM APIs.
An RL environment where an agent designs RNA sequences that fold into target secondary structures. Free-energy-shaped reward, curriculum-trained policy.
A lightweight framework for running offline evals on LLM outputs. Pairwise judges, rubric scoring, regression alerts when a new model or prompt drops quality.
The food-seeking organism from the Adaptive Behavior paper, as an interactive simulation. Swap memory models and watch policies diverge.
All at Essex. Stayed because the research group was too good to leave.
Thesis: "Minimalistic Adaptive Dynamic-Programming Agents for Memory-Driven Exploration."
Distinction. Thesis: "Investigating Immersion of Environments affecting the performance of BCI."
First Class Honors. Final project: a tile-based RPG engine built from scratch in C++.
I write when I learn something worth writing down.
Three patterns for making tool-use recoverable instead of catastrophic.
Why retrieval metrics and answer quality drift apart — and how to catch it.
What RL reward design taught me about judging an LLM's output.