M·P mahrad pisheh var
  • about
  • research
  • publications
  • projects
  • writing
  • say hi →
available for senior / staff roles · 2026

Dr. Mahrad Pisheh Var.

currently Senior AI Engineer

↓ tl;dr I design and ship LLM-powered systems — agents, retrieval pipelines, and automations that hold up in production. PhD in RL; day-to-day in tool-use, evals, and the plumbing that makes models actually useful.

See the work → Get in touch ✉ [email protected]
3
degrees
3+
publications
6y
ML research
∞
curiosity
●●● agent-exploration.live RL
fig. 01 memory-driven agent
§01·About

The short version.

I'm an AI engineer with a PhD in Computer Science from the University of Essex. I build LLM-powered systems — agentic workflows, retrieval pipelines, evaluation harnesses, and the unglamorous plumbing that makes a model actually useful in production.

My research background is in reinforcement learning and memory-augmented agents, which turns out to be excellent preparation for modern LLM work: tool-use, planning, context management, and reward-shaped evaluation are the same problems in new clothes. I care about systems that behave reliably on the long tail, not demos that only work on the happy path.

nowSenior AI Engineer. Shipping LLM agents, automations, and RAG systems. Open to senior / staff / applied-research roles.

// stack
  • Python · async
  • LangGraph / LangChain
  • OpenAI / Anthropic APIs
  • Vector DBs · pgvector
  • PyTorch · fine-tuning
  • Evals · tracing · telemetry
// locations
📍 UK · remote-friendly
🛫 Open to relocation
§02·Research interests

What I think about.

Four overlapping threads. Most of my projects touch two or more of these.

01

LLM agents & tool use

Multi-step agents that plan, call tools, recover from failure, and produce auditable traces. Orchestration with LangGraph and hand-rolled state machines.

LangGraphtool-useplanning
02

Retrieval & RAG systems

Production-grade retrieval pipelines — hybrid search, rerankers, query rewriting, chunk-level eval. Making models answer from sources they can cite.

RAGpgvectorrerank
03

Automation with LLMs

Replacing brittle rule-based workflows with LLM pipelines that extract, classify, summarise, and act — with guardrails and human-in-the-loop where it matters.

workflowsstructured-outputguardrails
04

Evals & alignment

How do you know it works? Offline evals, LLM-as-judge, red-teaming, drift monitoring. My RL background applied to a new kind of reward signal.

evalsRLHFmonitoring
§03·Publications

Peer-reviewed work.

Tap a paper to read the abstract.

  1. A simple memory augmentation technique that lets tabular Q-learning solve binary cell-structured mazes with randomly placed exits — without Recurrent Neural Networks. We expand the state with a cell-visit history, letting a frozen policy generalise to newly-generated exits and, across five maze problems of varying complexity, outperform standard deep-learning baselines.

    Read paper →
  2. A simulated organism must explore an environment containing a food pile — observing, remembering, planning, and navigating to regions of strongest food density. We compare RL algorithms to adaptive dynamic programming and show backpropagation through time convincingly solves this recurrent-network challenge, mimicking the fundamental objectives of a minimal sentient organism.

    Read paper →
  3. This thesis examines ADP agents using Backpropagation Through Time in continuous spaces. A Memory-Based BPTT (MBPTT) refines decision-making in partially observable environments; LSTM and GRU memory models simulate a "functionally sentient" organism. Two main contributions: simple memory-augmented agents outperform existing techniques in maze navigation; and BPTT combined with memory functions handles bicycle navigation and food-search tasks efficiently.

    Read paper →
§04·Projects

Things I've built.

Selected from my GitHub. Each one started as a question I couldn't stop thinking about.

LLM search db api code fig. 02
LLM agents · observability

Agent Ops Console

A tracing + eval dashboard for multi-step LLM agents. Tracks tool calls, token spend, failure modes, and replays runs deterministically. LangGraph + OpenTelemetry under the hood.

"If you can't trace it, you can't trust it."github ↗
fig. 03
RAG · clinical NLP

Clinical Query Assistant

A retrieval-augmented QA system over medical literature. Hybrid search + cross-encoder rerank, citations on every claim, hallucination eval on a curated gold set.

"Accuracy matters more than fluency here."github ↗
fig. 04
workflow automation · function-calling

Inbox Autopilot

An LLM-driven email triage agent. Classifies, drafts replies, schedules follow-ups, and escalates edge cases. Function-calling against Gmail + calendar + CRM APIs.

"Automated the boring 80% so I could focus on the 20% that matters."github ↗
fig. 05
reinforcement learning · bio

Neurofolding RNA Design

An RL environment where an agent designs RNA sequences that fold into target secondary structures. Free-energy-shaped reward, curriculum-trained policy.

"Chemistry with a policy gradient."github ↗
fig. 06
evals · LLM-as-judge

Eval Harness

A lightweight framework for running offline evals on LLM outputs. Pairwise judges, rubric scoring, regression alerts when a new model or prompt drops quality.

"Vibes are not a metric."github ↗
fig. 07
biological AI · RL

E-coli Behaviour

The food-seeking organism from the Adaptive Behavior paper, as an interactive simulation. Swap memory models and watch policies diverge.

"Companion code for the publication."github ↗
§05·Education

Three degrees, one campus.

All at Essex. Stayed because the research group was too good to leave.

  1. 2020–2024

    Ph.D. in Computer Science

    University of Essex

    Thesis: "Minimalistic Adaptive Dynamic-Programming Agents for Memory-Driven Exploration."

  2. 2018–2019

    M.Sc. in Computer Games

    University of Essex

    Distinction. Thesis: "Investigating Immersion of Environments affecting the performance of BCI."

  3. 2015–2018

    B.Sc. in Computer Science

    University of Essex

    First Class Honors. Final project: a tile-based RPG engine built from scratch in C++.

§06·Selected writing

Occasional notes.

I write when I learn something worth writing down.

  • 2025·03

    Designing LLM agents that fail loudly

    Three patterns for making tool-use recoverable instead of catastrophic.

    7 min
  • 2025·01

    Your RAG system is lying to you

    Why retrieval metrics and answer quality drift apart — and how to catch it.

    9 min
  • 2024·11

    Evals are the new unit tests

    What RL reward design taught me about judging an LLM's output.

    11 min
→ more soon.
§07 · Contact

Hiring, collaborating,
or just curious?
Let's talk.

email[email protected]↗ github@Kodaks94↗ linkedinmahrad-pisheh-var↗
— M.P.V.est. 2024 · built with caffeine
0