Skip to main navigation Skip to search Skip to main content

PKGPT: Expert-Orchestrated Recursive LLM Agent for Automated NONMEM PopPK Modeling with Human Benchmarking

  • Hoyoung Kwack
  • , Hyunseung Kong
  • , Jiwoo Lim
  • , Byoung Tak Zhang
  • , Jongsung Hahn*
  • , Min Jung Chang*
  • *Corresponding author for this work
  • Yonsei University
  • Seoul National University

Research output: Contribution to journalJournal articlepeer-review

Abstract

Background/Objectives: Population pharmacokinetic (PopPK) modeling in NONMEM requires iterative, expertise-dependent workflows. Naïve zero-shot prompting of general-purpose large language models (LLMs) typically produces NONMEM code that fails to execute. This study introduces PKGPT, a recursive agentic LLM system designed to automate NONMEM-based PopPK model development and benchmarks its performance against human expert models. Methods: PKGPT, powered by Google’s Gemini 3.0 Flash, embeds pharmacometrics expertise into phase-specific expert-agent prompts orchestrated across five sequential phases: base model establishment, structural diagnostics, overfitting reduction, random-effects optimization, and covariate analysis. The system recursively executes NONMEM, parses outputs, and iteratively refines control streams. PKGPT was evaluated on three public datasets (warfarin, theophylline, and tobramycin) and benchmarked against independently developed human expert models. Results: PKGPT consistently produced executable, converging NONMEM models across all three datasets. In warfarin, both PKGPT and the human expert selected a one-compartment oral structure (ADVAN2), but the expert achieved a lower OFV (294.41 vs. 484.43) via covariate scaling. In theophylline, PKGPT produced parameter estimates close to the expert solution (Ka = 1.59 vs. 1.46 h−1; CL = 0.0399 vs. 0.0404 L/h/kg). In tobramycin, PKGPT correctly identified a two-compartment structure but produced physiologically implausible peripheral volume estimates (V2 = 149 L vs. expert’s 13.2 L). Across datasets, PKGPT did not identify clinically established covariates, and run-to-run reproducibility was variable. Conclusions: PKGPT substantially improves the robustness and usability of LLM-generated NONMEM code compared with naïve zero-shot prompting, accelerating model drafting and iterative refinement, but physiological plausibility and clinical interpretability still require a human-in-the-loop oversight.

Original languageEnglish
Article number501
JournalPharmaceutics
Volume18
Issue number4
DOIs
StatePublished - 2026.04

Keywords

  • agentic AI
  • human-in-the-loop
  • large language model
  • model automation
  • model informed drug development
  • NONMEM
  • population pharmacokinetics
  • recursive self-correction

Fingerprint

Dive into the research topics of 'PKGPT: Expert-Orchestrated Recursive LLM Agent for Automated NONMEM PopPK Modeling with Human Benchmarking'. Together they form a unique fingerprint.

Cite this