Abstract
Recent large language models (LLMs) have shown great performance in medical question answering (QA), but are still limited in use due to challenges such as training and inference costs, medical domain prompt sensitivity, and lack of evaluation frameworks. To address this, a system has been built for fine-Tuning and evaluating medical LLMs. Using QLoRA for low-memory fine-Tuning, the system integrates the Hugging Face Accelerate framework with multi-GPU distributed training. The lm-eval-harness ensures robust automated evaluation. The validity of the system is demonstrated using the MedGemma 2B model and KorMedMCQA benchmarks. The experimental results show that sLLM can achieve 78.87% accuracy on MedQA, while maintaining training efficiency. This suggests that prompt engineering can outperform meticulously calibrated models, offering a cost-effective way to implement medical LLMs. This work presents a scalable, efficient, and reproducible approach for developing high-performance LLMs, laying the foundation for future clinical integration using transparent systems.
| Original language | English |
|---|---|
| Title of host publication | 2025 International Conference on Platform Technology and Service, PlatCon 2025 - Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 6-8 |
| Number of pages | 3 |
| Edition | 2025 |
| ISBN (Electronic) | 9798331576226 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 International Conference on Platform Technology and Service, PlatCon 2025 - Jeju, Korea, Republic of Duration: 2025.08.25 → 2025.08.27 |
Conference
| Conference | 2025 International Conference on Platform Technology and Service, PlatCon 2025 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Jeju |
| Period | 25.08.25 → 25.08.27 |
Keywords
- Distributed Training (FSDP)
- Medical Large Language Models(Medical LLMs)
- Medical Question Answering (Medical QA)
- Parameter-Efficient Fine-Tuning (PEFT)
- QLoRA
Fingerprint
Dive into the research topics of 'SLLM: A Memory-Efficient Fine-Tuning and Evaluation Pipeline for Medical Large Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver