Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI
TL;DR
In this post, you learn how to use Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) together to improve the tool-calling accuracy of a small language model (SLM). The example uses Amazon SageMaker AI training jobs, so you can focus on training code instead of managing your own training infrastructure. You also learn how to evaluate tool-calling accuracy and compare a base model to several fine-tuned variants, so you can make data-driven decisions about model quality.
Nauti's Take
The upside: tuning small models with SFT and DPO so they call tools reliably can cut cost and latency versus large LLMs. The catch: the effort for training data, evaluation and pipelines is real, and without a clean dataset the fine-tuning yields little.
Practically, the approach pays off mostly for teams with well-defined tool workflows that want measurable accuracy gains.