tech-pub

Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI

June 3, 2026 at 03:56 PMUpdated: Jun 41 Sources

TL;DR

In this post, you learn how to use Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) together to improve the tool-calling accuracy of a small language model (SLM). The example uses Amazon SageMaker AI training jobs, so you can focus on training code instead of managing your own training infrastructure. You also learn how to evaluate tool-calling accuracy and compare a base model to several fine-tuned variants, so you can make data-driven decisions about model quality.

Nauti's Take

The upside: tuning small models with SFT and DPO so they call tools reliably can cut cost and latency versus large LLMs. The catch: the effort for training data, evaluation and pipelines is real, and without a clean dataset the fine-tuning yields little.

Practically, the approach pays off mostly for teams with well-defined tool workflows that want measurable accuracy gains.

Sources

3.6.26

Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI

#agents #amazon

TL;DR

Nauti's Take

Sources

Related stories

From Our Newsletter