GPT-5 Fine-Tuning Opens to All Paid API Developers
OpenAI has made GPT-5 fine-tuning available to all paid API developers, supporting custom training runs with as few as 10 examples and introducing a supervised fine-tuning dashboard with real-time loss curve visualization.
Original sourceOpenAI has expanded access to GPT-5 fine-tuning beyond its earlier limited beta, making the capability available to all developers on paid API tiers. The update lowers the floor for custom model training significantly — developers can now initiate a fine-tuning job with as few as 10 labeled examples, compared to the hundreds typically required by prior GPT-3.5 and GPT-4 workflows.
Alongside the access expansion, OpenAI introduced a supervised fine-tuning dashboard that surfaces real-time loss curve visualization during training runs. This lets developers monitor convergence without pulling metrics programmatically or waiting for job completion emails, addressing one of the more friction-heavy parts of the previous fine-tuning workflow.
GPT-5 fine-tuning follows the same API structure as previous generations, meaning existing fine-tuning integrations require minimal changes to adopt the new model. Pricing for fine-tuning GPT-5 follows a per-token training cost model, with inference on the resulting custom model billed at a premium above standard GPT-5 API rates — a structure consistent with how OpenAI has handled fine-tuned model tiers historically.
The move positions OpenAI more directly against managed fine-tuning offerings from Anthropic, Google, and open-weight alternatives like Llama and Mistral, where fine-tuning has been available and in some cases cheaper. The 10-example minimum is a notable accessibility claim, though the practical quality of outputs at that data volume will depend heavily on task specificity and how well GPT-5's base capabilities align with the target domain.
Panel Takes
The Builder
Developer Perspective
“The primitive here is straightforward: supervised fine-tuning over GPT-5 with a training API that mirrors prior generations, plus a dashboard that shows loss curves in real time instead of making you poll a job status endpoint like it's 2019. The DX bet is backwards-compatibility — if your fine-tuning pipeline already targets GPT-4, swapping the model string should mostly work, and that's the right call. The 10-example minimum is interesting but I'd want to see what the loss curves actually look like at that data volume before trusting it on anything production-critical.”
The Skeptic
Reality Check
“The 10-example claim is doing a lot of work in this announcement and nobody should ship a production fine-tune on 10 examples without serious evaluation — that number gets you a proof-of-concept, not a reliable custom model. The real competition here isn't Anthropic's fine-tuning, it's open-weight models where you own the weights and the inference cost doesn't compound every time you call the endpoint. What kills this in 12 months: OpenAI's inference pricing on fine-tuned models makes the math painful at scale, and the moment Llama-class models hit GPT-5-level base quality, the lock-in argument evaporates entirely.”
The Founder
Business & Market
“The buyer here is any engineering team that was already paying for GPT-5 API access and wants task-specific performance without switching vendors — it's an upsell baked into existing budget, which is clean distribution. The moat concern is real though: fine-tuned weights still live in OpenAI's infrastructure, so the switching cost is your training data and job history, not the model itself. The pricing architecture — premium inference rates on top of training costs — means this scales linearly against the customer's success, which is fine until a competitor offers hosted fine-tuning on a comparable model at commodity inference rates.”
The PM
Product Strategy
“The job-to-be-done is sharp: adapt GPT-5 to a specific task or domain without managing your own infrastructure, and know whether it's working before the job finishes. The real-time loss curve dashboard is the product decision that makes this more than an API endpoint — it turns a black-box training job into something a developer can actually intervene on, which is the difference between a tool and a workflow. The open question is completeness: without tooling for dataset management, versioning fine-tuned models against eval sets, and A/B testing base vs. fine-tuned in production, developers will still be stitching together half the workflow themselves.”