Skip to content
Artificial Intelligence

New T2 Scaling Laws: Optimize AI with Smaller, Data-Rich Models

Researchers from the University of Wisconsin-Madison and Stanford University have introduced a groundbreaking framework. The "Train-to-Test" (T2) scaling laws revolutionize large language model (LLM) optimization.

person Redacción Tricuatro calendar_month 17 April, 2026 schedule 2 min read Add us on

Researchers from the University of Wisconsin-Madison and Stanford University have introduced the "Train-to-Test" (T2) scaling laws. This new framework optimizes the computational budget for artificial intelligence. It allows developers to maximize the performance of large language models (LLMs) by considering inference costs, not just training expenses. This innovation is crucial for real-world applications seeking both efficiency and accuracy.

Until now, standard guidelines for building LLMs primarily optimized for training costs. This presented a significant challenge for practical applications. Many of these applications use inference-time scaling techniques, such as drawing multiple reasoning samples, to increase the accuracy of model responses.

The T2 scaling laws bridge this gap by jointly optimizing three crucial factors. They consider a model's parameter size, its training data volume, and the number of test-time inference samples. This comprehensive approach redefines how we think about AI efficiency.

It is compute-optimal to train substantially smaller models on vastly more data.

In practice, the research demonstrates a surprising and highly valuable finding. It is compute-optimal to train substantially smaller models on vastly more data than traditional rules prescribe. The saved computational overhead is then used to generate multiple repeated samples at inference. This changes the game for efficiency.

For enterprise AI application developers who are training their own models, this research provides a proven blueprint. It helps them maximize their return on investment. It shows that AI reasoning does not necessarily require spending huge amounts on frontier models.

Instead, smaller models can yield stronger performance on complex tasks. They also keep per-query inference costs manageable within real-world deployment budgets. This democratizes access to powerful, efficient AI.

Scaling laws are an important part of developing large language models. Pretraining scaling laws dictate the best way to allocate compute during a model's creation. Test-time scaling laws, on the other hand, guide how to allocate compute during deployment. This includes letting the model "think longer" or generating multiple reasoning samples to solve complex problems.

The problem is that these scaling laws have been developed completely independently. Yet, they are fundamentally intertwined. A model's parameter size and training duration directly dictate both the quality and the per-query cost of its inference samples.

Currently, the industry gold standard for pretraining is the Chinchilla rule. This suggests a compute-optimal ratio of roughly 20 training tokens for every model parameter. However, creators of modern AI model families, such as Llama, Gemma, and Qwen, regularly break this rule. They intentionally overtrain their smaller models on massive amounts of data.

Nicholas Roberts, co-author of the paper, told VentureBeat that the traditional approach falters when building complex agentic workflows. "In my view, the inference stack breaks down when each individual inference call is expensive," he stated. "This is the case when the models are large and you need to do a lot of repeated sampling." Instead of relying on massive models, developers can use overtrained compact models. This allows them to run this repeated sampling at a fraction of the cost. It's a brilliant strategy!

Because training and test-time scaling laws were examined in isolation, there was no rigorous framework. This framework would calculate how much a model should be overtrained. The amount would depend on how many reasoning samples it will need to generate during deployment. The T2 laws finally solve this crucial unknown.

Share:
Also available in: ES

Related articles

Latest news

View all

Comments (0)

No comments yet. Be the first!

Leave a comment