3 Sources
3 Sources
[1]
Thinking Machines Lab wants to make AI models more consistent
There's been great interest in what Mira Murati's Thinking Machines Lab is building with its $2 billion in seed funding and the all-star team of former OpenAI researchers who have joined the lab. In a blog post published on Wednesday, Murati's research lab gave the world its first look into one of its projects: creating AI models with reproducible responses. The research blog post, titled "Defeating Nondeterminism in LLM Inference," tries to unpack the root cause of what introduces randomness in AI model responses. For example, ask ChatGPT the same question a few times over, and you're likely to get a wide range of answers. This has largely been accepted in the AI community as a fact -- today's AI models are considered to be non-deterministic systems -- but Thinking Machines Lab sees this as a solvable problem. The post, authored by Thinking Machines Lab researcher Horace He, argues that the root cause of AI models' randomness is the way GPU kernels -- the small programs that run inside of Nvidia's computer chips -- are stitched together in inference processing (everything that happens after you press enter in ChatGPT). He suggests that by carefully controlling this layer of orchestration, it's possible to make AI models more deterministic. Beyond creating more reliable responses for enterprises and scientists, He notes that getting AI models to generate reproducible responses could also improve reinforcement learning (RL) training. RL is the process of rewarding AI models for correct answers, but if the answers are all slightly different, then the data gets a bit noisy. Creating more consistent AI model responses could make the whole RL process "smoother," according to He. Thinking Machines Lab has told investors that it plans to use RL to customize AI models for businesses, The Information previously reported. Murati, OpenAI's former chief technology officer, said in July that Thinking Machines Lab's first product will be unveiled in the coming months, and that it will be "useful for researchers and startups developing custom models." It's still unclear what that product is, or whether it will use techniques from this research to generate more reproducible responses. Thinking Machines Lab has also said that it plans to frequently publish blog posts, code, and other information about its research in an effort to "benefit the public, but also improve our own research culture." This post, the first in the company's new blog series called "Connectionism," seems to be part of that effort. OpenAI also made a commitment to open research when it was founded, but the company has become more closed off as it's become larger. We'll see if Murati's research lab stays true to this claim. The research blog offers a rare glimpse inside one of Silicon Valley's most secretive AI startups. While it doesn't exactly reveal where the technology is going, it indicates that Thinking Machines Lab is tackling some of the largest question on the frontier of AI research. The real test is whether Thinking Machines Lab can solve these problems, and make products around its research to justify its $12 billion valuation.
[2]
Mira Murati's Thinking Machines Cracks the Code on LLM Nondeterminism
Thinking Machines argues that the real culprit is the lack of batch invariance in widely used inference kernels. Large language models (LLMs) often behave unpredictably during inference, producing different outputs even when given the same prompt. Thinking Machines, an AI company founded by former OpenAI CTO Mira Murati, says it has identified the root cause of this nondeterminism and developed a solution that could make inference reproducible and reliable. In a blog post titled "Defeating Nondeterminism in LLM Inference", the company explained that the problem goes beyond the well-known issue of floating-point arithmetic and GPU concurrency. While rounding errors from parallel computations do play a role, Thinking Machines argues that the real culprit is the lack of batch invariance in widely used inference kernels. Batch invariance means that a model's output for a giv
[3]
Thinking Machines Lab reveals research on eliminating randomness in AI model responses
Thinking Machines Lab, backed by $2 billion in seed funding and staffed with former OpenAI researchers, has shared its first detailed research insights. The lab released a blog post Wednesday examining how to create AI models that produce more consistent and reproducible responses, addressing a fundamental challenge in artificial intelligence development. The blog post, titled "Defeating Nondeterminism in LLM Inference," investigates why AI models often generate varied answers to identical questions. While this variability has been accepted as an inherent characteristic of large language models, Thinking Machines Lab views this nondeterminism as a solvable problem rather than an unavoidable limitation. Researcher Horace He authored the post, arguing that randomness in AI models stems from how GPU kernels are orchestrated during inference processing. Inference processing refers to the computational steps that occur after users submit queries, such as pressing enter in ChatGPT. GPU kernels are specialized programs running on Nvidia computer chips. He believes careful management of this orchestration layer can enable AI models to generate more predictable and consistent outputs. Beyond enhancing reliability for enterprise and scientific applications, He suggests reproducible responses can streamline reinforcement learning (RL) training. Reinforcement learning rewards AI models for correct answers, but inconsistent responses introduce noise into training data. More consistent responses could improve the RL process, which aligns with The Information's previous reporting that Thinking Machines Lab plans to use RL for tailoring AI models to specific business needs. Former OpenAI Chief Technology Officer Mira Murati announced in July that Thinking Machines Lab will release its first product soon. She indicated the product will be "useful for researchers and startups developing custom models," though specific details and whether it incorporates the reproducibility techniques remain undisclosed. Thinking Machines Lab announced plans to regularly publish blog posts, code, and research outputs to "benefit the public, but also improve our own research culture." The recent post launches a new series called "Connectionism," reflecting this transparency commitment. This approach mirrors OpenAI's early open research pledge, though OpenAI became less transparent as it grew. The research blog provides rare insight into Thinking Machines Lab's operations and indicates the company is tackling significant AI research challenges while working toward products that justify its $12 billion valuation.
Share
Share
Copy Link
Mira Murati's Thinking Machines Lab unveils research on eliminating randomness in AI model responses, potentially revolutionizing the field of large language models and their applications.
Thinking Machines Lab, a $2 billion seed-funded AI research company founded by former OpenAI CTO Mira Murati, has released its first major research insights, focusing on a fundamental challenge in AI development: the inconsistency of large language model (LLM) responses
1
.In a blog post titled "Defeating Nondeterminism in LLM Inference," researcher Horace He argues that the root cause of AI models' randomness lies in the orchestration of GPU kernels during inference processing
1
. This revelation challenges the widely accepted notion that AI models are inherently non-deterministic systems.Currently, when users ask AI models like ChatGPT the same question multiple times, they often receive varying responses. This inconsistency has been largely accepted as an inherent characteristic of LLMs
2
. However, Thinking Machines Lab posits that this is a solvable problem rather than an unavoidable limitation.The research suggests that the lack of batch invariance in widely used inference kernels is the primary culprit behind LLM nondeterminism
2
. By carefully controlling the layer of orchestration for GPU kernels – small programs running on Nvidia's computer chips – it may be possible to achieve more deterministic AI model outputs1
.The ability to generate reproducible responses could have far-reaching implications for AI development and applications:
3
.1
.3
.Related Stories
While details about Thinking Machines Lab's first product remain undisclosed, Murati has stated that it will be "useful for researchers and startups developing custom models" and is set to launch in the coming months
3
. The company has also committed to regularly publishing research findings and code, aiming to benefit the public and improve their own research culture1
.This research offers a rare glimpse into one of Silicon Valley's most secretive AI startups. By tackling fundamental questions in AI research, Thinking Machines Lab is positioning itself at the forefront of the field. The true test will be whether the company can translate this research into practical products that justify its $12 billion valuation
1
.Summarized by
Navi
[2]