Guide Labs Unveils Steerling-8B Interpretable LLM

Guide Labs Tackles AI's Black Box Problem with New Architecture

Guide Labs, a San Francisco startup founded by CEO Julius Adebayo and Chief Science Officer Aya Abdelsalam Ismail, has open-sourced Steerling-8B, an interpretable LLM that fundamentally changes how developers understand what their AI systems are doing1

. The 8-billion parameter model was released on Monday with a novel architecture designed to address one of AI's most persistent challenges: understanding why deep learning models make the decisions they do. From xAI's struggles to fine-tune Grok's political leanings to ChatGPT's issues with hallucinations, the black box problem has plagued AI developers for years1

How Traceability of Generated Tokens Works

The breakthrough lies in Steerling-8B's ability to trace every token produced by the model back to its origins in the training data1

. This capability ranges from simple tasks like verifying reference materials for cited facts to complex analyses of how the model encodes abstract concepts like humor or gender1

Source: TechCrunch

Adebayo explained the complexity: "If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I've encoded, and then you have to be able to reliably turn that on, turn them off"1

. While this is technically possible with current models, the process remains fragile and unreliable.

Engineering Interpretability Through a Concept Layer

The architecture fundamentally alters the standard transformer structure by inserting a concept layer that categorizes data into traceable buckets during training2

. "The kind of interpretability people do is...neuroscience on a model, and we flip that," Adebayo told TechCrunch. "What we do is actually engineer the model from the ground up so that you don't need to do neuroscience"1

. This approach requires more upfront data annotation, but Guide Labs used other AI systems to assist in the labeling process1

Research Roots at MIT and Emergent Behaviors

Adebayo began this work during his doctoral studies at MIT, co-authoring a widely cited 2018 paper that demonstrated the unreliability of existing methods for understanding deep learning models1

. One concern with engineering interpretability is whether it might eliminate emergent behaviors that make language models valuable. Adebayo says Steerling-8B still exhibits these capabilities, with his team tracking "discovered concepts" that the model generates autonomously without explicit training, such as quantum computing1

High-Stakes Applications and Control Over LLM Outputs

Guide Labs positions the technology as essential for high-stakes applications requiring strict control over LLM outputs2

. In consumer-facing AI systems, the architecture enables developers to block copyright infringement and better manage outputs around sensitive subjects like violence or drug abuse1

. For regulated industries, particularly finance, the model can evaluate loan applicants based on financial records while explicitly excluding protected attributes like race, addressing regulatory requirements1

. In scientific research, including protein folding, the model provides insight into why specific predictions succeed, addressing a critical gap in computational biology1

Performance Metrics and Future Plans

Steerling-8B achieves approximately 90% of the capability of existing frontier models while using less training data, thanks to its novel architecture1

. Adebayo argues that "training interpretable models is no longer a sort of science; it's now an engineering problem," suggesting the approach can scale to match frontier models with significantly higher parameter counts1

. The company emerged from Y Combinator and raised $9 million in seed funding led by Initialized Capital in November 20241

. Next steps include building a larger model and offering API access and agentic capabilities to users1

. "As we're going after these models that are going to be super intelligent, you don't want something to be making decisions on your behalf that's sort of mysterious to you," Adebayo said1

Guide Labs open-sources Steerling-8B, a new interpretable LLM that traces every token to source

Guide Labs Tackles AI's Black Box Problem with New Architecture

How Traceability of Generated Tokens Works

Engineering Interpretability Through a Concept Layer

Research Roots at MIT and Emergent Behaviors

High-Stakes Applications and Control Over LLM Outputs

Performance Metrics and Future Plans

References

Guide Labs debuts a new kind of interpretable LLM | TechCrunch

New Steerling-8B model can trace every single word back to its training source

Related Stories

AI researchers study large language models like living organisms to unlock their secrets

MIT Researchers Expose Hidden Biases and Personalities in Large Language Models

AI Model Race Heats Up: DeepSeek, Allen Institute, and Alibaba Push Boundaries

Recent Highlights

Google Maps unveils Ask Maps with Gemini AI and 3D Immersive Navigation in biggest update

AI chatbots help plan violent attacks as safety guardrails fail, new investigation reveals

Three Tennessee teens sue xAI over Grok AI creating child sexual abuse material from real photos

Recent Highlights

Today's Top Stories

OpenAI launches GPT-5.4 mini and nano models built for speed over raw power

Meta's Manus launches desktop app with AI agent to automate tasks on Mac and Windows

Nvidia restarts H200 AI chip production for China after securing dual government licenses

NVIDIA DLSS 5 arrives this fall with AI-powered graphics for 16 games including Starfield