Guide Labs open-sources Steerling-8B, a new interpretable LLM that traces every token to source

Reviewed byNidhi Govil

2 Sources

Share

San Francisco startup Guide Labs has open-sourced Steerling-8B, an 8-billion parameter language model built with a novel architecture that makes every generated token traceable to its training data origins. Founded by Julius Adebayo and Aya Abdelsalam Ismail, the company aims to solve AI's black box problem by engineering interpretability directly into the model rather than analyzing it post-hoc.

Guide Labs Tackles AI's Black Box Problem with New Architecture

Guide Labs, a San Francisco startup founded by CEO Julius Adebayo and Chief Science Officer Aya Abdelsalam Ismail, has open-sourced Steerling-8B, an interpretable LLM that fundamentally changes how developers understand what their AI systems are doing

1

. The 8-billion parameter model was released on Monday with a novel architecture designed to address one of AI's most persistent challenges: understanding why deep learning models make the decisions they do. From xAI's struggles to fine-tune Grok's political leanings to ChatGPT's issues with hallucinations, the black box problem has plagued AI developers for years

1

.

How Traceability of Generated Tokens Works

The breakthrough lies in Steerling-8B's ability to trace every token produced by the model back to its origins in the training data

1

. This capability ranges from simple tasks like verifying reference materials for cited facts to complex analyses of how the model encodes abstract concepts like humor or gender

1

2

.

Source: TechCrunch

Source: TechCrunch

Adebayo explained the complexity: "If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I've encoded, and then you have to be able to reliably turn that on, turn them off"

1

. While this is technically possible with current models, the process remains fragile and unreliable.

Engineering Interpretability Through a Concept Layer

The architecture fundamentally alters the standard transformer structure by inserting a concept layer that categorizes data into traceable buckets during training

2

. "The kind of interpretability people do is...neuroscience on a model, and we flip that," Adebayo told TechCrunch. "What we do is actually engineer the model from the ground up so that you don't need to do neuroscience"

1

. This approach requires more upfront data annotation, but Guide Labs used other AI systems to assist in the labeling process

1

2

.

Research Roots at MIT and Emergent Behaviors

Adebayo began this work during his doctoral studies at MIT, co-authoring a widely cited 2018 paper that demonstrated the unreliability of existing methods for understanding deep learning models

1

2

. One concern with engineering interpretability is whether it might eliminate emergent behaviors that make language models valuable. Adebayo says Steerling-8B still exhibits these capabilities, with his team tracking "discovered concepts" that the model generates autonomously without explicit training, such as quantum computing

1

2

.

High-Stakes Applications and Control Over LLM Outputs

Guide Labs positions the technology as essential for high-stakes applications requiring strict control over LLM outputs

2

. In consumer-facing AI systems, the architecture enables developers to block copyright infringement and better manage outputs around sensitive subjects like violence or drug abuse

1

2

. For regulated industries, particularly finance, the model can evaluate loan applicants based on financial records while explicitly excluding protected attributes like race, addressing regulatory requirements

1

2

. In scientific research, including protein folding, the model provides insight into why specific predictions succeed, addressing a critical gap in computational biology

1

2

.

Performance Metrics and Future Plans

Steerling-8B achieves approximately 90% of the capability of existing frontier models while using less training data, thanks to its novel architecture

1

2

. Adebayo argues that "training interpretable models is no longer a sort of science; it's now an engineering problem," suggesting the approach can scale to match frontier models with significantly higher parameter counts

1

2

. The company emerged from Y Combinator and raised $9 million in seed funding led by Initialized Capital in November 2024

1

2

. Next steps include building a larger model and offering API access and agentic capabilities to users

1

2

. "As we're going after these models that are going to be super intelligent, you don't want something to be making decisions on your behalf that's sort of mysterious to you," Adebayo said

1

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Β© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo