2 Sources
2 Sources
[1]
Guide Labs debuts a new kind of interpretable LLM | TechCrunch
The challenge of wrangling a deep learning model is often understanding why it does what it does: Whether it's xAI's repeated struggle sessions to fine-tune Grok's odd politics, ChatGPT's struggles with sycophancy, or run-of-the-mill hallucinations, plumbing through a neural network with billions of parameters isn't easy. Guide Labs, a San Francisco start-up founded by CEO Julius Adebayo and chief science officer Aya Abdelsalam Ismail, is offering an answer to that problem today. On Monday, the company open-sourced an 8 billion parameter LLM, Steerling-8B, trained with a new architecture designed to make its actions easily interpretable: Every token produced by the model can be traced back to its origins in the LLM's training data. That can as a simple as determining the reference materials for facts cited by the model, or as complex as understanding the model's understanding of humor or gender. "If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I've encoded, and then you have to be able to reliably turn that on, turn them off," Adebayo told TechCrunch. "You can do it with current models, but it's very fragile ... It's sort of one of the holy grail questions." Adebayo began this work while earning his PhD at MIT, co-authoring a widely cited 2020 paper that showed existing methods of understanding deep learning models were not reliable. That work ultimately led to the creation of a new way of building LLMs: Developers insert a concept layer in the model that buckets data into traceable categories. This requires more up front data annotation, but by using other AI models to help, they were able to train this model as their largest proof of concept yet. "The kind of interpretability people do is...neuroscience on a model, and we flip that," Adebayo said. "What we do is actually engineer the model from the ground up so that you don't need to do neuroscience." One concern with this approach is that it might eliminate some of the emergent behaviors that make LLMs so intriguing: Their ability to generalize in new ways about things they haven't been trained on yet. Adebayo says that still happens in his company's model: His team tracks what they call "discovered concepts" that the model discovered on its own, like quantum computing. Adebayo argues this interpretable architecture will be something everyone needs. For consumer-facing LLMs, these techniques should allow model builders to do things like block the use of copyrighted materials, or better control outputs around subjects like violence or drug abuse. Regulated industries will require more controllable LLMs, for example in finance, where a model evaluating loan applicants needs to consider things like financial records but not race. There's also a need for interpretability in scientific work, another area where Guide Labs has developed technology. Protein folding has been a big success of deep learning models, but scientists need more insight into why their software figured out successful combinations. "This model demonstrates is that training interpretable models is no longer a sort of science; it's now an engineering problem," Adebayo said. "We figured out the science and we can scale them, and there is no reason why this kind of wouldn't match the performance of the frontier level models," which have many more parameters. Guide Labs says that Steerling-8B can achieved 90% of the capability of existing models, but uses less training data, thanks to its novel architecture. The next step for the company, which emerged from Y Combinator and raised a $9 million seed round from Initialized Capital in November 2024, is to build a larger model and begin offering API and agentic access to users. "The way we're current training models is super primitive, and so democratizing inherent interpretability is actually going to be a long term good thing for our our within the human race," Adebayo told TechCrunch. "As we're going after these models that are going to be super intelligent, you don't want something to be making decisions on your behalf that's sort of mysterious to you."
[2]
New Steerling-8B model can trace every single word back to its training source
Guide Labs, a San Francisco-based startup, announced the open sourcing of Steerling-8B, an 8-billion-parameter large language model. Co-founded by CEO Julius Adebayo and Chief Science Officer Aya Abdelsalam Ismail, the company introduced the model on Monday. The architecture enables full traceability of generated tokens back to their specific origins within the training data, addressing the opacity common in deep learning systems. This capability allows users to verify cited facts and analyze how the model encodes abstract concepts. The architecture of Steerling-8B fundamentally alters the standard transformer structure by inserting a "concept layer." This layer functions by categorizing data into traceable buckets during the training process. Unlike traditional models that treat interpretability as a post-hoc analysis task, Guide Labs engineers interpretability directly into the model's foundation. Adebayo refers to the alternative method of analyzing neural networks as "neuroscience on a model," contrasting it with his team's approach of building the model from the ground up for transparency. This structural change requires significant up-front data annotation, a process the startup facilitated by utilizing other AI models to assist in labeling. Julius Adebayo originated the research underlying Steerling-8B during his doctoral studies at the Massachusetts Institute of Technology. He co-authored a widely cited paper in 2018 that demonstrated the unreliability of existing methods for understanding deep learning models. That research identified critical gaps in how developers could probe and verify the behavior of neural networks. The findings laid the groundwork for the architecture used in Steerling-8B, shifting the focus from interpreting black-box models to engineering systems where internal states are accessible by design. Despite the structural constraints imposed by the concept layer, Steerling-8B retains the ability to exhibit emergent behaviors. The development team tracks what they define as "discovered concepts," which are capabilities the model generates autonomously without explicit training. One specific example identified by the team involves the model's understanding of quantum computing. This suggests that the architecture does not prevent the model from generalizing to new domains, a key characteristic of advanced large language models. Steerling-8B achieves approximately 90% of the capability of existing frontier models while utilizing less training data. The efficiency gains are attributed to the novel architecture, which reduces the data requirements typically associated with training high-performing LLMs. This performance ratio positions the model competitively against larger, more resource-intensive counterparts currently available in the market. Guide Labs positions the architecture as a solution for high-stakes applications requiring strict control over model outputs. Julius Adebayo outlined several use cases where traceability is essential. In consumer applications, the technology can prevent the use of copyrighted materials and control outputs regarding sensitive subjects such as violence or drug abuse. For regulated industries, specifically finance, the model can evaluate loan applicants based on financial records while explicitly excluding protected attributes like race. In scientific research, such as protein folding, the model provides insight into the reasoning behind specific protein structure predictions, addressing the "black box" problem in computational biology. Adebayo argues that training interpretable models has transitioned from a scientific challenge to an engineering discipline. He stated that the team has resolved the foundational science required for transparency and is now focused on scalability. The goal is to match the performance of frontier models with significantly higher parameter counts while maintaining the interpretability benefits of Steerling-8B. The company asserts that democratizing this level of transparency is a long-term necessity as AI systems become more autonomous. Guide Labs emerged from the Y Combinator accelerator and secured $9 million in a seed funding round. The investment was led by Initialized Capital and closed in November 2024. The capital is intended to support the expansion of the company's research and development efforts. The company's immediate roadmap includes building a larger model based on the Steerling architecture. Future plans also involve offering API access and agentic capabilities to external users. These steps aim to transition the technology from a research prototype to a widely accessible tool for developers and enterprises.
Share
Share
Copy Link
San Francisco startup Guide Labs has open-sourced Steerling-8B, an 8-billion parameter language model built with a novel architecture that makes every generated token traceable to its training data origins. Founded by Julius Adebayo and Aya Abdelsalam Ismail, the company aims to solve AI's black box problem by engineering interpretability directly into the model rather than analyzing it post-hoc.
Guide Labs, a San Francisco startup founded by CEO Julius Adebayo and Chief Science Officer Aya Abdelsalam Ismail, has open-sourced Steerling-8B, an interpretable LLM that fundamentally changes how developers understand what their AI systems are doing
1
. The 8-billion parameter model was released on Monday with a novel architecture designed to address one of AI's most persistent challenges: understanding why deep learning models make the decisions they do. From xAI's struggles to fine-tune Grok's political leanings to ChatGPT's issues with hallucinations, the black box problem has plagued AI developers for years1
.The breakthrough lies in Steerling-8B's ability to trace every token produced by the model back to its origins in the training data
1
. This capability ranges from simple tasks like verifying reference materials for cited facts to complex analyses of how the model encodes abstract concepts like humor or gender1
2
.
Source: TechCrunch
Adebayo explained the complexity: "If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I've encoded, and then you have to be able to reliably turn that on, turn them off"
1
. While this is technically possible with current models, the process remains fragile and unreliable.The architecture fundamentally alters the standard transformer structure by inserting a concept layer that categorizes data into traceable buckets during training
2
. "The kind of interpretability people do is...neuroscience on a model, and we flip that," Adebayo told TechCrunch. "What we do is actually engineer the model from the ground up so that you don't need to do neuroscience"1
. This approach requires more upfront data annotation, but Guide Labs used other AI systems to assist in the labeling process1
2
.Adebayo began this work during his doctoral studies at MIT, co-authoring a widely cited 2018 paper that demonstrated the unreliability of existing methods for understanding deep learning models
1
2
. One concern with engineering interpretability is whether it might eliminate emergent behaviors that make language models valuable. Adebayo says Steerling-8B still exhibits these capabilities, with his team tracking "discovered concepts" that the model generates autonomously without explicit training, such as quantum computing1
2
.Related Stories
Guide Labs positions the technology as essential for high-stakes applications requiring strict control over LLM outputs
2
. In consumer-facing AI systems, the architecture enables developers to block copyright infringement and better manage outputs around sensitive subjects like violence or drug abuse1
2
. For regulated industries, particularly finance, the model can evaluate loan applicants based on financial records while explicitly excluding protected attributes like race, addressing regulatory requirements1
2
. In scientific research, including protein folding, the model provides insight into why specific predictions succeed, addressing a critical gap in computational biology1
2
.Steerling-8B achieves approximately 90% of the capability of existing frontier models while using less training data, thanks to its novel architecture
1
2
. Adebayo argues that "training interpretable models is no longer a sort of science; it's now an engineering problem," suggesting the approach can scale to match frontier models with significantly higher parameter counts1
2
. The company emerged from Y Combinator and raised $9 million in seed funding led by Initialized Capital in November 20241
2
. Next steps include building a larger model and offering API access and agentic capabilities to users1
2
. "As we're going after these models that are going to be super intelligent, you don't want something to be making decisions on your behalf that's sort of mysterious to you," Adebayo said1
.Summarized by
Navi
13 Jan 2026β’Science and Research
19 Feb 2026β’Science and Research

31 Jan 2025β’Technology

1
Technology

2
Policy and Regulation

3
Policy and Regulation
