4 Sources
4 Sources
[1]
AI Agents Are Getting Better. Their Safety Disclosures Aren't
Why? Well, they can plan, write code, browse the web and execute multistep tasks with little to no supervision. Some even promise to manage your workflow. Others coordinate with tools and systems across your desktop. The appeal is obvious. These systems do not just respond. They act -- for you and on your behalf. But when researchers behind the MIT AI Agent Index cataloged 67 deployed agentic systems, they found something unsettling. Developers are eager to describe what their agents can do. They are far less eager to describe whether these agents are safe. "Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement," the researchers wrote in the paper. "However, there is currently no structured framework for documenting ... safety features of agentic systems." That gap shows up clearly in the numbers: Around 70% of the indexed agents provide documentation, and nearly half publish code. But only about 19% disclose a formal safety policy, and fewer than 10% report external safety evaluations. The research underscores that while developers are quick to tout the capabilities and practical application of agentic systems, they are also quick to provide limited information regarding safety and risk. The result is a lopsided kind of transparency. The researchers were deliberate about what made the cut, and not every chatbot qualifies. To be included, a system had to operate with underspecified objectives and pursue goals over time. It also had to take actions that affect an environment with limited human mediation. These are systems that decide on intermediate steps for themselves. They can break a broad instruction into subtasks, use tools, plan, complete and iterate. That autonomy is what makes them powerful. It's also what raises the stakes. When a model simply generates text, its failures are usually contained to that one output. When an AI agent can access files, send emails, make purchases or modify documents, mistakes and exploits can be damaging and propagate across steps. Yet the researchers found that most developers do not publicly detail how they test for those scenarios. The most striking pattern in the study is not hidden deep in a table -- it is repeated throughout the paper. Developers are comfortable sharing demos, benchmarks and the usability of these AI agents, but they are far less consistent about sharing safety evaluations, internal testing procedures or third-party risk audits. That imbalance matters more as agents move from prototypes to digital actors integrated into real workflows. Many of the indexed systems operate in domains like software engineering and computer use -- environments that often involve sensitive data and meaningful control. The MIT AI Agent Index does not claim that agentic AI is unsafe in totality, but it shows that as autonomy increases, structured transparency about safety has not kept pace. The technology is accelerating. The guardrails, at least publicly, remain harder to see.
[2]
AI agents are fast, loose and out of control, MIT study finds
Agentic technology is moving fully into the mainstream of artificial intelligence with the announcement this week that OpenAI has hired Peter Steinberg, the creator of the open-source software framework OpenClaw. The OpenClaw software attracted heavy attention last month not only for its enabling of wild capabilities -- agents that can, for example, send and receive email on your behalf -- but also for its dramatic security flaws, including the ability to completely hijack your personal computer. Also: From Clawdbot to OpenClaw: This viral AI agent is evolving fast - and it's nightmare fuel for security pros Given the fascination with agents and how little is still understood about their pros and cons, it's important that researchers at MIT and collaborating institutions have just published a massive survey of 30 of the most common agentic AI systems. The results make clear that agentic AI is something of a security nightmare at the moment, a discipline marked by lack of disclosure, lack of transparency, and a striking lack of basic protocols about how agents should operate. Also: OpenClaw is a security nightmare - 5 red flags you shouldn't ignore (before it's too late) The biggest revelation of the report is just how hard it is to identify all the things that could go wrong with agentic AI. That is principally the result of a lack of disclosure by developers. "We identify persistent limitations in reporting around ecosystemic and safety-related features of agentic systems," wrote lead author Leon Staufer of the University of Cambridge and collaborators at MIT, University of Washington, Harvard University, Stanford University, University of Pennsylvania, and The Hebrew University of Jerusalem. Across eight different categories of disclosure, the authors pointed out that most agent systems offer no information whatsoever for most categories. The omissions range from a lack of disclosure about potential risks to a lack of disclosure about third-party testing, if any. The 39-page report, "The 2025 AI Index: Documenting Sociotechnical Features of Deployed Agentic AI Systems," which can be downloaded here, is filled with gems about just how little can be tracked, traced, monitored, and controlled in today's agentic AI technology. For example, "For many enterprise agents, it is unclear from information publicly available whether monitoring for individual execution traces exists," meaning there is no clear ability to track exactly what an agentic AI program is doing. Also: AI agents are already causing disasters - and this hidden threat could derail your safe rollout "Twelve out of thirty agents provide no usage monitoring or only notices once users reach the rate limit," the authors noted. That means you can't even keep track of how much agentic AI is consuming of a given compute resource -- a key concern for enterprises that have to budget for this stuff. Most of these agents also do not signal to the real world that they are AI, so there's no way to know if you are dealing with a human or a bot. "Most agents do not disclose their AI nature to end users or third parties by default," they noted. Disclosure, in this case, would include things such as watermarking a generated image file so that it's clear when an image was made via AI, or responding to a website's "robots dot txt" file to identify the agent to the site as an automation rather than a human visitor. Some of these software tools offer no way to stop a given agent from running. Alibaba's MobileAgent, HubSpot's Breeze, IBM's watsonx, and the automations created by Berlin, Germany-based software maker n8n, "lack documented stop options despite autonomous execution," said Staufer and team. "For enterprise platforms, there is sometimes only the option to stop all agents or retract deployment." Finding out that you can't stop something that is doing the wrong thing has got to be one of the worst possible scenarios for a large organization where harmful results outweigh the benefits of automation. The authors expect these issues, issues of transparency and control, to persist with agents and even become more prominent. "The governance challenges documented here (ecosystem fragmentation, web conduct tensions, absence of agent-specific evaluations) will gain importance as agentic capabilities increase," they wrote. Staufer and team also said that they attempted to get feedback from the companies whose software was covered over four weeks. About a quarter of those contacted responded, "but only 3/30 with substantive comments." Those comments were incorporated into the report, the authors wrote. They also have a form provided to the companies for ongoing corrections. Agentic artificial intelligence is a branch of machine learning that has emerged in the past three years to enhance the capabilities of large language models and chatbots. Rather than simply being assigned a single task dictated by a text prompt, agents are AI programs that have been plugged into external resources, such as databases, and that have been granted a measure of "autonomy" to pursue goals beyond the scope of a text-based dialogue. Also: True agentic AI is years away - here's why and how we get there That autonomy can include carrying out several steps in a corporate workflow, such as receiving a purchase order in email, entering it into a database, and consulting an inventory system for availability. Agents have also been used to automate several turns of a customer service interaction in order to replace some of the basic phone or email, or text inquiries a human customer rep would traditionally have handled. The authors selected agentic AI in three categories: chatbots that have extra capabilities, such as Anthropic's Claude Code tool; web browser extensions or dedicated AI browsers, such as OpenAI's Atlas browser; and enterprise software offerings such as Microsoft's Office 365 Copilot. That's just a taste: other studies, they noted, have covered hundreds of agentic technology offerings. (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) Most agents, however, "rely on a small set of closed-source frontier models," Staufer and team said. OpenAI's GPT, Anthropic's Claude, and Google's Gemini are what most of these agents are built on. The study is not based on testing the agentic tools directly; it is based on "annotating" the documentation provided by developers and vendors. That includes "only public information from documentation, websites, demos, published papers, and governance documents," they said. They did, however, establish user accounts with some of the agentic systems to double-check the actual functioning of the software. The authors offered three anecdotal examples that go into greater depth. A positive example, they wrote, is OpenAI's ChatGPT Agent, which can interface with websites when a user asks in the prompt for it to carry out a web-based task. Agent is positively distinguished as the only one of the agent systems they looked at that provides a means of tracking behavior by "cryptographically signing" the browser requests it makes. By contrast, Perplexity's Comet web browser sounds like a security disaster. The program, Staufer and team found, has "no agent-specific safety evaluations, third-party testing, or benchmark performance disclosures," and, "Perplexity [...] has not documented safety evaluation methodology or results for Comet," adding, "No sandboxing or containment approaches beyond prompt-injection mitigations were documented." Also: Gartner urges businesses to 'block all AI browsers' - what's behind the dire warning The authors noted that Amazon has sued Perplexity, saying that the Comet browser wrongly presents its actions to a server as if it were a human rather than a bot, an example of the lack of identification they discuss. The third example is the Breeze set of agents from enterprise software vendor HubSpot. Those are automations that can interact with systems of record, such as "customer relationship management." The Breeze tools are a mix of good and bad, they found. On the one hand, they are certified for lots of corporate compliance measures, such as SOC2, GDPR, and HIPAA compliance. On the other hand, HubSpot offers nothing when it comes to security testing. It states the Breeze agents were evaluated by third-party security firm PacketLabs, "but provides no methodology, results, or testing entity details." The practice of demonstrating compliance approval but not disclosing real security evaluations is "typical of enterprise platforms," Staufer and team noted. What the report doesn't examine are incidents in the wild, cases where agentic technology actually produced unexpected or undesired behavior that resulted in undesirable outcomes. That means we don't yet know the full impact of the shortcomings the authors identified. One thing is absolutely clear: Agentic AI is a product of development teams making specific choices. These agents are tools created and distributed by humans. As such, the responsibility for documenting the software, for auditing programs for safety concerns, and for providing control measures rests squarely with OpenAI, Anthropic, Google, Perplexity, and other organizations. It's up to them to take the steps to remedy the serious gaps identified or else face regulation down the road.
[3]
AI agents abound, unbound by rules or safety disclosures
MIT CSAIL's 2025 AI Agent Index puts opaque automated systems under the microscope AI agents are becoming more common and more capable, without consensus or standards on how they should behave, say academic researchers. So says MIT's Computer Science & Artificial Intelligence Laboratory (CSAIL), which analyzed 30 AI agents for its 2025 AI Agent Index, which assesses machine learning models that can take action online through their access to software services. AI agents may take the form of chat applications with tools (Manus AI, ChatGPT Agent, Claude Code), browser-based agents (Perplexity Comet, ChatGPT Atlas, ByteDance Agent TARS), or enterprise workflow agents (Microsoft Copilot Studio, ServiceNow Agent). The paper accompanying the AI Agent Index observes that despite growing interest and investment in AI agents, "key aspects of their real-world development and deployment remain opaque, with little information made publicly available to researchers or policymakers." The AI community frenzy around open source agent platform OpenClaw, and its accompanying agent interaction network Moltbook - plus ongoing frustration with AI-generated code submissions to open source projects - underscores the consequences of letting agents loose without behavioral rules. In the paper, the authors note that the tendency of AI agents to ignore the Robot Exclusion Protocol - which uses robots.txt files to signal no consent to scraping websites - suggests that established web protocols may no longer be sufficient to stop agents. It's a timely topic. Anthropic, one of the main providers of AI agents, on Wednesday published its own analysis of AI agent autonomy, focused more on how agents are used than the consequences of their use. "AI agents are here, and already they're being deployed across contexts that vary widely in consequence, from email triage to cyber espionage," the company said. "Understanding this spectrum is critical for deploying AI safely, yet we know surprisingly little about how people actually use agents in the real world." According to consultancy McKinsey, AI agents have the potential to add $2.9 trillion to the US economy by 2030 - assuming the vast capital expenditures by OpenAI and other tech firms haven't derailed the hype train. We note that enterprises aren't yet seeing much of a return on their AI investments. And researchers last year found AI agents could only complete about a third of multi-step office tasks. But AI models have improved since then. MIT CSAIL's 2025 AI Agent Index covers 30 AI agents. It is smaller than its 2024 predecessor, which looked at 67 agentic systems. The authors say the 2025 edition goes into greater depth, analyzing agents across six categories: legal, technical capabilities, autonomy & control, ecosystem interaction, evaluation, and safety. The AI Agent Index site makes this information available for every listed agent, each with 45 annotation fields. According to the researchers, 24 of the 30 agents studied were released or received major feature updates during the 2024-2025 period. But the developers of agents talk more about product features than about safety practices. "Of the 13 agents exhibiting frontier levels of autonomy, only four disclose any agentic safety evaluations (ChatGPT Agent, OpenAI Codex, Claude Code, Gemini 2.5 Computer Use)," according to the researchers. Developers of 25 of the 30 agents covered provide no details about safety testing and 23 offer no third-party testing data. To complicate matters, most agents rely on a handful of foundation models - the majority are harnesses or wrappers for models made by Anthropic, Google, and OpenAI, supported by scaffolding and orchestration layers. The result is a series of dependencies that are difficult to evaluate because no single entity is responsible, the MIT boffins say. Delaware-incorporated companies created 13 of the agents evaluated by the authors. Five come from China-incorporated organizations, and four come have non-US, non-China origins: specifically Germany (SAP, n8n), Norway (Opera), and Cayman Islands (Manus). Among the five Chinese-incorporated agent makers, one has a published safety framework and one has a compliance standard. For agents originating outside of China, 15 point to safety frameworks like Anthropic's Responsible Scaling Policy, OpenAI's Preparedness Framework, or Microsoft's Responsible AI Standard. The other ten lack safety framework documentation. Enterprise assurance standards are more common, with only five of 30 agents having no compliance standards documented. Twenty-three of the evaluated agents are closed-source. Developers of seven agents open-sourced their agent framework or harness - Alibaba MobileAgent, Browser Use, ByteDance Agent TARS, Google Gemini CLI, n8n Agents, OpenAI Codex, and WRITER. All told, the Index found agent makers reveal too little safety information, and that a handful of companies dominate the market. Other major findings include the difficulty of analyzing agents given their layers of dependencies, and that agents aren't necessarily welcome at every website. The paper lists the following authors: Leon Staufer (University of Cambridge), Kevin Feng (University of Washington), Kevin Wei (Harvard Law School), Luke Bailey (Stanford University), Yawen Duan (Concordia AI), Mick Yang (University of Pennsylvania), A. Pinar Ozisik (MIT), Stephen Casper (MIT), and Noam Kolt (Hebrew University of Jerusalem). ®
[4]
New Research Shows AI Agents Are Running Wild Online, With Few Guardrails in Place
In the last year, AI agents have become all the rage. OpenAI, Google, and Anthropic all launched public-facing agents designed to take on multi-step tasks handed to them by humans. In the last month, an open-source AI agent called OpenClaw took the web by storm thanks to its impressive autonomous capabilities (and major security concerns). But we don't really have a sense of the scale of AI agent operations, and whether all the talk is matched by actual deployment. The MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) set out to fix that with its recently published 2025 AI Agent Index, which provides our first real look at the scale and operations of AI agents in the wild. Researchers found that interest in AI agents has undoubtedly skyrocketed in the last year or so. Research papers mentioning "AI Agent" or "Agentic AI" in 2025 more than doubled the total from 2020 to 2024 combined, and a McKinsey survey found that 62% of companies reported that their organizations were at least experimenting with AI agents. With all that interest, the researchers focused on 30 prominent AI agents across three separate categories: chat-based options like ChatGPT Agent and Claude Code; browser-based bots like Perplexity Comet and ChatGPT Atlas; and enterprise options like Microsoft 365 Copilot and ServiceNow Agent. While the researchers didn't provide exact figures on just how many AI agents are deployed across the web, they did offer a considerable amount of insight into how they are operating, which is largely without a safety net. Just half of the 30 AI agents that got put under the magnifying glass by MIT CSAIL include published safety or trust frameworks, like Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework, or Microsoft’s Responsible AI Standard. One in three agents has no safety framework documentation whatsoever, and five out of 30 have no compliance standards. That is troubling when you consider that 13 of 30 systems reviewed exhibit frontier levels of agency, meaning they can operate largely without human oversight across extended task sequences. Browser agents in particular tend to operate with significantly higher autonomy. This would include things like Google's recently launched AI "Autobrowse," which can complete multi-step tasks by navigating different websites and making use of user information to do things like log into sites on your behalf. One of the troubles with letting agents browse freely and with few guardrails is that their activity is nearly indistinguishable from human behavior, and they do little to dispel any confusion that might occur. The researchers found that 21 out of the 30 agents provide no disclosure to end users or third parties that they are AI agents and not human users. This results in most AI agent activity being mistaken for human traffic. MIT found that just seven agents published stable User-Agent (UA) strings and IP address ranges for verification. Nearly as many explicitly use Chrome-like UA strings and residential/local IP contexts to make their traffic requests appear more human, making it next to impossible for a website to distinguish between authentic traffic and bot behavior. For some AI agents, that's actually a marketable feature. The researchers found that BrowserUse, an open-source AI agent, sells itself to users by claiming to bypass anti-bot systems to browse "like a human." More than half of all the bots tested provide no specific documentation about how they handle robots.txt files (text files that are placed in a website's root directory to instruct web crawlers on how they can interact with the site), CAPTCHAs that are meant to authenticate human traffic, or site APIs. Perplexity has even made the case that agents acting on behalf of users shouldn't be subject to scraping restrictions since they function "just like a human assistant." The fact that these agents are out in the wild without much protection in place means there is a real threat of exploits. There is a lack of standardization for safety evaluations and disclosures, leaving many agents potentially vulnerable to attacks like prompt injections, in which an AI agent picks up on a hidden malicious prompt that can make it break its safety protocols. Per MIT, nine of 30 agents have no documentation of guardrails against potentially harmful actions. Nearly all of the agents fail to disclose internal safety testing results, and 23 of the 30 offer no third-party testing information on safety. Just four agentsâ€"ChatGPT Agent, OpenAI Codex, Claude Code, and Gemini 2.5â€"provided agent-specific system cards, meaning the safety evaluations were tailored to how the agent actually operates, not just the underlying model. But frontier labs like OpenAI and Google offer more documentation on "existential and behavioral alignment risks," they lack details on the type of security vulnerabilities that may arise during day-to-day activitiesâ€"a habit that the researchers refer to as "safety washing," which they describe as publishing high-level safety and ethics frameworks while only selectively disclosing the empirical evidence required to rigorously assess risk. There has at least been some momentum toward addressing the concerns raised by MIT's researchers. Back in December, OpenAI and Anthropic (among others) joined forces, announcing a foundation to create a development standard for AI agents. But the AI Agent Index shows just how wide the transparency gap is when it comes to agentic AI operation. AI agents are flooding the web and workplace, functioning with a shocking amount of autonomy and minimal oversight. There's little to indicate at the moment that safety will catch up to scale any time soon.
Share
Share
Copy Link
MIT's 2025 AI Agent Index analyzed 30 prominent AI agents and found alarming gaps in safety documentation. While 70% provide technical specs, only 19% disclose formal safety policies and fewer than 10% report external safety evaluations. The study reveals that as AI agents gain autonomy to browse the web, send emails, and execute complex tasks, developers remain reluctant to detail how they test for risks and exploits.
AI agents have rapidly evolved from simple chatbots into autonomous systems capable of planning, executing multistep tasks, and acting on behalf of users with minimal human oversight. Yet according to MIT's Computer Science & Artificial Intelligence Laboratory (MIT CSAIL), the infrastructure surrounding AI safety has failed to keep pace with this technological acceleration. The 2025 AI Agent Index, which analyzed 30 prominent agentic AI systems, reveals a troubling pattern: developers eagerly showcase capabilities while providing limited information about safety protocols and risk management
1
.
Source: CNET
The research examined systems across three categories: chat-based agents like ChatGPT Agent and Claude Code, browser-based agents including Perplexity Comet and ChatGPT Atlas, and enterprise workflow agents such as Microsoft 365 Copilot and ServiceNow Agent
4
. What researchers discovered was a striking imbalance. Around 70% of indexed agents provide documentation about their technical capabilities, and nearly half publish code. However, only approximately 19% disclose a formal safety policy, and fewer than 10% report external safety evaluations1
. This lack of transparency creates significant governance and security challenges as these systems integrate into real-world workflows.The defining characteristic of AI agents is their autonomy. Unlike traditional models that simply generate text responses, these systems can access files, send emails, make purchases, modify documents, and break broad instructions into subtasks without constant human oversight
1
. Of the 30 agents studied, 13 exhibit frontier levels of autonomy, meaning they can operate largely without human intervention across extended task sequences4
. Browser agents in particular demonstrate significantly higher autonomy, with capabilities like Google's recently launched AI "Autobrowse" completing multistep tasks by navigating different websites and using user information to log into sites4
.
Source: Gizmodo
This operational freedom amplifies potential consequences. When mistakes or exploits occur, they can propagate across multiple steps and systems. Yet the MIT AI Agent Index found that 25 of the 30 agents covered provide no details about safety testing, and 23 offer no third-party testing data
3
. Nine agents have no documentation of guardrails against potentially harmful actions4
. Some systems, including Alibaba's MobileAgent, HubSpot's Breeze, IBM's watsonx, and n8n automations, "lack documented stop options despite autonomous execution," meaning organizations may be unable to halt agents performing harmful actions2
.The research reveals persistent limitations in how developers communicate about their agentic AI systems. Lead author Leon Staufer of the University of Cambridge and collaborators from MIT, University of Washington, Harvard University, Stanford University, University of Pennsylvania, and The Hebrew University of Jerusalem identified gaps across eight different categories of disclosure
2
. The omissions range from lack of disclosure about potential risks to absence of information about third-party testing and risk audits.Just four agents—ChatGPT Agent, OpenAI Codex, Claude Code, and Gemini 2.5—provided agent-specific system cards with safety evaluations tailored to how the agent actually operates, not just the underlying foundation models
4
. Half of the 30 AI agents include published safety frameworks like Anthropic's Responsible Scaling Policy, OpenAI's Preparedness Framework, or Microsoft's Responsible AI Standard, but one in three agents has no safety framework documentation whatsoever4
. Five out of 30 have no compliance standards documented3
.The opacity extends to operational monitoring. "For many enterprise agents, it is unclear from information publicly available whether monitoring for individual execution traces exists," the researchers noted
2
. Twelve out of 30 agents provide no usage monitoring or only notices once users reach rate limits, making it impossible to track resource consumption—a critical concern for enterprises managing budgets2
.Another dimension of the lack of oversight involves how AI agents present themselves online. The MIT AI Agent Index found that 21 out of 30 agents provide no disclosure to end users or third parties that they are AI agents rather than human users
4
. Most AI agent activity is mistaken for human traffic, with just seven agents publishing stable User-Agent strings and IP address ranges for verification4
. Nearly as many explicitly use Chrome-like User-Agent strings and residential or local IP contexts to make their traffic requests appear more human, making it nearly impossible for websites to distinguish between authentic traffic and bot behavior4
.
Source: The Register
For some developers, this invisibility is a feature rather than a bug. BrowserUse, an open-source AI agent, markets itself by claiming to bypass anti-bot systems to browse "like a human"
4
. More than half of all agents tested provide no specific documentation about how they handle robots.txt files, CAPTCHAs meant to authenticate human traffic, or site APIs4
. The tendency of AI agents to ignore the Robot Exclusion Protocol suggests established web protocols may no longer suffice to control agent behavior3
.Related Stories
The absence of standardized safety evaluations leaves many agents vulnerable to exploits like prompt injections, where hidden malicious prompts cause agents to break safety protocols
4
. The security concerns gained widespread attention when OpenClaw, an open-source agent framework, attracted notice not only for enabling agents to send and receive email autonomously but also for dramatic security flaws including the ability to completely hijack personal computers2
. OpenAI's subsequent hiring of OpenClaw creator Peter Steinberg highlighted how agentic technology is moving into the mainstream despite unresolved vulnerabilities2
.While frontier labs like OpenAI and Google offer more documentation on existential and behavioral alignment risks, they lack details on security vulnerabilities that may arise during day-to-day activities
4
. Nearly all agents fail to disclose internal safety testing results and public safety evaluations remain rare4
. This gap becomes more consequential as agents operate in domains involving sensitive data and meaningful control, particularly in software engineering and computer use environments1
.The research also reveals that most agents function as harnesses or wrappers for foundation models made by Anthropic, Google, and OpenAI, supported by scaffolding and orchestration layers
3
. This creates complex dependencies difficult to evaluate because no single entity bears full responsibility3
. Delaware-incorporated companies created 13 of the evaluated agents, five come from China-incorporated organizations, and four have non-US, non-China origins including Germany, Norway, and Cayman Islands3
. Twenty-three of the evaluated agents are closed-source, while seven open-sourced their agent framework or harness3
.Research papers mentioning "AI Agent" or "Agentic AI" in 2025 more than doubled the total from 2020 to 2024 combined, and a McKinsey survey found 62% of companies reported their organizations were at least experimenting with AI agents
4
. According to consultancy McKinsey, AI agents have the potential to add $2.9 trillion to the US economy by 20303
. Yet researchers expect governance challenges documented in the index—ecosystem fragmentation, web conduct tensions, and absence of agent-specific evaluations—will gain importance as agentic capabilities increase2
. The MIT researchers attempted to get feedback from companies whose software was covered over four weeks, with about a quarter responding but only three out of 30 providing substantive comments2
. The technology accelerates while regulations and structured transparency about AI safety remain harder to see1
.Summarized by
Navi
[3]
12 Feb 2026•Technology

28 Aug 2025•Technology

11 Nov 2025•Technology

1
Policy and Regulation

2
Technology

3
Technology
