3 Sources
3 Sources
[1]
AI companies keep publishing private API keys to GitHub
Security biz Wiz says 65% of top AI businesses leak keys and tokens Leading AI companies turn out to be no better at keeping secrets than anyone else writing code. Cloud security firm Wiz has found that 65 percent of the Forbes AI 50 "had leaked verified secrets on GitHub," minus a few with no presence on the code sharing site. "Some of these leaks could have exposed organizational structures, training data, or even private models," said Wiz threat researchers Shay Berkovich and Rami McCarthy in a blog post. The secrets consist of API keys, tokens, and other digital credentials that are supposed to be kept out of code commits to git repos. But as the security biz noted last month, developers of VS Code extensions keep making their secrets known, a problem that McCarthy has attributed in part to vibe coding. Secret leakage is a longstanding problem. Back in 2017, security researcher Dylan Ayrey published a tool called TruffleHog to find secrets inadvertently uploaded into code repos. But awareness of the problem has failed to eliminate it. In 2020, as we noted, AWS keys kept leaking due to configuration errors. In 2023, the Python Package Index (PyPI) was found to contain many packages with AWS API keys. There are many other examples. A recent source of API keys has been LLMs - they can capture exposed API keys in training data and can be convinced to disgorge those keys with the right coaxing. Wiz, which sells secret scanning as a service, claims that its approach covers more ground than traditional repo scanning tools. "Our deep scan includes full commit history, commit history on forks, deleted forks, workflow logs and gists (which can also have forks!)," explained Berkovich and McCarthy. Self-serving though that may be, Google has agreed to buy Wiz for $32 billion in cash, so perhaps there's something there. "Exposed secrets are usually a symptom of broader challenges, like limited visibility, fragmented ownership, or missing automated checks in the development pipeline," said Berkovich in an email to The Register. "In the cloud, everything moves fast and without strong guardrails, even mature teams can miss high-impact risks." The most common sources for secret leakage when Wiz initially looked at this issue came from Jupyter Notebook files (.ipynb), Python files (.py), and environment files (.env). These consisted mainly of keys and tokens from Hugging Face, AzureOpenAI, and WeightsAndBiases. "Hugging Face tokens are notorious for allowing access to private AI models," said Berkovich. "The leaked Hugging Face token belonging to an AI 50 company could have exposed access to ~1,000 private models, allowing an attacker to download or inspect proprietary IP. " Berkovich added that the WeightsAndBiases API keys belong to the same company and could have granted access to sensitive training data behind private models such as confidential business data. Wiz has chosen not to name and shame the firms spilling their sensitive keys across GitHub, other than ElevenLabs and LangChain. The ElevenLabs API key was spotted in a plaintext mcp.json file, which Berkovich and McCarthy say "speaks to the relationship between vibe coding and secrets leakage" that they noted previously. "Advances in AI development result in new use cases and possibilities of secret leaks (ipynb files, vibe coding, gaps in coverage of new AI-specific secret types)," said Berkovich. "That's why our working hypothesis was that any AI company with a big enough GitHub footprint has exposed secrets. This was confirmed by the high proportion (65 percent) of AI innovators with exposed secrets." According to Wiz, ElevenLabs and LangChain responded promptly when alerted to the exposed secrets. But almost half of the security disclosures either couldn't be delivered or received no response. The first step toward solving your secret exposure problem is admitting that you have a problem. ®
[2]
Leading AI companies keep leaking their own information on GitHub
Wiz used a ''Depth, Perimeter, and Coverage' approach to spot leaks AI companies have had a pretty rocky history with cybersecurity and data privacy, and new research from Wiz shows this still hasn't improved. Looking at the Forbes top 50 leading AI companies as a benchmark, the experts uncovered nearly two-thirds (65%) of these top AI firms were leaking verified secrets on GitHub. These tokens, sensitive credentials, and API keys were found buried deep in places most researchers and scanners would never encounter, like deleted forks, developer repos, and gists. Wiz says it used a 'Depth, Perimeter, and Coverage' framework to approach these GitHub repositories, enabling them to access and search for new sources, to go further than the 'secrets on the surface' for a deep scan that uncovers more than traditional searches. The 'Perimeter' aspect of their research entailed expanding discovery to contributors and organiztion members, who can often 'inadvertently check company-related secrets into their own public repositories and gists.' Coverage relates to new secret types often missed by traditional scanners, like Tavily, Langchain, Cohere, or Pinecone. Interestingly, when the researchers disclosed these leaks to the targets, almost half of these notifications either failed to reach them, received no response due to a lack of official notification channel, or the company failed to reply or solve the issue. The researchers recommend deploying secret scanning immediately as a non-negotiable defense - no matter what size your organization is. They also recommend prioritizing detection for their own secret types; ' too many shops leak their own API keys while "eating their dogfood." If your secret format is new, proactively engage vendors and the open source community to add support.' Finally, they advise that companies prepare a dedicated channel for disclosure. Disclosure protocol is an essential security measure that can give your company a head-start on any vulnerabilities or leaks, so these channels can be a vital information sharing source.
[3]
Perplexity, Anthropic and Others Might Have Leaked AI Secrets on GitHub
The report found leaked secrets of a company with 0 public repositories Perplexity, Anthropic, and other leading artificial intelligence (AI) companies might have exposed sensitive data on GitHub, claims a cloud security firm. As per the firm's report, at least 65 percent of the leading AI companies have exposure risk around their proprietary AI models, datasets, and training processes. Some of the exposed data includes application programming interface (API) keys, tokens, and sensitive credentials, the report claimed. The researchers also highlighted the need for AI companies to use more advanced scanners that can alert them to such exposure. GitHub Contains AI Secrets of Major AI Firms, Claims Research According to the cloud security platform Wiz, 65 percent of the AI companies mentioned in Forbes' AI 50 list have their AI secrets exposed on GitHub. This would include companies such as Anthropic, Mistral, Cohere, Midjourney, Perplexity, Suno, World Labs, and more. However, the researchers did not name any particular company. The sensitive data leaks on GitHub as the company's developers use the platform to code and create repositories. These repositories can inadvertently contain API keys, dataset information, and other information that can even reveal critical information about their proprietary AI models. The risk increases with a higher GitHub footprint, although the researchers found an instance where data was leaked even without any public repositories. To test whether these AI companies have any exposure risk, Wiz's team first identified the employees of the company by scanning through the followers of an organisation on LinkedIn, accounts referencing the organisation name in their GitHub metadata, code contributors, and correlating the information across Hugging Face and other platforms. After identifying the accounts, the researchers then performed an extensive scan across three parameters of depth, coverage, and perimeter. Depth search or searching for new sources lets the researchers scan the accounts' full commit history, commit history on forks, deleted forks, workflow logs, and gists. The researchers also found that the employees can sometimes add this sensitive data into their own public repositories and gists. Some of the leaked data surfaced by the team includes model weights and biases, Google API, credentials of Hugging Face and ElevenLabs, and more.
Share
Share
Copy Link
Cloud security firm Wiz finds that 65% of Forbes AI 50 companies have exposed API keys, tokens, and credentials on GitHub, potentially compromising proprietary models and training data. The leaks highlight persistent security challenges in AI development.
A comprehensive security analysis by cloud security firm Wiz has revealed that 65% of the world's leading AI companies listed in Forbes' AI 50 have inadvertently exposed sensitive credentials and secrets on GitHub
1
. The findings highlight a persistent and concerning pattern of security lapses among companies at the forefront of artificial intelligence development.
Source: NDTV Gadgets 360
The exposed materials include API keys, authentication tokens, and other digital credentials that could potentially grant unauthorized access to proprietary AI models, training datasets, and organizational infrastructure
2
. According to Wiz threat researchers Shay Berkovich and Rami McCarthy, "some of these leaks could have exposed organizational structures, training data, or even private models"1
.Wiz employed what they term a "Depth, Perimeter, and Coverage" approach to uncover these security breaches, going far beyond traditional repository scanning methods
2
. Their comprehensive analysis included examining full commit histories, deleted forks, workflow logs, and gists - areas that conventional security tools often overlook.The research methodology involved identifying company employees through various platforms including LinkedIn, GitHub metadata, and correlating information across services like Hugging Face
3
. Notably, the researchers discovered instances where sensitive data was leaked even from companies with zero public repositories, demonstrating that the risk extends beyond obvious sources.
Source: The Register
The most frequent sources of secret leakage were found in Jupyter Notebook files (.ipynb), Python files (.py), and environment configuration files (.env)
1
. The exposed credentials primarily consisted of keys and tokens from major AI platforms including Hugging Face, Azure OpenAI, and Weights & Biases.Particularly concerning were the Hugging Face token exposures, with Berkovich noting that "Hugging Face tokens are notorious for allowing access to private AI models"
1
. One leaked token belonging to an AI 50 company could have provided access to approximately 1,000 private models, potentially allowing attackers to download or inspect proprietary intellectual property.
Source: TechRadar
Related Stories
When Wiz attempted to notify the affected companies about their security exposures, the response was mixed at best. While some companies like ElevenLabs and LangChain responded promptly to security disclosures, nearly half of the notifications either failed to reach their intended recipients or received no response
1
. This communication gap highlights another layer of the security challenge facing the AI industry.The problem of secret leakage is not new to the technology sector. Security researcher Dylan Ayrey published TruffleHog, a tool designed to find inadvertently uploaded secrets, as early as 2017
1
. Despite years of awareness campaigns and security tools, the issue persists across the industry, with AWS keys continuing to leak due to configuration errors and Python Package Index containing numerous packages with exposed API keys.Berkovich attributes the ongoing problem to "broader challenges, like limited visibility, fragmented ownership, or missing automated checks in the development pipeline"
1
. The fast-paced nature of cloud development, combined with insufficient guardrails, creates an environment where even experienced teams can overlook high-impact security risks.Summarized by
Navi
[1]
[3]