2 Sources
2 Sources
[1]
GitHub: We going to train on your data after all
Microsoft's GitHub next month plans to begin using customer interaction data - "specifically inputs, outputs, code snippets, and associated context" - to train its AI models. The code locker's revised policy applies to Copilot Free, Pro, and Pro+ customers, as of April 24. Copilot Business and Copilot Enterprise users are exempt thanks to the terms of their contracts. Students and teachers who access Copilot will also be spared. Those affected have the option to opt out in accordance with "established industry practices" - meaning according to US norms as opposed to European norms where opt-in is commonly required. To opt out, GitHub users should visit /settings/copilot/features and disable "Allow GitHub to use my data for AI model training" under the Privacy heading. Mario Rodriguez, GitHub's chief product officer, would rather you didn't. "By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production," he wrote in a blog post. To excuse its covetous behavior, GitHub in its FAQs notes that Anthropic, JetBrains, and corporate parent Microsoft operate similar opt-out data use policies. The rationale for the change, according to Rodriguez, is that interaction data makes company AI models perform better. Adding interaction data from Microsoft employees has led to meaningful improvements, he claims, such as an increased acceptance rate for AI model suggestions. The data GitHub wants includes: The policy shift does somewhat change the meaning of GitHub private repositories, which are notionally "only accessible to you, people you explicitly share access with, and, for organization repositories, certain organization members." These might be more accurately described as "GitHub private repositories," with the asterisk to denote the limits of GitHub's definition of the word "private." As the FAQs explain: "If a Copilot user has their settings set to enable model training on their interaction data, code snippets from private repositories can be collected and used for model training while the user is actively engaged with Copilot while working in that repository." Recent banter in the GitHub community doesn't include much enthusiasm for the plan. To judge by emoji votes alone, users have offered 59 thumbs-down votes and just three rocket ships, which we understand signal some measure of excitement. But among the 39 posts commenting on the change at the time this article was filed, no one other than Martin Woodward, GitHub VP of developer relations, has really endorsed the idea. User indignation might be somewhat mitigated if GitHub users recognized that OpenAI's Codex - used in GitHub Copilot - is "a GPT language model fine-tuned on publicly available code from GitHub." That verbiage shows the data-gorged AI horse is already out of the barn, so to speak. Shutting the doors at this point won't change the fact that the AI industry is built on data gathered without asking for a strong indicator of enthusiastic consent. ®
[2]
GitHub's Copilot will use you as AI training data, but you can opt out
Corbin Davenport is the News Editor at How-To Geek and an independent software developer. He also runs Tech Tales, a technology history podcast. Send him an email at [email protected]! Corbin previously worked at Android Police, PC Gamer, and XDA before joining How-To Geek. He has over a decade of experience writing about tech, and has worked on several web apps and browser extensions. The generative AI models powering ChatGPT, Copilot Gemini, and other assistants were created with mountains of training data. Now, Microsoft will start using interactions with GitHub Copilot as another source of that information, unless you specifically opt out of the collection. GitHub, the popular coding platform owned by Microsoft, announced today that interactions with GitHub Copilot will be used to "train and improve our AI models." GitHub Copilot is the AI code assistance tool integrated in Visual Studio Code, the GitHub website, the Copilot CLI tool (which competes with Claude Code), and other services. That includes any input or output data, code snippets, comments and documentation, file names, repository structure, and other information. If you have never used GitHub Copilot in the first place, this won't change anything. However, if you've used the code completion in Visual Studio Code, asked Copilot a question on the GitHub website, or used another related AI feature, your interactions and code snippets could be harvested. Importantly, the automatic data collection applies to both free and paid accounts. That includes Copilot Free, Copilot Pro, and Copilot Pro+ users, but not Copilot Business and Copilot Enterprise accounts. Related Visual Studio Code's latest update is a big deal for web development You won't have to switch to a browser as often. Posts 1 By Corbin Davenport The blog post explained that the initial AI models for GitHub Copilot were "built using a mix of publicly available data and hand-crafted code samples" (which didn't go over well with everyone), and the company has seen positive improvements by incorporating data from Microsoft employees. Now, GitHub is hoping that the service will become even better with more interactions used as training data. GitHub said in the announcement, "This approach aligns with established industry practices and will improve model performance for all users. By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production." How to opt out You can pause the data collection from the Copilot features page in your GitHub account settings. After you are logged into your account, there's an "Allow GitHub to use my data for AI model training" setting in the Privacy section. You just need to set that dropdown menu to "Disabled," and that's it. If you have multiple GitHub accounts, be sure to do that for each of your accounts. Subscribe to this newsletter for AI & privacy briefings Get deeper context by subscribing to our newsletter: clear, expert coverage of AI model training, developer privacy, and code-tool policy shifts, plus focused analysis to help you interpret these changes and their implications for development workflows and related developer topics. Get Updates By subscribing, you agree to receive newsletter and marketing emails, and accept our Terms of Use and Privacy Policy. You can unsubscribe anytime. Source: GitHub Blog
Share
Share
Copy Link
Microsoft's GitHub will begin using customer interaction data from Copilot Free, Pro, and Pro+ users to train AI models starting April 24. The policy change affects code snippets, inputs, and outputs from private repositories. Users can opt out through GitHub account settings, though the company encourages participation to improve code suggestions and bug detection.
Microsoft's GitHub announced a significant shift in its GitHub policy that will allow the company to train AI models using customer interaction data from millions of developers. Starting April 24, GitHub Copilot will collect inputs, outputs, code snippets, and associated context from users of Copilot Free, Pro, and Pro+ tiers
1
. The policy change represents a departure from previous practices, though Copilot Business and Copilot Enterprise customers remain exempt due to their contractual terms2
. Students and teachers accessing the AI coding assistant will also be spared from this data collection initiative.Source: How-To Geek
Mario Rodriguez, GitHub's chief product officer, defended the decision by pointing to similar opt-out policies at Anthropic, JetBrains, and corporate parent Microsoft
1
. He claims that adding interaction data from Microsoft employees has already led to meaningful improvements, including increased acceptance rates for AI model suggestions. According to Rodriguez, "By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production"1
.The policy shift raises significant data privacy concerns, particularly regarding private repositories. GitHub's own FAQs acknowledge that if a user has enabled model training in their settings, code snippets from private repositories can be collected while actively engaged with GitHub Copilot
1
. This fundamentally changes what "private" means in the context of GitHub private repositories, which were previously described as "only accessible to you, people you explicitly share access with, and, for organization repositories, certain organization members."
Source: The Register
The scope of AI training data collection extends beyond simple code snippets. GitHub will harvest input and output data, comments and documentation, file names, repository structure, and other contextual information
2
. This comprehensive approach to customer interaction data collection affects anyone who has used code completion in Visual Studio Code, asked Copilot a question on the GitHub website, or utilized other related AI features.Users concerned about developer privacy can opt out of data collection by visiting their GitHub account settings. To disable the feature, navigate to /settings/copilot/features and set "Allow GitHub to use my data for AI model training" to "Disabled" under the Privacy heading
1
. Users with multiple accounts must repeat this process for each account2
.The opt-out policy follows "established industry practices," meaning US norms rather than European regulations where opt-in is commonly required
1
. This distinction matters for developers worldwide who may expect different privacy protections based on their location.Related Stories
The GitHub community response has been overwhelmingly negative. In the GitHub community discussion, users offered 59 thumbs-down votes compared to just three rocket ship emojis signaling excitement
1
. Among 39 posts commenting on the change, only Martin Woodward, GitHub VP of developer relations, endorsed the idea. This backlash reflects broader concerns about how AI companies collect and use developer code without explicit consent.The controversy isn't entirely new. OpenAI's Codex, which powers GitHub Copilot, was already "fine-tuned on publicly available code from GitHub"
1
. However, extending this practice to private repositories and paid user interactions represents a notable escalation. The AI industry's foundation on data gathered without enthusiastic consent continues to fuel debate about ethical AI development and the balance between innovation and privacy rights.Summarized by
Navi
[1]
1
Technology

2
Entertainment and Society

3
Policy and Regulation
