4 Sources
4 Sources
[1]
GitHub: We going to train on your data after all
Microsoft's GitHub next month plans to begin using customer interaction data - "specifically inputs, outputs, code snippets, and associated context" - to train its AI models. The code locker's revised policy applies to Copilot Free, Pro, and Pro+ customers, as of April 24. Copilot Business and Copilot Enterprise users are exempt thanks to the terms of their contracts. Students and teachers who access Copilot will also be spared. Those affected have the option to opt out in accordance with "established industry practices" - meaning according to US norms as opposed to European norms where opt-in is commonly required. To opt out, GitHub users should visit /settings/copilot/features and disable "Allow GitHub to use my data for AI model training" under the Privacy heading. Mario Rodriguez, GitHub's chief product officer, would rather you didn't. "By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production," he wrote in a blog post. To excuse its covetous behavior, GitHub in its FAQs notes that Anthropic, JetBrains, and corporate parent Microsoft operate similar opt-out data use policies. The rationale for the change, according to Rodriguez, is that interaction data makes company AI models perform better. Adding interaction data from Microsoft employees has led to meaningful improvements, he claims, such as an increased acceptance rate for AI model suggestions. The data GitHub wants includes: The policy shift does somewhat change the meaning of GitHub private repositories, which are notionally "only accessible to you, people you explicitly share access with, and, for organization repositories, certain organization members." These might be more accurately described as "GitHub private repositories," with the asterisk to denote the limits of GitHub's definition of the word "private." As the FAQs explain: "If a Copilot user has their settings set to enable model training on their interaction data, code snippets from private repositories can be collected and used for model training while the user is actively engaged with Copilot while working in that repository." Recent banter in the GitHub community doesn't include much enthusiasm for the plan. To judge by emoji votes alone, users have offered 59 thumbs-down votes and just three rocket ships, which we understand signal some measure of excitement. But among the 39 posts commenting on the change at the time this article was filed, no one other than Martin Woodward, GitHub VP of developer relations, has really endorsed the idea. User indignation might be somewhat mitigated if GitHub users recognized that OpenAI's Codex - used in GitHub Copilot - is "a GPT language model fine-tuned on publicly available code from GitHub." That verbiage shows the data-gorged AI horse is already out of the barn, so to speak. Shutting the doors at this point won't change the fact that the AI industry is built on data gathered without asking for a strong indicator of enthusiastic consent. ®
[2]
GitHub Copilot will use your data for AI training by default, but you can opt out
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. A hot potato: GitHub has announced that starting April 24, the company will begin using interaction data from Copilot Free, Pro, and Pro+ users to train and improve its AI models unless they opt out. Copilot Business and Copilot Enterprise accounts are excluded, but for individual subscribers, the new setup is enabled by default, which is causing plenty of irritation. GitHub describes this training data as inputs, outputs, code snippets, and associated context, but the fine print goes into more detail. According to the company, it can also include code surrounding the cursor, comments and documentation, file names, repository structure, navigation patterns, chats with Copilot features, and even thumbs-up or thumbs-down feedback on suggestions. GitHub says it has already seen "meaningful improvements" after training on interaction data from Microsoft employees, including higher acceptance rates across multiple languages, and now wants to scale that approach to paying users. GitHub says it still won't use private repository content at rest to train AI models, meaning code simply stored on GitHub remains off-limits. But if you are actively using Copilot while working inside a private repo, the prompts, suggestions, generated snippets, and surrounding context from that session may still be collected for training - unless you switch the setting off. That's technically not the same as training on your stored private repo, though many developers probably won't find the distinction especially comforting. If you do want to opt-out, and it's likely that most people will, head to Copilot settings, find the Privacy section, and set Allow GitHub to use my data for AI model training to Disabled. Github says anyone who previously opted out of data collection for product improvements will keep that preference, so they won't suddenly be volunteered into training next month. GitHub also says data shared under the new policy may be used by affiliates, including Microsoft, though not by third-party AI model providers for their own separate training. Unsurprisingly, response to the update, especially the fact that users must opt-out, has not been positive. A GitHub community post announcing the move has seen 117 thumbs-down votes and a slew of angry comments.
[3]
GitHub's Copilot will use you as AI training data, but you can opt out
Corbin Davenport is the News Editor at How-To Geek and an independent software developer. He also runs Tech Tales, a technology history podcast. Send him an email at [email protected]! Corbin previously worked at Android Police, PC Gamer, and XDA before joining How-To Geek. He has over a decade of experience writing about tech, and has worked on several web apps and browser extensions. The generative AI models powering ChatGPT, Copilot Gemini, and other assistants were created with mountains of training data. Now, Microsoft will start using interactions with GitHub Copilot as another source of that information, unless you specifically opt out of the collection. GitHub, the popular coding platform owned by Microsoft, announced today that interactions with GitHub Copilot will be used to "train and improve our AI models." GitHub Copilot is the AI code assistance tool integrated in Visual Studio Code, the GitHub website, the Copilot CLI tool (which competes with Claude Code), and other services. That includes any input or output data, code snippets, comments and documentation, file names, repository structure, and other information. If you have never used GitHub Copilot in the first place, this won't change anything. However, if you've used the code completion in Visual Studio Code, asked Copilot a question on the GitHub website, or used another related AI feature, your interactions and code snippets could be harvested. Importantly, the automatic data collection applies to both free and paid accounts. That includes Copilot Free, Copilot Pro, and Copilot Pro+ users, but not Copilot Business and Copilot Enterprise accounts. Related Visual Studio Code's latest update is a big deal for web development You won't have to switch to a browser as often. Posts 1 By Corbin Davenport The blog post explained that the initial AI models for GitHub Copilot were "built using a mix of publicly available data and hand-crafted code samples" (which didn't go over well with everyone), and the company has seen positive improvements by incorporating data from Microsoft employees. Now, GitHub is hoping that the service will become even better with more interactions used as training data. GitHub said in the announcement, "This approach aligns with established industry practices and will improve model performance for all users. By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production." How to opt out You can pause the data collection from the Copilot features page in your GitHub account settings. After you are logged into your account, there's an "Allow GitHub to use my data for AI model training" setting in the Privacy section. You just need to set that dropdown menu to "Disabled," and that's it. If you have multiple GitHub accounts, be sure to do that for each of your accounts. Subscribe to this newsletter for AI & privacy briefings Get deeper context by subscribing to our newsletter: clear, expert coverage of AI model training, developer privacy, and code-tool policy shifts, plus focused analysis to help you interpret these changes and their implications for development workflows and related developer topics. Get Updates By subscribing, you agree to receive newsletter and marketing emails, and accept our Terms of Use and Privacy Policy. You can unsubscribe anytime. Source: GitHub Blog
[4]
Bad news skeptics - GitHub says it will employ user data to train its AI after all
* GitHub rolls out on-by-default AI user data training, with optional opt-out * Business, Enterprise and some other platforms are excluded from the change * The company explains that users' real-time, live data is crucial for good training GitHub Chief Product Officer Mario Rodriguez has announced that the platform will be using user data to train its AI models, operating on an opt-out basis that automatically subscribes users into the data collection system. The change won't just affect Free users, but also Pro and Pro+ - Copilot Business, Enterprise, student accounts and teacher accounts will be exempt from the new user data training change. The company blog post adds AI-generated content as well as user feedback and interactions will all go into training the AI models. GitHub will use your data to train its AI models, it confirms Some of the elements that will go into training GitHub's AI include: inputs, like prompts and snippets of code; outputs, including accepted content and edited suggestions; code context; comments and documentation; file names and repo structures; Copilot interactions and even feedback like thumbs up/down. As well as the account types mentioned above and those who opt out, there is one third and final category of user who will be exempt from the training change. "Content from your issues, discussions, or private repositories at rest," Rodriguez writes, carefully pointing out that even private repos can be used if a user is actively using Copilot. The company is keen to point out that real-world interaction data vastly improves model training, thanking users who choose to share their data. "We believe the future of AI-assisted development depends on real-world interaction data from developers like you," the CPO added. GitHub publicly stating its position on user data training is an important step, but while users are given the option to opt out, many are still unhappy about the on-by-default setting. Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button! And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.
Share
Share
Copy Link
Microsoft's GitHub announced it will begin using customer interaction data from Copilot Free, Pro, and Pro+ users to train its AI models starting April 24. The opt-out system collects inputs, outputs, code snippets, and context from user sessions. While Copilot Business and Enterprise customers remain exempt, the developer community has responded with sharp criticism over data privacy concerns and the default enrollment approach.
Microsoft's GitHub announced a significant policy shift that will begin using customer interaction data to train its AI models starting April 24
1
. The change affects GitHub Copilot Free, Pro, and Pro+ users, who will be automatically enrolled in an opt-out system unless they manually disable the feature2
. Mario Rodriguez, GitHub's chief product officer, explained that the company will collect "specifically inputs, outputs, code snippets, and associated context" to improve AI model performance1
.Source: How-To Geek
Copilot Business and Enterprise customers remain exempt from this data collection due to their contract terms
3
. Students and teachers who access GitHub Copilot will also be spared from the new policy1
. Users who want to protect their data can opt out by visiting /settings/copilot/features and disabling "Allow GitHub to use my data for AI model training" under the Privacy heading1
.The scope of data collection extends far beyond simple code snippets. GitHub will gather inputs like prompts and code fragments, outputs including accepted content and edited suggestions, code context surrounding the cursor, comments and documentation, file names, repository structure, navigation patterns, chats with Copilot features, and even user feedback such as thumbs-up or thumbs-down reactions on suggestions
4
. This comprehensive approach to data collection represents a substantial expansion of what GitHub considers fair game for AI training.
Source: TechRadar
The policy shift raises questions about private repositories, which are supposedly "only accessible to you, people you explicitly share access with, and, for organization repositories, certain organization members"
1
. While GitHub maintains it won't use private repository content at rest to train its AI models, the company acknowledges that if a user is actively using Copilot while working inside a private repo, the prompts, suggestions, generated snippets, and surrounding context from that session may still be collected for training2
. Many developers find this distinction between stored code and active session data less than comforting.Rodriguez defended the decision by claiming that adding interaction data from Microsoft employees has led to meaningful improvements, including increased acceptance rates for AI model suggestions across multiple languages
2
. "By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production," he wrote1
.To justify this approach, GitHub noted in its FAQs that Anthropic, JetBrains, and corporate parent Microsoft operate similar opt-out data use policies
1
. The company emphasized that "this approach aligns with established industry practices" according to US norms, where opt-out is standard, as opposed to European regulations where opt-in is commonly required1
. GitHub also confirmed that data shared under the new policy may be used by affiliates, including Microsoft, though not by third-party AI model providers for their own separate training2
.Related Stories
The developer community has responded with overwhelming negativity to the announcement. A GitHub community post discussing the change received 117 thumbs-down votes compared to just three rocket ship emojis, which typically signal excitement. Among 39 posts commenting on the change, no one other than Martin Woodward, GitHub VP of developer relations, endorsed the idea
1
.Source: TechSpot
The backlash centers on data privacy concerns and the default enrollment approach that requires active steps to protect user information. Critics argue that paying customers—particularly those on Pro and Pro+ tiers—should not be automatically enrolled in data collection schemes. The fact that OpenAI's Codex, used in GitHub Copilot, was already "fine-tuned on publicly available code from GitHub" suggests the AI industry is built on data gathered without enthusiastic consent
1
. This latest move reinforces concerns that the data-gathering practices continue to expand despite user objections.Summarized by
Navi
[1]
1
Policy and Regulation

2
Policy and Regulation

3
Technology
