GitHub will use Copilot user data to train its AI models starting April 24, sparking backlash

4 Sources

Share

Microsoft's GitHub announced it will begin using customer interaction data from Copilot Free, Pro, and Pro+ users to train its AI models starting April 24. The opt-out system collects inputs, outputs, code snippets, and context from user sessions. While Copilot Business and Enterprise customers remain exempt, the developer community has responded with sharp criticism over data privacy concerns and the default enrollment approach.

GitHub Shifts to Opt-Out AI Training Policy

Microsoft's GitHub announced a significant policy shift that will begin using customer interaction data to train its AI models starting April 24

1

. The change affects GitHub Copilot Free, Pro, and Pro+ users, who will be automatically enrolled in an opt-out system unless they manually disable the feature

2

. Mario Rodriguez, GitHub's chief product officer, explained that the company will collect "specifically inputs, outputs, code snippets, and associated context" to improve AI model performance

1

.

Source: How-To Geek

Source: How-To Geek

Copilot Business and Enterprise customers remain exempt from this data collection due to their contract terms

3

. Students and teachers who access GitHub Copilot will also be spared from the new policy

1

. Users who want to protect their data can opt out by visiting /settings/copilot/features and disabling "Allow GitHub to use my data for AI model training" under the Privacy heading

1

.

What Data GitHub Will Collect

The scope of data collection extends far beyond simple code snippets. GitHub will gather inputs like prompts and code fragments, outputs including accepted content and edited suggestions, code context surrounding the cursor, comments and documentation, file names, repository structure, navigation patterns, chats with Copilot features, and even user feedback such as thumbs-up or thumbs-down reactions on suggestions

4

. This comprehensive approach to data collection represents a substantial expansion of what GitHub considers fair game for AI training.

Source: TechRadar

Source: TechRadar

The policy shift raises questions about private repositories, which are supposedly "only accessible to you, people you explicitly share access with, and, for organization repositories, certain organization members"

1

. While GitHub maintains it won't use private repository content at rest to train its AI models, the company acknowledges that if a user is actively using Copilot while working inside a private repo, the prompts, suggestions, generated snippets, and surrounding context from that session may still be collected for training

2

. Many developers find this distinction between stored code and active session data less than comforting.

Justification and Industry Context

Rodriguez defended the decision by claiming that adding interaction data from Microsoft employees has led to meaningful improvements, including increased acceptance rates for AI model suggestions across multiple languages

2

. "By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production," he wrote

1

.

To justify this approach, GitHub noted in its FAQs that Anthropic, JetBrains, and corporate parent Microsoft operate similar opt-out data use policies

1

. The company emphasized that "this approach aligns with established industry practices" according to US norms, where opt-out is standard, as opposed to European regulations where opt-in is commonly required

1

. GitHub also confirmed that data shared under the new policy may be used by affiliates, including Microsoft, though not by third-party AI model providers for their own separate training

2

.

Developer Community Backlash

The developer community has responded with overwhelming negativity to the announcement. A GitHub community post discussing the change received 117 thumbs-down votes compared to just three rocket ship emojis, which typically signal excitement. Among 39 posts commenting on the change, no one other than Martin Woodward, GitHub VP of developer relations, endorsed the idea

1

.

Source: TechSpot

Source: TechSpot

The backlash centers on data privacy concerns and the default enrollment approach that requires active steps to protect user information. Critics argue that paying customers—particularly those on Pro and Pro+ tiers—should not be automatically enrolled in data collection schemes. The fact that OpenAI's Codex, used in GitHub Copilot, was already "fine-tuned on publicly available code from GitHub" suggests the AI industry is built on data gathered without enthusiastic consent

1

. This latest move reinforces concerns that the data-gathering practices continue to expand despite user objections.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo