GitHub to train AI models on Copilot user data unless you opt out by April 24

2 Sources

Share

Microsoft's GitHub will begin using customer interaction data from Copilot Free, Pro, and Pro+ users to train AI models starting April 24. The policy change affects code snippets, inputs, and outputs from private repositories. Users can opt out through GitHub account settings, though the company encourages participation to improve code suggestions and bug detection.

GitHub Policy Enables AI Training on User Interactions

Microsoft's GitHub announced a significant shift in its GitHub policy that will allow the company to train AI models using customer interaction data from millions of developers. Starting April 24, GitHub Copilot will collect inputs, outputs, code snippets, and associated context from users of Copilot Free, Pro, and Pro+ tiers

1

. The policy change represents a departure from previous practices, though Copilot Business and Copilot Enterprise customers remain exempt due to their contractual terms

2

. Students and teachers accessing the AI coding assistant will also be spared from this data collection initiative.

Source: How-To Geek

Source: How-To Geek

Mario Rodriguez, GitHub's chief product officer, defended the decision by pointing to similar opt-out policies at Anthropic, JetBrains, and corporate parent Microsoft

1

. He claims that adding interaction data from Microsoft employees has already led to meaningful improvements, including increased acceptance rates for AI model suggestions. According to Rodriguez, "By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production"

1

.

Private Repositories No Longer Fully Private

The policy shift raises significant data privacy concerns, particularly regarding private repositories. GitHub's own FAQs acknowledge that if a user has enabled model training in their settings, code snippets from private repositories can be collected while actively engaged with GitHub Copilot

1

. This fundamentally changes what "private" means in the context of GitHub private repositories, which were previously described as "only accessible to you, people you explicitly share access with, and, for organization repositories, certain organization members."

Source: The Register

Source: The Register

The scope of AI training data collection extends beyond simple code snippets. GitHub will harvest input and output data, comments and documentation, file names, repository structure, and other contextual information

2

. This comprehensive approach to customer interaction data collection affects anyone who has used code completion in Visual Studio Code, asked Copilot a question on the GitHub website, or utilized other related AI features.

How to Opt Out of Data Collection

Users concerned about developer privacy can opt out of data collection by visiting their GitHub account settings. To disable the feature, navigate to /settings/copilot/features and set "Allow GitHub to use my data for AI model training" to "Disabled" under the Privacy heading

1

. Users with multiple accounts must repeat this process for each account

2

.

The opt-out policy follows "established industry practices," meaning US norms rather than European regulations where opt-in is commonly required

1

. This distinction matters for developers worldwide who may expect different privacy protections based on their location.

Developer Community Pushback

The GitHub community response has been overwhelmingly negative. In the GitHub community discussion, users offered 59 thumbs-down votes compared to just three rocket ship emojis signaling excitement

1

. Among 39 posts commenting on the change, only Martin Woodward, GitHub VP of developer relations, endorsed the idea. This backlash reflects broader concerns about how AI companies collect and use developer code without explicit consent.

The controversy isn't entirely new. OpenAI's Codex, which powers GitHub Copilot, was already "fine-tuned on publicly available code from GitHub"

1

. However, extending this practice to private repositories and paid user interactions represents a notable escalation. The AI industry's foundation on data gathered without enthusiastic consent continues to fuel debate about ethical AI development and the balance between innovation and privacy rights.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo