2 Sources
2 Sources
[1]
DPDP rules will raise compliance bar for AI firms
Legal and AI-industry experts said the rules will spur more rigorous governance across AI pipelines, while creating opportunities for responsible innovation. India's newly notified Digital Personal Data Protection (DPDP) Rules are set to reshape how artificial intelligence companies collect, process and secure personal data, significantly raising the compliance threshold for AI-driven businesses. Legal and AI-industry experts said the rules will spur more rigorous governance across AI pipelines, while creating opportunities for responsible innovation. IndiaAI Mission chief executive Abhishek Singh told ET that when developers use personal data for training, they will need to anonymise and implement privacy-preserving processes, in line with the Act's requirements. He sees this as a net positive for responsible AI development and data privacy. "If anyone is having any data for training AI models, if there are personal data attributes there, then they have to do anonymisation, they have to do privacy preservation and then only use it for AI training," he said. Supratim Chakraborty, partner at law firm Khaitan & Co, said the rules mark a major shift for companies integrating AI into core products and workflows. "The release of the DPDP Rules sharply raises the compliance bar," he said. "With AI now embedded in core systems, firms must rigorously audit how personal data is sourced, labelled, and used across model training and inference. Models that cannot evidence compliant data handling will not be viable in India's regulatory environment," he added. Shashi Shekhar Vempati, independent director on the board of BharatGen, which is developing large-scale Indian language AI models under the IndiaAI Mission, said the framework is "a step in the right direction" to curb data misuse. For developers building domestic AI models, the rules may serve as a strategic shield against indiscriminate foreign data use, he said. "It is critical that personal data is collected with clear consent and for specified purposes, and that such data remains within Indian jurisdiction," Vempati said. "Personal data of a billion Indians should not be handed on a platter for foreign models using India as a petri dish for market capture." According to him, the challenge now lies in governing how AI systems reuse data, particularly when AI is used to train newer or derivative models. "Responsible homegrown developers must be supported," he said. Privacy-preserving techniques such as encryption-based data pipelines can turn regulatory requirements into a platform for innovation, he said. Vaibhav Velhankar, cofounder and chief technology officer at Segumento, an AI-based data intelligence platform, said the DPDP Rules mark a turning point in how AI models are built and validated. "The DPDP Rules are a clear signal that the future of AI in India isn't just about what models can do, but whether they are built on data foundations that are lawful, transparent, and governed with intent," he said. "As AI becomes embedded into core business processes, the movement and transformation of data through training pipelines will directly determine not only performance, but credibility and trust," he said. Also Read: Editors Guild highlights gaps in DPDP Rules, seeks clarity for media safeguards Velhankar said the framework introduces a necessary cultural shift within the AI ecosystem. "Every dataset used in training must now have demonstrable consent, clear purpose limitation, proper labelling, traceability and secure handling," he added. "This is not merely a compliance exercise; it demands a structural mindset change. Models can no longer be trained on ambiguous datasets or undocumented workflows -- they must be built on transparent, auditable processes where every training decision can be justified and every data point traced back to a lawful purpose," he said. Prajish Prasad, faculty of computer science at Pune's FLAME University, pointed out that the DPDP Act's provisions must also be understood in the context of AI's technical processes. "Most businesses rely on machine learning algorithms trained on user data," he said. "This data is mathematically transformed into model weights, which makes it technically infeasible to isolate and delete one user's contribution without retraining the model or using specialised techniques," he added. Also Read | ET Graphics: Decoding India's new data protection rules Such limitations do not inherently conflict with the Act, he said. "The key thing to remember is that these rights apply to personal data, not to the mathematical state of a model." If businesses delete the stored personal data, stop using it for future training, and cease further processing linked to that user, they remain compliant, he said. Best practices such as de-identification and clear user communication can help firms strengthen trust while aligning AI operations with the law. Also Read: Data law to unlock a Rs 10,000 crore space as firms boost compliance spends Srinivas Padmanabhuni, chief technology officer at AI Ensured, a startup focused on AI governance and responsible AI, said the DPDP Act represents not just a compliance challenge but a chance to build long-term credibility and trust. "Companies should treat compliance as a core business strategy," he said. "Embedding privacy-by-design, automating consent management and investing in explainable AI (XAI) can help ensure transparency and accountability," he said. Using regulatory technology tools can automate compliance monitoring and reduce human error, he said. "By doing so, AI companies can both meet regulatory expectations and foster sustainable innovation in India's fast-evolving digital ecosystem." Also Read | ETtech Explainer: Understanding India's new data protection law & its implications
[2]
How Will DPDP Rules Affect AI Models Collecting, Retaining Data?
India's new data protection regime is going to reshape how AI developers collect, train, and retain data, as the Digital Personal Data Protection (DPDP) Act, 2023, together with the DPDP Rules, 2025, introduces a consent-centric framework that applies to all digital personal data processed in India. It requires Data Fiduciaries to obtain free, specific, and informed consent for each specified purpose, and to present notices that clearly itemise the personal data collected and the exact purpose behind its use. Consequently, AI companies building training datasets must explain why they are collecting each data field, how they will process it, and how users can withdraw consent. Moreover, the Act mandates that data be erased once consent is withdrawn or when the specified purpose is no longer served. This requirement could force developers to design systems capable of selectively removing data from training pipelines and logs, especially for models updated continuously. However, the law also draws important boundaries. It expressly excludes personal data made publicly available by the individual or under a legal obligation, offering AI firms some flexibility in sourcing publicly posted information. Additionally, the DPDP Rules introduce a research exemption: which covers processing necessary for research, archiving, or statistical purposes, provided that such work complies with standards in the Second Schedule. Importantly, this carve-out may ease constraints on non-commercial and academic AI work. India's DPDP Rules regime forces AI firms to reconceive basic personal data collection flows with consent and purpose to be specified to users. Dhruv Garg, a technology lawyer of the Indian Governance And Policy Project research group framed the law's core aim, saying, "The idea of this Act is to create a regime where the users know what all they have said 'yes' to in terms of processing of their personal data and access, and what they have said 'no' to, and what their rights are and how they can access those rights." Consequently, companies must map services to purposes, and then map purposes to the precise data they collect and generate. As Garg explained, firms will "map all their services, the purposes for those services, what data they collect for those services, and what data they generate through those services". Meanwhile, Nikhil Jhanji, Senior Product Manager at IDfy, pushed the operational implications further, arguing that "traceability and explainability is now non-negotiable". He also recommended that teams should "embed logging directly into the data pipeline so every ingestion, preprocessing, or training event leaves a verifiable trail". Additionally, Jhanji noted that robust provenance "builds trust with regulators without exposing proprietary model design and allows for clear, transparent, and accessible communication to Data Principals". Furthermore, he argued that meaningful purpose specification sits at the centre of consent, noting that "purpose need not mean narrow" and that "purpose statements will become the privacy policies of the AI era". Several major companies offering consumer-facing chatbots, such as OpenAI, Anthropic, and Google, offer mechanisms that let users opt out of having their personal data or content used in future model training, and exclude past conversations from influencing chatbot outputs. These opt-outs gain legal force under the DPDP framework, especially once erasure becomes a statutory right tied directly to consent withdrawal. However, Garg explained that erasure sits within a broader lifecycle of consent, retention, and use. He pointed out that privacy centres will need reworking under the DPDP framework to give users easy-to-understand information and choices about data collection from obtaining "consent till the retention" stage. Garg also remarked that users must be able to decide whether certain interactions, including sensitive or emotional disclosures, are stored at all. Furthermore, he points out that a user who does not want a particular interaction kept "should have the right" to ensure it is not retained. Garg also highlighted that some categories of data cannot be erased despite a request, pointing to mandatory log-retention rules where a person cannot delete data as the law requires its preservation. Jhanji framed the erasure challenge specifically for AI systems. He stated that "erasing influence from trained models is technically impossible, but compliance is not", arguing that companies must stop any future use of the data, delete all identifiable inputs, and retrain where possible from earlier checkpoints. Furthermore, he added that clear documentation and logging of retention periods, as well as communication with third parties would help to "demonstrate transparency" and define a standard for responsible AI governance. Notably, Jhanji also stressed post-release limits, remarking that "once a model is public, full deletion is unrealistic, but accountability is not." He noted that the fair remedy is to ensure that the data "never enters future training cycles" and that the process is logged. While India's DPDP Rules framework offers a research exemption, AI companies using it as a route to large-scale training might find it difficult for commercial ventures. To begin with, the exemption applies only in tightly defined circumstances, and it does not dilute the broader duties around purpose, transparency, or safeguards. Garg explains that exemptions in the Act "are given based on purpose", and that research is only one of the limited categories. Moreover, he emphasises that the DPDP is "not saying you cannot process this data", only that processing must be anchored in clarity, except in the few places where there's "an absolute no-no". Furthermore, he adds that users must still understand "what all they have said 'yes' to", even if an exemption is involved. Therefore, companies invoking the research category must still respect transparency and meet the standards set out in the Second Schedule. Meanwhile, Jhanji warned that "research exemptions are not a loophole". He remarked that such exemptions apply only when training "serves public-interest innovation", uses anonymised data, and avoids "any form of profiling or targeting". Additionally, he noted that companies must "prove intent" and structure their work like genuine open research rather than "disguised commercial ventures". Thus, merely labelling a training project as research is insufficient. AI firms will increasingly need to prove the origin of each data point and the legal basis for collecting it, which is why Jhanji argued that a future-proof approach must move beyond vague provenance logs and toward cryptographic verification. He noted that "a provenance layer built on verifiable consent tokens would give AI builders the missing piece - a way to prove lawful collection at scale". Moreover, this model would allow each data point to carry its own audit trail, since "each data point could carry a cryptographic stamp of its origin and consent state". This, in effect, transforms provenance into a machine-verifiable system rather than a set of internal claims. Consequently, regulators would be able to demand concrete proof of lawful collection, with companies demonstrating compliance without revealing proprietary model architecture. Pertinently, Jhanji warned that provenance must also account for intermediate representations. He explained that "embeddings sit in a grey zone between anonymity and identification" and therefore must be treated as pseudonymised rather than anonymised data. As such, it is amply clear that even transformed features require responsible handling. India's DPDP framework arrives at a moment when AI firms are expanding rapidly but often rely on opaque data pipelines, unclear provenance, and retention practices that users rarely see. In this context, India's data protection law mandates companies to disclose what they collect, why they collect it, and how long they keep it. And in doing so, it brings a degree of structural clarity - something that has been absent from most commercial model-training pipelines - to AI models' development. Moreover, the Act makes consent reversible and meaningful, pushing companies to design systems that should stop using data when users withdraw permission. At the same time, the DPDP legal framework introduces mandatory log retention and purpose-mapping obligations that require firms to justify every step of the data journey. This directly affects how AI companies ingest, store, fine-tune, and reuse information, and limits reliance on casual scraping or undocumented datasets. Furthermore, the research exemption and public data carve-out provide narrow but important pathways for innovation. Yet, these routes demand evidence of intent, strong safeguards, and traceable provenance. Ultimately, the DPDP framework matters because it compels AI companies to operate with a level of transparency, accountability, and restraint that will shape the future of model development in India.
Share
Share
Copy Link
India's Digital Personal Data Protection Rules introduce stringent consent-based frameworks that will significantly reshape how AI companies collect, process, and retain personal data, raising compliance standards across the industry.
India's newly notified Digital Personal Data Protection (DPDP) Rules are fundamentally transforming how artificial intelligence companies handle personal data, establishing stringent compliance requirements that will reshape the entire AI ecosystem. The rules, which work in conjunction with the DPDP Act 2023, introduce a consent-centric framework that applies to all digital personal data processed within India's jurisdiction
1
.
Source: Economic Times
Legal and AI industry experts unanimously agree that these regulations will spur more rigorous governance across AI pipelines while creating new opportunities for responsible innovation. The framework requires companies to obtain free, specific, and informed consent for each specified purpose, fundamentally changing how AI training datasets are assembled and maintained
2
.IndiaAI Mission chief executive Abhishek Singh emphasized that developers using personal data for training must now implement anonymization and privacy-preserving processes in line with the Act's requirements. "If anyone is having any data for training AI models, if there are personal data attributes there, then they have to do anonymisation, they have to do privacy preservation and then only use it for AI training," Singh explained
1
.
Source: MediaNama
Supratim Chakraborty, partner at law firm Khaitan & Co, described the rules as marking a major shift for companies integrating AI into core products and workflows. "With AI now embedded in core systems, firms must rigorously audit how personal data is sourced, labelled, and used across model training and inference. Models that cannot evidence compliant data handling will not be viable in India's regulatory environment," he stated
1
.Related Stories
The new framework introduces significant technical challenges, particularly around data erasure and consent management. Companies must now design systems capable of selectively removing data from training pipelines when users withdraw consent, a requirement that could force fundamental changes in how AI models are developed and maintained
2
.Nikhil Jhanji, Senior Product Manager at IDfy, emphasized that "traceability and explainability is now non-negotiable," recommending that teams "embed logging directly into the data pipeline so every ingestion, preprocessing, or training event leaves a verifiable trail"
2
.Vaibhav Velhankar, cofounder and CTO at Segumento, highlighted the cultural shift required within the AI ecosystem. "Every dataset used in training must now have demonstrable consent, clear purpose limitation, proper labelling, traceability and secure handling. Models can no longer be trained on ambiguous datasets or undocumented workflows," he explained
1
.Summarized by
Navi
[1]
1
Technology

2
Technology

3
Science and Research
