Curated by THEOUTPOST
On Tue, 24 Dec, 4:02 PM UTC
4 Sources
[1]
Event Report: Governing the AI Ecosystem, December 11, 2024
As AI continues to sculpt our digital landscape, we delved into key questions surrounding India's AI mission at MediaNama's roundtable discussion on facilitative regulations for the AI ecosystem in India in Bangalore. We explored how to effectively allocate resources like compute capacity while ensuring balanced investment in crucial areas such as datasets, AI research, and skilling. The discussion also focused on the need to create culturally relevant, high-quality datasets to improve AI model development, particularly for regional languages and local trends. We explored frameworks for AI regulation that balance fostering innovation with mitigating risks, particularly in sectors like healthcare and defense. Additionally, privacy concerns and data access issues highlighted the need for better government data consolidation, anonymisation, and transparency. Lastly, the conversation examined strategies to nurture India's AI industry, emphasising talent development, foundational model creation, and incentivising research and development through government support and tax benefits. Recently, the Ministry of Electronics and Information Technology (MeitY) disclosed a detailed breakdown of the RS 10,372 crore financial outlay for India's AI mission. MediaNama's discussion began with a debate on the 44% allocation (Rs 4,563.36 crore) for compute capacity, with some arguing that this emphasis on computation could overshadow other crucial areas, such as datasets, AI research, and skilling. While compute was deemed necessary for AI model training and real-time inference, many believed that datasets -- especially diverse, high-quality ones -- are more essential for developing effective models. However, others contended that the growing demand for compute, particularly for inference -- when users interact with the model in real-time, makes the 44% allocation the bare minimum. The debate also highlighted concerns over the government's role in efficiently allocating compute resources and ensuring the quality of datasets for AI development. The discussion on India's AI development highlighted key issues related to indigenous datasets and their role in creating culturally relevant AI tools. Some argued that popular AI models like ChatGPT which are trained on predominantly English data, fail to capture cultural nuances in regional languages. Some speakers felt that there was a growing need for datasets that reflect contemporary culture, including local dialects and emerging trends. The conversation also tapped on the importance of compensating dataset creators and ensuring ethical data use. Some emphasised the need for India to be self-sufficient in AI, while others cautioned against overreliance on foreign technology, stressing the importance of trusted external partnerships. The debate underscored the balance between developing homegrown solutions and leveraging global expertise to ensure resilient, culturally relevant AI. Furthermore, MediaNama's discussion on AI regulation emphasised the need for a balanced approach to mitigate harm without stifling innovation. The conversation focused on challenges in attributing liability for AI decisions, given their probabilistic nature. Experts also debated contradictions in AI data licensing, proposing solutions like statutory licensing models to ensure creators receive royalties for AI training data. Regulatory sandboxes were discussed as a tool to test innovation safely, although concerns about rights and intellectual property violations remained. Speakers also highlighted the importance of clear, proactive legal frameworks for AI, with a focus on mitigating harm in high-risk sectors like healthcare and defense, while allowing flexibility in low-risk applications. Early regulation was seen as essential to prevent monopolistic markets and promote fair competition, without hampering innovation. Moreover, the discussion on privacy and data access with respect to AI in India highlighted the need for better consolidation and accessibility of government-held datasets. Speakers stated that despite India's vast data resources, including healthcare and legislative records, access remained restricted due to bureaucratic barriers and poor data quality. Experts emphasised the importance of anonymising data, such as health sector datasets, for use in AI models while addressing concerns about the effectiveness of anonymisation. Conflicting views on privacy protection emerged, especially regarding India's Data Protection Act and the treatment of publicly available personal data. Proposed solutions included blending government-led privacy protections with private-sector innovations like homomorphic encryption. The need for greater transparency in data sharing and licensing, as well as a clear taxonomy for data classification, was also discussed as essential for responsible AI development and preventing misuse. Last but not the least, the discussion on the growing AI industry in India emphasised the need to develop a skilled workforce and establish foundational models. While it was acknowledged that India lags behind the USA and China, it was argued that it is not too late to build its own AI models, as seen in India's success with nuclear power. Government funding for AI startups was debated, with concerns about fairness in using taxpayer money, but suggestions included incentivising research and development through tax benefits and regulatory exemptions. It was also noted that the government should invest in areas underfunded by private sectors, while prioritising talent retention and fostering innovation. With strong venture capital and capital availability, the discussion felt that India's focus should be on long-term investments that contribute to AI advancement.
[2]
On AI regulation to mitigate harm without stifling innovation #Nama
During the Question Hour in the winter session of Parliament, the Ministry of Electronics and Information Technology (MeitY) responded to Lok Sabha queries on 'AI Governance,' stating that it is open to the possibility of introducing legislation for AI regulation. This marked a shift from the government's earlier position, where it emphasized a self-regulatory approach. "Government's aim is to create a supportive environment that encourages organizations to follow good practices voluntarily," it had said. During our recent MediaNama's roundtable discussion on facilitative regulations for the AI ecosystem in India, Nikhil Pahwa, founder, MediaNama asked a crucial question, "When a human makes a decision, it's theirs. When a machine makes a decision, whose is it?", highlighting the challenges of attributing liability for AI outcomes, which are probabilistic rather than deterministic. In traditional liability frameworks, a human is held accountable for decisions they make, but with AI, it is unclear who is responsible when a machine makes a decision. This raises fundamental questions about how law should evolve to address this new reality. "If you look at just liability as a concept, liability often depends on attribution to deterministic outcomes. But AI outcomes are probabilistic in nature. So how do you then attribute liability?", he asked. Furthermore, Pahwa stated "As things change, regulation will come in. How much do we regulate in a manner that innovation isn't stifled, and how do we innovate in a manner that regulation isn't needed?", emphasizing the need for regulation that encourages innovation while also curbing unchecked development. C. Chaitanya, Co-Founder and CTO, Ozonetel Communications discussed the difficulties surrounding the licensing of AI models and data, particularly when AI companies have contradictory rules about data usage. He also questioned the fairness of not allowing AI-generated data to be used for training models while others can freely scrape data. This highlights the growing tension between intellectual property rights, open-source philosophy, and AI development. "One of the first licenses that OpenAI released stated that any data generated using OpenAI shouldn't be used for training. Now, I don't like that. You can scrape my data and use it for training, but then I generate some data and am not allowed to train it? That doesn't make sense. What I don't like is the hypocrisy," he stated. Ajay Kumar, from Triumvir Law proposed a statutory licensing model for copyrighted AI training content, where creators would receive royalties when their data is used in training AI. This could be a solution to the current challenge of regulating AI and copyright infringement. "The Copyright Act includes a provision for statutory licensing of copyrighted works. This means that if someone's work is not publicly accessible, they can apply for a government license and pay a fee. To enable an AI framework, we need to implement a similar statutory licensing mechanism that allows models to be trained on copyrighted content generated in India, while also ensuring that creators receive royalties. The key solution here is that if someone uses my copyrighted work to train their AI, I should be entitled to a royalty. This approach should be the solution", he said. Furthermore, Pahwa and other attendees suggested that the licensing of AI could evolve similarly to Creative Commons licenses, where different usage rights are assigned to different parts of a model, such as its architecture, data, or outputs. A Creative Commons license is a free, standardized way for copyright holders to allow others to use their work under certain conditions. "I think what we'll probably end up with is that, just like you have different types of Creative Commons licenses, the availability of usage will likely be broken down into different parts, and everyone can adjust it according to how they want their models to be used", added Pahwa. Kaustubha Kalidindi, Legal Counsel of Tattle, added that the open-source initiative is currently debating the definition of open-source AI, particularly regarding whether it involves sharing data, source code, and model weights. Once they finalise this definition, it will help determine which licenses are applicable. Different licenses are already emerging, such as Llama's, which imposes usage restrictions. If users modify or adapt it, they must adhere to Llama's license terms. Another type of license being discussed is the Responsible AI license (RAIL), which similarly places restrictions on how models can be used, leaving it to the user to comply with those conditions. The concept of regulatory sandboxes was debated, especially in terms of whether they allow too much flexibility that could lead to rights violations. Some argue that sandboxes provide a controlled environment to test innovations without exposing consumers to risks, while others worry about intellectual property violations. Pahwa raised a concern, asking "Why should someone's rights be compromised in an uncertain sandbox, even if it's a controlled environment?" Sourabh Roy, Research Fellow at Vidhi Centre For Legal Policy, said "The whole idea of having a regulatory sandbox is to have a controlled testing environment where we can test innovation in real time without exposing consumers to risks". Sneha Priya Yanappa, Team Lead at Vidhi Centre for Legal Policy said "I view the regulatory sandbox as a tool to regulate, not restrict." "We've seen this approach work in fintech and financial services, providing a relaxed environment that benefits both innovators and regulators. It allows regulators to understand the pros and cons of innovations. While the system is still a work in progress, I believe it offers more benefits than risks. Although concerns about arbitrage exist, especially when trying to attract more innovation", she added. "You can actually design sandboxes. The key is to design it properly, and it has to be very specific. It will require collaboration between legal, government, and technical experts who want to avail of the sandbox, in order to define the exemptions. It's not a blanket, go-pass, or free card for an entire sector. There should be time limitations, financial caps, and a very detailed plan from anyone wanting to access the sandbox," said Sohan, an attendee. In response Ajay Kumar said that "There will be political pushback. There will be struggles. We need to decide where to build the sandbox. If we allow sandboxes, the average person in this country shouldn't be severely affected if, for example, that service disappears tomorrow or if the company fails to comply with regulations," emphasizing the importance of carefully considering the impact of AI sandboxes on the general population. Kalidindi highlighted that if we had a strong trust and safety ecosystem, along with incentives for organisations to develop secure tools, the risks from sandboxes might be less concerning. "Would we be as concerned about risks from sandboxes if, say, we had a strong trust and safety tooling, or trust and safety ecosystem?", asked Kalidindi. Notably, under the India AI mission, only Rs. 20.46 crore (0.2% of the total Rs. 10,371.92 crore) has been allocated to Safe & Trusted AI. Sreeja Sen, Research Manager, Digital Futures Lab said that there is a need for a clear, proactive legal framework to prevent uncertainty, especially concerning data protection and generative AI use. "Things should not be reactive; they should be proactive and functional," she added. However, the tension between innovation and regulation becomes evident as existing AI models have already leveraged vast amounts of personal and publicly available data for training. Introducing regulations at this stage raises the question: how do we reconcile the advantages gained by early adopters with the need for responsible governance moving forward? In response, Sen said that innovation should always be grounded in rights frameworks. She noted that, while innovation is important, responsible use of data and AI must be regulated, even if it means challenging major companies. "Innovation without rights does not exist, should not exist, in my opinion.", she said. Using the example of Meta, she said that India, as a significant market, can demand stricter adherence to rights and data protection from corporations without fearing they will abandon the market. "If we decide to pause Meta to amend their agreements and ensure our data is handled appropriately, we can attempt it. There might be conflicts, but it's possible. Do you really think Meta would walk away from a market of 1.3 billion people?", she stated. Adarsh Lathika, Project Anchor, PeoplePlusAI asked whether it's too early to regulate AI or if the government should wait for markets to develop. "When is the right time to frame regulation? Are we acting too early, or is this the right moment? Should we let markets naturally develop the model, see how it plays out, and then set guidelines? Or are we jumping the gun?", he asked. Sen countered stating that delaying regulation can lead to monopolistic markets and suggested early regulation can foster fair competition while promoting innovation. "The platform economy has shown us that delaying regulation leads to an oligopolistic market dominated by a few global powers, primarily based in the US. Do we want to continue promoting that by not regulating early? And let me emphasise, regulation does not mean stifling innovation," she stated. Vivek Abraham from Salesforce aimed to distinguish between high-risk AI applications (e.g., healthcare, defense) that require regulation and low-risk applications (e.g., Netflix recommendations) that do not. He argued that regulation should focus on mitigating harm rather than regulating for potential risks. Attendees pointed out that Self-regulation is deemed ineffective, as evidenced by its failure across various domains. For artificial intelligence, the regulatory approach is expected to be sectoral, tailored to the specific industries and use cases where AI is deployed. Kalidindi advocated for government regulation to prevent harm. Yanappa noted that self-regulation often prioritizes public image management over fostering genuine ethical commitments. An attendee concluded that when considering regulation versus self-regulation, it's essential to evaluate the underlying infrastructure, particularly the effectiveness of the legal system. He suggested that in a country with a well-functioning judiciary, where courts can swiftly address issues like deepfakes or other concerns, there might not be any need for formal regulation. In this view, the legal system would be sufficient to handle potential harms or disputes, making strict regulations unnecessary as long as the courts can ensure accountability and justice in a timely manner. "I'm happy to have a no regulation country. If you can tell me the courts will work", he stated.
[3]
Skills, funds and more: Speakers on ways to grow AI in India
Nikhil Pahwa, Editor of MediaNama, pointed out that engineers in Silicon Valley had the skill set to build frontier AI models. He questioned whether India possesses the necessary skill set beyond a handful of experts and asked how to cultivate and retain such talent. "Do we have that skill set? If not, what do we do to develop it?" he asked, wondering if the 8.5% budget allocation for skilling in the IndiaAI mission was sufficient to solve the problem. He asked for strategies to attract and retain talent within India, ensuring it does not migrate to markets like San Francisco, where opportunities might seem more lucrative. C. Chaitanya, CoFounder and CTO of Ozonetel Communications, argued that attracting talent to AI required demonstrating a local market. "The market for AI will be local, and that's an advantage. If you want a Telugu song system, it's local. You need to be here. You need to understand the culture here. You need to work with the creators here and then build those models up," he said. Regarding skilling, Chaitanya shared his experience as an assistant professor, observing that the challenge lies not in the lack of resources but in the willingness to learn. "Everything is available on the internet. Anybody wants to get skilled, they can get skilled. Nothing is stopping them right now. It's free. They don't need to pay anything. But nobody wants to get skilled," he said. He outlined a pilot skilling programme supported by the Telangana government, targeting 100,000 students through a four-week course starting in December. Chaitanya expressed the view that India should pursue both foundational model development and alternative approaches. "We should build foundational models. Even if you fail, you learn something," he said. However, he pointed out the high cost of current models and proposed incentivising innovation. "I would announce a $10 million prize for someone to develop a way to predict the next word without GPUs. Do we even need transformer models? There are multiple other ways to achieve this, but we're not focusing on them," he added. However, Nikhil Pahwa pointed out that USA and China had a significant head start in the AI development race. "Is it too late?" he asked. Umang Jaipuria, an engineer based in San Francisco, argued that it is not too late for India to develop its own frontier models, drawing a parallel with India's development of nuclear power in the 1990s despite the technology existing for decades. He emphasised the importance of India having its own models, even in an interdependent world. "India having its own frontier model doesn't mean we don't use OpenAI's models or Llama. It has to be everything, but there must be a backstop," they said. They stressed the need to develop domestic skills and infrastructure to ensure that India can function independently if global resources become inaccessible. "Do we have the skills?" asked Nikhil Pahwa. C Chaitanya said he believed so. "We have a zero-to-one problem, but once someone shows it can be done, we'll do it," he said, jokingly suggesting that foundation models from India could emerge soon. He felt that the process is straightforward, relying on existing technologies like transformers, GPUs and datasets such as Web 2 and Pile. "You just need to run it for two months, and you'll get a foundation model," he claimed. However, he argued that India should not replicate existing methods, but either build models differently or leverage open-source models like Llama to create value on top of them. Vivek Abraham, VP at Salesforce argued that AI could render India's tech industry obsolete, necessitating investment into foundational models, or at least the skills needed to build them. "Most of our technology industry is invested in this body shopping labour arbitrage model, which is the prime target for any company, including us, looking to overshadow them. So, four or five years from now, every engineer who is aiming to go into TCS or an Infosys or Wipro, that job is no longer going to be there. You're getting better quality code, faster code, more advanced code, secure code from AI," he explained. Nikhil Pahwa stated that 18.37% of the IndiaAI Mission's budget was allocated to startup financing. "How can we better enable startups in the country?" he asked. Deb from OurselfStudy, an audience member, sought Government support during product roll out. Drawing a parallel with UPI's success, he pointed out that its widespread adoption resulted from the efforts of platforms like Google Pay, PhonePe, and Paytm. He highlighted the challenges startups face due to limited resources and called for external support to enable them to achieve similar success. "We don't have the deep pockets to roll out at that scale, but with support, we can create another Paytm or something similar," he concluded. Sourabh Roy, Research Fellow at the Vidhi Centre For Legal Policy, suggested a few principles to keep in mind while setting policies for AI growth in India: equal treatment for domestic and foreign entities, avoiding geographical and national preferences, avoiding incentives tied to export performances or asking foreign entities to only use domestically produced components, as that would violate subsidies under the WTO. "Let's respect the regulations while making these incentives so that other countries don't come and start disputing as to what kind of incentives we are giving," he stated. He also suggested structuring these incentives around national objectives and public goods like health care or smart cities. Ajay Kumar, partner at Triumvir Law argued against direct government financing for start-ups, questioning the fairness of using taxpayer money to subsidise potential billion-dollar companies. "Why should a taxpayer in a tier-three city subsidise the next billion-dollar company? It does not make sense," he asked. Kumar suggested that if the government were to invest, it should acquire equity in exchange, ensuring a stake in the start-up's future success. "The government should not rely on direct outlay for start-up financing," he concluded. He raised ethical concerns about the government utilising tax payer money to finance a private actor to take risks and draw a profit. Kumar also highlighted barriers to individual participation in start-up financing in India. He noted that single investors must meet multiple thresholds to qualify for early-stage institutional rounds, which restricts access to investing in start-ups. He contrasted this with the ease of buying shares in listed companies and questioned who benefits from the current system, arguing that it primarily favours those with existing capital. "We're not really letting the next generation in India participate," he commented. Adarsh Lathika, Project Anchor of the Policy Working Group at PeoplePlusAI pointed out that the IndiaAI Mission was already bearing the majority of expenses startups would accrue, by providing compute capacity and datasets. "What extra money do they need, other than people cost?" he asked. On the other hand, Kesava Reddy, Chief Revenue Officer at E2E Networks disagreed. He explained that startups receive funding from a "fund of funds," set up by the government that channels money through various venture capital firms. Startups also give back to the fund whenever they get a return, he claimed. Ajay Kumar argued that incentivising growth did not require direct funding, citing examples of taxpayer incentives, land-based incentives and infrastructural support used across industries. He also questioned the practicality of a sovereign wealth fund for India, noting that such funds are typically created by nations with surplus wealth to preserve for future generations. He pointed out that India, which he claimed had increasing deficits, with reliance on borrowed money, was not in the same position. When discussing early-stage AI research versus funding commercially viable products, Ajay Kumar highlighted that commercial viability often emerges from non-viable research. He referenced how large language models in the US originated from academic research that later became commercially successful. He suggested awarding fellowships without restricting them to national talent. "The universities in the West are great not because they have smart people living in their countries. It's because smart people living in this country go to that country and do their research," he said. Chaitanya added that the government need not fund start-up products, given the strength of India's venture capital and angel investor ecosystem. However, he stressed the importance of government support for research, as venture capitalists and angel investors in India typically do not fund research. Vivek Abraham supported the idea of government investment in areas where others are not investing, suggesting that the state should not expect returns from such funding. He cited the original fund-of-funds scheme, which offered to chip in one-fourth of the investment sought by firms. He argued that the government should focus on sectors where private investment is absent, rather than duplicating efforts in areas already served by private markets. Sameer Krishnamurthy from Element Technologies noted that funding for large-language models depends on monetisation. He argued that once a model demonstrates profitability, funding will follow, emphasising that India is not short of high-net-worth individuals or capital. Ajay Kumar argued in support of tax incentives for AI startups, stating that they make sense because potential revenue from businesses benefiting from these incentives has not yet been accounted for. He stated that greenfield tax incentives for AI companies would not significantly impact the budget over the next five to ten years. Kumar outlined specific tax incentives that could support AI development. These include GST exemptions for private providers certified to train individuals in AI skills, customs duty concessions for importing computer chips, income tax holidays to allow reinvestment into businesses, and tailored amortisation schedules for chip investments. He highlighted Bangalore's success as an IT hub as evidence that tax and regulatory exemptions can effectively incentivise growth.
[4]
Experts Debate AI Privacy vs. Dataset Access #Nama
At MediaNama's roundtable discussion on 'Governing The AI Ecosystem', experts debated the importance of easily accessible datasets that developers could use to train AI models and their inherent privacy risks. C Chaitanya, Co-Founder and CTO at Ozonetel Communications, stated that modern AI systems require tremendous amounts of data. He also highlighted challenges in accessing existing government datasets, stating that he had already asked the Telangana government for access to data. "They won't give it. It's as simple as that," he stated. Nikhil Pahwa, Editor of MediaNama also brought up key questions about the accessibility and control of government data. He asked what framework should govern the release of government data, who should have access to it, and under what conditions. "If the government releases its data, who should it go to, how should it go, and what should the licensing framework be?" he asked. Kesava Reddy, Chief Revenue Officer at E2E Networks, stated that a lot of quality datasets were available across various sectors and provided healthcare as an example. He noted that states already provide free healthcare services and possess a wealth of data, such as DICOM images of X-rays and radiologists' reports, awaiting the government's willingness to share this data with researchers. He also mentioned the vast amounts of geospatial data available in India, stating, "Each and every geolocation in India has 200 layers of data." However, he stated that the government needed to consolidate and make the data accessible for research and model development. Ajay Kumar, a partner at Triumvir Law pointed out that governments in India hold vast amounts of data, including extensive records from legislative assemblies, courts, and archives. However, he criticized the inaccessibility of this data, citing barriers like Captcha restrictions on Supreme Court judgments and the lack of one-click PDF downloads. "The government sits on the largest dataset in this country," he said, urging easier access to public data. Adarsh Lathika, the Project Anchor for the Policy Working Group at PeoplePlusAI, raised concerns about the Government of India's ability to aggregate data effectively across states, districts, and other subdivisions. He noted that while data might exist in isolation within various bureaucratic layers, consolidating it into a unified dataset remains a challenge. He also criticised the quality of the existing datasets. "In the last 15 years, I have probably gone through statistical websites of close to 80 to 100 countries in the world and I can very confidently say that the Indian government has the poorest quality of data at this point of time," he claimed. However, Sneha Priya Yanappa, Team Lead at the Vidhi Centre for Legal Policy, brought up the government's ongoing efforts to bridge data silos within government departments. She pointed out that some municipal corporations, particularly in Karnataka, were adopting open data policies and facilitating conversations between departments to better utilize available data. However, she raised concerns about the legitimacy of data scraping as a practice and its implications for privacy. Sourabh Roy, a Research Fellow at the Vidhi Centre For Legal Policy emphasised the potential of government data collection attempts. He highlighted initiatives like Haryana's Parivar Pehchan Act, which collects detailed family-level data to enable targeted service delivery. "Just imagine the impact that can have in improving our AI models with this kind of dataset," he said. Vasanthika Srinath, a partner at Kosmos Partners, referred to the efforts of states like Telangana and Karnataka in developing significant projects to ensure the usefulness of datasets. Many states possess data, but departments scatter it across their systems, and they have a limited understanding of its potential uses. She also pointed to Karnataka's extensive data lake project, which integrates data from various departments and could serve as a valuable resource for broader applications. Paresh Ashara, VP at Quinte Financial Technologies, emphasised the importance of using government-held data while ensuring privacy through anonymisation. "I would want that data to be anonymized to an extent where the privately identifiable information is not made public," he stated. However, he supported making aggregated datasets available, such as health sector data, to train models for predictive analysis. Referring to examples like X-rays and MRI scans mentioned earlier, he stated that the information was of a very high quality and could be used at an aggregate level to train models that predict health information. Nikhil Pahwa, Editor of MediaNama, brought up a key distinction in India's data protection framework. He noted that the Data Protection Act does not protect publicly available personal data, such as photos or information shared on social media. However, the Joint Parliamentary Committee had recommended privacy protections for anonymised and non-personal data, arguing that risks of re-identification exist when someone layers such data on personal data. "Completely contrasting views exist on how even anonymized private data can be used," he remarked. He also highlighted the limitations of differential privacy, noting that even in its early days, instances of re-identification demonstrated its imperfections. "For everything that you do, there is a counter. It's essentially an arms race," explained Pahwa. Addressing privacy concerns, Vasanthika Srinath noted that these data lake programs often utilize anonymized data, not just pseudonymized data. "They pull out all the metadata, identify various identifiers, and ensure the data is fully anonymized," she explained, adding that this approach removes privacy constraints and makes the data more usable. She emphasized the importance of all states investing in similar programs to create a unified system that benefits the entire country. "All states need to perhaps invest and get there to make it useful for the entire country," she concluded. Umang Jaipuria, a San Francisco-based engineer, argued that neither governments nor private companies handle privacy concerns perfectly. He stated that private companies prioritize their incentives over individual privacy, while government measures, such as GDPR, often lead to excessive user friction, like cookie banners. Jaipuria suggested that India should aim for a balance by combining government-led privacy protections with advancements in private-sector technologies. He highlighted homomorphic encryption as a promising solution, which allows computation on encrypted data without decrypting it. According to Jaipuria, privacy concerns shouldn't stop us from using data effectively, but leaving unencrypted data sitting anywhere was a massive liability. Another speaker referenced the UK's NHS initiative OpenSAFELY as an example of a secure approach to data use. He explained that this program allows researchers to analyze data for public good without accessing the raw data directly, significantly reducing re-identification risks. "That is a very high bar in terms of, you know, making sure privacy concerns are addressed and you don't have to worry about leaking re-identification and so on," he said. He also noted cultural differences in the private sector, where companies like Zomato were sharing certain datasets, including weather and climate data. Kesava Reddy expressed his view that private data should not be a part of AI training datasets. However, he felt that private citizens should be able to share their data voluntarily for public good purposes. C Chaitanya shared an example from his own experience with Chandamama stories, where the property had been owned by another publisher since 2013. To avoid unauthorized scraping, his team approached both the original creators and the new rights holders for permission. However, he noted the ongoing lack of clarity in navigating such licensing and data-sharing issues. "We don't want to be OpenAI; we don't want to scrape data without telling them," he said. Sourabh Roy suggested California's AB 2013 as an example, which requires disclosing detailed information about datasets, including their categories, characteristics, the number of data points, their intended purpose, and descriptions. "I think more transparency and clarity in datasets is going to help because that's the building block after all," they stated. Hari Bhardwaj, an independent lawyer, suggested that the starting point for discussions about data should involve developing a credible taxonomy. He proposed classifying data based on common characteristics such as whether it is private or public, anonymous or non-anonymous, for commercial or non-commercial use, and verified or non-verified. "You would need to think about a whole bunch of things before arriving at a taxonomy," he said. But once established, it could inform how to license or use the data. Sameer Krishnamurthy from Element Technologies emphasised the challenge of monetizing large and small language models, warning that without viable monetization strategies, only the compute providers would profit. He shared an example from his work, where his team developed a model capable of detecting brain tumors with high accuracy using 30,000 brain scans. Despite this achievement, they struggled to find buyers, even among large hospital networks, as they had their own AIs. He suggested that the issue might stem from supply outpacing demand.
Share
Share
Copy Link
A comprehensive look at India's efforts to develop its AI ecosystem, covering regulatory challenges, data access issues, and strategies for fostering innovation while addressing privacy concerns.
India's ambitious AI mission, with a financial outlay of Rs 10,372 crore, has sparked debates on resource allocation and development priorities. The Ministry of Electronics and Information Technology (MeitY) has allocated 44% (Rs 4,563.36 crore) for compute capacity, raising concerns about potential overshadowing of other crucial areas such as datasets, AI research, and skilling 1. While some argue that this emphasis on computation is necessary for AI model training and real-time inference, others contend that diverse, high-quality datasets are more essential for developing effective models.
The development of culturally relevant AI tools has emerged as a key focus area. Popular AI models like ChatGPT, trained predominantly on English data, often fail to capture cultural nuances in regional languages 1. This has led to calls for creating datasets that reflect contemporary Indian culture, including local dialects and emerging trends. The debate underscores the need for a balance between developing homegrown solutions and leveraging global expertise to ensure resilient, culturally relevant AI.
As India considers introducing legislation for AI regulation, the challenge of balancing innovation with risk mitigation has come to the forefront. The discussion at MediaNama's roundtable highlighted the complexities of attributing liability for AI decisions, given their probabilistic nature 2. Experts proposed solutions such as statutory licensing models for AI training data to ensure creators receive royalties. The concept of regulatory sandboxes was debated as a potential tool for testing innovations safely, although concerns about rights violations persist.
The accessibility and control of government-held datasets have emerged as critical issues in India's AI development. Despite the country's vast data resources, including healthcare and legislative records, access remains restricted due to bureaucratic barriers and poor data quality 3. Experts emphasized the need for better consolidation and anonymization of data, particularly in sectors like healthcare. The discussion also highlighted conflicting views on privacy protection, especially regarding India's Data Protection Act and the treatment of publicly available personal data.
Developing a skilled workforce and establishing foundational models are seen as key priorities for growing India's AI industry. While acknowledging that India lags behind the USA and China, experts argued that it's not too late to build indigenous AI models 4. Suggestions for nurturing the AI ecosystem included incentivizing research and development through tax benefits, regulatory exemptions, and government support for areas underfunded by the private sector.
Experts highlighted the vast potential of government-held data in powering AI models. However, they also pointed out significant challenges in data consolidation and accessibility across various government departments and states 3. Initiatives like Karnataka's data lake project, which integrates data from various departments, were cited as potential models for broader applications. The debate also touched on the need for effective anonymization techniques to balance data utility with privacy concerns.
As India navigates its path in AI development, the country faces multifaceted challenges in balancing innovation, regulation, and data access. The discussions at MediaNama's roundtable underscore the need for a comprehensive approach that addresses resource allocation, cultural relevance, regulatory frameworks, privacy concerns, and talent development. As the government and private sector collaborate to foster the AI ecosystem, the focus remains on leveraging India's unique strengths while addressing its specific challenges in the global AI landscape.
Reference
[4]
The Global South faces unique challenges in balancing AI innovation with data protection, as discussed at PrivacyNama 2024. Issues include regulatory gaps, enforcement difficulties, and the complexities of using non-personal data in AI development.
4 Sources
4 Sources
India grapples with the timing and approach to AI regulation, as experts debate whether current discussions are premature given the nascent state of AI adoption in the country, especially in rural areas.
4 Sources
4 Sources
Experts discuss the complexities of developing AI while adhering to privacy laws, highlighting the need for 'Privacy by Design' and addressing challenges in data governance and regulatory compliance.
3 Sources
3 Sources
India's government is actively promoting AI development through policies and initiatives, while enterprises are gradually adopting AI technologies. Investors are showing particular interest in fintech-focused vertical AI solutions.
4 Sources
4 Sources
India is positioning itself as a potential leader in AI development, focusing on creating culturally relevant and accessible AI models. The country faces challenges in resources and pricing but sees opportunities in leveraging its unique strengths.
17 Sources
17 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved