4 Sources
[1]
As AI giants duel, the Global South builds its own brainpower
In a high-stakes artificial intelligence race between the United States and China, an equally transformative movement is taking shape elsewhere. From Cape Town to Bangalore, from Cairo to Riyadh, researchers, engineers and public institutions are building homegrown AI systems, models that speak not just in local languages, but with regional insight and cultural depth. The dominant narrative in AI, particularly since the early 2020s, has focused on a handful of US-based companies like OpenAI with GPT, Google with Gemini, Meta's LLaMa, Anthropic's Claude. They vie to build ever larger and more capable models. Earlier in 2025, China's DeepSeek, a Hangzhou-based startup, added a new twist by releasing large language models (LLMs) that rival their American counterparts, with a smaller computational demand. But increasingly, researchers across the Global South are challenging the notion that technological leadership in AI is the exclusive domain of these two superpowers. Instead, scientists and institutions in countries like India, South Africa, Egypt and Saudi Arabia are rethinking the very premise of generative AI. Their focus is not on scaling up, but on scaling right, building models that work for local users, in their languages, and within their social and economic realities. "How do we make sure that the entire planet benefits from AI?" asks Benjamin Rosman, a professor at the University of the Witwatersrand and a lead developer of InkubaLM, a generative model trained on five African languages. "I want more and more voices to be in the conversation".
[2]
How India can build inclusive, culturally relevant language models
While rapid strides in foundational AI models have greatly pushed language understanding and reasoning capabilities, they often fall short when it comes to representing and serving the Global South. In India, where linguistic and cultural diversity is vast, the current generation of large language models (LLMs) has often failed to capture the country's unique context. That's beginning to change through local innovation, policy direction, and community-driven data efforts. Most LLMs are trained predominantly on English online content. This inherently biases the models toward Western linguistic and cultural norms. In India, where much of the linguistic and cultural knowledge remains undigitized -- or even undocumented -- this results in poor model performance in local languages, and a lack of representation of regional customs, folklore and history. Worse still, this risks the local population not being aware of its history and culture, especially younger generations growing up in a digital-first world. Often, the areas where AI can make a difference in people's lives are remote, without robust internet connectivity. The models are not tailored to run on small edge devices, such as smartphones and tablets, for critical use cases. The language and technology barriers to using these systems -- unfamiliarity with using digital systems, not just a lack of access to them -- are not considered. Also, many cultural artifacts are available in local languages. When we translate them to train AI models, some nuances are lost. There is a need for AI focused on the Global South. India now has a pivotal opportunity to lead in building inclusive, localized AI. Solutions are already taking shape. Initiatives like AI4Bharat are building rich datasets in Indian languages, paving the way for training models that understand Indian linguistic nuances. Government-backed platforms like AI Kosha are stepping in to support these data efforts at scale. With many parts of India still lacking reliable internet connectivity, lightweight models that can run on edge devices are crucial. These are particularly important for healthcare, education, and agriculture use cases in remote regions. Sustainable, responsible AI should be built in collaboration with domain experts, social scientists, and the people who will actually use it. Financially viable open-source models -- or heavily subsidized alternatives -- could help democratize access to high-quality AI tools. There are still major hurdles to cross. A significant portion of India's linguistic and cultural wealth isn't digitized yet. Skilled professionals capable of training and deploying these models efficiently are limited. While fine-tuning models is feasible with modest infrastructure, training from scratch on large datasets remains resource-intensive. But solutions are within reach. Government missions like IndiaAI are laying the groundwork by funding computing infrastructure, fostering innovation, and promoting foundational model research. Meanwhile, India's thriving tech and startup ecosystem is responding, leveraging domain-specific data and expertise to build nimble, culturally aligned AI solutions.
[3]
Localizing AI in the global south - Nature Machine Intelligence
Much attention is directed at the race in developing large language models (LLMs) such as GPT-4, Gemini, Claude and DeepSeek, which are competing to outperform each other in language generation and content creation. However, these models, trained mainly on English and Western culture-centric data, perform badly in non-Western contexts and languages. This is an unfortunate development, as artificial intelligence (AI) technology, with the right focus, has the potential to address pressing societal challenges. For example, in a recent Correspondence in this journal, Chakraborty et al. highlight the potential of generative AI to support public mental health care in India, particularly in suicide prevention. LLM-based systems could alleviate challenges posed by the country's vast population, cultural diversity, shortage of trained professionals, and the stigma that surrounds mental health. However, although current systems perform well on specialized tasks in English or basic tasks across many languages, they remain inadequate for specialized, culturally sensitive applications. Chakraborty et al. call for the development of cost-effective models tailored to the Indian context, trained on native languages and capable of operating with minimal user IT infrastructure.
[4]
Open Source LLMs Pave the Way for Responsible AI in India | AIM
Open-source large language models are emerging as powerful tools in India's quest for responsible AI. By allowing developers to fine-tune models on locally relevant datasets, organisations are building solutions that reflect the country's diversity. In a recent conversation with AIM, powered by Meta, Alpan Raval, chief AI/ML scientist at Wadhwani AI, and Saurabh Banerjee, CTO and co-founder of United We Care, explained how this approach is making AI both more ethical and more effective. "We are doing projects in healthcare, in agriculture, and in primary education that leverage LLMs, some of which are supported by Meta," said Raval. He further added that open source models offer a lot of freedom in terms of fine-tuning them, adding extra layers on top of them, and then retraining from scratch. Alpan shared another example where they have developed an oral reading fluency assessment using AI, currently deployed in public schools across Gujarat, India. This initiative leveraged AI4Bharat's open-source models. Raval stated that they collected student data from across the state and trained more advanced models by utilising both this student data and synthetic data generated through pseudo-labelling children's voices with base models. He emphasised that this achievement would not have been feasible without the open-sourcing of the base models. Adding on to the conversation, Banerjee said that if any company is going for a vertical use case, the best approach would be to pick an open-source model and do the post-training on that. "We should focus on post-training on the existing pre-trained models, and work with the use case," he said. Alpan said that open source, by itself, magically removes bias. "It depends on the methodology, the kind of data the model was trained on, and so on," he said. He explained that many open-source models are trained on datasets that differ significantly from the data observed in rural and underserved communities. "It's almost imperative for us in order to prevent bias that we have to fine-tune those data sets." Discussing hallucinations, Banerjee said that LLMs won't stop hallucinating, and we have to live with that. However, he believes it is sensible to put weights and biases, training methodology, in the public domain. He explained that this transparency allows for public scrutiny and helps identify inherent errors. "Put it in the public domain for public scrutiny. Let people decide what they are getting into, rather than a closed, boxed approach." He also offered a nuanced perspective on bias, suggesting that it's not always inherently negative. He provided examples of common AI limitations, such as generating an image of an analogue clock at 6:25 or a left-handed person writing. Banerjee explained that these limitations stem from training data being biased towards certain representations. To improve model accuracy, he said it may be necessary to introduce a different kind of bias, which he calls positive bias. He gave the example of healthcare, where accuracy matters more than being completely neutral. In such cases, adding a positive bias can help make the system more accurate, even if it means making a trade-off. For organisations in the social sector, the security of Personally Identifiable Information (PII) remains a top concern. Alpan said, "We have a rule -- more or less -- that we don't ingest PII into the organisation at all, except in certain cases where we have no choice." Regarding ethical guardrails and governance, Alpan said that there's no "one size fits all" solution. The ethical use of open-source models depends on their intended application. On the other hand, Banerjee said there is a need for an "inter-governmental initiative" for AI safety, similar to aviation safety, due to the decentralised nature of AI processing and training. He added that clear guidelines on "what is acceptable in a domain and what is not" are needed, particularly in human-machine interaction. Banerjee said that instead of looking at the West, India should be proud of the work that it is doing for responsible AI and lauded NASSCOM's developer guidelines. He stated that the developer guideline is highly actionable and serves as a resource for both individuals and organisations to comprehend their responsibilities when using, building, or fine-tuning foundation models. Alpan said that India's leadership in using AI for social good is supported by strong government collaboration. "India has been the number one country in the world to emphasise AI for social good -- and it's not just in letter but also in spirit," he added. He further said that open source AI is being used to solve pressing challenges in fields ranging from healthcare and agriculture to education and climate. "Nandan Nilekani has said many times that India is going to be the use case capital of the world, and that applies to AI as well," he concluded.
Share
Copy Link
Researchers and institutions in the Global South are challenging the AI dominance of US and China by developing localized, culturally relevant AI models to address unique regional needs and linguistic diversity.
As the artificial intelligence race between the United States and China intensifies, a transformative movement is emerging in the Global South. Researchers, engineers, and public institutions from Cape Town to Bangalore are developing homegrown AI systems that not only speak local languages but also incorporate regional insights and cultural depth 1.
While US-based companies like OpenAI, Google, and Meta dominate the AI narrative, and China's DeepSeek makes strides with efficient large language models (LLMs), countries in the Global South are rethinking the premise of generative AI. Their focus is on "scaling right" rather than just scaling up, creating models that work for local users within their unique social and economic contexts 1.
India, with its vast linguistic and cultural diversity, is at the forefront of this movement. Initiatives like AI4Bharat are building rich datasets in Indian languages, while government-backed platforms like AI Kosha support data efforts at scale. The country is developing lightweight models that can run on edge devices, crucial for applications in healthcare, education, and agriculture in remote regions 2.
Open-source LLMs are emerging as powerful tools in India's quest for responsible AI. They allow developers to fine-tune models on locally relevant datasets, reflecting the country's diversity. Organizations like Wadhwani AI are leveraging these models for projects in healthcare, agriculture, and primary education 4.
Current LLMs, trained mainly on English and Western culture-centric data, perform poorly in non-Western contexts and languages. This limitation is particularly evident in specialized, culturally sensitive applications such as mental health care in India. Researchers are calling for the development of cost-effective models tailored to specific cultural contexts 3.
Despite progress, significant hurdles remain. Much of India's linguistic and cultural wealth is not yet digitized, and there's a shortage of skilled professionals capable of efficiently training and deploying these models. However, government missions like IndiaAI are addressing these challenges by funding computing infrastructure, fostering innovation, and promoting foundational model research 2.
As the development of localized AI models progresses, ethical considerations remain paramount. Organizations are implementing strict rules to protect personally identifiable information (PII). There's a growing call for inter-governmental initiatives for AI safety, similar to aviation safety standards, due to the decentralized nature of AI processing and training 4.
India is positioning itself as a leader in using AI for social good, with strong government collaboration supporting these efforts. The country is becoming the "use case capital of the world" for AI, applying open-source AI to solve pressing challenges in various fields 4.
As the Global South continues to develop culturally relevant AI models, it not only challenges the dominance of AI superpowers but also paves the way for more inclusive and diverse technological advancements that could benefit a broader spectrum of the global population.
Summarized by
Navi
[4]
NASA and IBM have developed Surya, an open-source AI model that can predict solar flares and space weather, potentially improving the protection of Earth's critical infrastructure from solar storms.
5 Sources
Technology
2 hrs ago
5 Sources
Technology
2 hrs ago
Meta introduces an AI-driven voice translation feature for Facebook and Instagram creators, enabling automatic dubbing of content from English to Spanish and vice versa, with plans for future language expansions.
8 Sources
Technology
19 hrs ago
8 Sources
Technology
19 hrs ago
OpenAI CEO Sam Altman reveals plans for GPT-6, focusing on memory capabilities to create more personalized and adaptive AI interactions. The upcoming model aims to remember user preferences and conversations, potentially transforming the relationship between humans and AI.
2 Sources
Technology
19 hrs ago
2 Sources
Technology
19 hrs ago
Chinese AI companies DeepSeek and Baidu are making waves in the global AI landscape with their open-source models, challenging the dominance of Western tech giants and potentially reshaping the AI industry.
2 Sources
Technology
3 hrs ago
2 Sources
Technology
3 hrs ago
A comprehensive look at the emerging phenomenon of 'AI psychosis', its impact on mental health, and the growing concerns among experts and tech leaders about the psychological risks associated with AI chatbots.
3 Sources
Technology
3 hrs ago
3 Sources
Technology
3 hrs ago