Grab Builds Custom Vision AI Model After Finding Major LLMs Struggle with Southeast Asian Languages

Grab's AI Language Challenge

Singapore-based superapp giant Grab has revealed that major artificial intelligence models struggle significantly with Southeast Asian languages, prompting the company to develop its own custom vision language model. The revelation highlights broader challenges facing AI accessibility for non-English speaking populations worldwide 1

Grab, which dominates ride-sharing, food delivery, and financial services across eight Southeast Asian countries, requires accurate document processing for compliance tasks including know-your-customer checks. The company processes ID cards, driver's licenses, and registration certificates written in scripts that don't use Latin alphabets 1

Source: The Register

Proprietary Models Fall Short

According to Grab's engineering team, "powerful proprietary Large Language Models (LLMs) were an option, they often fell short in understanding SEA languages, produced errors, hallucinations, and had high latency." Open-source vision LLMs proved "more efficient but not accurate enough for production" 1

The company's experience reflects a broader industry challenge. Research has consistently shown that AI models developed by both Western and Chinese companies struggle with minority languages, even within their own linguistic regions. A recent study found that Chinese AI models perform as poorly on Chinese minority languages as Western models do 2

Building a Custom Solution

Faced with these limitations, Grab decided to build its own Vision LLM. The team started by evaluating existing models and selected Alibaba Cloud's Qwen2-VL 2B as their foundation. They extracted Southeast Asian language content from Common Crawl and created "an in-house synthetic data pipeline to generate text images by rendering SEA text contents in various fonts, backgrounds and augmentations" 1

Initial experiments using Low-Rank Adaptation (LoRA) fine-tuning showed promise for Latin script documents, particularly achieving high accuracy for Indonesian documents. However, Thai and Vietnamese languages remained challenging, as did documents with unstructured layouts and dense text 1

Full-Parameter Training Breakthrough

Grab's team discovered that existing vision LLMs "lack visual text in SEA languages during vision encoder and joint training." This insight led them to perform full-parameter fine-tuning, first training vision components using synthetic OCR datasets for Bahasa Indonesia, Thai, Vietnamese, and English 1

The intensive training process "pushed the limits of GPUs," ultimately leading Grab to build a lightweight Vision LLM with approximately 1 billion parameters from scratch. The resulting model outperformed existing OCR tools, Qwen2, ChatGPT, and Google's Gemini on their specific tasks 1

Industry-Wide Language Accessibility Crisis

Grab's experience illuminates a significant challenge facing the AI industry. The lack of adequate datasets for training models on diverse languages creates barriers to accessibility and functionality. Major AI companies are investing heavily to address this gap - Google partnered with IIT Bombay for Indic language speech models, Meta reportedly pays $55 per hour for Hindi language training contractors, and OpenAI announced a $500,000 research collaboration with IIT Madras 2

Source: Gadgets 360

However, while collecting data for prominent Asian languages remains expensive but feasible, minority languages face even greater challenges. These languages may never achieve adequate representation in major AI models, creating persistent accessibility limitations 2

Future Development Plans

Grab plans to expand its AI capabilities, developing "Chain of Thought-based OCR and Key Information Extraction models to strengthen generalisation capabilities." The company also intends to extend its document processing technology to Myanmar, Cambodia, and other markets 1

Grab Builds Custom Vision AI Model After Finding Major LLMs Struggle with Southeast Asian Languages

Grab's AI Language Challenge

Proprietary Models Fall Short

Building a Custom Solution

Full-Parameter Training Breakthrough

Industry-Wide Language Accessibility Crisis

Future Development Plans

References

LLMs are lousy at reading Asian languages, finds Grab

AI Models are Bad at Understanding Asian Languages, Says Singapore's Grab

Related Stories

Global South Leads in Developing Culturally Relevant AI Models

OpenAI's GPT-3.5 Turbo Update and India's AI Landscape: Balancing Innovation and Challenges

India's Balancing Act: Navigating Open and Closed Source GenAI Models

Recent Highlights

OpenAI secures $110 billion funding round from Amazon, Nvidia, and SoftBank at $730B valuation

Trump orders federal agencies to ban Anthropic after Pentagon dispute over AI surveillance

Google releases Nano Banana 2 AI image model with Pro quality at Flash speed

Recent Highlights

Today's Top Stories

Qualcomm's Snapdragon Wear Elite chip targets AI watches, pendants, and smart glasses

Claude Outage Affects Thousands as Unprecedented Demand Follows Pentagon Standoff

US Treasury terminates Anthropic AI products after Trump orders government-wide ban

Honor unveils Robot Phone and moonwalking humanoid robot as embodied AI takes physical form