Curated by THEOUTPOST
On Wed, 5 Feb, 4:02 PM UTC
2 Sources
[1]
What Is Openeuro LLM: Europe's Multilingual AI Explained
It faces data availability and computing power challenges but works to solve them through EuroHPC and better datasets. In the global AI race, the focus is often on the U.S. and China, especially with cases like Gemini, DeepSeek, and OpenAI, but Europe is not falling behind. OpenEuro LLM proves that the continent is serious about shaping the future of artificial intelligence on its own terms. Funded by the Digital Europe Programme , the project develops open-source, multilingual AI models that aim to align with European values and regulatory frameworks. The model covers all 24 official European languages and more, ensuring AI remains open, accessible, and culturally inclusive. By prioritizing transparency and linguistic diversity, OpenEuro LLM helps Europe to stay competitive while maintaining digital sovereignty. The European Commission recognized the project's strategic importance by awarding it the Strategic Technologies for Europe Platform (STEP) Seal -- the first Digital Europe Programme initiative to receive this mark of excellence. OpenEuro LLM secured €20.6 million in funding, bringing its total budget to €37.4 million. But challenges remain. Europe must balance rapid AI development with ethical and transparent standards. To provide deeper insight, CCN contacted Jan Hajič, OpenEuro LLM project coordinator at the Institute of Formal and Applied Linguistics, Charles University, Prague. He co-leads the project with Peter Sarlin, co-founder of Silo AI, a leading European AI lab specializing in enterprise and research-driven AI solutions. Hajič's insights help explain OpenEuro LLM's goals, real-world applications, and the challenges ahead, all covered here. What Is OpenEuro LLM OpenEuro LLM is a European initiative to develop high-performance, multimodal, open-source large language models (LLMs) for text, speech, and structured data. It serves industry, public services, and research, strengthening Europe's position in the AI race. The role of EuroLLM in research is particularly important, providing fully open models that allow academic institutions to experiment, analyze, and innovate without restrictions. According to Jan Hajič, "for academic and research groups, it is important to work with fully open models, which is exactly what OpenEuroLLM is aiming at." One of the main goals is to push innovation forward without losing control over data, security, and regulatory compliance. The OpenEuro LLM project connects universities, research centers, companies, and EuroHPC institutions from across Europe. It aims to support the region's digital sovereignty and ethical AI development. OpenEuro LLM's key features have the following characteristics: EuroLLM's models are designed to support 35 languages, including all official EU languages: Arabic, Chinese, Hindi, Japanese, Korean, Russian, and Turkish. This broad linguistic reach ensures accessibility across diverse communities. Handling morphologically rich languages like Czech presents unique challenges in natural language processing (NLP). Unlike English-based models, these languages have complex word structures and grammatical variations, making them more difficult to process. However, according to Jan Hajič, "while this has been the problem in the past, recent advances in technology, such as proper organization, are able to minimize the loss for such languages. In any case, these languages are very often low-resource languages, so there are still problems stemming from data sparsity." EuroLLM addresses these issues by leveraging structured data and multilingual training. This improves model performance across low-resource languages while ensuring more accurate and context-aware outputs. EuroLLM's Language Support and Model Advancements The EuroLLM-1.7B model is the base version, trained on 4 trillion tokens for general tasks. It provides fast processing and serves as the foundation for more specialized versions. The EuroLLM-1.7B-Instruct model builds upon this by incorporating EuroBlocks fine-tuning, enhancing its ability to handle machine translation and instruction-following tasks with improved efficiency. Meanwhile, EuroLLM-9B is the most advanced version so far, featuring 9 billion parameters and trained on 4 trillion tokens from web data, parallel translations, and high-quality datasets. It also has an instruction-tuned variant, EuroLLM-9B-Instruct , which fine-tunes the model on the EuroBlocks dataset to enhance instruction-following and machine translation capabilities. It is designed for more complex language processing tasks, making it suitable for research, industry, and public services. All models operate under the Apache 2.0 license, ensuring free and open access to developers and researchers. Below is a table comparing the basic models: How Do EuroLLM models compare to Gemini, DeepSeek, and ChatGPT? EuroLLM models provide open-source AI solutions for diverse applications, including EuroLLM-1.7B and EuroLLM-9B. They offer transparent and adaptable research and industry and public service alternatives. GPT-4 by OpenAI delivers advanced language processing, generating highly coherent and contextually accurate text. Google's Gemini specializes in conversational AI, leveraging Google's vast search infrastructure to enhance responses. DeepSeek aims for efficiency and strong performance despite having fewer resources. However, recent reports indicated an accuracy of 17% "in delivering news and information," falling behind ChatGPT and Gemini. Direct benchmark comparisons remain limited, but EuroLLM's open-source model prioritizes multilingual accessibility, while GPT-4 and Gemini focus on proprietary advancements. According to Jan Hajič, EuroLLM aims to provide comparable quality across all supported languages, ensuring equal performance across official and future EU languages. "Multilinguality, language equality, and language transparency are really important for Europe to operate as a truly Single Market with no language barriers." This focus on linguistic inclusivity sets EuroLLM apart from general-purpose LLMs like GPT-4 and LLaMA, Meta's open-weight AI model designed for research and broad applications, which often prioritize higher-resource languages over low-resource European languages. Challenges and Considerations Developing a high-performing, multilingual AI model like OpenEuro LLM presents several challenges, particularly in competition, data availability, and ethical concerns. Hajič highlights two key obstacles in relation to this. "One of the main goals of the project is to overcome both: to get enough compute from the EuroHPC centers, and to get enough data especially for the low-resource languages to get equal or close to equal quality and cultural adequacy for those languages." Addressing data limitations will be essential for achieving linguistic fairness. He also points to persistent challenges in NLP, stating, "Most of NLP (even though not all) is implemented today through LLMs, so the well-known problems apply: hallucinations, inaccuracies, cultural unawareness, inconsistencies over time, etc." OpenEuro LLM must work toward reducing these issues to ensure reliable and unbiased AI outputs. The Future of OpenEuro LLM The OpenEuroLLM project will develop the next generation of large language models designed to set new standards in multilingual AI, transparency, and regulatory alignment. The project will create advanced, open-source foundation models that surpass existing capabilities by leveraging cutting-edge research, high-quality datasets, and Europe's strongest AI expertise. These models will drive innovation across industry, academia, and public services, ensuring Europe remains at the forefront of AI development while upholding ethical, fair, and privacy-focused AI practices. Conclusion OpenEuro LLM puts Europe at the forefront of AI development, offering an open-source, multilingual alternative to proprietary models. Supporting finance, research, industry, and public services ensures AI remains accessible and aligned with European regulations. The project faces challenges in data availability, computing power, and ethical AI, but its focus on transparency and linguistic diversity gives it a strong foundation. With continued research, collaboration, and investment, OpenEuro LLM could help Europe build AI that serves its people, strengthens digital sovereignty, and competes globally.
[2]
OpenEuroLLM Project to Develop Open-Source Multilingual AI Models
They will be available for commercial, industrial and public services OpenEuroLLM Project, a European alliance tasked with developing open-source artificial intelligence (AI) models, was announced on Monday. The project is backed by the European Commission, which has also awarded it the Strategic Technologies for Europe Platform (STEP) Seal, recognising it as a critical technology project. The group aims to develop a family of multilingual large language models (LLMs) that can be proficient in all languages under the European Union (EU). These models will also be developed transparently and adhere to the regulatory framework of the bloc. In a post on X (formerly known as Twitter), the official handle of the European Commission announced that the OpenEuroLLM Project has been awarded the first STEP Seal of the year. Notably, STEP Seal is a quality label awarded to high-quality projects meeting the minimum quality requirements under the Digital Europe Programme. The selected projects are provided visibility on the STEP platform and are promoted by the EU to attract investors easily. In a press release, the OpenEuroLLM Project stated that it started the work to develop AI models on February 1. The project comprises a consortium of 20 European research institutions, companies, and EuroHPC centres coordinated by Jan Hajič from Charles University in Czechia and is co-led by Peter Sarlin, the Co-Founder and CVP at AMD Silo AI. The OpenEuroLLM Project plans to build a family of high-performance multilingual LLMs that will be released to the open-source community for commercial, industrial, and public services. The project confirmed that it will adhere to the strict regulatory policies of the EU and will be transparent about data procurement. Once the AI models have been made available, the project will also publicly release the documentation, training and testing code, and evaluation metrics of the AI models. This will be done to ensure that the LLMs can be fine-tuned and instruction-tuned for specific industry and public sector needs. "The transparent and compliant open-source models will democratise access to high-quality AI technologies and strengthen the ability of European companies to compete on a global market and public organizations to produce impactful public services," added the press release. Notably, the European Commission has already funded the OpenEuroLLM project under the Digital Europe Programme, and it is expected to gain more investors in the coming weeks. So far, there is no roadmap on when these models will be released, or what would be the focus area of these models.
Share
Share
Copy Link
The European Commission backs OpenEuroLLM, an open-source project developing multilingual AI models to compete with global leaders while adhering to EU values and regulations.
In a bold move to assert its position in the global AI race, Europe has launched the OpenEuroLLM project, a significant initiative aimed at developing open-source, multilingual AI models. Funded by the Digital Europe Programme, this project has received the prestigious Strategic Technologies for Europe Platform (STEP) Seal from the European Commission, marking it as a critical technology project 12.
OpenEuroLLM is designed to create high-performance, multimodal large language models (LLMs) for text, speech, and structured data. The project's primary goals include:
Jan Hajič, the project coordinator from Charles University in Prague, emphasizes the importance of fully open models for academic and research purposes 1.
The European Commission has awarded €20.5 million to OpenEuroLLM, bringing its total budget to €37.5 million 1. The project is led by a consortium of 20 European research institutions, companies, and EuroHPC centers, co-led by Peter Sarlin from Silo AI 2.
OpenEuroLLM is developing a family of models with varying capabilities:
All models will operate under the Apache 2.0 license, ensuring free and open access 1.
The project faces unique challenges in handling morphologically rich languages. Hajič notes, "Recent advances in technology, such as proper organization, are able to minimize the loss for such languages" 1. OpenEuroLLM addresses these issues through structured data and multilingual training.
While direct benchmark comparisons are limited, OpenEuroLLM aims to provide comparable quality across all supported languages, differentiating itself from proprietary models like GPT-4 and Google's Gemini. The project's open-source nature and focus on multilingual accessibility set it apart in the competitive AI landscape 1.
OpenEuroLLM is committed to transparency and adherence to EU regulations. The project will publicly release documentation, training and testing code, and evaluation metrics, allowing for fine-tuning and instruction-tuning for specific industry and public sector needs 2.
As the project progresses, it is expected to attract more investors and potentially reshape the AI landscape in Europe. While no specific roadmap for model release has been announced, the initiative represents a significant step towards Europe's digital sovereignty and ethical AI development 2.
The OpenEuroLLM project stands as a testament to Europe's commitment to shaping the future of AI on its own terms, balancing rapid development with ethical standards and linguistic diversity.
Reference
[2]
LatticeFlow, in collaboration with ETH Zurich and INSAIT, has developed the first comprehensive technical interpretation of the EU AI Act for evaluating Large Language Models (LLMs), revealing compliance gaps in popular AI models.
12 Sources
12 Sources
The Allen Institute for AI (Ai2) has unveiled OLMo 2, a family of open-source language models that compete with leading AI models while adhering to open-source principles, potentially reshaping the landscape of accessible AI technology.
3 Sources
3 Sources
Recent developments suggest open-source AI models are rapidly catching up to closed models, while traditional scaling approaches for large language models may be reaching their limits. This shift is prompting AI companies to explore new strategies for advancing artificial intelligence.
5 Sources
5 Sources
Meta has released the largest open-source AI model to date, marking a significant milestone in artificial intelligence. This development could democratize AI research and accelerate innovation in the field.
2 Sources
2 Sources
NVIDIA has released an open-source large language model with 72 billion parameters, positioning it as a potential competitor to OpenAI's GPT-4. This move marks a significant shift in NVIDIA's AI strategy and could reshape the AI landscape.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved