2 Sources
[1]
Swiss boffins tease 'fully open' LLM trained on Alps super
Source code and weights coming later this summer with an Apache 2.0 bow on top Supercomputers are usually associated with scientific exploration, research, and development, and ensuring our nuclear stockpiles actually work. Typically, these workloads rely on highly precise calculations, with 64-bit floating point mathematics being the gold standard. But as support for lower-precision datatypes continues to find its way into the chips used to build these systems, supercomputers are increasingly being used to train AI models. This is exactly what the boffins at ETH Zürich and the Swiss Federal Technology Institute in Lausanne, Switzerland, have done. At the International Open-Source LLM Builders Summit in Geneva this week, researchers teased a pair of open large language models (LLMs) trained using the nation's Alps supercomputer. As supercomputers go, Alps is better suited than most for running AI workloads alongside more traditional high-performance computing (HPC) applications. The system is currently the third-most powerful supercomputer in Europe, and eighth worldwide in the bi-annual Top500 ranking. It's also among the first large-scale supercomputers based around Nvidia's Grace-Hopper Superchips. Each of these GH200 Superchips features a custom Grace CPU powered by 72 Arm Neoverse V2 cores, connected via a 900GB/s NVLink-C2C fabric to a 96GB H100 GPU. Those GPUs account for the lion's share of Alps' total compute capacity, with up to 34 teraFLOPS of FP64 vector performance. However, if you're willing to turn down the resolution a bit to, say, FP8, the performance jumps to nearly four petaFLOPS of sparse compute. Built by HPE's Cray division, Alps features a little over 10,000 of these chips across 2688 compute blades, which have been stitched together using the OEM's custom Slingshot-11 interconnects. Combined, the system boasts 42 exaFLOPS of sparse FP8 performance or roughly half when using the more precise BF16 data type. While Nvidia's H100 accelerators have been widely employed for AI training for years now, the overwhelming majority of these Hopper clusters have employed Nvidia's 8-GPU HGX form factor rather than its Superchips. With that said, Alps isn't the only supercomputer to use them. The Jupiter supercomputer in Germany and the UK's Isambard AI, both of which came online this spring, also use Nvidia's GH200 Superchips. "Training this model is only possible because of our strategic investment in 'Alps', a supercomputer purpose-built for AI," Thomas Schulthess, Director of Swiss National Supercomputing Centre (CSCS) and professor at ETH Zurich, said in a blog post. The researchers have yet to name the models, but we do know they'll be offered in both eight-billion and 70-billion parameter sizes, and have been trained on 15 trillion tokens of data. They're also expected to be fluent in more than 1,000 languages, with roughly 40 percent of the training data being in languages other than English. More importantly, the researchers say, the models will be fully open. Instead of releasing simply the models and weights for the public to scrutinize and tweak, as we've seen with models from Microsoft, Google, Meta, and others, researchers at ETH Zürich also intend to release the source code used to train the model and claim that the "training data will be transparent and reproducible." "By embracing full openness -- unlike commercial models that are developed behind closed doors -- we hope that our approach will drive innovation in Switzerland, across Europe, and through multinational collaborations," EPFL professor Martin Jaggi said in the post. According to Imanol Schlag, a research scientist at the ETH AI Center, this transparency is essential to building high-trust applications and advancing research in AI risks and opportunities." What's more, researchers contend that for most tasks and general knowledge questions, circumventing web crawling protections wasn't necessary, and complying with these opt-outs showed no sign of performance degradation. The LLMs are expected to make their way into public hands later this summer under a highly permissive Apache 2.0 license. ®
[2]
New open-source language model offers multilingual support and public transparency
This summer, EPFL and ETH Zurich will release a large language model (LLM) developed on public infrastructure. Trained on the Alps supercomputer at the Swiss National Supercomputing Center (CSCS), the new LLM marks a milestone in open-source AI and multilingual excellence. Earlier this week in Geneva, about 50 leading global initiatives and organizations dedicated to open-source LLMs and trustworthy AI convened at the International Open-Source LLM Builders Summit. Hosted by the AI centers of EPFL and ETH Zurich, the event marked a significant step in building a vibrant and collaborative international ecosystem for open foundation models. Open LLMs are increasingly viewed as credible alternatives to commercial systems, most of which are developed behind closed doors in the United States or China. Participants of the summit previewed the forthcoming release of a fully open, publicly developed LLM -- co-created by researchers at EPFL, ETH Zurich and other Swiss universities in close collaboration with engineers at CSCS. Currently in final testing, the model will be downloadable under an open license. The model focuses on transparency, multilingual performance, and broad accessibility. The model will be fully open: source code and weights will be publicly available, and the training data will be transparent and reproducible, supporting adoption across science, government, education, and the private sector. This approach is designed to foster both innovation and accountability. "Fully open models enable high-trust applications and are necessary for advancing research about the risks and opportunities of AI. Transparent processes also enable regulatory compliance," says Imanol Schlag, research scientist at the ETH AI Center, who is leading the effort alongside EPFL AI Center faculty members and professors Antoine Bosselut and Martin Jaggi. Multilingual by design A defining characteristic of the LLM is its fluency in more than 1,000 languages. "We have emphasized making the models massively multilingual from the start," says Bosselut. Training of the base model was done on a large text dataset in more than 1,500 languages -- approximately 60% English and 40% non-English languages -- as well as code and mathematics data. Given the representation of content from all languages and cultures, the resulting model maintains the highest global applicability. The model will be released in two sizes -- 8 billion and 70 billion parameters -- meeting a broad range of users' needs. The 70B version will rank among the most powerful fully open models worldwide. The number of parameters reflects a model's capacity to learn and generate complex responses. High reliability is achieved through training on more than 15 trillion high-quality training tokens (units representing a word or part of the word), enabling robust language understanding and versatile use cases. The LLM is being developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. In a recent study posted to the arXiv preprint server, the project leaders demonstrated that for most everyday tasks and general knowledge acquisition, respecting web-crawling opt-outs during data acquisition produces virtually no performance degradation. Supercomputer as an enabler of sovereign AI The model is trained on the Alps supercomputer at CSCS in Lugano, one of the world's most advanced AI platforms, equipped with more than 10,000 NVIDIA Grace Hopper Superchips. The system's scale and architecture made it possible to train the model efficiently using 100% carbon-neutral electricity. The successful realization of Alps was significantly facilitated by a long-standing collaboration spanning over 15 years with NVIDIA and HPE/Cray. This partnership has been pivotal in shaping the capabilities of Alps, ensuring it meets the demanding requirements of large-scale AI workloads, including the pre-training of complex LLMs. "Training this model is only possible because of our strategic investment in Alps, a supercomputer purpose-built for AI," says Thomas Schulthess, Director of CSCS and professor at ETH Zurich. "Our enduring collaboration with NVIDIA and HPE exemplifies how joint efforts between public research institutions and industry leaders can drive sovereign infrastructure, fostering open innovation -- not just for Switzerland, but for science and society worldwide." Public access and global reuse In late summer, the LLM will be released under the Apache 2.0 License. Accompanying documentation will detail the model architecture, training methods, and usage guidelines to enable transparent reuse and further development. "As scientists from public institutions, we aim to advance open models and enable organizations to build on them for their own applications," says Bosselut. "By embracing full openness -- unlike commercial models that are developed behind closed doors -- we hope that our approach will drive innovation in Switzerland, across Europe, and through multinational collaborations. Furthermore, it is a key factor in attracting and nurturing top talent," says EPFL professor Jaggi.
Share
Copy Link
Swiss researchers from ETH Zürich and EPFL are set to release a fully open, multilingual large language model trained on the Alps supercomputer, promising transparency and broad accessibility.
In a significant development for the AI community, researchers from ETH Zürich and the Swiss Federal Technology Institute in Lausanne (EPFL) have unveiled plans to release a fully open large language model (LLM) trained on Switzerland's Alps supercomputer. The announcement was made at the International Open-Source LLM Builders Summit in Geneva, marking a pivotal moment in the pursuit of transparent and accessible AI technology 12.
Source: Tech Xplore
The forthcoming LLM will be available in two sizes: 8 billion and 70 billion parameters. Trained on an impressive 15 trillion tokens of data, the model is designed to be fluent in over 1,000 languages, with approximately 40% of the training data being in languages other than English 1. This multilingual approach aims to maintain high global applicability and serve a diverse range of users and applications 2.
What sets this model apart is its commitment to full transparency. Unlike many commercial models developed behind closed doors, the Swiss researchers intend to release not only the model and weights but also the source code used for training. Additionally, they promise that the training data will be transparent and reproducible 1. This level of openness is expected to foster innovation and accountability in AI development 2.
Source: The Register
The model's training was made possible by the Alps supercomputer, currently ranked as the third most powerful in Europe and eighth worldwide. Alps features over 10,000 Nvidia Grace-Hopper Superchips, each combining a 72-core Arm-based CPU with a 96GB H100 GPU. This architecture allows for up to 42 exaFLOPS of sparse FP8 performance, making it particularly well-suited for AI workloads 1.
The researchers have emphasized their commitment to ethical AI development. They claim that for most tasks and general knowledge questions, circumventing web crawling protections wasn't necessary, and complying with these opt-outs showed no significant performance degradation 1. The model is being developed with consideration for Swiss data protection laws, copyright laws, and the transparency obligations under the EU AI Act 2.
The LLM is scheduled for release later this summer under the Apache 2.0 license, a highly permissive open-source license. This move is expected to support adoption across various sectors, including science, government, education, and private industry 12. Accompanying documentation will provide details on the model architecture, training methods, and usage guidelines to facilitate transparent reuse and further development 2.
This initiative represents a significant step towards democratizing AI technology and fostering a collaborative international ecosystem for open foundation models. By making the entire process transparent and accessible, the researchers aim to drive innovation not only in Switzerland but across Europe and through multinational collaborations 2. This approach could potentially shift the landscape of AI development, currently dominated by closed-source models from major tech companies in the United States and China.
Summarized by
Navi
[1]
Goldman Sachs is testing Devin, an AI software engineer developed by Cognition, potentially deploying thousands of instances to augment its human workforce. This move signals a significant shift towards AI adoption in the financial sector.
5 Sources
Technology
10 hrs ago
5 Sources
Technology
10 hrs ago
RealSense, Intel's depth-sensing camera technology division, has spun out as an independent company, securing $50 million in Series A funding to scale its 3D perception technology for robotics, AI, and computer vision applications.
13 Sources
Technology
10 hrs ago
13 Sources
Technology
10 hrs ago
AI adoption is rapidly increasing across businesses and consumers, with tech giants already looking beyond AGI to superintelligence, suggesting the AI revolution may be further along than publicly known.
2 Sources
Technology
18 hrs ago
2 Sources
Technology
18 hrs ago
Elon Musk's artificial intelligence company xAI is preparing for a new funding round that could value the company at up to $200 billion, marking a significant increase from its previous valuation and positioning it as one of the world's most valuable private companies.
3 Sources
Business and Economy
10 hrs ago
3 Sources
Business and Economy
10 hrs ago
The United Nations' International Telecommunication Union urges companies to implement advanced tools for detecting and eliminating AI-generated misinformation and deepfakes to counter risks of election interference and financial fraud.
2 Sources
Technology
10 hrs ago
2 Sources
Technology
10 hrs ago