2 Sources
[1]
High-performance computing, with much less code
Caption: A new programming language called "Exo 2" could enable high-performance coding that can compete with state-of-the-art libraries with a few hundred lines of code, instead of tens or hundreds of thousands. Many companies invest heavily in hiring talent to create the high-performance library code that underpins modern artificial intelligence systems. NVIDIA, for instance, developed some of the most advanced high-performance computing (HPC) libraries, creating a competitive moat that has proven difficult for others to breach. But what if a couple of students, within a few months, could compete with state-of-the-art HPC libraries with a few hundred lines of code, instead of tens or hundreds of thousands? That's what researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have shown with a new programming language called Exo 2. Exo 2 belongs to a new category of programming languages that MIT Professor Jonathan Ragan-Kelley calls "user-schedulable languages" (USLs). Instead of hoping that an opaque compiler will auto-generate the fastest possible code, USLs put programmers in the driver's seat, allowing them to write "schedules" that explicitly control how the compiler generates code. This enables performance engineers to transform simple programs that specify what they want to compute into complex programs that do the same thing as the original specification, but much, much faster. One of the limitations of existing USLs (like the original Exo) is their relatively fixed set of scheduling operations, which makes it difficult to reuse scheduling code across different "kernels" (the individual components in a high-performance library). In contrast, Exo 2 enables users to define new scheduling operations externally to the compiler, facilitating the creation of reusable scheduling libraries. Lead author Yuka Ikarashi, an MIT PhD student in electrical engineering and computer science and CSAIL affiliate, says that Exo 2 can reduce total schedule code by a factor of 100 and deliver performance competitive with state-of-the-art implementations on multiple different platforms, including Basic Linear Algebra Subprograms (BLAS) that power many machine learning applications. This makes it an attractive option for engineers in HPC focused on optimizing kernels across different operations, data types, and target architectures. "It's a bottom-up approach to automation, rather than doing an ML/AI search over high-performance code," says Ikarashi. "What that means is that performance engineers and hardware implementers can write their own scheduling library, which is a set of optimization techniques to apply on their hardware to reach the peak performance." One major advantage of Exo 2 is that it reduces the amount of coding effort needed at any one time by reusing the scheduling code across applications and hardware targets. The researchers implemented a scheduling library with roughly 2,000 lines of code in Exo 2, encapsulating reusable optimizations that are linear-algebra specific and target-specific (AVX512, AVX2, Neon, and Gemmini hardware accelerators). This library consolidates scheduling efforts across more than 80 high-performance kernels with up to a dozen lines of code each, delivering performance comparable to, or better than, MKL, OpenBLAS, BLIS, and Halide. Exo 2 includes a novel mechanism called "Cursors" that provides what they call a "stable reference" for pointing at the object code throughout the scheduling process. Ikarashi says that a stable reference is essential for users to encapsulate schedules within a library function, as it renders the scheduling code independent of object-code transformations. "We believe that USLs should be designed to be user-extensible, rather than having a fixed set of operations," says Ikarashi. "In this way, a language can grow to support large projects through the implementation of libraries that accommodate diverse optimization requirements and application domains." Exo 2's design allows performance engineers to focus on high-level optimization strategies while ensuring that the underlying object code remains functionally equivalent through the use of safe primitives. In the future, the team hopes to expand Exo 2's support for different types of hardware accelerators, like GPUs. Several ongoing projects aim to improve the compiler analysis itself, in terms of correctness, compilation time, and expressivity. Ikarashi and Ragan-Kelley co-authored the paper with graduate students Kevin Qian and Samir Droubi, Alex Reinking of Adobe, and former CSAIL postdoc Gilbert Bernstein, now a professor at the University of Washington. This research was funded, in part, by the U.S. Defense Advanced Research Projects Agency (DARPA) and the U.S. National Science Foundation, while the first author was also supported by Masason, Funai, and Quad Fellowships.
[2]
Exo 2: A new programming language for high-performance computing, with much less code
by Adam Conner-Simons, Massachusetts Institute of Technology Many companies invest heavily in hiring talent to create the high-performance library code that underpins modern artificial intelligence systems. NVIDIA, for instance, developed some of the most advanced high-performance computing (HPC) libraries, creating a competitive moat that has proven difficult for others to breach. But what if a couple of students, within a few months, could compete with state-of-the-art HPC libraries with a few hundred lines of code, instead of tens or hundreds of thousands? That's what researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have shown with a new programming language called Exo 2. Exo 2 belongs to a new category of programming languages that MIT Professor Jonathan Ragan-Kelley calls "user-schedulable languages" (USLs). Instead of hoping that an opaque compiler will auto-generate the fastest possible code, USLs put programmers in the driver's seat, allowing them to write "schedules" that explicitly control how the compiler generates code. This enables performance engineers to transform simple programs that specify what they want to compute into complex programs that do the same thing as the original specification, but much, much faster. One of the limitations of existing USLs (like the original Exo) is their relatively fixed set of scheduling operations, which makes it difficult to reuse scheduling code across different "kernels" (the individual components in a high-performance library). In contrast, Exo 2 enables users to define new scheduling operations externally to the compiler, facilitating the creation of reusable scheduling libraries. Lead author Yuka Ikarashi, an MIT Ph.D. student in electrical engineering and computer science and CSAIL affiliate, says that Exo 2 can reduce total schedule code by a factor of 100 and deliver performance competitive with state-of-the-art implementations on multiple different platforms, including Basic Linear Algebra Subprograms (BLAS) that power many machine learning applications. This makes it an attractive option for engineers in HPC focused on optimizing kernels across different operations, data types, and target architectures. "It's a bottom-up approach to automation, rather than doing an ML/AI search over high-performance code," says Ikarashi. "What that means is that performance engineers and hardware implementers can write their own scheduling library, which is a set of optimization techniques to apply on their hardware to reach the peak performance." One major advantage of Exo 2 is that it reduces the amount of coding effort needed at any one time by reusing the scheduling code across applications and hardware targets. The researchers implemented a scheduling library with roughly 2,000 lines of code in Exo 2, encapsulating reusable optimizations that are linear-algebra specific and target-specific (AVX512, AVX2, Neon, and Gemmini hardware accelerators). This library consolidates scheduling efforts across more than 80 high-performance kernels with up to a dozen lines of code each, delivering performance comparable to, or better than, MKL, OpenBLAS, BLIS, and Halide. Exo 2 includes a novel mechanism called "Cursors" that provides what they call a "stable reference" for pointing at the object code throughout the scheduling process. Ikarashi says that a stable reference is essential for users to encapsulate schedules within a library function, as it renders the scheduling code independent of object-code transformations. "We believe that USLs should be designed to be user-extensible, rather than having a fixed set of operations," says Ikarashi. "In this way, a language can grow to support large projects through the implementation of libraries that accommodate diverse optimization requirements and application domains." Exo 2's design allows performance engineers to focus on high-level optimization strategies while ensuring that the underlying object code remains functionally equivalent through the use of safe primitives. In the future, the team hopes to expand Exo 2's support for different types of hardware accelerators, like GPUs. Several ongoing projects aim to improve the compiler analysis itself, in terms of correctness, compilation time, and expressivity. The study is published on the arXiv preprint server.
Share
Copy Link
MIT's CSAIL team introduces Exo 2, a new programming language that enables high-performance computing with significantly less code, potentially revolutionizing AI and machine learning development.
Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a groundbreaking programming language called Exo 2, which promises to revolutionize high-performance computing (HPC) and potentially disrupt the competitive landscape in artificial intelligence development 1.
Exo 2 belongs to a new category of programming languages termed "user-schedulable languages" (USLs) by MIT Professor Jonathan Ragan-Kelley. Unlike traditional compilers that automatically generate code, USLs empower programmers to write "schedules" that explicitly control the compiler's code generation process 2.
Lead author Yuka Ikarashi, an MIT Ph.D. student, highlights that Exo 2 can reduce total schedule code by a factor of 100 while delivering performance competitive with state-of-the-art implementations. This efficiency is achieved across multiple platforms, including Basic Linear Algebra Subprograms (BLAS) that power many machine learning applications 1.
One of Exo 2's key innovations is its ability to enable users to define new scheduling operations externally to the compiler. This feature facilitates the creation of reusable scheduling libraries, addressing a significant limitation of existing USLs 2.
Exo 2 introduces a novel mechanism called "Cursors," which provides a stable reference for pointing at the object code throughout the scheduling process. This innovation is crucial for encapsulating schedules within library functions, making the scheduling code independent of object-code transformations 1.
The researchers implemented a scheduling library with approximately 2,000 lines of code in Exo 2, encapsulating reusable optimizations for various hardware targets. This library consolidates scheduling efforts across more than 80 high-performance kernels, each requiring only up to a dozen lines of code 2.
The development of Exo 2 could potentially disrupt the competitive landscape in AI development. Currently, companies like NVIDIA invest heavily in creating advanced HPC libraries, which have been difficult for others to match. Exo 2's ability to compete with state-of-the-art HPC libraries using significantly less code could level the playing field 1.
The CSAIL team aims to expand Exo 2's support for different types of hardware accelerators, including GPUs. Ongoing projects are focused on improving compiler analysis in terms of correctness, compilation time, and expressivity 2.
This research, funded in part by DARPA and the National Science Foundation, represents a significant step forward in high-performance computing and could have far-reaching implications for the development of AI systems and other computationally intensive applications.
Summarized by
Navi
[1]
Elon Musk's AI company xAI has open-sourced the Grok 2.5 model on Hugging Face, making it available for developers to access and explore. Musk also announced plans to open-source Grok 3 in about six months, signaling a commitment to transparency and innovation in AI development.
7 Sources
Technology
19 hrs ago
7 Sources
Technology
19 hrs ago
Nvidia announces plans to implement silicon photonics and co-packaged optics for AI GPU communication by 2026, promising higher transfer rates and lower power consumption in next-gen AI data centers.
2 Sources
Technology
3 hrs ago
2 Sources
Technology
3 hrs ago
Netflix has released new guidelines for using generative AI in content production, outlining low-risk and high-risk scenarios and emphasizing responsible use while addressing industry concerns.
2 Sources
Technology
3 hrs ago
2 Sources
Technology
3 hrs ago
Scientists at KIST have developed a new device principle that utilizes "spin loss" as a power source for magnetic control, potentially revolutionizing the field of spintronics and paving the way for ultra-low-power AI chips.
2 Sources
Technology
3 hrs ago
2 Sources
Technology
3 hrs ago
Cloudflare introduces new features for its Cloudflare One zero-trust platform, aimed at helping organizations securely adopt, build, and deploy generative AI applications while maintaining security and privacy standards.
2 Sources
Technology
3 hrs ago
2 Sources
Technology
3 hrs ago