2 Sources
[1]
Atlas Cloud optimizes AI inference service to boost GPU throughput - SiliconANGLE
Atlas Cloud optimizes AI inference service to boost GPU throughput Cloud infrastructure startup Atlas Cloud today launched a highly optimized artificial intelligence inference service that it says dramatically reduces the computational requirements of even the most demanding AI workloads. The new service, called Atlas Inference, is designed to provide companies with a more cost-effective and simpler environment in which they can deploy and run their large language models. Atlas Cloud is the creator of a cloud-based infrastructure platform that's geared especially for AI workloads. It provides low-cost and on-demand access to clusters of up to 5,000 graphics processing units, for both AI training and inference workloads. Customers can choose from a selection of GPU types, and the platform is serverless too, so they don't have to worry about configuring their clusters or carrying out maintenance work. The new Atlas Inference service is based on the open-source SGLang inference engine. The company says it maximizes GPU efficiency by processing more tokens with fewer computational resources. It claims it can deliver 2.1 times greater throughput for AI workloads compared wit equivalent AI inference services offered by the likes of Amazon Web Services Inc. and Nvidia Corp. When running heavyweight, tensor-parallel AI systems, Atlas Inference can deliver equal or superior throughput while using 50% fewer GPUs. It features real-time load balancing capabilities that allow it to evenly distribute tokens and reduce latency spikes on overloaded nodes, ensuring stable performance under any conditions. In tests, it claims it was able to maintain sub-five-second first-token latency and 100-millisecond inter-token latency across more than 10,000 concurrent sessions. The company adds that an Atlas Inference 12-node cluster outperformed DeepSeek Ltd.'s reference implementation for the DeepSeek V3 model while using only two-thirds of the server's computational capacity. At the same time, operational expenses were reduced by 80%. Atlas Cloud says this was made possible by four separate innovations. They include a "prefill/decode disaggregation" technique that separates compute-intensive operations from memory-bound processes to boost efficiency. There's also "DeepExpert Parallelism," which uses load balancing to increase GPU utilization across the entire cluster. Other innovations include Atlas Cloud's proprietary two-batch overlap technology, which boosts throughput by enabling larger token batches, and the use of "DisposableTensor memory models," which help prevent system crashes. Another advantage of Atlas Inference is its linear scaling behavior across nodes, which automates the expansion and contraction of GPU clusters in real time to help optimize infrastructure costs. Atlas Cloud Chief Executive Jerry Tang said the company wants to change the economics of AI deployment in order to make it more profitable for enterprises. He explained that many companies can barely break even at the moment, while others are running AI applications and services at a loss, because of the sky-high computational costs. "Our platform's ability to process 54,500 input tokens and 22,500 output tokens per second per node means businesses can finally make high-volume LLM services profitable," Tang said. "I believe this will have a significant ripple effect throughout the industry. We're surpassing industry standards set by hyperscalers by delivering superior throughput with fewer resources." The startup says Atlas Inference is compatible with any type of GPU hardware and supports any kind of AI model. It's available starting today via the company's cloud-based servers, and can also be run on customers' on-premises servers.
[2]
Atlas Cloud Launches High-Efficiency AI Inference Platform, Outperforming DeepSeek
Developed with SGLang, Atlas Inference surpasses leading AI companies in throughput and cost, running DeepSeek V3 & R1 faster than DeepSeek themselves. NEW YORK, May 28, 2025 (Newswire.com) - Atlas Cloud, the all-in-one AI competency center for training and deploying AI models, today announced the launch of Atlas Inference, an AI inference platform that dramatically reduces GPU and server requirements, enabling faster, more cost-effective deployment of large language models (LLMs). Atlas Inference, co-developed with SGLang, an AI inference engine, maximizes GPU efficiency by processing more tokens faster and with less hardware. When comparing DeepSeek's published performance results, Atlas Inference's 12-node H100 cluster outperformed DeepSeek's reference implementation of their DeepSeek-V3 model while using two-thirds of the servers. Atlas' platform reduces infrastructure requirements and operational costs while addressing hardware costs, which represent up to 80% of AI operational expenses. "We built Atlas Inference to fundamentally break down the economics of AI deployment," said Jerry Tang, Atlas CEO. "Our platform's ability to process 54,500 input tokens and 22,500 output tokens per second per node means businesses can finally make high-volume LLM services profitable instead of merely break-even. I believe this will have a significant ripple effect throughout the industry. Simply put, we're surpassing industry standards set by hyperscalers by delivering superior throughput with fewer resources." Atlas Inference's performance also exceeds major players like Amazon, NVIDIA and Microsoft, delivering up to 2.1 times greater throughput using 12 nodes compared to competitors' larger setups. It maintains sub-5-second first-token latency and 100-millisecond inter-token latency with more than 10,000 concurrent sessions, ensuring a scaled, superior experience. The platform's performance is driven by four key innovations: "This platform represents a significant leap forward for AI inference," said Yineng Zhang, Core Developer at SGLang. "What we built here may become the new standard for GPU utilization and latency management. We believe this will unlock capabilities previously out of reach for the majority of the industry regarding throughput and efficiency." Combined with a lower cost per token, linear scaling behavior, and reduced emissions compared to leading vendors, Atlas Inference provides a cost-efficient and scalable AI deployment. Atlas Inference works with standard hardware and supports custom models, giving customers complete flexibility. Teams can upload fine-tuned models and keep them isolated on dedicated GPUs, making the platform ideal for organizations requiring brand-specific voice or domain expertise. The platform is available immediately for enterprise customers and early-stage startups. About Atlas Cloud Atlas Cloud is your all-in-one AI competency center, powering leading AI teams with safe, simple, and scalable infrastructure for training and deploying models. Atlas Cloud also offers an on-demand GPU platform that delivers fast, serverless compute. Backed by Dell, HPE, and Supermicro, Atlas delivers near instant access to up to 5,000 GPUs across a global SuperCloud fabric with 99% uptime and baked-in compliance. Learn more at atlascloud.ai.
Share
Copy Link
Atlas Cloud introduces Atlas Inference, a highly optimized AI inference service that significantly boosts GPU throughput and reduces computational requirements for AI workloads.
Atlas Cloud, a cloud infrastructure startup specializing in AI workloads, has launched a highly optimized artificial intelligence inference service called Atlas Inference. This new offering promises to dramatically reduce the computational requirements of even the most demanding AI workloads, potentially revolutionizing the economics of AI deployment 1.
Source: SiliconANGLE
Atlas Inference, co-developed with SGLang, an AI inference engine, claims to deliver 2.1 times greater throughput for AI workloads compared to equivalent services offered by industry giants such as Amazon Web Services and Nvidia 1. The platform's ability to process 54,500 input tokens and 22,500 output tokens per second per node significantly outperforms current industry standards 2.
In a notable achievement, Atlas Inference's 12-node cluster outperformed DeepSeek Ltd.'s reference implementation for the DeepSeek V3 model while using only two-thirds of the server's computational capacity. This impressive feat was accompanied by an 80% reduction in operational expenses 1.
The exceptional performance of Atlas Inference is attributed to four key innovations:
Atlas Inference boasts linear scaling behavior across nodes, which automates the expansion and contraction of GPU clusters in real-time. This feature optimizes infrastructure costs and provides a more cost-effective solution for businesses deploying AI models 1.
Jerry Tang, CEO of Atlas Cloud, emphasized the platform's potential to change the economics of AI deployment: "Our platform's ability to process 54,500 input tokens and 22,500 output tokens per second per node means businesses can finally make high-volume LLM services profitable" 2.
Atlas Inference is designed to work with standard hardware and supports custom models, offering customers complete flexibility. Organizations can upload fine-tuned models and keep them isolated on dedicated GPUs, making the platform ideal for those requiring brand-specific voice or domain expertise 2.
Yineng Zhang, Core Developer at SGLang, believes that Atlas Inference represents a significant leap forward for AI inference: "What we built here may become the new standard for GPU utilization and latency management. We believe this will unlock capabilities previously out of reach for the majority of the industry regarding throughput and efficiency" 2.
The launch of Atlas Inference could have far-reaching implications for the AI industry, potentially enabling more businesses to profitably deploy and run large language models. As AI continues to play an increasingly crucial role in various sectors, innovations like Atlas Inference may accelerate the adoption and implementation of AI technologies across industries.
Sundar Pichai, CEO of Alphabet, announces plans to continue hiring engineers through 2026, highlighting the importance of human talent alongside AI investments. He discusses AI's impact on productivity, job market concerns, and Google's commitment to innovation across various sectors.
6 Sources
Technology
23 hrs ago
6 Sources
Technology
23 hrs ago
OpenAI reports an increase in Chinese groups using ChatGPT for various covert operations, including social media manipulation, cyber operations, and influence campaigns. The company has disrupted multiple operations originating from China and other countries.
7 Sources
Technology
7 hrs ago
7 Sources
Technology
7 hrs ago
Palantir CEO Alex Karp emphasizes the dangers of AI and the critical nature of the US-China AI race, highlighting Palantir's role in advancing US interests in AI development.
3 Sources
Technology
7 hrs ago
3 Sources
Technology
7 hrs ago
Microsoft's stock reaches a new all-time high, driven by its strategic AI investments and strong market position in cloud computing and productivity software.
3 Sources
Business and Economy
7 hrs ago
3 Sources
Business and Economy
7 hrs ago
A UN report highlights a significant increase in indirect carbon emissions from major tech companies due to the energy demands of AI-powered data centers, raising concerns about the environmental impact of AI expansion.
3 Sources
Technology
7 hrs ago
3 Sources
Technology
7 hrs ago