2 Sources
[1]
Fastino launches with $7M to release high-performance task-optimized AI models that run on CPUs - SiliconANGLE
Fastino launches with $7M to release high-performance task-optimized AI models that run on CPUs Fastino, a new artificial intelligence foundation model developer, launched today to provide a family of task-optimized language models designed to maintain high performance and accuracy without the need to run on high-end graphics processing units. The company also announced it raised $7 million in a pre-seed funding round led by Insight Partners and Microsoft Corp.'s M12 venture arm, with participation from NEA, Valor, GitHub Inc. Chief Executive Thomas Dohmke and others. "Fastino aims to bring the world more performant AI with task-specific capabilities," said Ash Lewis, co-founder and chief executive of Fastino. "Whereas traditional LLMs often require thousands of GPUs, making them costly and resource-intensive, our unique architecture requires only central processing units or neural processing units. This approach enhances accuracy and speed while lowering energy consumption compared to other large language models." The company said that its models are developed on a fit-for-purpose architecture for critical enterprise use cases and optimized for specific tasks, which makes them performant enough that they do not need to rely on heavyweight high-end GPUs. These use cases include structuring textual data, text summarization and task planning. "This task-level approach allows us to focus on delivering exceptional performance for distinct use cases relative to generalized models," Lewis told SiliconANGLE. "We achieve this by making architectural adjustments tailored to each task, which enables models that are not only highly performant but also faster and smaller than traditional generalized LLMs. " According to Fastino, the company's novel AI architecture can operate up to 1,000 times faster than traditional LLMs, allowing for flexible deployment across CPUs. Task optimization also allows for distributed AI systems, which makes them less vulnerable to security issues, such as adversarial attacks and privacy issues. One limiting challenge many enterprise companies face when deploying LLMs is the significant energy usage of hundreds or thousands of GPUs. By using an AI model that only needs CPUs or NPUs for task-optimized use cases, it would greatly reduce the amount of energy needed. The difference between a task-optimized language model and an LLM is that traditional LLMs are generalized and are not optimized for any particular capability. An LLM would be equally capable of question-and-answer, text generation, summarization, task planning, document analysis and more, making it a very large complex piece of software that requires a significant amount of computation. Task-specific optimization would make a particular language model very good at particular tasks, allowing it to be highly performant, accurate and fast for those activities.
[2]
Microsoft-backed startup debuts task optimized enterprise AI models that run on CPUs
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A new enterprise AI focussed startup is emerging from stealth today with the promise of providing what it calls 'task-optimized' models that provide better performance at lower cost. Fastino , based in San Francisco is also revealing that it has raised $7 million in a pre-seed funding round from Insight Partners and M12, Microsoft's Venture Fund as well as participation from Github CEO Thomas Dohmke. Fastino is building its own family of enterprise AI models as well as developer tooling. The models are new and are not based on any existing Large Language Models (LLMs). Like most generative AI vendors, Fastino's models have a transformer architecture though it is using some innovative techniques designed to improve accuracy and enterprise utility. Unlike most other LLMs providers, Fastino's models will run well on general-purpose CPUs, and do not require high-cost GPUs to run. The idea for Fastino was born out of the founders' own experiences in the industry and real-world challenges in deploying AI at scale. Ash Lewis, CEO and co-founder of the company had been building a developer agent technology known as DevGPT. His co-founder, George Hurn-Maloney, was previously the founder of Waterway DevOps which was acquired by JFrog in 2023. Lewis explained that his prior company's developer agent was using OpenAI in the background, which led to some issues. "We were spending close to a million dollars a year on the API," Lewis said. "We didn't feel like we had any real control over that." Fastino's approach represents a departure from traditional large language models. Rather than creating general-purpose AI models, the company has developed task-optimized models that excel at specific enterprise functions. "The whole idea is that if you narrow the scope of these models, make them less generalist so that they're more optimized for your task, they can only respond within scope," Lewis explained. How the task optimized model approach could bring more efficiency to enterprise AI The concept of using a smaller model to optimize for a specific use case, isn't an entirely new idea. Small Language Models (SLM) , such as Microsoft's Phi-2 and vendors like Arcee AI have been advocating the approach for a while. Hurn-Maloney said that Fastino is calling its models task optimized rather than SLMs for a number of reasons. For one, in his view, the term "small" has often carried the connotation of being less accurate, which is not the case for Fastino. Lewis said that the goal is to actually create a new model category that is not a generalist model that is just large or small by parameter count. Fastino's models are task-optimized rather than being generalist models. The goal is to make the models less broad in scope and more specialized for specific enterprise tasks. By focusing on specific tasks, Fastino claims that its models are able to achieve higher accuracy and reliability compared to generalist language models. These models particularly excel at: Optimized models means no GPU is required, lowering enterprise AI costs A key differentiator for the Fastino models is the fact that they can run on CPUs and do not require the use of GPU AI accelerator technology. Fastino enables fast inference on CPUs using a number of different techniques. "If we're just talking absolutely simple terms, you just need to do less multiplication," Lewis said. "A lot of our techniques in the architecture just focus on doing less tasks that require matrix multiplication." He added that the models deliver responses in milliseconds rather than seconds. This efficiency extends to edge devices, with successful deployments demonstrated on hardware as modest as a Raspberry Pi. "I think a lot of enterprises are looking at TCO [total cost of ownership] for embedding AI in their application," Hurn-Maloney added. " So the ability to remove expensive GPUs from the equation, I think, is obviously helpful, too." Fastino's models are not yet generally available. That said, the company is already working with industry leaders in consumer devices, financial services and e-commerce, including a major North American device manufacturer for home and automotive applications. "Our ability to run on-prem is really good for industries that are pretty sensitive about their data," Hurn-Maloney explained. "The ability to run these models on-prem and on existing CPUs is quite enticing to financial services, healthcare and more data sensitive industries."
Share
Copy Link
Fastino, a new AI startup, emerges with $7 million in funding to develop task-optimized AI models that run efficiently on CPUs, promising high performance and lower costs for enterprises.
Fastino, a San Francisco-based artificial intelligence startup, has launched with a $7 million pre-seed funding round led by Insight Partners and Microsoft's M12 venture arm 12. The company aims to revolutionize enterprise AI by developing task-optimized language models that can run efficiently on central processing units (CPUs) without the need for expensive graphics processing units (GPUs).
Fastino's approach diverges from traditional large language models (LLMs) by focusing on task-specific optimization. According to CEO and co-founder Ash Lewis, "Whereas traditional LLMs often require thousands of GPUs, making them costly and resource-intensive, our unique architecture requires only central processing units or neural processing units" 1. This strategy allows for:
The company's models excel in specific enterprise functions such as structuring textual data, text summarization, and task planning 12.
Fastino asserts that its novel AI architecture can operate up to 1,000 times faster than traditional LLMs 1. The models are designed to deliver responses in milliseconds rather than seconds, with successful deployments demonstrated on hardware as modest as a Raspberry Pi 2.
While the exact details of Fastino's technology remain proprietary, the company has hinted at some of its innovative techniques:
Fastino is positioning itself to address key challenges in enterprise AI adoption, particularly for industries sensitive about data privacy and cost-efficiency. The ability to run models on-premises using existing CPU infrastructure is especially appealing to sectors such as:
The company is already working with industry leaders, including a major North American device manufacturer for home and automotive applications 2.
Fastino's approach could significantly reduce the total cost of ownership for embedding AI in enterprise applications. By eliminating the need for expensive GPUs and lowering energy consumption, the company addresses two major barriers to widespread AI adoption in business settings 12.
As the AI industry continues to evolve, Fastino's task-optimized models represent a potential shift in how enterprises approach AI implementation, balancing performance with resource efficiency and cost-effectiveness.
NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.
9 Sources
Technology
8 hrs ago
9 Sources
Technology
8 hrs ago
Google's Made by Google 2025 event showcases the Pixel 10 series, featuring advanced AI capabilities, improved hardware, and ecosystem integrations. The launch includes new smartphones, wearables, and AI-driven features, positioning Google as a strong competitor in the premium device market.
4 Sources
Technology
8 hrs ago
4 Sources
Technology
8 hrs ago
Palo Alto Networks reports impressive Q4 results and forecasts robust growth for fiscal 2026, driven by AI-powered cybersecurity solutions and the strategic acquisition of CyberArk.
6 Sources
Technology
8 hrs ago
6 Sources
Technology
8 hrs ago
OpenAI updates GPT-5 to make it more approachable following user feedback, sparking debate about AI personality and user preferences.
6 Sources
Technology
16 hrs ago
6 Sources
Technology
16 hrs ago
President Trump's plan to deregulate AI development in the US faces a significant challenge from the European Union's comprehensive AI regulations, which could influence global standards and affect American tech companies' operations worldwide.
2 Sources
Policy
29 mins ago
2 Sources
Policy
29 mins ago