2 Sources
[1]
TwelveLabs raises $100M to bring superintelligence to AI video models
TwelveLabs raises $100M to bring superintelligence to AI video models TwelveLabs Inc., the developer of generative artificial intelligence foundation models that can understand videos like humans, today announced it has raised $100 million in early funding to expand beyond simple understanding to achieve holistic intelligence. The Series B round was co-led by NEA and NAVER Ventures. Amazon, Radical Ventures, Korea Investment Partners, Index Ventures, Quadrille Capital and Red Bull Ventures also participated in the round. Today's capital infusion brings the company's total raised to over $207 million. "Five years ago, we made a contrarian bet: the substrate of machine intelligence is recorded reality in motion, not language," said Chief Executive and co-founder Jae Lee. "Language is downstream of understanding. Video is the data understanding has to answer to." TwelveLabs is bringing genuine AI power to the field with its technology, having built frontier foundation models capable of video understanding. The company worked from the ground up to construct multimodal models that weren't simply large language models that process video, but models that natively understand video. The company's flagship products include the Marengo model family, with the 3.0 version released late last year and Pegasus 1.5. Maregno enables real-world AI embedding for videos, audio, text and composition. This means that it can parse numerous types of content and add it to machine-readable data structures, such as vector databases, so that AI models can understand and search the information at scale. Pegasus works alongside the previous model to turn video into structured data. It understands scene boundaries, entities, time segments and what's happening, enabling LLMs to reason across visual information. It operates similarly to how large language models process large documents and images, summarizing them in markup languages to make them easier to understand. Today's LLMs can't consume video all at once. They need to segment it up into flashes - a series of screenshots - and then use that to reason across. TwelveLabs said it built a reasoning capacity that natively understands trends over time by maintaining a memory that persists between queries, rather than evaporating after each one. An intelligence that compounds with each video, allowing the model to become more knowledgeable. The company said that, using these models, it intends to build a new paradigm of video perception, creating a system that allows machines to analyze, search and operationalize footage. Use cases for this type of technology span numerous workflows across industries such as security, advertising, sports and automotive, where tremendous amounts of information reside in video. As part of the funding, TwelveLabs is deepening its relationship with Amazon Web Services. The company's service has been available on AWS Marketplace since at least 2025, alongside managed access to foundation models via Amazon Bedrock. AWS is already TwelveLabs' preferred cloud provider, and with today's investment, the company has signed a multi-year commitment to optimize its video inference workloads for AWS Trainium chips. In addition, new frontier models will launch on AWS first.
[2]
Twelve Labs Raises $100 Million to Fund Bet on Video AI | PYMNTS.com
Now, the artificial intelligence (AI) startup has $100 million in new funding to explore that idea, according to an announcement on its blog Wednesday (July 1). "Five years ago, we began with a simple observation: The world does not happen in text. It happens in motion," Co-Founder and CEO Jae Lee wrote. In an interview with Bloomberg News, Lee expanded on that idea, saying video "is the most similar signal data that we receive as humans to learn about the world." "That's different from even the latest frontier models such as Fable 5 and Mythos which are still language models," he added. While the last decade of AI "made text programmable," the blog post continued, video has yet to enjoy a similar moment. "The world's video is still mostly dark matter to machines," Lee said, noting that it sits in places like "archives ... drones, and satellites," mostly still accessed "through filenames, folders, captions, transcripts, and human memory." He added, "The richest record of reality is still largely outside the semantic layer that modern AI systems use. We are changing that. Our goal is to make every second of video addressable, searchable, and usable by agents." Twelve Labs' Series B round was led by NEA and NAVER Ventures, with participation from Amazon, alongside Radical Ventures, Korea Investment Partners, Index Ventures, Quadrille Capital and Red Bull Ventures. PYMNTS wrote earlier this year that video generators were part of a new generation of consumer software categories forming around AI. "From AI companions and conversational search to prompt-based coding tools and video generators, products that barely appeared on app roadmaps two years ago are now attracting millions of users and building their own subscription economies," that report said. The report cites the example of AI companion app Character.AI, video editor CapCut, and Canva's Magic Suite of AI products. "These are not startups. They are established products that have been effectively rebuilt around AI capabilities. But running alongside them in the rankings is a separate tier: tools that could not exist without generative AI as their foundation," the report added. "AI-native search products like Perplexity. Video generation platforms. Coding assistants. Companion apps. Each represents a category that barely registered on product roadmaps in 2023 and now commands its own user communities, retention dynamics and, in many cases, subscription revenue."
Share
Copy Link
TwelveLabs secured $100 million in Series B funding to advance AI video models that natively understand video like humans. Led by NEA and NAVER Ventures with Amazon participating, the round brings total funding to over $207 million as the company aims to make every second of video addressable and usable by AI agents.
TwelveLabs has raised $100 million in Series B funding to expand its generative AI foundation models beyond simple video understanding toward holistic AI intelligence
1
. The round was co-led by NEA and NAVER Ventures, with participation from Amazon, Radical Ventures, Korea Investment Partners, Index Ventures, Quadrille Capital and Red Bull Ventures. This capital infusion brings the company's total raised to over $207 million1
."Five years ago, we made a contrarian bet: the substrate of machine intelligence is recorded reality in motion, not language," said Chief Executive and co-founder Jae Lee
1
. In an interview with Bloomberg News, Jae Lee explained that video "is the most similar signal data that we receive as humans to learn about the world," distinguishing TwelveLabs' approach from latest frontier models that remain language-based2
. The company worked from the ground up to construct multimodal models that weren't simply large language models processing video, but models that natively understand video1
.
Source: PYMNTS
The company's flagship products include the Marengo model family, with version 3.0 released late last year, and Pegasus 1.5
1
. Marengo 3.0 enables real-world AI embedding for videos, audio, text and composition, parsing numerous content types and adding them to machine-readable data structures like vector databases so AI models can understand and search information at scale1
. Pegasus 1.5 works alongside Marengo to turn video into structured data, understanding scene boundaries, entities, time segments and events, enabling large language models to reason across visual information1
.While the last decade of AI "made text programmable," video has yet to enjoy a similar moment
2
. "The world's video is still mostly dark matter to machines," Lee noted, sitting in archives, drones, and satellites, mostly accessed "through filenames, folders, captions, transcripts, and human memory"2
. TwelveLabs built a reasoning capacity that natively understands trends over time by maintaining memory that persists between queries, rather than evaporating after each one, creating an intelligence that compounds with each video1
. The company's goal is to make every second of video addressable, searchable, and usable by agents2
.Related Stories
As part of the funding, TwelveLabs is strengthening its relationship with Amazon Web Services. The company's service has been available on AWS Marketplace since at least 2025, alongside managed access to foundation models via Amazon Bedrock
1
. With today's investment, the company has signed a multi-year commitment to optimize its video inference workloads for AWS Trainium chips, with new frontier models launching on AWS first1
.TwelveLabs represents part of a new generation of AI-native tools forming around artificial intelligence capabilities
2
. Use cases for this technology span numerous workflows across industries such as security, advertising, sports and automotive, where tremendous amounts of information reside in video1
. The company is working to build a new paradigm of video perception, creating systems that allow machines to analyze, search and operationalize footage, pursuing superintelligence in video understanding1
.Summarized by
Navi
15 Jul 2025•Business and Economy

19 Nov 2025•Business and Economy

26 Jun 2026•Startups

1
Policy and Regulation

2
Policy and Regulation

3
Policy and Regulation
