Databricks Solves Data Pipelines Problem for AI Agents

Databricks tackles the data pipeline bottleneck at Data + AI Summit

At the Data + AI Summit on Tuesday, Databricks announced a fundamental shift in how enterprises handle operational and analytical data, introducing two products designed to eliminate infrastructure that has slowed AI agents 1

. The company unveiled Lake Transactional/Analytical Processing (LTAP) and Lakehouse//RT, technologies that promise to collapse the decades-old separation between transactional databases and analytical systems 2

Reynold Xin, co-founder of Databricks, described a simpler data stack as "the holy grail for agents," arguing that as users generate more applications, AI agents reasoning analytically need the underlying infrastructure out of the way to move fast 1

. The challenge is structural: a system that reasons continuously and acts on live data cannot tolerate a pipeline between itself and the information it needs to act on.

LTAP delivers a unified platform for operational and analytical data without ETL pipelines

LTAP stores PostgreSQL-native transactional data in Delta Lake and Apache Iceberg format from the point of write, eliminating ETL pipelines that have connected operational and analytical systems for decades 1

. The architecture builds upon Lakebase, Databricks' serverless cloud-based PostgreSQL database service that became generally available in February, built on technology from the Neon acquisition 3

Shanku Niyogi, Databricks' vice president of product management, has renamed Change Data Capture (CDC) as "continuous data corruption," reflecting widespread frustration with pipeline reliability. "CDC was slow, and it was buggy, and it was expensive. Pipelines break down. Schemas change," Niyogi said during an interview at the summit 3

. He cited a large banking customer maintaining hundreds of thousands of Postgres databases, each requiring CDC pipelines to bring data back to the lake 2

The LTAP approach unifies data at the storage layer rather than the engine level, distinguishing it from earlier HTAP (Hybrid Transactional/Analytical Processing) attempts. "HTAP to us is kind of more of a failure of the industry rather than a success," Xin noted 1

. Instead of converging engines, LTAP maintains PostgreSQL compatibility for transactional workloads while simultaneously writing data in columnar formats like Delta Lake and Apache Iceberg that analytical engines can read directly.

Lakehouse//RT enables millisecond query latency without separate serving infrastructure

Lakehouse//RT delivers sub-100ms latency at 12,000 queries per second, with response times as low as 10ms on smaller datasets and up to 16 times better performance than existing dedicated serving stacks 1

. The product is powered by a new execution engine called Reyden, built specifically for high-concurrency, low-latency serving that queries Delta Lake and Apache Iceberg tables directly without moving data out of the lakehouse.

Niyogi described Lakehouse//RT as "the biggest innovation we've had since we started the lakehouse" in 2020, noting that it removes the need for separate serving infrastructure while delivering real-time data access 3

. Every query runs within Unity Catalog's governance framework with no separate permissions layer, no data copies and no ingestion pipelines 1

Why eliminating data pipelines matters for AI agents at scale

The urgency stems from explosive growth in code generation. "This year, the amount of code being written in the world has gone up 50x. We think in the next 12 months, more code will be written than in the history of coding," Niyogi said 3

. These applications, increasingly powered by AI agents, need to read, analyze and act upon data in near real-time, making traditional architectures with separate transactional systems, analytical systems and serving layers inadequate 2

"Agents need the best data," Niyogi explained. "If they're getting stale or wrong data, they act poorly" 2

. The central engineering challenge is latency, as object storage carries response times in the seconds range, far too slow for OLTP workloads requiring sub-millisecond performance. Lakebase handles this through a caching layer between Postgres compute instances and object storage, with idle CPU capacity performing row-to-column conversion before data lands in object storage. When data converts from row to column, it compresses more than 10 times typically, substantially reducing network costs 1

Databricks plans to open-source technology that enables PostgreSQL data to be stored in Apache Parquet format while preserving compatibility, reinforcing its commitment to open formats 2

. As enterprises grapple with scaling AI agents, the ability to eliminate pipeline complexity while maintaining governance and performance will determine which organizations can deploy autonomous systems effectively.

Databricks unveils LTAP architecture to eliminate data pipelines slowing AI agents

Databricks tackles the data pipeline bottleneck at Data + AI Summit

LTAP delivers a unified platform for operational and analytical data without ETL pipelines

Lakehouse//RT enables millisecond query latency without separate serving infrastructure

Why eliminating data pipelines matters for AI agents at scale

References

Databricks says it solved the decades-old data pipeline problem that's been slowing AI agents

Databricks declares the end of pipelines with a unified platform for operational and analytical data

Why Databricks calls CDC 'continuous data corruption' - and what it built instead

Related Stories

Databricks Launches Serverless Lakebase Database, Challenging Legacy Database Market

Databricks Acquires Mooncake to Revolutionize Data Infrastructure for AI Agents

Confluent Unifies Batch and Stream Processing to Enhance AI Capabilities

Recent Highlights

OpenAI releases GPT-5.6 models after government review, unveils ChatGPT Work to compete in AI agent race

US-China AI tensions reach new heights as both nations move to restrict each other's models

OpenAI proposes giving US government 5% stake worth $42.6 billion amid regulatory pressure

Recent Highlights

Today's Top Stories

OpenAI launches ChatGPT Work, an AI agent designed to automate entire workflows across apps

Anthropic launches Claude Reflect to track your AI habits and help you use it more mindfully

Apple in talks with PrismML to run massive AI models directly on iPhone without cloud servers

Microsoft deploys AI security to accelerate Windows vulnerability discovery and patch releases