2 Sources
2 Sources
[1]
Chronosphere takes on Datadog with AI that explains itself, not just outages
Chronosphere, a New York-based observability startup valued at $1.6 billion, announced Monday it will launch AI-Guided Troubleshooting capabilities designed to help engineers diagnose and fix production software failures -- a problem that has intensified as artificial intelligence tools accelerate code creation while making systems harder to debug. The new features combine AI-driven analysis with what Chronosphere calls a Temporal Knowledge Graph, a continuously updated map of an organization's services, infrastructure dependencies, and system changes over time. The technology aims to address a mounting challenge in enterprise software: developers are writing code faster than ever with AI assistance, but troubleshooting remains largely manual, creating bottlenecks when applications fail. "For AI to be effective in observability, it needs more than pattern recognition and summarization," said Martin Mao, Chronosphere's CEO and co-founder, in an exclusive interview with VentureBeat. "Chronosphere has spent years building the data foundation and analytical depth needed for AI to actually help engineers. With our Temporal Knowledge Graph and advanced analytics capabilities, we're giving AI the understanding it needs to make observability truly intelligent -- and giving engineers the confidence to trust its guidance." The announcement comes as the observability market -- software that monitors complex cloud applications -- faces mounting pressure to justify escalating costs. Enterprise log data volumes have grown 250% year-over-year, according to Chronosphere's own research, while a study from MIT and the University of Pennsylvania found that generative AI has spurred a 13.5% increase in weekly code commits, signifying faster development velocity but also greater system complexity. AI writes code 13% faster, but debugging stays stubbornly manual Despite advances in automated code generation, debugging production failures remains stubbornly manual. When a major e-commerce site slows during checkout or a banking app fails to process transactions, engineers must sift through millions of data points -- server logs, application traces, infrastructure metrics, recent code deployments -- to identify root causes. Chronosphere's answer is what it calls AI-Guided Troubleshooting, built on four core capabilities: automated "Suggestions" that propose investigation paths backed by data; the Temporal Knowledge Graph that maps system relationships and changes; Investigation Notebooks that document each troubleshooting step for future reference; and natural language query building. Mao explained the Temporal Knowledge Graph in practical terms: "It's a living, time-aware model of your system. It stitches together telemetry -- metrics, traces, logs -- infrastructure context, change events like deploys and feature flags, and even human input like notes and runbooks into a single, queryable map that updates as your system evolves." This differs fundamentally from the service dependency maps offered by competitors like Datadog, Dynatrace, and Splunk, Mao argued. "It adds time, not just topology," he said. "It tracks how services and dependencies change over time and connects those changes to incidents -- what changed and why. Many tools rely on standardized integrations; our graph goes a step further to normalize custom, non-standard telemetry so application-specific signals aren't a blind spot." Why Chronosphere shows its work instead of making automatic decisions Unlike purely automated systems, Chronosphere designed its AI features to keep engineers in the driver's seat -- a deliberate choice meant to address what Mao calls the "confident-but-wrong guidance" problem plaguing early AI observability tools. "'Keeping engineers in control' means the AI shows its work, proposes next steps, and lets engineers verify or override -- never auto-deciding behind the scenes," Mao explained. "Every Suggestion includes the evidence -- timing, dependencies, error patterns -- and a 'Why was this suggested?' view, so they can inspect what was checked and ruled out before acting." He walked through a concrete example: "An SLO [service level objective] alert fires on Checkout. Chronosphere immediately surfaces a ranked Suggestion: errors appear to have started in the dependent Payment service. An engineer can click Investigate to see the charts and reasoning and, if it holds up, choose to dig deeper. As they steer into Payment, the system adapts with new Suggestions scoped to that service -- all from one view, no tab-hopping." In this scenario, the engineer asks "what changed?" and the system pulls in change events. "Our Notebook capability makes the causal chain plain: a feature-flag update preceded pod memory exhaustion in Payment; Checkout's spike is a downstream symptom," Mao said. "They can decide to roll back the flag. That whole path -- suggestions followed, evidence viewed, conclusions -- is captured automatically in an Investigation Notebook, and the outcome feeds the Temporal Knowledge Graph so similar future incidents are faster to resolve." How a $1.6 billion startup takes on Datadog, Dynatrace, and Splunk Chronosphere enters an increasingly crowded field. Datadog, the publicly traded observability leader valued at over $40 billion, has introduced its own AI-powered troubleshooting features. So have Dynatrace and Splunk. All three offer comprehensive "all-in-one" platforms that promise single-pane-of-glass visibility. Mao distinguished Chronosphere's approach on technical grounds. "Early 'AI for observability' leaned heavily on pattern-spotting and summarization, which tends to break down during real incidents," he said. "These approaches often stop at correlating anomalies or producing fluent explanations without the deeper analysis and causal reasoning observability leaders need. They can feel impressive in demos but disappoint in production -- they summarize signals rather than explain cause and effect." A specific technical gap, he argued, involves custom application telemetry. "Most platforms reason over standardized integrations -- Kubernetes, common cloud services, popular databases -- ignoring the most telling clues that live in custom app telemetry," Mao said. "With an incomplete picture, large language models will 'fill in the gaps,' producing confident-but-wrong guidance that sends teams down dead ends." Chronosphere's competitive positioning received validation in July when Gartner named it a Leader in the 2025 Magic Quadrant for Observability Platforms for the second consecutive year. The firm was recognized based on both "Completeness of Vision" and "Ability to Execute." In December 2024, Chronosphere also tied for the highest overall rating among recognized vendors in Gartner Peer Insights' "Voice of the Customer" report, scoring 4.7 out of 5 based on 70 reviews. Yet the company faces intensifying competition for high-profile customers. UBS analysts noted in July that OpenAI now runs both Datadog and Chronosphere side-by-side to monitor GPU workloads, suggesting the AI leader is evaluating alternatives. While UBS maintained its buy rating on Datadog, the analysts warned that growing Chronosphere usage could pressure Datadog's pricing power. Inside the 84% cost reduction claims -- and what CIOs should actually measure Beyond technical capabilities, Chronosphere has built its market position on cost control -- a critical factor as observability spending spirals. The company claims its platform reduces data volumes and associated costs by 84% on average while cutting critical incidents by up to 75%. When pressed for specific customer examples with real numbers, Mao pointed to several case studies. "Robinhood has seen a 5x improvement in reliability and a 4x improvement in Mean Time to Detection," he said. "DoorDash used Chronosphere to improve governance and standardize monitoring practices. Astronomer achieved over 85% cost reduction by shaping data on ingest, and Affirm scaled their load 10x during a Black Friday event with no issues, highlighting the platform's reliability under extreme conditions." The cost argument matters because, as Paul Nashawaty, principal analyst at CUBE Research, noted when Chronosphere launched its Logs 2.0 product in June: "Organizations are drowning in telemetry data, with over 70% of observability spend going toward storing logs that are never queried." For CIOs fatigued by "AI-powered" announcements, Mao acknowledged skepticism is warranted. "The way to cut through it is to test whether the AI shortens incidents, reduces toil, and builds reusable knowledge in your own environment, not in a demo," he advised. He recommended CIOs evaluate three factors: transparency and control (does the system show its reasoning?), coverage of custom telemetry (can it handle non-standardized data?), and manual toil avoided (how many ad-hoc queries and tool-switches are eliminated?). Why Chronosphere partners with five vendors instead of building everything itself Alongside the AI troubleshooting announcement, Chronosphere revealed a new Partner Program integrating five specialized vendors to fill gaps in its platform: Arize for large language model monitoring, Embrace for real user monitoring, Polar Signals for continuous profiling, Checkly for synthetic monitoring, and Rootly for incident management. The strategy represents a deliberate bet against the all-in-one platforms dominating the market. "While an all-in-one platform may be sufficient for smaller organizations, global enterprises demand best-in-class depth across each domain," Mao said. "This is what drove us to build our Partner Program and invest in seamless integrations with leading providers -- so our customers can operate with confidence and clarity at every layer of observability." Noah Smolen, head of partnerships at Arize, said the collaboration addresses a specific enterprise need. "With a wide array of Fortune 500 customers, we understand the high bar needed to ensure AI agent systems are ready to deploy and stay incident-free, especially given the pace of AI adoption in the enterprise," Smolen said. "Our partnership with Chronosphere comes at a time when an integrated purpose-built cloud-native and AI-observability suite solves a huge pain point for forward-thinking C-suite leaders who demand the very best across their entire observability stack." Similarly, JJ Tang, CEO and founder of Rootly, emphasized the incident resolution benefits. "Incidents hinder innovation and revenue, and the challenge lies in sifting through vast amounts of observability data, mobilizing teams, and resolving issues quickly," Tang said. "Integrating Chronosphere with Rootly allows engineers to collaborate with context and resolve issues faster within their existing communication channels, drastically reducing time to resolution and ultimately improving reliability -- 78% plus decreases in repeat Sev0 and Sev1 incidents." When asked how total costs compare when customers use multiple partner contracts versus a single platform, Mao acknowledged the current complexity. "At present, mutual customers typically maintain separate contracts unless they engage through a services partner or system integrator," he said. However, he argued the economics still favor the composable approach: "Our combined technologies deliver exceptional value -- in most circumstances at just a fraction of the price of a single-platform solution. Beyond the savings, customers gain a richer, more unified observability experience that unlocks deeper insights and greater efficiency, especially for large-scale environments." The company plans to streamline this over time. "As the ISV program matures, we're focused on delivering a more streamlined experience by transitioning to a single, unified contract that simplifies procurement and accelerates time to value," Mao said. How two Uber engineers turned Halloween outages into a billion-dollar startup Chronosphere's origins trace to 2019, when Mao and co-founder Rob Skillington left Uber after building the ride-hailing giant's internal observability platform. At Uber, Mao's team had faced a crisis: the company's in-house tools would fail on its two busiest nights -- Halloween and New Year's Eve -- cutting off visibility into whether customers could request rides or drivers could locate passengers. The solution they built at Uber used open-source software and ultimately allowed the company to operate without outages, even during high-volume events. But the broader market insight came at an industry conference in December 2018, when major cloud providers threw their weight behind Kubernetes, Google's container orchestration technology. "This meant that most technology architectures were eventually going to look like Uber's," Mao recalled in an August 2024 profile by Greylock Partners, Chronosphere's lead investor. "And that meant every company, not just a few big tech companies and the Walmarts of the world, would have the exact same problem we had solved at Uber." Chronosphere has since raised more than $343 million in funding across multiple rounds led by Greylock, Lux Capital, General Atlantic, Addition, and Founders Fund. The company operates as a remote-first organization with offices in New York, Austin, Boston, San Francisco, and Seattle, employing approximately 299 people according to LinkedIn data. The company's customer base includes DoorDash, Zillow, Snap, Robinhood, and Affirm -- predominantly high-growth technology companies operating cloud-native, Kubernetes-based infrastructures at massive scale. What's available now -- and what enterprises can expect in 2026 Chronosphere's AI-Guided Troubleshooting capabilities, including Suggestions and Investigation Notebooks, entered limited availability Monday with select customers. The company plans full general availability in 2026. The Model Context Protocol (MCP) Server, which enables engineers to integrate Chronosphere directly into internal AI workflows and query observability data through AI-enabled development environments, is available immediately for all Chronosphere customers. The phased rollout reflects the company's cautious approach to deploying AI in production environments where mistakes carry real costs. By gathering feedback from early adopters before broad release, Chronosphere aims to refine its guidance algorithms and validate that its suggestions genuinely accelerate troubleshooting rather than simply generating impressive demonstrations. The longer game, however, extends beyond individual product features. Chronosphere's dual bet -- on transparent AI that shows its reasoning and on a partner ecosystem rather than all-in-one integration -- amounts to a fundamental thesis about how enterprise observability will evolve as systems grow more complex. If that thesis proves correct, the company that solves observability for the AI age won't be the one with the most automated black box. It will be the one that earns engineers' trust by explaining what it knows, admitting what it doesn't, and letting humans make the final call. In an industry drowning in data and promised silver bullets, Chronosphere is wagering that showing your work still matters -- even when AI is doing the math.
[2]
Chronosphere launches AI-Guided Troubleshooting to accelerate root-cause analysis - SiliconANGLE
Chronosphere launches AI-Guided Troubleshooting to accelerate root-cause analysis Cloud-native observability company Chronosphere Inc. today announced the launch of AI-Guided Troubleshooting capabilities, an advancement that helps engineering teams investigate and resolve production incidents. The new capabilities combine artificial intelligence-driven insights with deep environmental context via a Temporal Knowledge Graph to deliver highly accurate root-cause insights that enable engineers to resolve issues faster and with greater confidence. The company's new AI-Guided Troubleshooting seeks to assist with the issue whereby generative AI has spurred a 13.5% increase in weekly code commits, signifying a surge in code velocity and change volume. Chronosphere argues that despite these advancements in software development, troubleshooting remains primarily manual and relies heavily on intuition, resulting in slower mean time to resolution and greater on-call stress. Chronosphere's AI-Guided Troubleshooting capabilities close the gap by combining AI reasoning with a Temporal Knowledge Graph, a living, queryable map of an organization's services, infrastructure and their relationships. The capabilities account for system changes and even human input, as well as supporting custom application telemetry to provide deep context for effective root-cause analysis. The system applies analytics to surface the most meaningful next steps in an investigation. Each stage includes explanations of what's been analyzed or ruled out to allow engineers to stay in control while AI accelerates every phase of the troubleshooting process. As engineers zero in on a root cause, investigations are fed into the Temporal Knowledge Graph so future suggestions get smarter. "For AI to be effective in observability, it needs more than pattern recognition and summarization," said co-founder and Chief Executive Martin Mao. "Chronosphere has spent years building the data foundation and analytical depth needed for AI to actually help engineers. With our Temporal Knowledge Graph and advanced analytics capabilities, we're giving AI the understanding it needs to make observability truly intelligent and giving engineers the confidence to trust its guidance." Chronosphere's AI-Guided Troubleshooting introduces four core capabilities. Suggestions offer proactive, plain-language insights that guide investigations toward likely causes backed by data, not guesswork. The Temporal Knowledge Graph gives a continuously updated map of services, dependencies and custom telemetry. Investigation Notebooks provides persistent workspaces that document every step, piece of evidence and conclusion, turning investigations into reusable institutional knowledge. And Natural Language Assistance allows engineers to build queries and dashboards using natural language, accelerating data exploration. In addition to AI-Guided Troubleshooting, Chronosphere also today announced the general availability of its Model Context Protocol Server which allows engineers and developers to integrate Chronosphere directly into internal AI workflows. Chronosphere CEO Mao spoke with theCUBE, SiliconANGLE Media's livestreaming studio, in June, when he discussed how Chronosphere is redefining cloud-native observability with Logs 2.0 and real-time data control:
Share
Share
Copy Link
New York-based observability startup Chronosphere introduces AI-powered troubleshooting capabilities that combine artificial intelligence with a Temporal Knowledge Graph to help engineers diagnose production failures faster, addressing the growing complexity of AI-accelerated software development.

Chronosphere, a New York-based observability startup valued at $1.6 billion, announced Monday the launch of AI-Guided Troubleshooting capabilities designed to help engineers diagnose and fix production software failures
1
. The new features combine AI-driven analysis with what the company calls a Temporal Knowledge Graph, a continuously updated map of an organization's services, infrastructure dependencies, and system changes over time2
.The announcement addresses a mounting challenge in enterprise software development: while artificial intelligence tools are accelerating code creation, troubleshooting remains largely manual, creating bottlenecks when applications fail
1
. According to research from MIT and the University of Pennsylvania, generative AI has spurred a 13.5% increase in weekly code commits, signifying faster development velocity but also greater system complexity .Enterprise log data volumes have grown 250% year-over-year, according to Chronosphere's research, while the observability market faces mounting pressure to justify escalating costs
1
. When major applications fail, engineers must sift through millions of data points including server logs, application traces, infrastructure metrics, and recent code deployments to identify root causes.Chronosphere's AI-Guided Troubleshooting is built on four core capabilities: automated "Suggestions" that propose investigation paths backed by data; the Temporal Knowledge Graph that maps system relationships and changes; Investigation Notebooks that document each troubleshooting step for future reference; and natural language query building
1
.CEO and co-founder Martin Mao explained the Temporal Knowledge Graph as "a living, time-aware model of your system" that stitches together telemetry, infrastructure context, change events, and human input into a single, queryable map that updates as systems evolve
1
. This differs from service dependency maps offered by competitors like Datadog, Dynatrace, and Splunk by adding temporal context and tracking how services change over time.Related Stories
Unlike purely automated systems, Chronosphere designed its AI features to keep engineers in control, addressing what Mao calls the "confident-but-wrong guidance" problem plaguing early AI observability tools
1
. Every suggestion includes evidence such as timing, dependencies, and error patterns, along with a "Why was this suggested?" view that allows engineers to inspect what was checked and ruled out before taking action.The system applies analytics to surface meaningful next steps while providing explanations at each stage, allowing engineers to maintain control while AI accelerates the troubleshooting process
2
. As engineers investigate root causes, their findings feed back into the Temporal Knowledge Graph, making future suggestions more intelligent.Summarized by
Navi
08 Apr 2025•Technology

05 Feb 2025•Technology

20 Feb 2025•Technology
