Curated by THEOUTPOST
On Wed, 24 Jul, 4:03 PM UTC
4 Sources
[1]
CrowdStrike: Global IT outage was due to an 'undetected error' in update
July 24 (UPI) -- CrowdStrike said Wednesday the worldwide IT outage impacting air travel, 911 services, television and public infrastructure last week was caused by an undetected error in a Rapid Response Content Falcon update. Rapid Response Content is "designed to respond to the changing threat landscape at operational speed," according to CrowdStrike. "Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data," CrowdStrike said in a statement. That bug caused a Windows operating system crash. It was part of the Falcon content update for Windows hosts that took 8.5 million computers offline. CrowdStrike said the event that triggered the outage July 19 was a content configuration update for "the Windows sensor to gather telemetry on possible novel threat techniques." Sensor content updates from the company are not dynamically updated from the cloud. Instead, it comprises code that includes AI and machine learning models for CrowdStrike's threat detection engineers. "Rapid Response Content provides visibility and detections on the sensor without requiring sensor code changes," CrowdStrike's statement said. "This capability is used by threat detection engineers to gather telemetry, identify indicators of adversary behavior and perform detections and preventions. Rapid Response Content is behavioral heuristics, separate and distinct from CrowdStrike's on-sensor AI prevention and detection capabilities." To prevent the global IT crash from happening again, CrowdStrike said it is improving Rapid Response Content updates with greater resiliency and testing. That will include local developer testing as well as enhanced error handling. Additional validation checks are being also being added "to guard against this type of problematic content from being deployed in the future," according to CrowdStrike. A staggered deployment strategy for Rapid Response Content will also be used to gradually deploy the updates. Emergency 911 services were affected by the July 19 incident in several U.S. states while air travel came to a standstill. But the event was not a cyberattack, according to CrowdStrike. The Federal Communications Commission investigated the outage. Some TV networks were unable to broadcast and banks, hospitals and stock exchanges were also impacted.
[2]
CrowdStrike: Global IT outage was due to an 'undetected error' in update
said Wednesday the worldwide IT outage impacting air travel, 911 services, television and public infrastructure last week was caused by an undetected error in a Rapid Response Content Falcon update. Rapid Response Content is "designed to respond to the changing threat landscape at operational speed," according to . "Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data," said in a statement. That bug caused a Windows operating system crash. It was part of the Falcon content update for Windows hosts that took 8.5 million computers offline. said the event that triggered the outage was a content configuration update for "the Windows sensor to gather telemetry on possible novel threat techniques." Sensor content updates from the company are not dynamically updated from the cloud. Instead, it comprises code that includes AI and machine learning models for threat detection engineers. "Rapid Response Content provides visibility and detections on the sensor without requiring sensor code changes," statement said. "This capability is used by threat detection engineers to gather telemetry, identify indicators of adversary behavior and perform detections and preventions. Rapid Response Content is behavioral heuristics, separate and distinct from on-sensor AI prevention and detection capabilities." To prevent the global IT crash from happening again, said it is improving Rapid Response Content updates with greater resiliency and testing. That will include local developer testing as well as enhanced error handling. Additional validation checks are being also being added "to guard against this type of problematic content from being deployed in the future," according to . A staggered deployment strategy for Rapid Response Content will also be used to gradually deploy the updates. Some TV networks were unable to broadcast and banks, hospitals and stock exchanges were also impacted. Copyright 2024 United Press International, Inc. (UPI). Any reproduction, republication, redistribution and/or modification of any UPI content is expressly prohibited without UPI's prior written consent., source
[3]
CrowdStrike Explains Friday Incident Crashing Millions of Windows Devices
Cybersecurity firm CrowdStrike on Wednesday blamed an issue in its validation system for causing millions of Windows devices to crash as part of a widespread outage late last week. "On Friday, July 19, 2024 at 04:09 UTC, as part of regular operations, CrowdStrike released a content configuration update for the Windows sensor to gather telemetry on possible novel threat techniques," the company said in its Preliminary Post Incident Review (PIR). "These updates are a regular part of the dynamic protection mechanisms of the Falcon platform. The problematic Rapid Response Content configuration update resulted in a Windows system crash." The incident impacted Windows hosts running sensor version 7.11 and above that was online between July 19, 2024, 04:09 UTC and 05:27 UTC and received the update. Apple macOS and Linux systems were not affected. CrowdStrike said it delivers security content configuration updates in two ways, one via Sensor Content that's shipped with Falcon Sensor and another through Rapid Response Content that allows it to flag novel threats using various behavioral pattern-matching techniques. The crash is said to have been the result of a Rapid Response Content update containing a previously undetected error. It's worth noting that such updates are delivered in the form of Template Instances corresponding to specific behaviors - that are mapped to specific Template Types - for enabling new telemetry and detection. The Template Instances, in turn, are created using a Content Configuration System, after which they are deployed to the sensor over the cloud through a mechanism dubbed Channel Files, which are ultimately written to disk on the Windows machine. The system also encompasses a Content Validator component that carries out validation checks on the content before it is published. "Rapid Response Content provides visibility and detections on the sensor without requiring sensor code changes," it explained. "This capability is used by threat detection engineers to gather telemetry, identify indicators of adversary behavior and perform detections and preventions. Rapid Response Content is behavioral heuristics, separate and distinct from CrowdStrike's on-sensor AI prevention and detection capabilities." These updates are then parsed by the Falcon sensor's Content Interpreter, which then facilitates the Sensor Detection Engine to detect or prevent malicious activity. While each new Template Type is stress tested for different parameters like resource utilization and performance impact, the root cause of the problem, per CrowdStrike, could be traced back to the rollout of the Interprocess Communication (IPC) Template Type on February 28, 2024, that was introduced to flag attacks that named pipes. The timeline of events is as follows - "Based on the testing performed before the initial deployment of the Template Type (on March 05, 2024), trust in the checks performed in the Content Validator, and previous successful IPC Template Instance deployments, these instances were deployed into production," CrowdStrike said. "When received by the sensor and loaded into the Content Interpreter, problematic content in Channel File 291 resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSoD)." In response to the sweeping disruptions caused by the crash and preventing them from happening again, the Texas-based company said it has improved its testing processes and enhanced its error handling mechanism in the Content Interpreter. It's also planning to implement a staggered deployment strategy for Rapid Response Content.
[4]
Microsoft outage: Read CrowdStrike's memo explaining how its update broke the computers worldwide - Times of India
A faulty software update from cybersecurity company CrowdStrike triggered a worldwide IT meltdown last Friday (July 19), bringing airlines, banks, and retailers to a standstill. The update, designed to protect against cyber threats, contained a critical error that caused widespread disruption. Thousands of flights were grounded, emergency services faced challenges, stores were forced to close or operate in cash-only mode, and hospitals experienced delays in procedures.While CrowdStrike quickly released a fix, it took several hours for systems to fully recover, as some required manual intervention. The company has acknowledged the error and is providing detailed information about the incident in a memo released on Wednesday, July 24. Here's the memo on what happened on July 19, 2024 Preliminary Post Incident Review (PIR): Content Configuration Update Impacting the Falcon Sensor and the Windows Operating System (BSOD) This is CrowdStrike's preliminary Post Incident Review (PIR). We will be detailing our full investigation in the forthcoming Root Cause Analysis that will be released publicly. Throughout this PIR, we have used generalized terminology to describe the Falcon platform for improved readability. Terminology in other documentation may be more specific and technical. What Happened? On Friday, July 19, 2024 at 04:09 UTC, as part of regular operations, CrowdStrike released a content configuration update for the Windows sensor to gather telemetry on possible novel threat techniques. These updates are a regular part of the dynamic protection mechanisms of the Falcon platform. The problematic Rapid Response Content configuration update resulted in a Windows system crash. Systems in scope include Windows hosts running sensor version 7.11 and above that were online between Friday, July 19, 2024 04:09 UTC and Friday, July 19, 2024 05:27 UTC and received the update. Mac and Linux hosts were not impacted. The defect in the content update was reverted on Friday, July 19, 2024 at 05:27 UTC. Systems coming online after this time, or that did not connect during the window, were not impacted. What Went Wrong and Why? CrowdStrike delivers security content configuration updates to our sensors in two ways: Sensor Content that is shipped with our sensor directly, and Rapid Response Content that is designed to respond to the changing threat landscape at operational speed. The issue on Friday involved a Rapid Response Content update with an undetected error. Sensor Content Sensor Content provides a wide range of capabilities to assist in adversary response. It is always part of a sensor release and not dynamically updated from the cloud. Sensor Content includes on-sensor AI and machine learning models, and comprises code written expressly to deliver longer-term, reusable capabilities for CrowdStrike's threat detection engineers. These capabilities include Template Types, which have pre-defined fields for threat detection engineers to leverage in Rapid Response Content. Template Types are expressed in code. All Sensor Content, including Template Types, go through an extensive QA process, which includes automated testing, manual testing, validation and rollout steps. The sensor release process begins with automated testing, both prior to and after merging into our code base. This includes unit testing, integration testing, performance testing and stress testing. This culminates in a staged sensor rollout process that starts with dogfooding internally at CrowdStrike, followed by early adopters. It is then made generally available to customers. Customers then have the option of selecting which parts of their fleet should install the latest sensor release ('N'), or one version older ('N-1') or two versions older ('N-2') through Sensor Update Policies. The event of Friday, July 19, 2024 was not triggered by Sensor Content, which is only delivered with the release of an updated Falcon sensor. Customers have complete control over the deployment of the sensor -- which includes Sensor Content and Template Types. Rapid Response Content Rapid Response Content is used to perform a variety of behavioral pattern-matching operations on the sensor using a highly optimized engine. Rapid Response Content is a representation of fields and values, with associated filtering. This Rapid Response Content is stored in a proprietary binary file that contains configuration data. It is not code or a kernel driver. Rapid Response Content is delivered as "Template Instances," which are instantiations of a given Template Type. Each Template Instance maps to specific behaviors for the sensor to observe, detect or prevent. Template Instances have a set of fields that can be configured to match the desired behavior. In other words, Template Types represent a sensor capability that enables new telemetry and detection, and their runtime behavior is configured dynamically by the Template Instance (i.e., Rapid Response Content). Rapid Response Content provides visibility and detections on the sensor without requiring sensor code changes. This capability is used by threat detection engineers to gather telemetry, identify indicators of adversary behavior and perform detections and preventions. Rapid Response Content is behavioral heuristics, separate and distinct from CrowdStrike's on-sensor AI prevention and detection capabilities. Rapid Response Content Testing and Deployment Rapid Response Content is delivered as content configuration updates to the Falcon sensor. There are three primary systems: the Content Configuration System, the Content Interpreter and the Sensor Detection Engine. The Content Configuration System is part of the Falcon platform in the cloud, while the Content Interpreter and Sensor Detection Engine are components of the Falcon sensor. The Content Configuration System is used to create Template Instances, which are validated and deployed to the sensor through a mechanism called Channel Files. The sensor stores and updates its content configuration data through Channel Files, which are written to disk on the host. The Content Interpreter on the sensor reads the Channel File and interprets the Rapid Response Content, enabling the Sensor Detection Engine to observe, detect or prevent malicious activity, depending on the customer's policy configuration. The Content Interpreter is designed to gracefully handle exceptions from potentially problematic content. Newly released Template Types are stress tested across many aspects, such as resource utilization, system performance impact and event volume. For each Template Type, a specific Template Instance is used to stress test the Template Type by matching against any possible value of the associated data fields to identify adverse system interactions. Template Instances are created and configured through the use of the Content Configuration System, which includes the Content Validator that performs validation checks on the content before it is published. Timeline of Events: Testing and Rollout of the InterProcessCommunication (IPC) Template Type Sensor Content Release: On February 28, 2024, sensor 7.11 was made generally available to customers, introducing a new IPC Template Type to detect novel attack techniques that abuse Named Pipes. This release followed all Sensor Content testing procedures outlined above in the Sensor Content section. Template Type Stress Testing: On March 05, 2024, a stress test of the IPC Template Type was executed in our staging environment, which consists of a variety of operating systems and workloads. The IPC Template Type passed the stress test and was validated for use. Template Instance Release via Channel File 291: On March 05, 2024, following the successful stress test, an IPC Template Instance was released to production as part of a content configuration update. Subsequently, three additional IPC Template Instances were deployed between April 8, 2024 and April 24, 2024. These Template Instances performed as expected in production. What Happened on July 19, 2024? On July 19, 2024, two additional IPC Template Instances were deployed. Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data. Based on the testing performed before the initial deployment of the Template Type (on March 05, 2024), trust in the checks performed in the Content Validator, and previous successful IPC Template Instance deployments, these instances were deployed into production. When received by the sensor and loaded into the Content Interpreter, problematic content in Channel File 291 resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSOD). How Do We Prevent This From Happening Again? Software Resiliency and Testing Improve Rapid Response Content testing by using testing types such as: Local developer testing Content update and rollback testing Stress testing, fuzzing and fault injection Stability testing Content interface testing Add additional validation checks to the Content Validator for Rapid Response Content. A new check is in process to guard against this type of problematic content from being deployed in the future. Enhance existing error handling in the Content Interpreter. Rapid Response Content Deployment Implement a staggered deployment strategy for Rapid Response Content in which updates are gradually deployed to larger portions of the sensor base, starting with a canary deployment. Improve monitoring for both sensor and system performance, collecting feedback during Rapid Response Content deployment to guide a phased rollout. Provide customers with greater control over the delivery of Rapid Response Content updates by allowing granular selection of when and where these updates are deployed. Provide content update details via release notes, which customers can subscribe to. In addition to this preliminary Post Incident Review, CrowdStrike is committed to publicly releasing the full Root Cause Analysis once the investigation is complete. The TOI Tech Desk is a dedicated team of journalists committed to delivering the latest and most relevant news from the world of technology to readers of The Times of India. TOI Tech Desk's news coverage spans a wide spectrum across gadget launches, gadget reviews, trends, in-depth analysis, exclusive reports and breaking stories that impact technology and the digital universe. Be it how-tos or the latest happenings in AI, cybersecurity, personal gadgets, platforms like WhatsApp, Instagram, Facebook and more; TOI Tech Desk brings the news with accuracy and authenticity.
Share
Share
Copy Link
CrowdStrike, a major cybersecurity firm, inadvertently caused a widespread IT outage affecting Windows systems globally. The incident, which occurred on Friday, was due to an undetected error in a software update, impacting numerous organizations and prompting a swift response from the company.
On Friday, a significant IT outage swept across the globe, affecting countless Windows systems and causing disruptions for numerous organizations. The root cause was traced back to an update released by CrowdStrike, a prominent cybersecurity company 1.
CrowdStrike promptly addressed the situation, confirming that an undetected error in their software update was responsible for the widespread outage. The company's CEO, George Kurtz, issued a statement explaining that the problem stemmed from a sensor update containing a bug that impacted Windows systems running their Falcon platform 2.
The issue specifically affected the Falcon sensor's ability to load its primary DLL file, resulting in system crashes and boot failures. This problem was particularly severe for devices configured with Secure Boot enabled, as it prevented alternative boot methods 3.
Upon discovering the problem, CrowdStrike's team worked tirelessly to develop and distribute a fix. Within approximately 30 minutes of identifying the issue, the company had created a solution. However, the complete rollout of the fix took several hours due to the need for gradual deployment to ensure system stability 4.
The outage affected a wide range of organizations, from government agencies to private businesses. Many companies experienced significant disruptions to their operations, with some forced to send employees home due to inaccessible computer systems 1.
In response to the incident, CrowdStrike has pledged to enhance its testing and release processes. The company aims to prevent similar occurrences in the future by implementing more rigorous quality control measures. CEO George Kurtz emphasized the company's commitment to learning from this experience and strengthening their systems 2.
This incident highlights the critical role that cybersecurity firms play in maintaining global IT infrastructure. It also underscores the potential far-reaching consequences of software updates gone awry, especially for widely-used security platforms. The event has prompted discussions within the tech industry about the need for more robust testing procedures and failsafe mechanisms in critical software deployments 3.
Reference
[2]
[3]
A recent software update led to significant tech disruptions worldwide, affecting major companies and highlighting the fragility of interconnected technology systems. The incident has sparked discussions on cyber resilience and the need for robust backup plans.
7 Sources
7 Sources
A recent CrowdStrike software update led to a widespread Microsoft outage, affecting numerous users. CEO George Kurtz issued an apology, while the incident sparked discussions on cybersecurity and tech reliability.
6 Sources
6 Sources
CrowdStrike, a leading cybersecurity firm, experienced a significant global IT outage, causing widespread chaos and a sharp decline in its stock price. The incident, stemming from a Windows update, affected numerous customers and highlighted the vulnerabilities in cloud-based security systems.
8 Sources
8 Sources
CrowdStrike's Chief Security Officer Shawn Henry appeared before a U.S. House panel to address the company's role in a widespread tech outage in July. The incident affected numerous organizations globally, prompting scrutiny from lawmakers.
8 Sources
8 Sources
CrowdStrike CEO George Kurtz faces congressional scrutiny over recent global outages. The House Committee on Oversight and Accountability demands answers on the incident's impact and cybersecurity implications.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved