On July 19, an unprecedented computer chaos was unleashed when a Windows loading error affected a significant number of computers. The figures, a priori, might not seem so alarming, since it is estimated that it affected 1% of the computers running this operating system worldwide.
But the blue screen paralyzed important systems, public and private, all over the interconnected planet, so that the problem was elevated in minutes to the category of systemic.
As it soon became known, something that Microsoft must have been actively concerned about, the error originated after an update of the computer security service of CrowdStrike, a cybersecurity provider of the Silicon Valley giant among other companies. The purpose of this tool is to block malware access to systems and computers and, therefore, their infection.
The company itself, through its founder George Kurtz, quickly dismissed the hypotheses of a cyber-attack or security breach as a starting point. As it became known throughout the day, it all started in the Falcon sensor of the service and in an error in a new version, deployed early in the morning.
Concurrence of causes
At this point we can speak of a concurrence of causes.
The direct was a problem in the sensor code, i.e. with a significant component of human error. But this cause was compounded by theinability of the test software to detect the issue before validation and release to production and, indirectly, by the ineffectiveness of the quality and testing controls and procedures that are part of the preventive strategy.
As a result, a sensor of an unstable security solution (from one of the dominant companies in the area) that impacts computers connected to Microsoft cloud services such as Microsoft 365, Microsoft Azure or Microsoft Teams among others.
The nature of the error, the blue screen and the continuous reboot of the computers, also made it difficult to correct the incident due to the impossibility for technicians to access the equipment remotely.
Companies from all sectors affected
The consequences negatively impacted virtually every industry and globally, to the point of causing what Troy Hunt, head of the cybersecurity portal HaveIBeenPwned, called “the biggest computer outage in history.”
Air traffic experienced a very complicated day all over the world, with problems in the systems of airport managers and airlines.
In Spain, AENA confirmed on the same day that the incident occurred, more than 400 flights were cancelled and thousands were delayed, all of this in one of the most intense periods in terms of flights.
Also in Spain, the healthcare systems of different autonomous communities and private hospital networks suffered significant incidents, from the “blackout” of patient records to the paralysis of tools that manage ICU indicators in several centers, which forced the implementation of alternative solutions and, on many occasions, a return to pen and paper.
Throughout the world, hundreds of companies reported that their activity had been affected.
In Spain alone.
Names such as VISA, Unicaja, Movistar, Google, Santander España, Correos, Vueling or Repsol are just a few and give an approximate idea of the magnitude of the crash, the difficulty to prevent it and restore normality, the deep interconnection in which they operate during the day and the economic losses that a systemic error can cause in any sector of activity.