I think some CrowdStrike customers recovered rather quickly. As the news tends to focus on disaster rather than job well done, we mostly hear about Delta Airlines. In some cases the backup server and backups were carefully protected by the same software that damaged production.It seems that the industry in general never learns the lessons about backup -- backup -- backup.There is perhaps a lesson here about backup regimes and the sensibility of running with data on the PC rather than using them more as thin clients, having local apps which access remote data.
It should be possible to reinstate a past snapshot image of a disk in mere seconds. Plus whatever it takes to stop CrowdStrike running and invoking the 'Brick My PC' update.
It appears the cost of this debacle is estimated at being over $5 billion, and I'm not sure that includes incidental costs to the public affected by it.
That aside, there are two other lessons for large scale, mission-critical systems.
- Never apply system updates to live systems without testing them offline. If this had been done, the Crowdstrike damage would not have struck the crowd.
- Another single point of failure has been identified. Much as it is nice to restrict systems to one architecture, at this level the software system should always be implemented on two different hardware platforms, with two different system software regimes. That way, at the cost of extra complexity and higher support costs, an overall system can be made such that it does not rely on any one software component. If the Win servers go down, the Linux ones may not be affected. The overall system response and throughput will be degraded, but it will not be struck out for the crowd's count.
Fido recommends expanding the product lineup to include CrowdStrike Falcon for IBM z/OS to help regain market share recently lost. After objecting I pointed out that financial transactions are done by mainframes because it would be even more costly if the banking system went down. Fido growled something about blockchain and then suggested CrowdStrike for Raspberry Pi OS.
What media does one use for offline backups?
Statistics: Posted by ejolson — Sat Jul 27, 2024 2:54 pm