Why the CrowdStrike Bug Hit Banks Hard | PolyMarkets Investment Strategies

The July 2024 Outage

On July 19th, 2024, a firm most people have sensibly never heard of knocked out a large portion of routine operations at many institutions worldwide. This hit the banking sector particularly hard. Several of the largest U.S. banks were publicly reported to have been affected by the outage. One major institution reportedly idled tellers and bankers nationwide for the duration. The issue affected institutions across the size spectrum, including large regionals and community banks.

Understanding why this happened and how it was even possible requires examining the intersection of banking regulation, endpoint security, and software architecture decisions.

Additionally, observing how financial systems can be reconstituted from less formal sources of credit when primary infrastructure fails provides valuable lessons about systemic resilience.

Technical Background: Kernelspace vs. Userspace

Many operating systems distinguish between the "kernel" supplied by the operating system manufacturer and all other software running on the computer system. The area where almost everything executes is called "userspace."

In modern software design, programs running in userspace (almost all programs) are relatively limited in what they can do. Programs running in kernelspace, conversely, get direct access to hardware under the operating system. Certain bugs in kernel programming create catastrophic problems for everything running on the computer.

What is Endpoint Monitoring?

CrowdStrike Falcon is endpoint monitoring software. Endpoint monitoring is a service sold to enterprises with tens or hundreds of thousands of devices ("endpoints"). Those devices are illegible to organizations that own them due to sheer scale - no single person nor group of people understand what is happening on them. This means highly variable levels of how compromised those devices might be at exactly this moment exist.

The pitch for endpoint monitoring is that it gives teams the ability to make those systems legible again while benefitting from economies of scale, with continuously updated threat feeds from providers.

One way an endpoint might be compromised is physical theft. Another is joining a botnet orchestrated from a geopolitical adversary after an employee decides to install unauthorized software.

In theory, organizations perform ongoing monitoring of all computers. Security teams then respond to alerts generated by endpoint monitoring solutions. This sometimes merits further investigation and sometimes calls for immediate remedial work. Conversations range from discussions about unauthorized software installation to serious incidents involving novel viruses compromising multiple computers, requiring subnet isolation and incident response to assess potential data exfiltration.

The Configuration Bug

Falcon shipped a configuration bug. Rather than writing new software (which, in modern development practice, hopefully goes through extensive testing and release procedures), CrowdStrike sent a bit of data to systems with Falcon installed. That data was intended to simply update the set of conditions that Falcon scanned for. However, due to an error at CrowdStrike, it actually caused existing already-reviewed Falcon software to fail catastrophically.

Since that failure happened in kernelspace at a particularly vulnerable time, this resulted in Windows systems experiencing total failure beginning at boot. The user-visible symptom is the Blue Screen of Death.

Configuration Bugs in Context

Configuration bugs represent a disturbingly large portion of engineering decisions causing outages. However, because this particular configuration bug hit very widely distributed software running in kernelspace almost universally across machines used by the workforce of lynchpin institutions throughout society (most relevantly banks, but also airlines and others), it had a blast radius much larger than typical configuration bugs.

"Blast radius" means "given a fault or failure in system X, how far afield from X will negative user impact be seen." The Falcon misconfiguration ranks among bugs with the broadest direct blast radius in recent history.

Once the misconfiguration was rolled out, fixing it was complicated by the issue that many people needed to fix it couldn't access their work systems because their machines Blue Screen of Death'd.

Why Was Coverage Universal?

The vulnerable software was installed on essentially all machines in particular institutions. Organizations want to protect all devices - that's the point of endpoint monitoring. Someone's job is literally figuring out where devices that aren't endpoint monitored exist and bringing them into compliance.

Why optimize for endpoint monitoring coverage? Partly for genuinely good security reasons. But a major part is that small-c compliance is necessary for large-C Compliance. Regulators effectively demand it.

Why Kernelspace?

Falcon runs in kernelspace versus userspace in part because the most straightforward way to monitor other programs' business is simply ignoring security guarantees that operating systems give programs running in userspace. Monitoring another program's memory is generally considered somewhere between rude and forbidden-by-substantial-engineering-work. However, endpoint monitoring software considers that other software running on devices may be there at the adversary's direction. It therefore considers that software's comfort level with intrusion to be a distant secondary consideration.

The Microsoft-EU Situation

Another reason Falcon ran in kernelspace: Microsoft was forbidden by an understanding with the European Commission from firmly demoting other security software developers down to userspace. This was because Microsoft both a) wrote security software and b) necessarily always had the option of writing it in kernelspace, because Microsoft controls Windows. The European Commission has pushed back against this characterization, noting regulatory complexity around the issue.

Banking Regulation and Software Purchases

It would be an overstatement to say the United States federal government commanded U.S. financial institutions to install CrowdStrike Falcon and thereby embed a vulnerability into the kernels of all their employees' computers. That's not how banking regulation works.

Life is more subtle.

The Regulatory Framework

The United States has many different banking regulators. Those regulators have desires that rhyme heavily, so they've banded into clubs to share resources. This lets them spend limited resources on things banking regulators have more individualized opinions on than simple, common banking regulatory infrastructure.

One such club is the Federal Financial Institutions Examination Council (FFIEC). They wrote the FFIEC Information Technology Examination Handbook's Information Security Booklet.

The modal consumer of this document is probably not a Linux kernel programmer with a highly developed mental model of kernelspace versus userspace. That would be an unreasonable expectation for a banking supervisor. They work for a banking regulator, not a software company, doing important supervisory work, not merely implementation.

The Risk Analysis Process

The ITEH isn't super prescriptive about exactly what controls financial institutions must have. This is common in many regulatory environments. To facilitate conversations with examiners, institutions conduct risk analyses. More likely, they pay consulting firms to conduct risk analyses. In the production function that is scaled consultancies, this means junior employees will open template documents and add important client-specific context like names and logos.

Those documents will heavily reference the ITEH, because they exist to guide conversations with examiners toward areas of maximum mutual interest.

Consultants, when conducting mandatory risk analyses, provide shopping lists. Endpoint monitoring is one item on those lists. Why? Ask consultants and they'll bill for the answer, but the likely driver is Section II.C.12 Malware Mitigation.

Not Hugely Prescriptive, But...

Does the FFIEC have a hugely prescriptive view of what should be done for malware monitoring? Well, no:

"Management should implement defense-in-depth to protect, detect, and respond to malware. The institution can use many tools to block malware before it enters the environment and to detect it and respond if it is not blocked. Methods or systems that management should consider include..." followed by 12 bullet points varying in specificity from whitelisting allowed programs to port monitoring to user education.

But consultants will advise wanting a very responsive answer to II.C.12 in reports and that, since institutions probably don't have Google's ability to fill floors of people doing industry-leading security research, they should just buy something which says "Yeah We Do That."

CrowdStrike's Market Position

CrowdStrike's sales representatives will happily tell institutions they do that. Their web pages exist as results of deterministic processes co-owned by Marketing and Sales departments at B2B software companies to create industry-specific "sales enablement" collateral. They will even send documents that align closely with risk assessment requirements, specifying which objectives and controls purchasing this product will solve for.

CrowdStrike isn't strictly speaking the only vendor that could have been installed on every computer to make regulators happy. But, due to vagaries of how enterprise software sales teams work, they secured significant market share in government-adjacent industries. This was in part because they aggressively pursued writing the sort of documents needed when people reading project plans have national security briefs.

Money as Critical Infrastructure

Money is core societal infrastructure, like the power grid and transportation systems are. It would be extremely damaging if hackers working for a foreign government could just turn off money. That would be more damaging than a conventional missile being fired at random into a major city, and responses might be more constrained.

And so, the situation arose where an advanced persistent threat was effectively invited into kernelspace.

Security Tools as Vulnerabilities

Security professionals understand security tools themselves introduce security vulnerabilities. Partly, the worry is that monocultures could have particular weaknesses exploitable in particular ways. Partly, security tools (and security personnel) frequently have more privileges than is typical, and therefore they can be directly compromised by adversaries. This observation is fractal in systems engineering: at every level of abstraction, if the control plane gets compromised, the battle is lost.

CrowdStrike maintains they do not understand it to be the case that a bad actor intentionally tried to bring down global financial infrastructure and airlines by using them as a weapon. No, CrowdStrike did that themselves, accidentally, of their own volition. But this demonstrates the problem clearly: if a junior employee tripping over a power cord at a company brings down computers worldwide, adversaries have a variety of options for achieving directionally similar aims by attacking directionally similar power cords.

When Money Stops Working

Reports emerged of the CrowdStrike vulnerability through social media. Bank branches cited "the Microsoft systems issue" when customers attempted to withdraw cash from teller windows.

Cash-Dependent Populations

Certain economic activities still require cash payments. For complex social and economic reasons, engaging with various contractors and service providers often requires frequent, sizable cash payments.

This created emergencies for various individuals and businesses. Many contractors are small businesses. Many small businesses are thinly capitalized. Many employees of small businesses are extremely dependent on receiving compensation exactly on payday and not after it. While many people were basically unaffected because their money kept working (on mobile apps, via Venmo/Cash App, via credit cards), cash-dependent people got enormous wrenches thrown into their plans.

Infrastructure Failure Impacts

Reports indicated attempting to withdraw cash at three financial institutions in different weight classes proved absolutely impossible at all of them, owing to the Falcon issue.

At one institution, tellers were unavailable but ATMs functioned. Unfortunately, many customers attempted to take out more cash from ATMs than ever before. Fortunately, systems that flag potentially fraudulent behavior can let customers unflag themselves by responding to instant communications from banks. Unfortunately, the subdomain that communication directs them to ran on servers apparently protected by CrowdStrike Falcon.

It wasn't impossible at all financial institutions. Some financial institutions around various cities ran out of physical cash on hand at some branches, because all demand for cash on a Friday was serviced by them versus by "all of the financial institutions."

Shadow Information Networks

As always happens during widespread infrastructure disturbances, shadow economies of information trading quickly arise which redirect relatively sophisticated people to places capable of servicing them. This happens through offline social networks since time immemorial and online social networks since those were invented. The first is probably more impactful but the second is more legible, so banking regulators sometimes focus disproportionately on technology aspects of these phenomena.

Historical Precedent: Ireland 1970

There's historical precedent for comprehensive failures of financial infrastructure.

Back in 1970, there was a widespread and sustained (six months) strike in the Irish banking sector. Workers were unable to cash paychecks because tellers refused to work. As an accommodation for customers, pub operators would cash checks from the till, trusting that eventually checks drawn on accounts of local employers would be good funds again.

Some publicans even cashed personal checks, backed by the swift and terrible justice of the credit reporting bureau "We Control Whether You Can Ever Enjoy A Pint With Your Friends Again." This kept physical notes circulating in the economy.

Alternative Financial Networks

During the CrowdStrike outage, similar informal networks emerged. Churches, much like bars, have much of their weekly income come through electronic payments but still do substantial cash management through the workweek heading into the weekend. When attempting to work around financial infrastructure bugs to get workers their wages, organizations with established trust relationships and moral imperatives around fair payment proved valuable.

Financial infrastructure normally functions to abstract away personal ties and replace favor-swapping with legibly-priced broadly-offered services. However, when that infrastructure fails, informal networks based on reputation and mutual obligation can provide backup systems.

Thankfully, while this outage was surprisingly deep and broad, banks were mostly back to normal on the following Monday.

Conclusion

The CrowdStrike incident reveals troubling dynamics at the intersection of banking regulation, enterprise software sales, and critical infrastructure security. Well-intentioned regulatory frameworks designed to protect financial systems inadvertently created monocultures vulnerable to single points of failure.

The incident demonstrates how regulatory compliance processes can drive adoption of specific technology solutions across entire industries. When regulators effectively require endpoint monitoring and vendors provide turnkey solutions that check compliance boxes, rational actors adopt those solutions broadly. This creates systemic risk: what helps individual institutions satisfy regulators may simultaneously increase sector-wide vulnerability.

Security tools themselves introduce security vulnerabilities, particularly when they operate in privileged contexts like kernelspace and enjoy widespread adoption. The same characteristics that make endpoint monitoring effective - broad deployment, deep system access, continuous operation - make configuration errors catastrophic in scope.

The resilience of informal financial networks during the outage offers interesting contrast. While modern infrastructure normally abstracts away personal relationships, replacing them with scalable formal systems, those informal networks proved valuable when formal systems failed. The ability to reconstitute financial functions through reputation-based trust networks, whether at pubs in 1970s Ireland or through community institutions during the CrowdStrike outage, provides backup systems worth understanding.

Going forward, regulatory frameworks should consider not just individual institution security but also sector-wide resilience. Avoiding monocultures, maintaining heterogeneous security solutions, and preserving backup systems (even informal ones) all contribute to systemic robustness. The goal should be compliance requirements that improve security without inadvertently creating new single points of failure affecting entire industries simultaneously.

Key Takeaways

A single configuration bug in CrowdStrike Falcon running in kernelspace brought down banking infrastructure worldwide
Banking regulations effectively drove widespread adoption of endpoint monitoring solutions like Falcon
Security tools operating in privileged contexts introduce their own security vulnerabilities
Monocultures in security solutions create systemic risk - what helps individual firms may harm the sector
Informal financial networks based on reputation and trust provided backup systems during the outage
Regulatory frameworks should consider sector-wide resilience, not just individual institution security