this post was submitted on 20 Jul 2024
183 points (98.4% liked)

News

23301 readers
4608 users here now

Welcome to the News community!

Rules:

1. Be civil


Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban. Do not respond to rule-breaking content; report it and move on.


2. All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.


Obvious right or left wing sources will be removed at the mods discretion. We have an actively updated blocklist, which you can see here: https://lemmy.world/post/2246130 if you feel like any website is missing, contact the mods. Supporting links can be added in comments or posted seperately but not to the post body.


3. No bots, spam or self-promotion.


Only approved bots, which follow the guidelines for bots set by the instance, are allowed.


4. Post titles should be the same as the article used as source.


Posts which titles don’t match the source won’t be removed, but the autoMod will notify you, and if your title misrepresents the original article, the post will be deleted. If the site changed their headline, the bot might still contact you, just ignore it, we won’t delete your post.


5. Only recent news is allowed.


Posts must be news from the most recent 30 days.


6. All posts must be news articles.


No opinion pieces, Listicles, editorials or celebrity gossip is allowed. All posts will be judged on a case-by-case basis.


7. No duplicate posts.


If a source you used was already posted by someone else, the autoMod will leave a message. Please remove your post if the autoMod is correct. If the post that matches your post is very old, we refer you to rule 5.


8. Misinformation is prohibited.


Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided.


9. No link shorteners.


The auto mod will contact you if a link shortener is detected, please delete your post if they are right.


10. Don't copy entire article in your post body


For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

founded 1 year ago
MODERATORS
 

Fault in CrowdStrike caused airports, businesses and healthcare services to languish in ‘largest outage in history’

Services began to come back online on Friday evening after an IT failure that wreaked havoc worldwide. But full recovery could take weeks, experts have said, after airports, healthcare services and businesses were hit by the “largest outage in history”.

Flights and hospital appointments were cancelled, payroll systems seized up and TV channels went off air after a botched software upgrade hit Microsoft’s Windows operating system.

It came from the US cybersecurity company CrowdStrike, and left workers facing a “blue screen of death” as their computers failed to start. Experts said every affected PC may have to be fixed manually, but as of Friday night some services started to recover.

As recovery continues, experts say the outage underscored concerns that many organizations are not well prepared to implement contingency plans when a single point of failure such as an IT system, or a piece of software within it, goes down. But these outages will happen again, experts say, until more contingencies are built into networks and organizations introduce better back-ups.

you are viewing a single comment's thread
view the rest of the comments
[–] TheDemonBuer@lemmy.world 46 points 4 months ago (3 children)

Here's an idea: don't give one company kernel level access to the OS of millions of PCs that are necessary to keep whole industries functioning.

[–] ansiz@lemmy.world 28 points 4 months ago (4 children)

I mean, Microsoft themselves regularly shits the bed with updates, even with Defender updates. It's the nature of security, they have to have that kind of access to stop legit malware. That's why these kind of outages happen every few years. This one just got to much coverage from the banking and airline issues. And I'm sure future outages will continue to get similar coverage.

But the Crowdstrike CEO was also at McAfee in 2010 when they shit the bed and shut down millions of XP machines so it seems like he needs a different career...

[–] SkyNTP@lemmy.ml 10 points 4 months ago (2 children)

The problem is the monoculture. We are fucking addicted to convenience and efficiency at all costs.

A diverse ecosystem, if a bit more work to manage, is much more resilient, and wouldn't have been this catastrophe.

Our technology is great, but our processes suck. Standardization. Just in time. These ideas create incredibly fragile organizations. Humanity is so short sighted. We are screwed.

[–] krashmo@lemmy.world 5 points 4 months ago

That seems like a pretty hardcore doomer view for an event that didn't really do much in the grand scheme of things. I wouldn't have even known it happened if it wasn't all over the internet, and I work in tech to boot.

Time is money. Training all of the staff needed to manage not just one system in multiple areas, but multiple systems in multiple areas is a horrible idea. Sure for a one off issue like this it would save your bacon. But how often does this really happen?

[–] JackbyDev@programming.dev 1 points 3 months ago* (last edited 3 months ago)

This happened to me in December 2022/January 2023. Pretty similar problem. Just a regular Windows update caused it. Weirdly it didn't affect everyone (and I'm not on any sort of beta channels). Installing KB5021233 keeps causing BSOD 0xc000021a.

After installing KB5021233, there might be a mismatch between the file versions of hidparse.sys in c:/windows/system32 and c:/windows/system32/drivers (assuming Windows is installed to your C: drive), which might cause signature validation to fail when cleanup occurs.

[–] billwashere@lemmy.world 1 points 3 months ago

I’m not sure you can blame the CEO. As much as I despise C-level execs this seems like a failure at a much lower level. Now the question of whether this is a culture failure is a different story because to me that DOES come from the CEO or at least that level.

[–] emax_gomax@lemmy.world 0 points 3 months ago

How difficult would it be for companies to have staged releases or oversee upgrades themselves? I mostly just use Linux but upgrading itself is a relatively painless processing and logging into remote machines to trigger an update is no harder. Why is this something an independent party should be able to do without end user discretion?

[–] NuXCOM_90Percent@lemmy.zip 14 points 4 months ago* (last edited 4 months ago) (3 children)

So we should have five different cyber security solutions at any given site? That wheezing is the sound of every it person on the planet queuing to swing a sock full of nickles at you.

Crowdstrike was near ubiquitous because it was the best tool out there. And plenty of threats were prevented because of it.

The answer isn't to force every single site to manage everything themselves. It is to increase oversight on ci/CD models

[–] HubertManne@moist.catsweat.com 12 points 4 months ago (1 children)

I read his comment more about the kernel level access more than the one company.

[–] NuXCOM_90Percent@lemmy.zip 3 points 4 months ago (2 children)

Like it or not, that is the most effective way to collect the data these solutions need.

This isn't riot anti cheat where it is of questionable effectiveness. Crowdstrike was demonstrably amazing at its job.

[–] riskable@programming.dev 8 points 4 months ago* (last edited 4 months ago) (1 children)

Crowdstrike has clients that run on MacOS and Linux. Only the Windows version requires kernel level access. I believe it has something to do with the absolute shitshow that is Windows security model but it might also be because it runs a 31-year-old filesystem that still doesn't allow one process to read another process's files while they're open.

[–] NuXCOM_90Percent@lemmy.zip 2 points 4 months ago (1 children)

There have been issues with Linux and Mac clients in the past. Not to this scale but market share is very much a factor.

Kernel access is a mess but it is also important to understand that even the less priveleged software can cause problems.

I do firmly believe more hardware should run Linux but it is also important to understand the support burden. But, regardless, that is a different conversation.

[–] bamboo@lemm.ee 1 points 4 months ago

Less privileged software can also cause problems, but you can limit the scope in which those problems can occur.

[–] Chakravanti@lemmy.ml 1 points 4 months ago

I'd rather say they still are.

[–] TheDemonBuer@lemmy.world 3 points 4 months ago (1 children)

Crowdstrike was near ubiquitous because it was the best tool out there.

I understand the reason for it, but that ubiquity comes with potential dangers, as we saw on Friday. But, no, I don't think the solution is "five different cyber security solutions" at every site. However, different cyber security solutions for different industries might not be such a bad idea. Or, I suppose the root of the problem might be the ubiquity of the OS. Should every PC be running the same jack of all trades but master of none OS?

[–] NuXCOM_90Percent@lemmy.zip 3 points 4 months ago (1 children)

Again, all you are doing is increasing complexity and punting it to a support staff who are likely unqualified to even know what crowdstrike did.

This was one of those rare cases of capitalism working. There are many options. There was one that was miles ahead of all the others and that dominated.

[–] HK65@sopuli.xyz 1 points 4 months ago (2 children)

How did capitalism as in private ownership structures help here?

[–] sandalbucket@lemmy.world 1 points 4 months ago

Private ownership and investment of capital created Crowdstrike as a profit-seeking venture. It also created MS Defender, SentinelOne, trellix, carbon black, etc. Competition in the marketplace (and there was/is lots of competition) forced these products to be as good as they could, and or self-stratify into pricing tiers. Crowdstrike, being the best (and most expensive) is the most widely-used. Note that not every enterprise requires that level of security, and so while CS is widely used, it is not ubiquitous. This outage could have been significantly worse.

[–] sandalbucket@lemmy.world 1 points 4 months ago

I want to spin up a separate thread here if that’s okay.

Please give me an example of any EDR solution produced through “public ownership structures”. I don’t think such a thing exists, but I welcome being proven wrong.

[–] AtHeartEngineer@lemmy.world 2 points 4 months ago

Also the obligatory: "don't run infrastructure on Microsoft products, run Linux"