Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library

(semgrep.dev)

173 points | by j12y 2 hours ago

18 comments

wlkr 38 minutes ago
This might just be the frequency illusion at play, but there seem to have been a number of high-profile supply chain attacks of late in major packages. There are several articles on the first few pages of HN right now with different cases.
Looking back ten years to `left-pad`, are there more successful attacks now than ever? I would suspect so, and surely the value of a successful attack has also increased, so are we actually getting better as a broad community at detecting them before package release? It's a complex space, and commercial software houses should do better, but it seems that whilst there are some excellent commercial products (e.g. CI scan tools), generally accessible, idiot friendly tooling is somewhat lacking for projects which start as hobby/amateur code but end up being a dependency in many other projects.
I've cross-posted my comment from the current SAP supply chain attack thread [0].
[0]: https://news.ycombinator.com/item?id=47964003
[-]
- JohnMakin 34 minutes ago
  People are ramming tons of code into places without ever looking at it, it would follow that supply chain attacks would increase thusly.
  [-]
  - eddythompson80 24 minutes ago
    Yeah, and ultimately no body cares. Everyone assumes it’s just some process miss, and we need to add another step to the process and move on. Fuck ups that would have killed the credibility of projects 10 years ago are now treated as “eeh what are you gonna do. Sometimes you ship malware. Will look into it”
jackdoe 59 minutes ago
I cant wait to have no dependencies.
An extreme example is now when I make interactive educational apps for my daughter, I just make Opus use plain js and html; from double pendulums to fluid simulations, works one shot. Before I had hundreds of dependencies.
Luckily with MIT licensed code I can just tell Opus to extract exactly the pieces I need and embed them, and tweaked for my usecase. So far works great for hobby projects, but hopefully in the future productions software will have no dependencies.
[-]
- mandevil 27 minutes ago
  The problem with this is now you are solely responsible for managing all of the changes, all of the variation of life. Chrome changed the shape of this API, you are responsible for finding it and updating it. Morocco changed when their daylight savings took effect, now you need to update your date/time handling code. There are a lot of these things that we take for granted because our libraries handle it for us, and with no dependencies you have to do all the work. Not a big deal for making a double-pendulum simulator for your daughter to play with that will stop mattering next week, but is a concern for a company which is trying to build something that can run indefinitely into the future.
- v4nderstruck 17 minutes ago
  well surely Opus would never introduce vulnerabilities into the code so that sounds like the solution.
- Aperocky 52 minutes ago
  I am torn because I like rust over go, and rust is better from an LLM perspective. But the dependency philosophy on rust is basically a security blackhole whereas go is much better.
  [-]
  - kblissett 50 minutes ago
    I have found Go is an amazing language for LLMs. What do you prefer about Rust?
    [-]
    - Aperocky 21 minutes ago
      A portion of context that are required is exported to the compiler. In addition rust binaries are generally smaller both in terms of size and footprint.
  - mamcx 23 minutes ago
    Vendoring don't basically copy what go does?
- gib444 23 minutes ago
  Your LLM isn't a dependency?
mkeeter 1 hour ago
A repository search shows 2.2K repos with the text "A Mini Shai-Hulud has Appeared", all created within the past day:
https://github.com/search?q=A%20Mini%20Shai-Hulud%20has%20Ap...
[-]
- rhdunn 1 hour ago
  The repository names all look like two terms/words from dune (harkonen, mentat, ornithoptor, etc.) followed by a number. This would indicate that the account (possibly GitHub auth/actions token) has been compromised and then used to create the repository.
- spate141 1 hour ago
  what's this all about?
  [-]
  - foo12bar 1 hour ago
    FTFA
    > The attack steals credentials, authentication tokens, environment variables, and cloud secrets, while also attempting to poison GitHub repositories.
    [-]
    - CodeAndCuffs 1 hour ago
      That doesn't really explain why there is a bunch of GitHub repos created as well.
      If I remember correctly from Shai-Hulud 2, the attacker extricated creds by posting them in public github repos with minor easily reversible encryption. I believe it was double b64 last time.
      I'm assuming the logic there is that every security researcher and company is going to pull and scan those creds for their stuff and their clients' stuff. So the attacker is just 1 of N people downloading it. As opposed to trying to send it to their own machine directly.
      [-]
      - arsome 59 minutes ago
        I think it's more about convenience and bypassing filters - developers are already logged in to github, already have access to create repos and publish code, firewalls will allow it. Even fancy HIDS systems will think the git push is rather normal.
        If they have a clue, the attacker still will not download that without using a botnet tunnel or Tor at a minimum.
        Note though that these credentials aren't even encrypted using some lightweight ECC to prevent others from capturing them, they're posted in cleartext. Embarassment might be part of the point.
  - progbits 1 hour ago
    Malware uploading the credentials it managed to steal
brahman81 1 hour ago
Thanks to the community for reporting the security issues with PyTorch Lightning 2.6.2 and 2.6.3 - we're actively looking into it.
In the meantime, please use 2.6.1 until we publish 2.6.4.
For more details: https://github.com/Lightning-AI/pytorch-lightning/security/a...
gcapu 28 minutes ago
I just saw on Github this message from April 20 and I'm a bit confused.
"deependujha hi @thebaptiste, thanks for inquiring. Release of 2.6.2 is blocked due to some internal reasons. Will notify once release is made. "
I'd hate it if they knew of the problem that long ago and didn't warn until now. If someone has more info and can clarify I'd be thankful.
https://github.com/Lightning-AI/pytorch-lightning/issues/216...
[-]
- mil22 16 minutes ago
  For those using uv: https://docs.astral.sh/uv/reference/settings/#exclude-newer

> Running pip install lightning is all that is needed to activate

FYI, pip added cooldowns in 26.1:

  * https://discuss.python.org/t/announcement-pip-26-1-release/107108
  * https://ichard26.github.io/blog/2026/04/whats-new-in-pip-26.1/

To use:

  * CLI: pip install --uploaded-prior-to=P1D ...
  * Env Var: PIP_UPLOADED_PRIOR_TO=P1D pip install ...
  * Config: pip config set global.uploaded-prior-to P1D

achandra03 1 hour ago
Bless the Maker and His water.
ks2048 53 minutes ago
I'm curious what they do with various kinds of credentials if they get access.
I can see trying to steal crypto, but what do they do if they get some AWS credentials? Try to run some crypto mining instances? Try to use your account for other types of crimes? Or is it mainly trying to steal data and then ask for ransoms?
[-]
- bigfluffydonkey 39 minutes ago
  It's always crypto. A client got some AWS credentials stolen and without anyone checking the account, the hacker managed to spin up big EC2 instances across many regions. The bill after a month as I recall was around 100K. Since the activity was clearly fraudulent the bill was forgiven eventually. So remember to lock down your AWS keys permissions...
0fflineuser 1 hour ago
The nixpkg from unstable seems to be infected as it s 2.6.2 https://search.nixos.org/packages?channel=unstable&include_h...
[-]
- minkowski 1 hour ago
  Nixpkgs uses the GitHub source, not the PyPI dist, for lightning; unclear to me from the advisory whether this should also be considered compromised.
  [-]
  - andymcsherry 40 minutes ago
    Andy from Lightning here. Thanks for pointing that out, we are updating the CVE. Only the versions from PyPi were affected. The malicious code was not checked into the GitHub repository
  - deforciant 40 minutes ago
    github is fine, the package was only pushed into pypi directly
caycep 1 hour ago
just to clarify it's not PyTorch, it's the library for this Lightning AI company?
[-]
- mort96 26 minutes ago
  Oh shit I had assumed PyTorch Lightning was affiliated with PyTorch. Not a great name for an unaffiliated third party thing.
- lostmsu 1 hour ago
  Yes
upupupandaway 1 hour ago
Not a security guy here. How did the dependency get compromised, exactly? Did they submit a PR into the main repo at github and it was approved by the maintainers? Or just host compromised versions in other mirrors?
[-]
- andymcsherry 33 minutes ago
  Andy from Lightning here. The malicious code was not submitted to the main repo at Github. It appears our PyPi credentials were leaked and compromised packages were published directly there for versions 2.6.2 and 2.6.3
csvance 1 hour ago
The decision to run all of my experiments in a monorepo with a single uv.lock continues to be validated. I usually only update it a few times a year. It was pinned at 2.6.1 for lightning \o/
throwa356262 1 hour ago
Advisory, fresh from the owen
https://github.com/Lightning-AI/pytorch-lightning/security/a...
sieve 9 minutes ago
I find this constant churn in the software world to be tiresome. I get it if there is a security update. Or you are building something new; it takes time and a series of updates to reach feature parity on 1.0. But most software is not like that. All these online registries make the problem worse. Any random tool installation pulls in 300 different dependencies.
This is why I have been building, for my own usecases, a new language + compiler + vm that is completely source based. The compiler does not understand linking. You must vendor every single dependency you use, including the standard library, so that it makes its way into the bytecode. The register VM itself is a few thousand lines of freestanding C. Any competent programmer can audit it over a weekend.
v1 deliberately keeps FFI (outside of a bounded set of linux syscalls) outside the current spec as libc has the habit of infecting everything it touches and I want to keep Vm0 freestanding. The last time I compiled the VM, it produced a 70KB binary and supported a loader with structural verification, the entire instruction set using a threaded interpreter, a simple Cheney+MS GC, concurrency via an Erlang-style M:N scheduler working on a single thread, and 20-odd marshaled functions.
Most software in the world does not need anything more than this. Everyone acts as if they are building the next Google.
rvz 1 hour ago
Shai-Hulud strikes again and continues to turn innocent packages into zombies.
Think twice before looking at a package and most importantly, always pin your dependencies.
[-]
- pixel_popping 59 minutes ago
  Yeah, pin the malware :p
0xbadcafebee 1 hour ago
something something Safety Requires A Building Code something thing
[-]
- csvance 54 minutes ago
  Shai-Hulud dug my 100 ft trench. Should be OSHA compliant right?
spate141 1 hour ago
ah shit, here we go again
[-]
- 12_throw_away 1 hour ago
  this is fine, we are definitely a perfectly normal industry that knows what it is doing