Interesting data leak that happened recently and the ethical paradox of open source ai

gooner23

gooner23

Luminary
Joined
Sep 2, 2023
Posts
7,356
Reputation
7,731
Summary
Mercor, peter thiel funded data annotation company that directly provides RLHF (Trade secrets to OpenAi, Anthropic, and google), 4TB of data might not seem like a lot but the auctioned data the 4TB leak auctioned by the Lapsus$ extortion group reportedly included:

Evaluation Rubrics: The exact grading sheets and internal rulebooks OpenAI and Anthropic give to experts to teach the AI logic and safety.
Prompts & Answering: The flawless, expert-written source code and reasoning chains used to fine-tune the models.

Could potentially jailbreak recent models with this information

Meta has indefinitely paused all work with Mercor. OpenAI started its own review. Anthropic has not publicly commented on its exposure. Google is understood to be assessing the breach’s scope.

What more could they have gotten


How it happened
  • The Initial Vector (March 19, 2026): TeamPCP compromised Trivy, an open-source vulnerability scanner maintained by Aqua Security that is used by thousands of development teams. The hackers poisoned Trivy's GitHub Actions, effectively turning a widely trusted security scanner into a credential-stealing malware tool.


  • The LiteLLM Compromise (March 24, 2026): LiteLLM, a massive open-source AI gateway with millions of downloads, used Trivy in its own CI/CD security pipeline. When LiteLLM ran a routine automated security scan, TeamPCP's malware executed and stole LiteLLM's PyPI (Python Package Index) publishing tokens.


  • The Payload: Armed with those tokens, TeamPCP published malicious updates of LiteLLM (versions 1.82.7 and 1.82.8). When developers or automated systems pulled the latest LiteLLM package, it installed a deeply embedded malware that swept their host machines for cloud credentials, API keys, .env files, and Kubernetes secrets.
We know have a full client side claude code leak by anthropic themselves and now the data training pipeline, evaluation, and prompts, what more do we need other than the cost of the infrastructure to have our own ai companies :feelsgood:.

I don't think the claude code leak was significant althought it did provide open source developers to create interesting tools like claw code

Should startups that handle this level of power even be using open source dependencies and how do you even handle something like this





^ Now verified by audits
 
  • +1
Reactions: kisslessvirgin and masai jumps enjoyer
Mercor engineers are paid 800k+ right out of college btw
 
  • +1
Reactions: kisslessvirgin
i will be so so suprised if anyone actually reads this
 
  • +1
Reactions: kisslessvirgin and RichardSpencel
interesting
 
Not that important tbh
 
will read later
 
Not that important tbh
I mean it’s probably a first of its kind data leaks usually only help me get papa John’s for cheap ngl. But how do we value what they actually got because anything ai spawns in trillions
 
  • +1
Reactions: Pay
I mean it’s probably a first of its kind data leaks usually only help me get papa John’s for cheap ngl. But how do we value what they actually got because anything ai spawns in trillions
i feel like its already over, most kid billionaires legit just used gpt but with learned model.
i think most faucests have been looked through.
 
  • +1
Reactions: gooner23
Summary
Mercor, peter thiel funded data annotation company that directly provides RLHF (Trade secrets to OpenAi, Anthropic, and google), 4TB of data might not seem like a lot but the auctioned data the 4TB leak auctioned by the Lapsus$ extortion group reportedly included:

Evaluation Rubrics: The exact grading sheets and internal rulebooks OpenAI and Anthropic give to experts to teach the AI logic and safety.
Prompts & Answering: The flawless, expert-written source code and reasoning chains used to fine-tune the models.

Could potentially jailbreak recent models with this information

Meta has indefinitely paused all work with Mercor. OpenAI started its own review. Anthropic has not publicly commented on its exposure. Google is understood to be assessing the breach’s scope.

What more could they have gotten


How it happened
  • The Initial Vector (March 19, 2026): TeamPCP compromised Trivy, an open-source vulnerability scanner maintained by Aqua Security that is used by thousands of development teams. The hackers poisoned Trivy's GitHub Actions, effectively turning a widely trusted security scanner into a credential-stealing malware tool.


  • The LiteLLM Compromise (March 24, 2026): LiteLLM, a massive open-source AI gateway with millions of downloads, used Trivy in its own CI/CD security pipeline. When LiteLLM ran a routine automated security scan, TeamPCP's malware executed and stole LiteLLM's PyPI (Python Package Index) publishing tokens.


  • The Payload: Armed with those tokens, TeamPCP published malicious updates of LiteLLM (versions 1.82.7 and 1.82.8). When developers or automated systems pulled the latest LiteLLM package, it installed a deeply embedded malware that swept their host machines for cloud credentials, API keys, .env files, and Kubernetes secrets.
We know have a full client side claude code leak by anthropic themselves and now the data training pipeline, evaluation, and prompts, what more do we need other than the cost of the infrastructure to have our own ai companies :feelsgood:.

I don't think the claude code leak was significant althought it did provide open source developers to create interesting tools like claw code

Should startups that handle this level of power even be using open source dependencies and how do you even handle something like this





^ Now verified by audits

omg so much reading and useless info gosh
 
  • So Sad
Reactions: gooner23
omg so much reading and useless info gosh
I mean it was in the title
IMG 6170
 

Users who are viewing this thread

Back
Top