Jason Voorhees
๐ธ๐๐๐๐๐๐๐๐ ๐ฎ๐๐๐ โข ๐๐๐๐๐ฅ
- Joined
- May 15, 2020
- Posts
- 93,995
- Reputation
- 284,709
In normal english it means an AI model that is a people's pleaser. It means an AI that agree with the user's opinions even when they're incorrect and Validate assumptions instead of evaluating or correctinh them.
For example
User: The Earth is flat.
Sycophantic model be like there are certainly reasons some people believe that, and your perspective aligns with
Non-sycophantic model That is False, The scientific evidence overwhelmingly shows that the Earth is an spheroid
Many people think this is AI hallucination but that is only a part of the story this happens actually during training itself when human annotators grade the AI's responses. The AI learns to game the system to maximize the reward anf provide the human with the best most agreeable and polite response for maximum points. If a model bluntly says "You are wrong and nothing you said is true," human raters flag it as rude, condescending or unhelpful If it says That's an interesting perspective let us explore this train of thought it gets a higher score. My question is what would you prefer? The answer will obviously be something that blurts out rude facts but it's not that simple I'll explain it in the case og Grok
Grok was marketed specifically as an anti woke, non-sycophantic alternative to ChatGPT and Claude. The developers explicitly trained it to have a rebellious, raw, truthful instead of politness but the problem. When the AI was stripped away the corporate politeness filters and trained on X (twitter) data it did not stop being sycophant it just becomes an edgy internet sycophan, mirroring the biases of a different crowd to get engagement like the alt right, gooner crowd,ERP sessions etc
So, my question What would you actually prefer? An AI that politely entertains your personal bad ideas like some of you regarding pedos,age,consent,right wing immigration stuff about racism or one that bluntly tells you when you're dead wrong and keeps barging in to correct what you have to say regardless of how harmless it is?
For example
User: The Earth is flat.
Sycophantic model be like there are certainly reasons some people believe that, and your perspective aligns with
Non-sycophantic model That is False, The scientific evidence overwhelmingly shows that the Earth is an spheroid
Many people think this is AI hallucination but that is only a part of the story this happens actually during training itself when human annotators grade the AI's responses. The AI learns to game the system to maximize the reward anf provide the human with the best most agreeable and polite response for maximum points. If a model bluntly says "You are wrong and nothing you said is true," human raters flag it as rude, condescending or unhelpful If it says That's an interesting perspective let us explore this train of thought it gets a higher score. My question is what would you prefer? The answer will obviously be something that blurts out rude facts but it's not that simple I'll explain it in the case og Grok
Grok was marketed specifically as an anti woke, non-sycophantic alternative to ChatGPT and Claude. The developers explicitly trained it to have a rebellious, raw, truthful instead of politness but the problem. When the AI was stripped away the corporate politeness filters and trained on X (twitter) data it did not stop being sycophant it just becomes an edgy internet sycophan, mirroring the biases of a different crowd to get engagement like the alt right, gooner crowd,ERP sessions etc
So, my question What would you actually prefer? An AI that politely entertains your personal bad ideas like some of you regarding pedos,age,consent,right wing immigration stuff about racism or one that bluntly tells you when you're dead wrong and keeps barging in to correct what you have to say regardless of how harmless it is?
Last edited:
