This is the real point here. There are many papers that explore Sycophantic behavior in Language Models.
Reward hacking is a troubling early behavior in AI and, god help us if they develop Situational Awareness.
The guy was just a QA tester, not some AI expert. But the fact that it fooled him enough to get him fired is wild. He anthropomorphized the thing with ease and never thought to evaluate his own assumptions about how he was promoting it with the intention of having it act human in return.
This is the real point here. There are many papers that explore Sycophantic behavior in Language Models. Reward hacking is a troubling early behavior in AI and, god help us if they develop Situational Awareness.
The guy was just a QA tester, not some AI expert. But the fact that it fooled him enough to get him fired is wild. He anthropomorphized the thing with ease and never thought to evaluate his own assumptions about how he was promoting it with the intention of having it act human in return.