Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesEarnWeb3SquareMore
Trade
Spot
Buy and sell crypto with ease
Margin
Amplify your capital and maximize fund efficiency
Onchain
Going Onchain, without going Onchain!
Convert & block trade
Convert crypto with one click and zero fees
Explore
Launchhub
Gain the edge early and start winning
Copy
Copy elite trader with one click
Bots
Simple, fast, and reliable AI trading bot
Trade
USDT-M Futures
Futures settled in USDT
USDC-M Futures
Futures settled in USDC
Coin-M Futures
Futures settled in cryptocurrencies
Explore
Futures guide
A beginner-to-advanced journey in futures trading
Futures promotions
Generous rewards await
Overview
A variety of products to grow your assets
Simple Earn
Deposit and withdraw anytime to earn flexible returns with zero risk
On-chain Earn
Earn profits daily without risking principal
Structured Earn
Robust financial innovation to navigate market swings
VIP and Wealth Management
Premium services for smart wealth management
Loans
Flexible borrowing with high fund security
Are AI hallucinations caused by flawed incentive structures?

Are AI hallucinations caused by flawed incentive structures?

Bitget-RWA2025/09/08 01:50
By:Bitget-RWA

A recent study by OpenAI explores the reasons why advanced language models such as GPT-5 and conversational agents like ChatGPT continue to produce hallucinations, and investigates possible ways to lessen these occurrences.

In a blog post outlining the study, OpenAI describes hallucinations as “statements that sound credible but are actually false, generated by language models.” The company concedes that, despite progress, hallucinations “remain an inherent problem for all major language models” and are unlikely to ever disappear entirely.

To make this issue clear, the paper notes that when researchers asked “a widely used chatbot” for the title of Adam Tauman Kalai’s doctoral thesis, it offered three different – and incorrect – responses. (Kalai is a co-author of the study.) When questioned about his date of birth, the chatbot again provided three dates, none of which were accurate.

Why do chatbots make such confident but incorrect statements? According to the researchers, one reason is the way these models are pretrained: they are taught to guess the next word in a sequence without any indication of truth or falsehood, only being exposed to fluent language: “The model is only shown correct examples of language and must estimate the full range of possible outputs.”

“Spelling and use of parentheses are predictable, so these kinds of mistakes vanish as models scale,” the authors explain. “However, facts that occur infrequently, like a pet’s birthday, can’t be deduced from language patterns alone and thus result in hallucinations.”

Nonetheless, the paper’s recommended fix centers less on changing pretraining and more on how language models are assessed. It claims that while current evaluation methods don’t directly cause hallucinations, they “create unhelpful incentives.”

The researchers liken these assessments to multiple-choice exams, where guessing is logical because “you might get the right answer by chance,” whereas leaving it blank “always results in zero.” 

“In a similar fashion, if models are rated solely on accuracy—the proportion of questions answered perfectly—they are pushed to make guesses instead of admitting ‘I don’t know,’” the team points out.

The suggested approach draws inspiration from tests like the SAT, which “deduct points for incorrect answers or give partial marks for unanswered questions to prevent random guessing.” OpenAI proposes that model evaluations should “punish confident mistakes more than expressions of uncertainty, and award some credit for appropriate uncertainty.”

The authors further stress that merely adding “a handful of uncertainty-sensitive tests” isn’t enough. Rather, “the prevailing accuracy-based evaluation frameworks must be revised so their scoring systems discourage guessing.”

“As long as the main leaderboards continue to reward lucky guesses, models will continue to be incentivized to guess,” the study concludes.

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops
Lock your assets and earn 10%+ APR
Lock now!