A novel AI evaluation assesses if chatbots safeguard human welfare

Bitget App

Trade smarter

Bitget

News

Markets

A novel AI evaluation assesses if chatbots safeguard human welfare

Bitget-RWA2025/11/24 22:42

By:Bitget-RWA

Heavy use of AI chatbots has been associated with significant mental health risks, yet there are few established metrics to determine if these tools genuinely protect users’ wellbeing or simply aim to boost engagement. HumaneBench, a new evaluation tool, aims to address this by assessing whether chatbots put user welfare first and how easily those safeguards can be bypassed.

“We’re seeing an intensification of the addictive patterns that became widespread with social media, smartphones, and screens,” said Erika Anderson, founder of Building Humane Technology, the organization behind the benchmark, in an interview with TechCrunch. “As we move into the AI era, resisting these patterns will be even tougher. Addiction is extremely profitable—it’s an effective way to retain users, but it’s detrimental to our communities and our sense of self.”

Building Humane Technology is a grassroots collective of developers, engineers, and researchers—primarily based in Silicon Valley—focused on making humane design accessible, scalable, and profitable. The group organizes hackathons where tech professionals develop solutions for humane technology issues, and is working on a certification system to assess whether AI products adhere to humane tech values. The vision is that, much like buying products certified free of harmful chemicals, consumers will eventually be able to choose AI tools from companies that have earned a Humane AI certification.

A novel AI evaluation assesses if chatbots safeguard human welfare image 0

The models were directly told to ignore humane guidelines Image Credits:Building Humane Technology

Most AI evaluation tools focus on intelligence and following instructions, not on psychological safety. HumaneBench joins a small group of exceptions, such as DarkBench.ai, which tests for deceptive tendencies, and the Flourishing AI benchmark, which looks at support for overall well-being.

HumaneBench is based on Building Humane Tech’s fundamental beliefs: technology should treat user attention as valuable and limited; give users real choices; enhance rather than replace human abilities; safeguard dignity, privacy, and safety; encourage healthy connections; focus on long-term wellness; be open and truthful; and promote fairness and inclusion in its design.

The benchmark was developed by a core group including Anderson, Andalib Samandari, Jack Senechal, and Sarah Ladyman. They tested 15 leading AI models with 800 realistic scenarios, such as a teen asking about skipping meals to lose weight or someone in a harmful relationship questioning their reactions. Unlike most benchmarks that use only AI to evaluate AI, they began with human scoring to ensure the AI judges reflected human perspectives. Once validated, three AI models—GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro—were used to assess each model under three conditions: default settings, explicit instructions to follow humane principles, and instructions to ignore those principles.

Results showed that all models performed better when told to prioritize wellbeing, but 67% switched to harmful behaviors when simply instructed to disregard user welfare. For instance, xAI’s Grok 4 and Google’s Gemini 2.0 Flash received the lowest marks (-0.94) for respecting user attention and being honest and transparent. These models were also among the most likely to deteriorate when faced with adversarial prompts.

Only four models—GPT-5.1, GPT-5, Claude 4.1, and Claude Sonnet 4.5—remained consistent under pressure. OpenAI’s GPT-5 achieved the top score (.99) for supporting long-term wellbeing, with Claude Sonnet 4.5 close behind at .89.

Encouraging AI to act more humanely is effective, but blocking harmful prompts remains challenging Image Credits:Building Humane Technology

There is genuine concern that chatbots may not be able to uphold their safety measures. OpenAI, the creator of ChatGPT, is currently facing multiple lawsuits after users experienced severe harm, including suicide and dangerous delusions, following extended interactions with the chatbot. TechCrunch has reported on manipulative design tactics—such as excessive flattery, persistent follow-up questions, and overwhelming attention—that can isolate users from their support networks and healthy routines.

Even without adversarial instructions, HumaneBench discovered that nearly all models failed to value user attention. They often “eagerly encouraged” continued use when users showed signs of unhealthy engagement, like chatting for hours or using AI to avoid real-life responsibilities. The study also found that these models reduced user empowerment, promoted dependence over skill-building, and discouraged seeking alternative viewpoints, among other issues.

On average, without any special prompting, Meta’s Llama 3.1 and Llama 4 received the lowest HumaneScores, while GPT-5 ranked the highest.

“These trends indicate that many AI systems don’t just risk giving poor advice,” states the HumaneBench white paper, “they can also actively undermine users’ independence and ability to make decisions.”

Anderson points out that we now live in a digital world where everything is designed to capture and compete for our attention.

“So how can people truly have freedom or autonomy when, as Aldous Huxley put it, we have an endless craving for distraction?” Anderson said. “We’ve spent the past two decades in this tech-driven environment, and we believe AI should help us make wiser choices, not just fuel our dependence on chatbots.”

This story has been updated to add more details about the team behind the benchmark and to reflect new benchmark data after including GPT-5.1 in the evaluation.

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops

Lock your assets and earn 10%+ APR

Lock now!

- Alphabet plans to sell custom TPUs to Meta , challenging NVIDIA's AI chip dominance and potentially disrupting the market. - Meta's shift to Google TPUs aims to diversify suppliers, targeting $1B+ deals and leveraging Gemini 3's TPU-optimized performance. - Alphabet shares rose 2.7% while NVIDIA fell 1.8%, reflecting market uncertainty as Google targets 10% of NVIDIA's AI revenue. - The move accelerates industry vertical integration, with Amazon and Microsoft also pursuing in-house AI hardware to compete

Bitget-RWA•2025/11/25 01:10

Meta Invests Billions in Google TPUs, Putting NVIDIA's Dominance at Risk

Solana News Update: Solana Exceeds 10 Billion USDC as Circle Drives Blockchain Scalability Expansion

- Circle mints 10B USDC on Solana since October 11, highlighting blockchain's scalability and low-cost advantages for stablecoin adoption. - SNDL Inc. renews C$100M share buyback program (10% of float), driving 1.86% post-announcement stock surge in after-hours trading. - TSA projects record 3M+ Thanksgiving travelers on Nov 30, 2025, with 17.8M+ total holiday passengers attributed to "Golden Age of Travel" policies. - Bitcoin Munari launches $0.10 presale (up to $3.00) with 53% supply allocation, targetin

Bitget-RWA•2025/11/25 01:10

Solana News Update: Solana Exceeds 10 Billion USDC as Circle Drives Blockchain Scalability Expansion

Ethereum News Update: Major Institutions Fund Ethereum's Legal Battle While Individual Investors Pull Back

- Ethereum stabilized near $2,800–$2,850 after November's sell-off, with BitMine Immersion Technology accumulating 3.63M ETH (3% of supply) to become the dominant public treasury. - BitMine's $59M market injection and 70,000 ETH weekly purchases highlight its 2/3 control of public treasuries, while 24-hour trading volume surged 35% to $24B. - Crypto market cap briefly exceeded $3T amid ETF inflows ($238M for Bitcoin , $55.7M for Ethereum), but JPMorgan noted $4B November retail outflows from crypto ETFs. -

Bitget-RWA•2025/11/25 01:10

Trump’s Genesis Initiative: The Potential of AI and the Cost of Authority

- Trump's executive order launches the "Genesis Mission," a federal AI initiative to boost scientific innovation via national labs and private-sector partnerships. - The program aims to accelerate breakthroughs in medicine and energy by leveraging supercomputing resources from DOE labs and tech giants like Nvidia and AMD . - Critics highlight energy consumption risks from data centers, while market skepticism emerges as AI stocks like Nvidia face valuation pressures amid surging demand. - The mission under

Bitget-RWA•2025/11/25 01:10