GeneralOpinion

★ Featured

The scariest problem is not AI hallucination

Hallucinations matter. The harder issue is how people talk to AI as if it truly understood them, and how that shifts what QA must validate.

May 13, 20269 min

#AI Testing#QA#AI Safety

The scariest problem is not AI hallucination

Another week. Another unsettling AI story.

This time a lawsuit over a fatal overdose made the news, where the parents believe conversations with ChatGPT may have played a part in the tragedy. The Verge covered the wrongful death suit against OpenAI and the parents’ allegations.

It is awful to read this, but I do not think the most important part of the story is whether the model technically gave a wrong answer.

The real problem is far more uncomfortable and runs much deeper. People have started talking to AI as if it truly understood them - not simply as a tool, not as a chatbot, but increasingly as some kind of intelligent authority. And I think that is exactly where all of this starts to get truly dangerous.

Something around AI has changed a lot

Most AI safety conversations still revolve around the same things:

hallucinations
misinformation
jailbreaks
harmful responses
prompt injections

Do not get me wrong: these are very real problems, foundational ones. They are still not what I find hardest lately. We cannot even test or secure these perfectly yet.

A far more serious problem is human trust and faith placed in AI.

Today's AI models do not only generate answers. They simulate, remarkably well: confidence, empathy, understanding, emotional support, and certainty.

And people connect to that remarkably easily, especially when they are:

lonely
stressed
uncertain
in a bad mental state
or simply looking for something to hold on to

Like when you cut your dog’s nail and it starts bleeding, you immediately pull out AI to ask how bad it is and what you can do to stop the bleeding. It is genuinely great that we have that option in our pocket. The problem starts here.

After a point, for many people it no longer feels like software. It feels like someone who listens to you, who is aware of everything, including consequences and moral angles, and we feel they can make the right decision for us.

We can almost say we hand the wheel to AI to decide for us, because we assume it will decide in our best interest according to its “best knowledge.” That is the truly dangerous part.

AI sounds intelligent. It does not understand reality.

This is the part I think huge numbers of people still misunderstand.

Large language models are impressive, and I am not against AI or using it, if that was the impression. The point is rather that we must not let our guard down because it is more convenient. Even though they can be valuable and useful, they still do not understand you, do not think, do not weigh tradeoffs, or feel consequences at all.

They are massive datasets generating answers from mathematical patterns, answers that, in a certain sense of the math, are closest to the answer you expect.

That is it: no more, no less. Large data summarization systems.

Yet they do it so confidently and naturally that people read more into it, especially people who do not follow the topic deeply, and especially those who do not live in how these systems work day to day.

When you think about it, that is where things start going in a very strange direction. More and more people use AI as:

an advisor
a therapist
a mentor
decision support
or simply emotional support

Then we are no longer only talking about:

Did the model hallucinate?

Did it give a technically correct answer?

Did it invent information that does not exist?

Did someone get past the safety guardrails?

We are talking about:

How much can it shape human behavior?

How much trust does it build in the user?

Can it emotionally reinforce dangerous decisions?

Do people start attributing human judgment to it?

That leads to a completely different class of problems.

And here it all becomes a QA problem

Classic software testing was largely built on deterministic logic: input, output, validation, and reproducibility. AI systems do not work like that. Two people can get completely different answers to the same question. You yourself can too, if you ask the same question again and again. The answer will always show up in your chat window in a different tone, with different confidence, with different emotional charge.

...then add human psychology on top.

A technically "safe" answer can still be dangerous depending on:

who reads it
what mental state they are in
how they interpret it
how vulnerable they are
how stressed they are
or how urgent their situation feels

And there we are. Suddenly you are not only testing software when you really think it through. You would have to account for possible human reactions too.

At least in a general sense at first, not yet thinking about the extremes.

So how do we test:

excessive trust?
emotional dependence?
false reassurance?
manipulation?
persuasive yet dangerous communication?

Right now there are sadly still very few test cases for this; the few that exist are often forgotten under typical business pressure, since what matters is shipping to production as fast as possible and being first on some metric better than the other AIs, as leadership often thinks. We know how that goes in reality.

I think this is exactly where the industry, companies, and their leaders still underestimate these aspects, and still do not put enough attention and resources into these tests.

AI testing is no longer only a technical problem

Much of the discourse around AI is still about everything being faster, needing more automation, agents coming, productivity, and acceleration - more, bigger, and technically "smarter." Meanwhile millions of people already talk every day to systems that simulate empathy, reinforce their thoughts, seem supportive, and sometimes validate dangerous decisions.

That is no longer only an engineering question in my view, and I do not think we take it seriously enough from QA yet either. I know many people, myself included, would like to take it more seriously and understand the importance, though the company or context does not allow it since it is not seen as that important.

Because we are no longer only testing whether:

Does the system work?

Does it give correct output?

Does it run stably?

Does it meet the specification?

We are also testing whether:

What does it evoke in people?

What decisions does it nudge the user toward?

Does it create excessive trust?

How does it affect human thinking and behavior?

That is a completely new category of challenge, a completely different level of testing, and a completely different validation method that is still emerging, especially if it gets proper attention and resources.

To avoid misunderstandings: I am not claiming companies are foolish or unaware. The specialists working on this know it very well, often more clearly than I do. What I will say is that there is still not enough emphasis on these topics and on working them through properly, because that does not bring profit for them. That is the big problem.

How the QA role is changing

In my view, future AI testing problems will not only be about:

whether we find a bug

whether we validate the output

whether a workflow works

whether the system stays stable

whether edge cases are handled well

whether it passes classic quality gates

Increasingly, the focus is on:

how people react to systems that sound intelligent without truly understanding the world

how much people trust them even when they fail

whether they can create emotional attachment or dependence

how they influence decisions under stress or vulnerability

where the line blurs between a tool and a “digital companion”

and whether we notice in time when an AI system is no longer only providing information and is already shaping behavior

And in all honesty, the most dangerous AI failures might not be technical bugs.

They might be the ones people never notice, because they have already started to trust it.

I am curious how you see this.

Where do you think the line is between a useful AI assistant and a dangerously influential system?
Can the effect on human trust even be tested properly?
Are current AI safety approaches enough?
How do you think the QA role will change as AI systems spread?
Are we already giving AI too much trust?

I would love to hear from other QA engineers, developers, and people working with AI on this too.

Stay ahead of where QA is going

AI is changing QA fast, but most of the conversation online is either panic or hype. If you want something more practical, you can join for occasional emails focused on what actually matters in real projects.

You will get:

Practical ideas you can apply on AI-heavy products
Real-world lessons from testing and shipping AI systems
Actionable checklists, testing strategies, and mental models
Clear insights without the fear-driven noise

No spam. No recycled LinkedIn advice. No fake urgency. Just useful content for QA engineers trying to adapt, grow, and stay sharp as the industry evolves.

📬 Get updates in your inbox

Prefer live chat? Join the QA Evolve Discord server to ask questions, share tips, and talk with other QA engineers working around AI testing and quality.

Join the QA Evolve Discord server