MO / XXX / MENT

AI voices – Replace humans with machines?

8. April 2025
Paul Kremsleithner

"Artificial intelligence will soon replace voice-overs" – that's what we thought when we initially used computer-generated narration for our animation and web project "brain health." These voices were intended to accompany our explanatory videos about how to keep your brain healthy and what steps you can take to combat dementia.

We used the AI ​​votes from Elevenlabs.io. Initially, we were truly impressed—both by the technical quality and the astonishing realism of these synthetic voices. They sounded clean, neutral, and pleasant. But during the production process, a turning point came. And with it, a realization that significantly influenced the entire subsequent process.


Source: Midjourney

What are AI-generated voices, and how do they work?

To understand the differences between real and artificial voices, it's worth taking a quick technical look behind the scenes.
AI-generated voices are based on technologies such as machine learning, speech synthesis, and natural language processing. The goal is to transform written text into spoken language – as realistically and understandably as possible. The process typically looks like this:

Data collection and trainingFirst, huge amounts of human speech data are collected – including various accents, emotions, and intonations. This data serves as the basis for training an AI model, which is then able to recognize and reproduce speech patterns.

Speech synthesisThe trained model transforms text into spoken language. Text-to-speech (TTS) systems analyze phonetic structures and context information to generate the most natural pronunciation, intonation, and sentence melody possible.

Real-time processing: Modern AI systems are capable of converting text into speech almost in real time – often via cloud services. This makes them particularly flexible and scalable.

Fine-tuning and personalization: Users can now influence many parameters – such as pitch, speed, or even emotions. Techniques such as voice cloning can now even recreate specific voices with deceptive realism.



ORF presenter Nadja Mader recording the lyrics for brain-healthy.
Standbild aus der Animation Alkohol - Risikofaktor für Demenzerkrankungen. Ein Video für gehirngesund.at
gehirngesund.at shows 9 videos that explain risk factors for dementia and show methods for prevention.

The Art of Human Speaking Voices

At the beginning of our project, we decided on AI voices for practical reasons – a solution that convinced us both in terms of efficiency and cost. The results were solid, understandable, and clear. But then came the comparison.

Through Isolde Kühas, the founder of the association, we received confirmation from Nadja Mader – a nationally known television, event, and radio presenter – that she would provide the voiceovers for our videos.

After listening to her recordings and slightly adjusting the speed of the clips, something happened that we hadn't expected: The videos significantly improved in quality.

And this wasn't a subtle, barely noticeable improvement – ​​it was a clear, almost astonishing difference. The content suddenly seemed more personal, more vibrant, more accessible.

Without the comparison to the real voice, it would have been hard to notice that something was missing from the AI ​​versions. But the direct comparison made it visible (or rather, audible) what human voices can achieve – and where AI reaches its limits.

What makes a human voice so special?

The art of human speakers lies in interpretation. In the small decisions that are often made intuitively: Which syllable is stressed? When is a short pause taken? How does a sentence sound when it is sincerely meant?

Between the written text and the spoken result, real people have a creative process – an interplay of experience, emotion, and spontaneity. While AI merely calculates what "probably sounds good," speakers sense what is meant – and convey it with their voice.

The decision to pronounce an "O" particularly softly or to shorten a sentence rhythmically – these are creative ideas that give a text personality. No matter how many parameters an AI may offer (pitch, style, tempo, emotion, etc.), intuition remains the preserve of humans for the time being.


Source: Midjourney

Where does AI make sense – and where doesn't it?

Based on our experience, we would clearly say today: It depends on the application.

Suitable application areas for AI voices:

  • Telephone announcements
  • E-learning courses
  • Prototypes
  • Accessible communication
  • Automated announcements

Wherever information needs to be conveyed functionally, AI voices can be an excellent solution. They are flexible, understandable, technically sophisticated – and significantly cheaper and faster than human speakers.

Less suitable for AI:

  • Storytelling
  • Character voice-overs
  • Audiobooks
  • Advertising
  • Emotional or personal themes

In all of these areas, it's not just the content that influences the quality of the result, but above all the way it's implemented. And here, emotion, authenticity, and trust play a central role – elements that must be conveyed through real voices.

Emotion and trust – two key factors

Emotion

The difference was immediately noticeable in our videos. The AI ​​versions were good – factually correct, understandable, technically flawless. But they didn't feel right.

Our goal was to convey the content at eye level, not like a digital textbook. With the real voice, there was suddenly the added feeling of being addressed by a human – not a machine. AI can't (yet) create this closeness, this warmth.

Trust

We only became aware of another point once Nadja's recordings were available: Her voice is familiar to many people – and that's precisely what creates trust.

Although the content was exactly the same, Nadja's voice suddenly gave it more weight. The statements seemed more credible, more approachable.

This raises an intriguing question: What happens when well-known voices are recreated by AI? What if voice cloning and deepfakes make it possible to have famous voices say anything? Does this enhance credibility – or undermine it? And who actually has the right to these voices? What happens when people "continue to live virtually" – without their consent? Here we encounter an ethical dimension that cannot be neglected in the discussion surrounding AI voices.


Conclusion: Human or machine – or both?

AI-generated voices are undoubtedly a great tool. They are inexpensive, efficient, accessible – and open up new possibilities, especially in barrier-free communication.

But what they (still) lack is what defines us as human beings: spontaneity, depth, authenticity.

Our conclusion: Not better or worse – but context-dependent.
We will continue to use AI voices for prototypes and factual content in the future. But when it comes to truly reaching people, humans remain irreplaceable.





Chat on WhatsApp
WhatsApp
heartbookmark