MO / XXX / MENT

AI voices – Replace humans with machines?

8. April 2025
Paul Kremsleithner

"Artificial intelligence will soon replace voice-over artists" – that was our thinking when we initially used computer-generated narration voices for our animation and web project "brain health." These voices were intended to accompany our explanatory videos, in which we explained how to keep the brain healthy and which measures can help combat dementia.

We used the AI ​​voices from Elevenlabs.io. Initially, we were impressed by both the technical quality and the astonishing naturalness of these synthetic voices. They sounded clear, neutral, and pleasant. However, as production progressed, we noticed that the impact of the voices was changing – a realisation that had a decisive effect on the further course of the project.



What are AI-generated voices – and how do they work?

AI-generated voices convert written text into spoken language. They use technologies such as machine learning, speech synthesis, and natural language processing to create voices that sound as realistic and understandable as possible. The process is divided into several steps:

Data collection and training

First, the systems collect large amounts of human speech data, including various accents, emotions, and intonations. This data forms the basis for training an AI model that recognises and replicates speech patterns.

Speech synthesis

The trained model converts text into spoken language. Text-to-speech (TTS) systems analyse phonetic structures and context to generate natural pronunciation, intonation, and sentence melody.

Real-time processing

Modern AI systems convert text into speech in near real time, often via cloud services. This enables flexible and scalable use.

Fine-tuning and personalisation

Users can now adjust many parameters, such as pitch, speech rate, and emotions. Techniques like voice cloning can even recreate individual voices with deceptive realism.


ORF Moderatorin Nadja Mader beim Einsprechen der Texte für gehirngesund.

The Art of Human Speaking Voices

At the beginning of our project, we opted for AI voices for practical reasons. This solution convinced us with its efficiency and cost savings. The results were solid, clear, and easy to understand. However, the comparison with a human voice revealed a clear difference.

Through Isolde Kühas, the founder of the association, we received the commitment of Nadja Mader, a nationally known television, event, and radio presenter, to provide the voiceovers for our videos.

After listening to her recordings and slightly adjusting the speed of the clips, an unexpected effect emerged: The videos significantly improved in quality.

This improvement was not only subtle but clear and tangible. The content seemed more personal, lively, and accessible.

Without the direct comparison with the human voice, the difference between the AI ​​versions would have been barely noticeable. However, the comparison made it clear what strengths human voices have and where AI reaches its limits.

What makes a human voice so special?

The strength of human speakers lies in interpretation. They often make small decisions intuitively: Which syllable do they emphasise? When do they pause briefly? How does a sentence sound when it is sincerely meant?

Between the written text and the spoken result, a creative process takes place for real people – an interplay of experience, emotion, and spontaneity. While AI calculates what "probably sounds good," speakers sense what is meant and convey this with their voice.

The decision to pronounce an "o" particularly softly or to shorten a sentence rhythmically lends personality to a text. As many parameters as an AI offers – pitch, style, tempo, emotion – intuition remains, for the time being, the preserve of humans.


Source: Midjourney

Where does AI with voices make sense – and where doesn't it?

The use of AI voices depends heavily on the respective application area.

Suitable areas of application for AI voices include:

  • Telephone announcements
  • E-learning courses
  • Prototypes
  • Accessible communication
  • Automated announcements

In these areas, AI voices primarily serve to convey information functionally. They impress with their flexibility, good intelligibility, and technical reliability – while being more cost-effective and faster than human speakers.

AI voices are less suitable for:

  • Storytelling
  • Character voice-overs
  • Audiobooks
  • Advertising
  • Emotional or personal topics

Here, it's not just the content that determines quality, but above all, the way it's implemented. Emotion, authenticity, and trust play a central role – qualities that authentic voices convey better.

Emotion and trust – two decisive factors

Emotion

The difference was clearly evident in our videos: The AI ​​versions were factually correct, understandable, and technically flawless. Nevertheless, they didn't seem truly alive.

Our goal was to convey content at eye level—not like a digital textbook. The real voice created a personal connection and conveyed closeness and warmth that AI can't yet provide.

Trust

Another critical aspect became clear when Nadja's recordings were available: Her voice is familiar to many people, which inspires trust.

Although the content remained unchanged, Nadja's voice added weight, making it seem more credible and approachable.

These observations raise an important question: What happens when AI recreates well-known voices? Voice cloning and deepfakes make it possible to use famous voices for any statement. Does this increase credibility—or undermine it? Who owns the rights to such voices? And how do we deal with people "living on virtually"—without their consent? These ethical questions should not be ignored in the discussion about AI voices.


Conclusion: Human or machine – or both?

AI-generated voices offer many advantages: They are cost-effective, efficient, and easily accessible. They open up new possibilities, especially in accessible communication.

However, they currently lack something crucial: the spontaneity, depth, and authenticity that only humans can convey.

Our conclusion is therefore: It's not a question of better or worse, but of the appropriate use. We continue to rely on AI voices for prototypes and factual content. However, when it comes to truly reaching people, humans remain indispensable.


Hey! 

We analyse your existing design, concept, or website and provide concrete recommendations for action – free of charge, without obligation, and personally.
Form EN
Chat on WhatsApp
WhatsApp
heartbookmark