Disadvantages of automatic speech-to-text translation

Get accurate and active Loan Data.
Post Reply
hasanthouhid0
Posts: 131
Joined: Sat Dec 28, 2024 3:24 am

Disadvantages of automatic speech-to-text translation

Post by hasanthouhid0 »

An AI transcriptionist will help a company process voice recordings with minimal human involvement and much cheaper, faster and in larger volumes.


The main problem with machine translation is the imperfection of the analysis algorithms. Even good australia physiotherapist email list language models make mistakes and write the wrong words. This usually happens when:

The speaker has noticeable speech defects, due to which the pronunciation differs greatly from the standard.

The recording is of poor quality: loud wheezing, a lot of extraneous noise, which does not allow one to isolate speech and separate it into separate sounds.

The human uses unfamiliar words. For example, metaphors, uncommon terms that the AI ​​does not know. This happens especially often during interviews with experts.

In most cases, a misspelling of a word will not be a problem; the reader will correct the errors and understand the idea. However, in complex niches, typos in the text can radically change the meaning.

Automatic services cannot be trusted 100% yet. If there is background noise, the error rate increases to 40–70% depending on the quality of the recording. At the same time, AI has no self-checking mechanisms. The only way to eliminate speech recognition errors is to involve an editor or the speaker themselves, who will correct the article.


The graph shows the results of the accuracy check of 10 transcribers. The main metric is Word Error Rate (WER) — the percentage of incorrectly recognized words.

The blue line on it is the ideal ratio between WER and AI confidence.

It is assumed that if the AI ​​is 80% confident in the result, then the WER should be 20%. However, most tools do not meet this requirement. The researchers found that the AI ​​makes too many mistakes and has high confidence in the result. Only AssemblyAI shows the perfect ratio. Whisper and Microsoft have the opposite: with low confidence, they make fewer mistakes.

This graph shows that the technology is not yet well developed. Even services created by corporations are far from ideal. Therefore, when implementing a transcriptionist for audio processing, take into account possible errors and manually correct the results if 100% translation accuracy is required.

What tasks can voice-to-text translation help solve?
Voice recognition is a universal technology that is used in many areas of business. We will consider only the priority areas.

Call analytics
The wording "All conversations are recorded to improve the quality of service" and similar ones are often heard before connecting with a call center operator. Nowadays, it is mainly people who listen to audio recordings and make reports.

AI-powered systems are great for simple tasks, like assessing the category of a call. AI can handle them well even if the audio is noisy or the person has an unclear pronunciation.

Algorithm for using technology for call analytics
Algorithm for using technology for call analytics

Call center automation
Voice recognition algorithms are used in call centers when developing answering machines. They understand the speech of the interlocutor and answer simple questions, and advanced robots are able to maintain a dialogue like a real person. The answering machine does not ask to choose one of the functions in the list ("Press 1 to get help"), but waits for a specific question.
Post Reply