Stabilizing Live Speech Translation in Google Translate - Deepstash
Stabilizing Live Speech Translation in Google Translate

Stabilizing Live Speech Translation in Google Translate

Curated from: ai.googleblog.com

Ideas, facts & insights covering these topics:

9 ideas

·

1.74K reads

5

Explore the World's Best Ideas

Join today and uncover 100+ curated journeys from 50+ topics. Unlock access to our mobile app with extensive features.

Real Time Conversion Of Languages We Don't Understand

The transcription feature in the Google Translate app may be used to create a live, translated transcription for events like meetings and speeches, or for a story at the dinner table. In such settings, it is useful for the translated text to be displayed promptly to help keep the reader engaged.

Early versions of this feature the translated text suffered from multiple real-time revisions. The non-monotonic relationship between the source and the translated text, in which words at the end of the source sentence can influence words at the beginning of the translation.

19

306 reads

A New Update

The new version of the Google Translate app that significantly reduces translation revisions and improves the user experience. The research enabling this is presented in two papers. The first formulates an evaluation framework tailored to live translation and develops methods to reduce instability. The second demonstrates that these methods do very well compared to alternatives, while still retaining the simplicity of the original approach. The resulting model is much more stable and provides a noticeably improved reading experience within Google Translate.

19

251 reads

Evaluating Live Translation: The Metrics

Erasure: Measures the additional reading burden on the user due to instability. It is the number of words that are erased and replaced for every word in the final translation.

Lag: Measures the average time that has passed between when a user utters a word and when the word’s translation displayed on the screen becomes stable. Requiring stability avoids rewarding systems that can only manage to be fast due to frequent corrections.

BLEU score: Measures the quality of the final translation. Quality differences in intermediate translations are captured by a combination of all metrics.

20

193 reads

The Performance Measure of Quality

It is important to recognize the inherent trade-offs between these different aspects of quality. Transcribe enables live-translation by stacking machine translation on top of real-time automatic speech recognition. For each update to the recognized transcript, a fresh translation is generated in real time; several updates can occur each second. This approach placed Transcribe at one extreme of the 3 dimensional quality framework: it exhibited minimal lag and the best quality, but also had high erasure. Understanding this allowed us to work towards finding a better balance.

19

177 reads

Stabilizing Re-translation

One straightforward solution to reduce erasure is to decrease the frequency with which translations are updated. Along this line, “streaming translation” models (for example, STACL and MILk) intelligently learn to recognize when sufficient source information has been received to extend the translation safely, so the translation never needs to be changed. In doing so, streaming translation models are able to achieve zero erasure.

20

178 reads

Real-Time Streaming

In our paper, “Re-translation versus Streaming for Simultaneous Translation”, we show that our original “re-translation” approach to live translation can be fine-tuned to reduce erasure and achieve a more favourable erasure/lag/BLEU trade-off. Without training any specialized models, we applied a pair of inference-time heuristics to the original machine translation models — masking and biasing.

19

165 reads

The End Game

The end of an on-going translation tends to flicker because it is more likely to have dependencies on source words that have yet to arrive. We reduce this by truncating some number of words from the translation until the end of the source sentence has been observed. This masking process thus trades latency for stability, without affecting quality. This is very similar to delay-based strategies used in streaming methods such as Wait-k, but applied only during inference and not during training.

19

163 reads

Zero-Flicker Streaming

The combination of masking and biasing, produces a re-translation system with high quality and low latency, while virtually eliminating erasure. The table below shows how the metrics react to the heuristics we introduced and how they compare to the other systems discussed above. The graph demonstrates that even with a very small erasure budget, re-translation surpasses zero-flicker streaming translation systems (MILk and Wait-k) trained specifically for live-translation.

19

155 reads

The Bottomline

The solution outlined above returns a decent translation very quickly, while allowing it to be revised as more of the source sentence is spoken. The simple structure of re-translation enables the application of our best speech and translation models with minimal effort. However, reducing erasure is just one part of the story — we are also looking forward to improving the overall speech translation experience through new technology that can reduce lag when the translation is spoken, or that can enable better transcriptions when multiple people are speaking.

19

159 reads

IDEAS CURATED BY

jessicadelgado

Medical sales representative

Jessica Delgado's ideas are part of this journey:

Machine Learning With Google

Learn more about technologyandthefuture with this collection

Understanding machine learning models

Improving data analysis and decision-making

How Google uses logic in machine learning

Related collections

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

100+ Learning Journeys

Access to 200,000+ ideas

Access to the mobile app

Unlimited idea saving

Unlimited history

Unlimited listening to ideas

Downloading & offline access

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

Email

I agree to receive email updates