Google DeepMind Hails Breakthrough Advances In More Natural AI-Based Speech Generation

We've no doubt all listened to a PC "speak" to us at some point in our lives. Today, technologies are advanced enough to make it so that understanding this speech is not an issue, though we're sure many would prefer that solutions sounded less robotic, or so stitched-together. 

Google's DeepMind works on developing incredibly smart computers, and one of its projects involves recreating human speech as naturally as possible. You might appreciate the fact that DeepMind has developed its WaveNet model, which improves upon existing solutions by roughly 50%. It still underperforms human speech (but not by much, in our opinion), but it's nonetheless a massive improvement. Given where technology is going, with ever-faster hardware, there's little doubt that this is just the beginning when it comes to improvements.

Google WaveNet Accuracy

What makes WaveNet different is that it's an AI neural network designed to understand how the human brain functions to produce various sounds. This is different from other solutions which either stitch together real human voice segments to form full words, or one that simply produces computer-generated sounds that attempt to sound realistic.

Here are a few samples for comparison...


Another upside to WaveNet is that it's not only able to accurately recreate the U.S. English language, but also Chinese Mandarin. There's no reason to doubt that in the future, many more languages will be included, and that its accuracy will be improved even further. There is a downside, though: the computation behind WaveNet requires enormous datasets, so it's not going to be for consumer use quite yet. However, our hardware continues to get better, and so too could the algorithms used, so it might not take that long for us to see real-world benefits. In the meantime, be sure to hit up the URL below and check out comparison examples provided by Google. Prepare to be impressed.