Google’s DeepMind researchers have been at it again creating technology that will help change the world as we know it and this time it comes in the form of WaveNet. WaveNet is a convolutional neural network that is even closer to mimicking both US English and Mandarin Chinese. The network can also switch between different voices and create unique musical fragments.
This type of text-to-speech system (TTS) is different to those currently in use today as is trained on raw audio waveform from multiple speakers. It then uses the network to generate synthetic utterances and sends the sample back to the network to generate the next sample. One of DeepMind’s researcher’s comments, “As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.”
WaveNet also has the ability to learn different characteristics of various voices (female and male) including breathing and mouth gestures. The researcher’s state, “To make sure [WaveNet] knew which voice to use for any given utterance; we conditioned the network on the identity of the speaker. Interestingly, we found that training on many speakers made it better at modeling a single speaker than training on that speaker alone, suggesting a form of transfer learning.”
One area that WaveNet mat struggle with is it will be limited to Google products, at least for the time being as the approach requires so much data and computing power. The processing power needed is said to be around 16,000 samples per second to create realistic speech sounds. But, WaveNet can still be used to remodel music audio and speech recognition and is something we will see much more of soon.
Related Links;
More News To Read
- Meet AliceX – Your Very Own Virtual Reality Girlfriend
- New Disney Robot Uses Air-Water Actuators to Move Around Smoothly
- Deep Web vs. Dark Web – Is There A Difference?
- Futuristic Propulsion That Can Bring Us to Mars In a Few Weeks Becoming a…
- This Hybrid Electric Sport Car to Give Tesla a Run for Its Money