AT&T Enables Voice-Controlled Apps for iPhone 3G

One of the biggest laments iPhone users have about the iPhone is that it lacks voice dialing. How and when that key missing feature gets added is still open to speculation, but the engineers at AT&T Labs Research have at least been cooking up a way for developers to easily implement voice controlled applications for "multimedia devices with broadband access," such as the iPhone 3G. 

AT&T has developed a software framework that it calls the AT&T Watson Speech Mashup. Essentially, the Watson ASR (automatic speech recognition) engine is a Web-based service that developers can provide hooks to into their applications--ergo the "mashup." When someone uses one of these apps on a supported device and comes to an area that needs user input, instead of typing in the information, the user speaks the information. This spoken data then gets sent to the remote application server where the Watson ASR engine resides. The Watson ASR engine converts the speech to text and sends the data back to the app where it populates the input field with the converted text. 


Credit: AT&T

For instance, in this AT&T video demo (the link opens a video in a new window), the demonstrator uses the Yellowpages.com mobile Website, to search for Japanese restaurants in Florham Park, NJ, by saying "Florham Park, New Jersey," when the Location field is selected and "Japanese restaurants" when the Find field is selected.

"This new capability provides network-hosted speech technologies for multimedia devices with broadband access (iPhone, BlackBerry, IPTV set-top box, SmartPhones, etc.) without the need to install, configure, and manage speech recognition software and equipment. This enables easy and rapid development of new speech and multimodal mobile services as well as new web-based services. The software implementation is based on well-established web programming models, such as SOA, REST, AJAX, JavaScript and JSON."


Credit: AT&T

Depending on the cost to developers of implementing this technology and the ease of integrating it into applications, this could very well be a very efficient means of adding voice-control functionality to applications. With the AT&T Watson ASR servers doing all the heavy lifting of speech-to-text conversion, the only possible issue we foresee with the technology is the potential lag-time from once the information is spoken, transmitted to the server, converted to text, and transmitted back to the device. As such, this is why AT&T envisions the technology for "devices with broadband access," which can hopefully minimize latency with fast data transfer speeds.

We see no reason why developers couldn't create a voice-dialing application for the iPhone that uses the Watson ASR to dial phone numbers via voice commands. Perhaps this is how AT&T plans to bring voice dialing to the iPhone 3G?