Watch Microsoft’s VASA-1 AI Make The Mona Lisa Sing Like A Rap Star In Wild Demo
The fear of AI being used to make deep fakes of people may have just gotten a bit scarier. Microsoft’s latest announcement of its VASA-1 model is not only capable of making lip movements that are synchronized with audio, but also able to capture a large spectrum of facial nuances and natural head motions that the company says contributes to the perception of authenticity and liveliness. Min Choi shared a video created with VASA-1 on X/Twitter of “Mona Lisa rapping Paparazzi.”
Microsoft just dropped VASA-1.
— Min Choi (@minchoi) April 18, 2024
This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba
10 wild examples:
1. Mona Lisa rapping Paparazzi pic.twitter.com/LSGF3mMVnD
With great power comes great responsibility, and Microsoft says it understands this when it comes to VASA-1’s capabilities. The company recognizes the possibility of it being misused, but adds that “it is imperative to recognize the substantial positive potential” of the company’s technique. Microsoft lists benefits which include enhancing educational equity, improving accessibility for individuals with communication challenges, offering companionship or therapeutic support to those in need, among others. Microsoft concludes it is dedicated to developing AI responsibly, with the ultimate goal of advancing human well-being.
With all that said, the software giant says it has no plans of releasing an online demo of VASA-1, API, product, additional implementation details, or any related offerings, until it is positive that the technology will be used responsibly and in accordance with proper regulations. So… perhaps never?