Recently, NVIDIA Corp. joined the ranks of Meta Platforms Inc., OpenAI, and Runway AI Inc. by launching a generative artificial intelligence model capable of producing "new" music and audio based on human language prompts.
The model, named Fugatto (short for Foundational Generative Audio Transformer Opus 1), according to NVIDIA, stands out for its ability to alter human voices and create novel sounds that other models cannot produce.
Although NVIDIA is renowned for producing powerful GPUs that drive AI models, the company has not publicly released the Fugatto model due to safety concerns.
NVIDIA highlighted that Fugatto differs from other music and audio generation models by its ability to absorb and modify existing sounds. For instance, it can listen to a musical segment played on a piano and transform that sound into human vocals or notes from other instruments like the violin. Additionally, it can record human voices and alter the accent and emotional expression in singing.
While claiming that Fugatto produces entirely novel sounds may be somewhat misleading, as like all AI models, its output is based on existing data sources to generate responses to user prompts, NVIDIA asserts that Fugatto can create unprecedented "soundscapes" by layering two different audio effects.
In a video released on YouTube, NVIDIA showcased Fugatto's capabilities, such as generating train sounds that gradually transition into orchestral performances or transforming joyful sounds into angry ones.
NVIDIA claims that such functionalities have not been observed in prior audio generation models. Additionally, beyond basic prompt engineering, Fugatto offers users more refined controls to edit the soundscapes they create.
Bryan Catanzaro, NVIDIA's Vice President of Applied Deep Learning Research, told Reuters that generative AI could impact music production in ways similar to electronic synthesizers.
He stated, "Looking back at 50 years of synthetic audio, today's music sounds distinct from computer-generated sounds. Generative AI will bring new functionalities to music, video games, and to those who simply want to create."
NVIDIA is not the first company to experiment with generative AI for music creation. Last month, Meta introduced a new model called Movie Gen, which can create both video and soundscapes for generated short films.
Regarding the data used to train Fugatto, NVIDIA disclosed limited information, only stating that it consists of "millions of audio samples" from open-source data. The company also confirmed, like Meta, that it currently does not plan to make Fugatto available to AI developers. Catanzaro mentioned that his team is still discussing how to release the model to the public safely.