Stability AI Launches New Sound Generation AI Tool

2 min read

AI startup company Stability AI launched a new sound generator called Stable Audio Open, which uses royalty-free sounds for training.

Stability AI, the startup responsible for the AI-powered art generator Stable Diffusion, has released an open AI model for generating sounds and music. The model was purportedly trained exclusively on royalty-free recordings.

The generative model, Stable Audio Open, generates a recording up to 47 seconds long by interpreting a text description (e.g., “Rock beat played in a treated studio, session drumming on an acoustic kit”). The model was trained using approximately 486,000 samples from the Free Music Archive and FreeSound, two free music libraries.

According to Stability AI, the model can generate drum beats, instrument riffs, ambient noises, and “production elements” for videos, films, and TV programs. Additionally, it can be employed to “edit” existing songs or adopt the style of one song (e.g., smooth jazz) for another.

In a post on its corporate blog, Stability AI stated that users can fine-tune the model on their custom audio data, a significant benefit of this open-source release. “For instance, a drummer could generate new beats by fine-tuning samples of their drum recordings.”

Nevertheless, Stable Audio Open has its limitations. It must be capable of composing complete compositions, melodies, or vocals, at least not high quality. Stability AI asserts that it is not optimized for this purpose and recommends that users seeking those capabilities opt for the company’s premium Stable Audio service.

Additionally, its terms of service prohibit commercial use of Stable Audio Open. Additionally, it is less effective when described in languages other than English or across musical genres and cultures, as it is subject to biases. Stability AI accuses the training data.

In a model description, Stability AI notes that the data source potentially could be more diverse and that all cultures are not equitably represented in the dataset. “The biases present in the training data will be reflected in the samples generated by the model.”

Stability AI, which has long struggled to revitalize its faltering business, recently faced controversy when its VP of generative audio, Ed Newton-Rex, resigned because he disagreed with the company’s stance that training generative AI models on copyrighted works constitutes “fair use.” Stable Audio Open seems to attempt to rewrite that narrative while simultaneously not so subtly advertising Stability AI’s paid products.

Edwin Aboyi

on11 months ago