Our AI writing assistant, WriteUp, can assist you in easily writing any text. Click here to experience its capabilities.

Stability AI debuts Stable Audio bringing text to audio generation to the masses

View Original View Raw

Summary

Stability AI has released a new AI technology called Stable Audio which enables users to generate short audio clips using text prompts. The technology is based on the same core AI techniques used for image and code generation, and has been trained on over 800,000 pieces of licensed music from AudioSparks. The model has 1.2 billion parameters, and users can use prompts to create audio files that go beyond MIDI and symbolic generation. Stable Audio is available for free with limited generation options, or in a Pro plan for $12/month.

Q&As

What is Stability AI's new technology called?
Stability AI's new technology is called Stable Audio.

What techniques does Stable Audio use to generate audio clips?
Stable Audio uses a diffusion model trained on audio to generate new audio clips.

How does Stable Audio differ from symbolic generation techniques?
Stable Audio works directly with raw audio samples for higher quality output, whereas symbolic generation techniques commonly work with MIDI files.

What data was used to train the Stable Audio model?
The Stable Audio model was trained on over 800,000 pieces of licensed music from audio library AudioSparks.

What are the features of the free and Pro plans for Stable Audio?
The free version of Stable Audio allows 20 generations per month of up to 20 second tracks, while the Pro version increases this to 500 generations and 90 second tracks.

AI Comments

👍 I'm excited to see how Stability AI is making text-to-audio generation technology accessible to everyone with Stable Audio. It's great to see how the company has expanded its capabilities beyond image and code generation.

👎 I'm disappointed that Stable Audio can't generate music in the style of specific artists such as The Beatles. It would be great to have that functionality available.

AI Discussion

Me: It's about an AI technology called Stable Audio that can generate audio clips from simple text prompts. Stability AI, the organization behind the technology, just released it to the public. It's based on the same AI techniques used in Stable Diffusion, which is their text-to-image generation AI technology.

Friend: That's really interesting! It sounds like this could have a lot of applications for musicians and audio production.

Me: Absolutely. It could be a great tool to quickly create audio samples for music production. It also has the potential to make audio generation more accessible to people who don't have a lot of experience with music production. Stability AI is releasing a prompt guide to help users with text prompts that will lead to the types of audio files they want to generate. They're also offering both a free version and a Pro version of the technology.

Action items

Explore the on-demand library from VB Transform 2023 to learn more about Stability AI's Stable Audio technology.
Register for the free version of Stable Audio to experiment with text-to-audio generation.
Read the Stable Audio prompt guide to learn how to use the right prompts for audio generation.

Technical terms

Stability AI: A company that specializes in AI technology for image and code generation.
Stable Audio: A technology developed by Stability AI that enables users to generate short audio clips from simple text prompts.
Stable Diffusion: A text-to-image generation AI technology developed by Stability AI.
SDXL: A new base model for improved image composition released by Stable Diffusion in July.
StableCode: A technology developed by Stability AI that enables users to generate code from text prompts.
Diffusion Model: A type of AI model used by Stable Audio to generate audio clips.
MIDI (Musical Instrument Digital Interface): A type of file format used to represent musical notes.
Symbolic Generation: A technique used to generate music using MIDI files.
Harmonai: A research lab created by Zach Evans at Stability AI for music generation.
AudioSparks: An audio library used to train the Stable Audio model.
Contrastive Language Audio Pretraining (CLAP): A technique used by the Stable Audio text model.