Our AI writing assistant, WriteUp, can assist you in easily writing any text. Click here to experience its capabilities.

Try ‘Riffusion,’ an AI model that composes music by visualizing it

View Original View Raw

Summary

Riffusion is an AI model that creates music by visualizing it. It does this by using a machine learning technique called Stable Diffusion, which works by gradually replacing visual noise with what the AI thinks a prompt ought to look like. This is done by using spectrograms, which are visual representations of audio, and feeding the model a collection of images with relevant tags, such as "blues guitar" and "jazz piano." Once trained, the model can produce spectrograms that can be converted into audio which closely match its prompts. The model can also transition between two prompts, such as "church bells" and "electronic beats," in a surprisingly natural fashion. Riffusion is a demo project and not meant to reinvent music, but its creators are excited to see people engaging with it and building on top of the code.

Q&As

What is Riffusion?
Riffusion is an AI model that composes music by visualizing it.

How does Riffusion generate music?
Riffusion generates music by fine-tuning Stable Diffusion on spectrograms.

What is Stable Diffusion?
Stable Diffusion is a machine learning technique for generating images that supercharged the AI world over the last year.

How is Riffusion different from other AI music generators?
Riffusion takes advantage of latent space to create music that fades from one prompt to another, while other AI music generators use speech synthesis models or specially trained audio models.

What resources are available to test and use Riffusion?
A live demo of Riffusion is available at Riffusion.com, and the code is available via the about page.

AI Comments

👍 This article does a great job of exploring the fascinating new AI-generated music process of Riffusion and its potential implications. It is thorough and provides a range of examples of the audio visuals that the process produces.

👎 While the article does a great job of explaining the process of Riffusion, it fails to explore the potential ethical implications of using AI to generate music.

AI Discussion

Me: It's about a new AI model called Riffusion that composes music by visualizing it. It uses a machine learning technique called Stable Diffusion to generate images and then convert them to sound. So you can feed it images of spectrograms and it can "understand" the audio and create a new composition. It's kind of like a mashup of different genres.

Friend: That's really interesting! How far along is this technology?

Me: It's still a hobby project, but it's already created some pretty cool compositions. They're working on being able to create longer clips, but they're still working out how to do that. It could be a great tool for musicians, but it could also be problematic since AI-generated music could take away from traditional artistry.

Action items

Experiment with the Riffusion live demo to create your own AI-generated music.
Read up on other AI-generated music projects, such as Dance Diffusion, to learn more about the possibilities of AI-generated music.
Try to replicate the Riffusion project by using the code available on the about page and experimenting with different spectrograms.

Technical terms

AI-generated music: Music created by artificial intelligence.
Riffusion: An AI model that composes music by visualizing it.
DALL-E 2: A machine learning technique for generating images that supercharged the AI world over the last year.
Stable Diffusion: A machine learning technique for generating images that supercharged the AI world over the last year.
Fine-tuning: The process of giving the mostly trained model a lot of a specific kind of content in order to have it specialize in producing more examples of that content.
Spectrograms: Visual representations of audio that show the amplitude of different frequencies over time.
Latent Space: The no-man’s-land between more well-defined nodes in a machine learning model.
Dance Diffusion: A specially trained audio model for creating AI-generated music.