AI Engineering Workshop

AI Music Generation: From Prompt to Production

A comprehensive hands-on workshop exploring AI music generation tools, voice cloning, stem separation, and the legal battles reshaping the music industry. Learn practical workflows for generating professional-quality music from text prompts using Udio, Suno, Stable Audio, and RVC.

"That exact song that you just generated into this digital audio workstation and give you stem by stem."

Phlo Young, demonstrating stem separation with Wave tool (00:51:36)

Watch (00:51:36)
54 min

Hands-on workshop

3 Types

Text-to-audio, audio-to-audio, voice conversion

RIAA

Lawsuits against Suno & Udio

The AI Music Generation Landscape

AI music generation has evolved rapidly, with tools now capable of producing studio-quality tracks from simple text prompts. Young categorizes the landscape into three primary types of tools, each serving different creative workflows.

Generate full songs with lyrics from text descriptions. The most accessible entry point for AI music creation.

Udio
Suno

Transform simple audio inputs (whistling, beatboxing) into full musical arrangements with instruments and structure.

Stable Audio

Clone and transform voices using as little as 6 seconds of audio. Enables artists to "sing" in any voice.

RVC

The Workflow

The most powerful approach combines all three: Use GPT-4 to write lyrics → Generate music with Udio/Suno → Apply voice conversion with RVC → Separate stems for post-production. This hybrid workflow enables creators to produce professional tracks without traditional recording studios.

Live Demo: Creating Music in Real-Time

Young demonstrates the power of AI music generation by creating original songs live based on audience suggestions, including a birthday song for an attendee named Patrick and an 'AI Engineer World's Fair' themed track.

Described as a collaborative partner that helps you craft songs. Generates 30-second clips extendable to full tracks with coherent structure, verses, and choruses.

✅ Text-to-music with lyrics

✅ Genre and mood control

✅ Extendable clips

✅ Coherent song structure

Positioned as an in-house music producer. Similar capabilities to Udio with different model strengths and prompt behaviors.

✅ Full song generation

✅ Vocal and instrumental modes

✅ Style transfer capabilities

✅ Multiple output versions

Practical Tip: Use Rate Your Music for Better Prompts

Young shares a professional technique for getting better results: Use "Rate Your Music" (rateyourmusic.com) to find specific genre labels, then incorporate those into your prompts to reverse-engineer specific sounds.

Example Workflow:

  1. Search Rate Your Music for your target genre
  2. Copy the genre labels and descriptors
  3. Paste into Udio/Suno as style prompts
  4. Get authentic, genre-accurate results

Voice Conversion: When AI Brings Voices Back

One of the most powerful demonstrations involved AI-generated vocals mimicking famous artists like Kanye West and Drake. But the most emotional example was country artist Randy Travis, who lost his voice to stroke but was able to 'sing' again through AI voice conversion.

"Behavior that we didn't expect and it just works, which is really fascinating."

Speaker's reflection (00:06:16)

00:06:16

Technical Requirements

RVC (Retrieval-based Voice Conversion) requires minimal training data:

  • Minimum: 6 seconds of voice sample
  • Recommended: 10-30 seconds for better quality
  • Hardware: Decent GPU for local training

Voice Blending Technique

Can combine multiple voice sources for copyright safety:

  • Example: 6 seconds your voice + 6 seconds Eminem
  • Alternative: Use TTS robot voices as base
  • Benefit: Avoids direct cloning of specific artists

Important Warning

Ethical Consideration

When asked about ethical AI music use, Young responded: "I'mma plead the fifth because I'm not a lawyer" (00:47:25). The legal landscape around voice cloning and copyright remains unclear, with creators navigating gray areas between inspiration and infringement.

Stem Separation: The Post-Production Revolution

One of the most powerful capabilities demonstrated was stem separation—isolating individual instrument tracks and vocals from mixed audio. This enables creators to edit, remix, and master AI-generated music with professional-level control.

Online digital audio workstation with free accounts currently available. Provides stem separation capabilities directly in the browser.

Online DAW
Free Accounts
Browser-Based

Generation Time: 30-90 seconds for stem extraction

Open-source stem separation tool for local processing. Works on any song, not just AI-generated music.

Open Source
Local Processing
Universal

"literally you can get and that's not just a AI generated music any song you can put into uvr5" (00:53:51)

The Complete Production Workflow

1

Generate Lyrics

Use GPT-4 with custom prompts for coherent songwriting

2

Create Music

Generate songs with Udio or Suno from text prompts

3

Voice Convert

Apply RVC for voice transformation if needed

4

Separate Stems

Extract individual tracks with Wave or UVR5

The Legal Battle: RIAA vs. AI Music

The rapid advancement of AI music generation has collided with copyright law. The RIAA (Recording Industry Association of America) has filed lawsuits against both Suno and Udio for copyright infringement, raising critical questions about training data, artist rights, and the future of music creation.

The Core Legal Questions

Training Data Copyright

Do AI models have the right to train on copyrighted music without permission or compensation? This is the central question in the RIAA lawsuits.

Artist Style vs. Expression

Copyright protects specific works, not an artist's style. But AI models can now mimic artists so convincingly that the distinction is blurring.

Transformative Use Defense

AI companies argue their models create transformative works. The RIAA argues this is mass copyright infringement disguised as innovation.

Important Warning

Practical Impact on Creators

  • ⚠️ Artist Name Restrictions: Platforms now block prompts containing specific artist names
  • ⚠️ Content Moderation: AI-generated songs may be flagged (like the Randy Travis example)
  • ⚠️ Uncertain Liability: Legal exposure for creators using AI tools remains unclear
  • ⚠️ Platform Terms: Always check terms of service before commercial use

Key Takeaways for AI Engineers

1. AI Music Quality is Production-Ready

  • Tools like Udio and Suno can generate studio-quality tracks
  • The Randy Travis example proves AI can create emotionally authentic music
  • Coherent structure, lyrics, and instrumentation

2. Prompt Engineering Matters

  • Use 'Rate Your Music' labels to reverse-engineer specific sounds
  • Combine with GPT-4 for lyric writing
  • Get significantly better results with genre-specific prompts

3. Stem Separation Enables Post-Production

  • Tools like Wave (free accounts) and UVR5 (open-source)
  • Isolate instruments and vocals for professional editing
  • Enable remixing and mastering capabilities

4. Legal Uncertainty Remains

  • RIAA lawsuits against Suno and Udio highlight unresolved copyright questions
  • Avoid using specific artist names in prompts
  • Stay updated on legal developments

5. Voice Cloning Requires Minimal Data

  • RVC can train voice models with as little as 6 seconds of audio
  • Consider blending voices for copyright safety
  • Technical capability accessible to hobbyists

6. Hybrid Workflows Win

  • GPT-4 for lyrics → Udio/Suno for generation → RVC for voice → Wave/UVR5 for stems
  • The best results combine multiple tools
  • Professional music without traditional recording studios

Tools and Resources

Music Generation

  • Udio: Text-to-music generation platform
  • Suno: AI music producer with full song capabilities
  • Stable Audio: Audio-to-audio transformation by Stability AI
  • 11 Labs: Text-to-speech and music generation

Voice & Post-Production

  • RVC: Retrieval-based Voice Conversion toolkit
  • Wave: Online DAW with stem separation (wavtool.com)
  • UVR5: Open-source stem separation tool
  • Rate Your Music: Genre reference for prompts (rateyourmusic.com)

Speaker Resources

Young provides additional resources, including GPT-4 prompt templates and music model prompts, at:

ai-talk.com/music →

Click on the notes section for comprehensive prompts and templates.

Source Video

Research Note: All quotes in this report are timestamped and link to exact moments in the video for validation. This analysis was conducted using comprehensive transcript analysis (10,616 lines) with dedicated agents for transcript analysis and insight extraction.

Technologies Mentioned: Udio, Suno, Stable Audio, RVC, Wave, UVR5, GPT-4, Rate Your Music

Research sourced from AI Engineer World's Fair transcript. Analysis conducted using comprehensive transcript analysis covering text-to-music generation, voice conversion, stem separation, and legal landscape considerations. All quotes verified against original VTT file.