--- base_model: - stabilityai/stable-audio-open-1.0 tags: - music-generation - trap - rap - hip-hop - beat-generation - fine-tuning - music-tagging ---
As a music and AI lover I wanted to dive into the music generation technologies.
First, I started by exploring existing models for music generation such as Suno or Stable Audio 2.0, but I couldn't find any that could generate modern trap/rap/r&b beat as well. Then I got this idea, fine tune an open source model over a good amount of trap beat. I chose Stable Audio Open 1.0, as I found it to be the most suitable open-source foundation for this kind of task.
# Results [**Here**](https://github.com/Gab404/Stable-BeaT) the GitHub repository for model inference. All the following results have been generated with 200 steps, CFG scale of 7, second start set on 0s and duration on 47s. --- ### Prompt 1 *A dark and melancholic cloud trap beat, with nostalgic piano, plucked bass and synth bells, at 110 BPM.* | Stable Audio Open 1.0 | StableBeaT | |:--|:--| | | | | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score | |:--|:--|:--|:--|:--|:--| | **106.13** | **1159.43** | **0.000091** | **0.460** | **0.000073** | **0.489** | --- ### Prompt 2 *A laid back lo-fi jazz rap at 85 BPM, featuring deep sub, plucked bass, and vocal chop, with chill and jazzy relaxed moods.* | Stable Audio Open 1.0 | StableBeaT | |:--|:--| | | | | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score | |:--|:--|:--|:--|:--|:--| | **82.72** | **784.82** | **0.000030** | **0.457** | **0.000015** | **0.429** | --- ### Prompt 3 *Melancholic trap beat at 105 BPM with shimmering synth bells and deep sub bass, minor chord progressions on piano, and airy vocal pads, evoking a cinematic and emotional atmosphere.* | Stable Audio Open 1.0 | StableBeaT | |:--|:--| | | | | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score | |:--|:--|:--|:--|:--|:--| | **100.45** | **2540.28** | **0.000284** | **1.412** | **0.0000585** | **0.523** | --- ### Prompt 4 *A jazzy chillhop beat at 101 BPM featuring synth bells, vocal pad, and movie sample, evoking trap nostalgic and chill moods.* | Stable Audio Open 1.0 | StableBeaT | |:--|:--| | | | | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score | |:--|:--|:--|:--|:--|:--| | **148.02** | **4287.26** | **0.00179** | **2.963** | **0.000195** | **0.552** | --- ### Prompt 5 *Smooth and seductive at 115 BPM trap beat with electric guitar riffs, plucked bass, vocal adlibs, and warm synth pads. Relaxed, romantic, and sexy mood.* | Stable Audio Open 1.0 | StableBeaT | |:--|:--| | | | | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score | |:--|:--|:--|:--|:--|:--| | **82.72** | **1056.42** | **0.000046** | **0.645** | **0.000089** | **0.478*** | --- ### Prompt 6 *A moody cloud trap beat, boomy bass, synth bells and melodic piano, evoking etherate mood at 100 BPM.* | Stable Audio Open 1.0 | StableBeaT | |:--|:--| | | | | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score | |:--|:--|:--|:--|:--|:--| | **144.2** | **2458.5** | **0.000356** | **0.738** | **0.00206** | **0.363** | --- ### Prompt 7 *A smooth neo-soul R&B instrumental at 90 BPM in D major, featuring live bass, soft Rhodes keys, and warm analog drum grooves.* | Stable Audio Open 1.0 | Stable BeaT | |:--|:--| | | | | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score | |:--|:--|:--|:--|:--|:--| | **130.81** | **1000.87** | **0.000166** | **0.679** | **0.000007288** | **0.250** | --- # Dataset I used 20,000 trap/rap beats spanning various subgenres such as cloud, trap, R&B, EDM, industrial hip-hop, jazzy chillhop... For each instrumental, I extracted two segments of 20 to 35 seconds, so it ended up with 40k audio dataset for about 277h of audio, while keeping track of their starting timestamps. This allowed the model not only to learn the content of the beats but also to capture the temporal structure inherent to the musical phrases. A key goal of this project was to enable the model to learn new instruments (synth bells, deep sub, plucked bass, snare, ...), tempos, and rhythmic patterns that are strongly associated with trap and its subgenres. To achieve this, I tagged each segment by computing its similarity with curated lists of instruments, moods, and genres using a CLAP LAION model. Additionally, I used the Essentia library to extract the BPM (deeptemp-k16-3) and key/scale of each audio segment, considering only predictions with confidence above 70%. ```json { "39118.wav": { "instruments_tags": [ "plucked guitar", "synth bells", "movie sample" ], "genres_tags": [ "rap with soul" ], "moods_tags": [ "trap melancholic", "love" ], "key": "G", "scale": "minor", "tempo": 109.0, "start": 63, "duration": 26 } } ``` I chose to generate some synonyms to improve the model’s language variety. This combination of features instrumentation, tempo, key, mood, and genre provided a rich set of musical metadata.