Created June 24, 2024

midiFill

Transitions for Video Game Midi Tracks via Deep Learning

Things used in this project

Hardware components

AMD Radeon Pro W7900

Software apps and online services

Snappy Ubuntu Core

Story

Motivation

As a indie video game creator and composer, I often write tunes which work fine on their own, but the transition going from one area to the next is abrupt and kills the mood. A classic showcase of the problem is the game Chrono Trigger: Each track is beautiful its own, but the transitions are rather clumsy, which is really a shame. Therefore, I will use deep learning on an AMD graphics chip to create transitions between two tracks.

Method

Using torch with ROCm on the Radeon Pro W7900, we train a transformer to predict midi-like note tokens. These are snippets from preprocessed midi tracks with themes from popular media, which are scraped from popular midi websites such as midiworld.com. The database is around 50MB uncompressed. To achieve a prediction of a transition instead of the typical causal setting, we mask out a random number of notes after the token that is to be predicted. Masking out a region means setting regions of the first attention head to -float("inf") before the softmax, so that the transformer cannot make use of the information from those tokens.

To achieve this, the project uses Kevin-Yang's (@jason9693) midi tokenizer, and the music tranformer architecture as implemented by Damon Gwinn, Ben Myrick and Ryan Marshall.

Results

The loss of this version is, as expected, better than the original because the model has access to more data. In practice though, it appears to have learned mostly to repeat previous sequences of notes and is not yet in a state to produce good transitions, and is thus sadly still work-in-progress. Nonetheless I am personally happy with my learning journey and found that the model occasionally produces non-trivial note sequences that are pleasant to listen to. On the AMD Radeon W7900 with ROCm drivers, training 100 epochs took around 10 hours, which is less than expected for a transformer from scratch. Originally, I thought I had to use a GAN because they are easier to train at smaller scales.

midiFill

Things used in this project

Hardware components

Software apps and online services

Story

Method

Results

Schematics

Transformer Diagram

Code

Main Repository

Credits

Lennart Finke

Comments

Embed the widget on your own site

midiFill

midiFill

Things used in this project

Hardware components

Software apps and online services

Story

Method

Results

Schematics

Transformer Diagram

Code

Main Repository

Credits

Lennart Finke

Comments