Musical Chains: Music Generation with Clojure

June 25th 2013

All right, let’s talk Music Generation. I was reading about Genetic Algorithms recently, and I wondered whether they could be applied to music. Turns out they can, and have been. There’s some pretty cool music that’s been generated through these algorithms. Unfortunately, there isn’t really a good way for computers to tell good music from cacophony, so the fitness check in these algorithms is usually people. Though I wish I had enough knowledge about AI to take a stab at making my computer appreciate music, I’m sadly nowhere close. So I read some more, and settled upon a different approach: Markov Chains. Don’t worry if you have no idea what those are, I didn’t until recently either. Just keep reading.

So, for this foray into music, I decided to use my language du jour, Clojure. Why? Mostly because I just started learning it and want to use it wherever possible, but also because it gives me access to the wonderful Overtone music synthesis library. By the way, please excuse my horribly un-idiomatic Clojure in this post; like I said, I’m still just getting the hang of Lisp. Anyway, before I started generating music in earnest, I had some setting-up to do. First I set up Overtone, following their Getting Started guide. I created a file for my music-playing functions (player.clj), set a namespace, and pulled in the Overtone library (this is a leiningen project, so Overtone is listed as a dependency):

(ns markovmusic.player (:use overtone.live))

Then I set up some “instruments” to play the music. This is pretty much copied directly from the Overtone wiki, with a few small simplifications and modifications. For our purposes, we only need to know that each function produces it’s waveform at a given frequency, for a given duration, and at a given volume. The numbers after each argument are the default values for the argument. These are the four main waveforms used in music synthesis. Each has a unique sound, and unique properties; you can test them all out to see which one you like most. If you’re interested in more information on the differences between these waveforms, and about music synthesis in general, see this helpful introduction.

(definst saw-wave [freq 440 sustain 0.4 vol 0.4]
  (* (env-gen (lin-env 0 sustain 0) 1 1 0 1 FREE)
     (saw freq)
     vol))

(definst sin-wave [freq 440 sustain 0.4 vol 0.4]
  (* (env-gen (lin-env 0 sustain 0) 1 1 0 1 FREE)
     (sin-osc freq)
     vol))

(definst square-wave [freq 440 sustain 0.4  vol 0.4]
  (* (env-gen (lin-env 0 sustain 0) 1 1 0 1 FREE)
     (lf-pulse freq)
     vol))

(definst triangle-wave [freq 440 sustain 0.4 vol 0.4]
  (* (env-gen (lin-env 0 sustain 0) 1 1 0 1 FREE)
     (lf-tri freq)
     vol))

One thing to note about these functions is that they all set the attack and release to zero (that’s the “0 sustain 0” part). This is just to simplify them; we can add in effects later to improve/modulate the sound. OK, now that we can make sounds, let’s move on to music. But before I get to the actual code, a little detour/road-map about what’s ahead.

Markov Chains

First, the foundation of our music generation plan. Markov Chains, named after Russian mathematician Andrey Markov, are mathematical systems that go from one state to another and move forever between a finite number of states. They are “memoryless”, meaning that the choice of the next state depends solely on the current state, not any of the states that came before. Markov chains are extremely useful in modeling certain real-life processes which have an element of randomess, such as stock-market trends. The wikipedia page on Markov Chains is a great resource on the topic, which I encourage you to at least skim. For now, I’ll only discuss Markov Chains as they pertain to this project. I like to think of Markov Chains, simplistically, as a lookup table or probability matrix. For each state of a system, there is a list of probabilities that the system will go to each other possible state. For our purposes the ‘state’ will be the note, with every note in a song or group of songs being a possible state. The plan is to read in songs as a series of notes, and then calculate the probabilities for each note to transition to any other note. Hopefully, by following this chain repeatedly from each note to the next, we will end up with something that has at least a passing resemblance to music.

MIDI

Of course, before we follow a Markov Chain, we must create one. To get the initial data to populate our table, I’m reading in a few music files and recording the note progressions. For this purpose, I am using the MIDI file format. The MIDI format was explained to me in terms of an analogy which I think makes a lot of sense, so here goes. Most audio formats that we use typically, such as mp3, are precise recordings of a piece of audio. They are like records (the old kind, that you play in a phonograph), and reliably produce the exact sequence of sounds when played. MIDI files, on the other hand, don’t have the actual audio data, just the instructions to produce it. They are like sheet music; your computer’s sound card is the pianist. Therefore, MIDI files can sound slightly different when played on different computers. They also make it far easier to access the actual notes that make up a piece of music, making them ideally suited for our purposes.

Back on Track

OK, since that’s out of the way, we can return to the Clojure. My first step was to create an interface between my code and the midi files. For that, I basically converted this Java StackOverflow answer into Clojure, and adapted it to the format I wanted to use. The structure I decided on to represent the songs was a vector of hash-maps, with each hash-map containing two keys, :sound and :duration. :duration corresponds with the duration of a cord or note, while :sound corresponds with another hash-map which has each note in the sound as the keys and their respective velocities (which roughly correspond to volume, but more accurately represent how hard the input instrument was struck) as values. Wow, that was a convoluted sentence. Anyway, the code works by checking for note-on and note-off events in the midi file, and adding a new sound/duration pair to the vector representing the song whenever the notes being currently played changes. Most of the functionality is in the ‘parse-midi-file’ function, but there is also an ‘add-note’ helper function for adding notes to the current state. It deals with some irregularities in some files, such as zero-velocity notes and duplicate notes. Anyway, here’s the code, in a separate file (midi.clj) and namespace.

(ns markovmusic.midi
  (:import (java.io File)
           (javax.sound.midi MidiSystem Sequence MidiMessage MidiEvent ShortMessage Track)))

(defn add-note [msg notes] 
  (let [k (.getData1 msg) 
        v (.getData2 msg)] 
    (if (> v 0) 
      (assoc notes k 
        (+ (.getData2 msg) (get notes k 0))) (dissoc notes k))))

(defn parse-midi-file
  ([file-name] (parse-midi-file file-name 0))
  ([file-name track] 
   (let [note-on 0x90
         note-off 0x80
         sequence (MidiSystem/getSequence (File. file-name))
         track  (-> sequence .getTracks (aget track))]
     (loop [current-notes {}
            parsed []
            last-time 0
            event-index 0]
       (let [event (.get track event-index)
             message (.getMessage event)]
         (cond
           (= (inc event-index) (.size track)) parsed
           (not (instance? ShortMessage message)) 
             (recur current-notes parsed last-time (inc event-index))
           (= (.getCommand message) note-on) 
             (if (= (.getTick event) last-time)
               (recur 
                 (add-note message current-notes)
                 parsed
                 last-time
                 (inc event-index))
               (recur
                 (add-note message current-notes)
                 (conj parsed 
                       {:sound current-notes 
                        :duration (- (.getTick event) last-time)})
                 (.getTick event)
                 (inc event-index)))
           (= (.getCommand message) note-off) 
             (if (= (.getTick event) last-time)
               (recur
                 (dissoc current-notes (.getData1 message))
                 parsed
                 last-time
                 (inc event-index))
               (recur
                 (dissoc current-notes (.getData1 message))
                 (conj parsed 
                       {:sound current-notes 
                        :duration (- (.getTick event) last-time)})
                 (.getTick event)
                 (inc event-index)))
           :else (recur current-notes parsed last-time (inc event-index))))))))

So now we can read in MIDI files; all I need is some files to read. I picked a few simple, recognizable pieces from 8notes.com: Eine Kleine Nachtmusik, Greensleeves, Happy Birthday, Ode to Joy, and Scarborough Fair. I found that some pieces or combinations work better than others – pieces with lots of short, choppy rhythms usually tend to produce really terrible, broken-sounding “compositions”. Some pieces seem like they would work well, but then turn out horrible. Anyway, I digress. So I downloaded and saved those files for later use. I tested out the midi parser, which seems to work fine. Parsing Happy Birthday in the REPL, for example, results in this:

[{:sound {}, :duration 512}
 {:sound {60 65}, :duration 128}
 {:sound {60 71}, :duration 128}
 {:sound {62 79}, :duration 256}
 {:sound {60 67}, :duration 256}
 {:sound {65 85}, :duration 256}
 {:sound {64 75}, :duration 512}
 {:sound {60 73}, :duration 128}
 {:sound {60 73}, :duration 128}
 {:sound {62 80}, :duration 256}
 {:sound {60 71}, :duration 256}
 {:sound {67 91}, :duration 256}
 {:sound {65 73}, :duration 512}
 {:sound {60 66}, :duration 128}
 {:sound {60 72}, :duration 128}
 {:sound {72 92}, :duration 256}
 {:sound {69 72}, :duration 256}
 {:sound {65 63}, :duration 256}
 {:sound {64 75}, :duration 256}
 {:sound {62 71}, :duration 256}
 {:sound {70 91}, :duration 128}
 {:sound {70 74}, :duration 128}
 {:sound {69 69}, :duration 256}
 {:sound {65 68}, :duration 256}
 {:sound {67 77}, :duration 256}
 {:sound {65 73}, :duration 768}]

Now that we can parse the files, we need to construct a probability matrix. For that I created a new file and namespace, chain.clj (as in Markov “Chains”) and markovmusic.chain. I decided to first create a frequency matrix, to count how many times each note followed each other note. I planned on using Markov chains for durations as well as notes, so my function for generating the frequency matrix takes another function as an argument. This function dictates what value in each element of the song is counted for the frequency matrix. The ‘generate-frequency-matrix’ function also takes an initial matrix, so that I can add on to matrices from other songs and ‘combine’ Markov chains. Here’s the function:

(defn generate-frequency-matrix
  ([song func] (generate-frequency-matrix song func {}))
  ([song func matrix] 
   (if (< (count song) 2)
     matrix
     (recur 
       (vec (rest song)) 
       func 
       (val-inc matrix (func song 0) (func song 1))))))

It uses the ‘val-inc’ function, which is a simple function that I made to increment a value nested in a 2D vector:

(defn val-inc [t r c]
  (assoc-in t [r c] (inc (get-in t [r c] 0))))

To use as the selection function, I created ‘get-duration’ and ‘get-notes’, which (obviously) get the duration/notes at a specific position in the song vector:

(defn get-duration [song position]
  ((get song position) :duration))

(defn get-notes [song position]
(keys
  ((get song position)
    :sound)))

I also created a ‘get-volume’ function, but I’m not showing it here since I didn’t end up using it and removed it. If you want, it should be fairly simple to write one.

The ‘generate-frequency-matrix’ function now generates a frequency count. Here’s the output when the arguments are a parsed Happy Birthday and the ‘get-notes’ function:

{(70) {(69) 1, (70) 1},
 (69) {(65) 2},
 (72) {(69) 1},
 (67) {(65) 2},
 (64) {(62) 1, (60) 1},
 (65) {(67) 1, (60) 1, (64) 2},
 (62) {(70) 1, (60) 2},
 (60) {(72) 1, (67) 1, (65) 1, (62) 2, (60) 3},
 nil {(60) 1}}

Now this frequency table must be converted to a probability matrix. In the following snippet, frequency-to-probability does this. It uses ‘mapmap’, a strangely named function that I wrote which simply applies a function to every value in a hash-map while leaving the keys untouched. I feel like there is a better implementation, or an equivalent function in the standard library, but it eludes me.

(defn mapmap [f m]
  (reduce #(assoc %1 %2 (f (m %2))) {} (keys m)))

(defn frequency-to-probability [freqmatrix]
  (mapmap
    (fn [notedata]
      (let [total (* 1.0 (reduce + (vals notedata)))]
        (mapmap #(/ % total) notedata)))
    freqmatrix ))

For the last function in the chain namespace, I need to be able to randomly pick one of the elements in the frequency matrix, taking into account each element’s probability. The ‘make-choice’ function below accepts a hash-map in which the keys are the choices and the values are the probabilities, and picks one of the keys.

(defn make-choice [choices]
  (let [items (keys choices)
        probabilities (vals choices)
        r (rand)]
    (loop [i 0
           cummulative-probability (nth probabilities 0)]
      (if (< r cummulative-probability)
        (nth items i)
        (recur 
          (inc i) 
          (+ cummulative-probability 
             (nth probabilities (inc i))))))))

OK, now that we have all the mechanisms for dealing with Markov Chains in place, we can move on to playing some music. Lets head back to player.clj, and fill it out some more. First, a simple function for playing a note in the midi range for a given duration and volume (since Overtone uses Hz for frequency, unlike MIDI):

(defn play-note [midinote sustain vol]
  (saw-wave (midi->hz midinote) sustain vol))

Then two small functions that just make it a little easier to get the note and duration probability matrices. Notice that these functions generate the matrices for vectors of songs. To get the matrix for only one song, you must pass a vector containing that single song. The reasoning behind this will be evident later.

(defn note-matrix [songs] 
  (chain/frequency-to-probability 
    (reduce 
      #(chain/generate-frequency-matrix %2 chain/get-notes %1) {} songs)))
(defn duration-matrix [songs] 
  (chain/frequency-to-probability 
    (reduce 
      #(chain/generate-frequency-matrix %2 chain/get-duration %1) {} songs)))

And finally, the actual playing function. The ‘play’ function takes two probability matrices, one for pitch and one for volume, and plays notes following the Markov chains… FOREVER! Or at least until you hit Ctrl-C.

(defn play [note-matrix duration-matrix]
  (loop [note-count 1
         sound (first (keys note-matrix))
         duration (first (keys duration-matrix))]
    (do
      (if (not (or (= sound nil) (= (count sound) 0) (< duration 96)))
        (do
        (doseq [note sound]
          (play-note note (/ duration 1000.0) (/ 70 128.0)))
        (Thread/sleep duration)
        (println (str "Playing Note #" note-count ", pitch(es): " (clojure.string/join " & " sound) ", velocity: " 70 ", duration: " duration))))
      (recur
        (inc note-count)
        (chain/make-choice (get note-matrix sound {(rand-nth (keys note-matrix)) 1}))
        (chain/make-choice (get duration-matrix duration {(rand-nth (keys duration-matrix)) 1}))))))

That function should be mostly self-explanatory, but I want to point out a few things. the first ‘if’ statement, in line 6, causes the player to skip over any rests/stops in the music as well as any notes with durations less than 96 ticks (which corresponds usually with a 1/16 note). Though this deviates slightly from the original idea of only using the Markov chain, I found that it great increases the quality of the output. Also, in the call to play-note, the duration and volume are converted from their range in MIDI to the range that Overtone uses. Sound and duration start at randomly selected values that exist somewhere in the matrix, for an extra dash of stochasticity.

To wrap up player.clj, I added a function called ‘random-play’. In case you couldn’t guess, this just plays random notes within a reasonable range of pitches, durations, and volumes. I intentionally made it similar to ‘play’, so it could probably have been considerably more concise.

(defn random-play []
  (while true
    (let [sound (repeatedly (rand-int 3) #(+ (rand-int 20) 60))
          volume (+ (rand-int 60) 40)
          duration (rand-nth [128 256 512 1024 48 96 384])]
      (do
        (doseq [note sound]
          (play-note note (/ duration 1000.0) (/ volume 128.0)))
        (Thread/sleep duration)
        (println (str "Playing Pitch(es): " (clojure.string/join " & " sound) ", velocity: " volume ", duration: " duration))))))

Both ‘play’ and ‘random-play’ print out the details of each note that is played. For example, here is a snippet from a call to ‘play’:

Playing Note #2, pitch(es): 48 & 60, velocity: 70, duration: 1024
Playing Note #16, pitch(es): 53 & 64, velocity: 70, duration: 256
Playing Note #17, pitch(es): 62 & 53, velocity: 70, duration: 256
Playing Note #18, pitch(es): 60, velocity: 70, duration: 256
Playing Note #19, pitch(es): 59, velocity: 70, duration: 256
Playing Note #21, pitch(es): 55 & 52 & 48 & 60, velocity: 70, duration: 1024
Playing Note #22, pitch(es): 55 & 52 & 48, velocity: 70, duration: 256
Playing Note #23, pitch(es): 62 & 55 & 52 & 48, velocity: 70, duration: 256
Playing Note #24, pitch(es): 55 & 52 & 48, velocity: 70, duration: 512
Playing Note #25, pitch(es): 60 & 55 & 52 & 48, velocity: 70, duration: 512

And here’s some output from random-play:

Playing Pitch(es): 79, velocity: 63, duration: 256
Playing Pitch(es): , velocity: 61, duration: 256
Playing Pitch(es): 69 & 63, velocity: 83, duration: 384
Playing Pitch(es): , velocity: 90, duration: 96
Playing Pitch(es): 60, velocity: 89, duration: 48
Playing Pitch(es): 71 & 70, velocity: 59, duration: 384
Playing Pitch(es): 79, velocity: 90, duration: 128
Playing Pitch(es): 78, velocity: 86, duration: 384
Playing Pitch(es): , velocity: 76, duration: 512
Playing Pitch(es): , velocity: 76, duration: 96
Playing Pitch(es): 76, velocity: 79, duration: 384
Playing Pitch(es): 67, velocity: 44, duration: 1024

Bringing it all Together

We’re all set! For the final touch, I wrote core.clj, which contains another ‘play’ and a few predefined songs. This ‘play’ takes one or more parsed songs as its arguments, and plays music based on the Markov chain generated from them. The predefined songs are those I downloaded earlier, but parsed and given short names.

(ns markovmusic.core
  (:require [markovmusic.midi :as midi])
  (:require [markovmusic.chain :as chain])
  (:require [markovmusic.player :as player]))

(def ek 
  (midi/parse-midi-file 
    "/home/vishnu/Downloads/mozart_eine_kleine_easy.mid" 1))
(def gs 
  (midi/parse-midi-file 
    "/home/vishnu/Downloads/greensleeves.mid" 1))
(def hb 
  (midi/parse-midi-file 
    "/home/vishnu/Downloads/happy_birthday_easy.mid" 1))
(def oj 
  (midi/parse-midi-file 
    "/home/vishnu/Downloads/beethoven_ode_to_joy.mid" 1))
(def sf 
  (midi/parse-midi-file 
    "/home/vishnu/Downloads/scarborough_fair.mid" 1))

(defn play [& songs]
  (player/play 
    (player/note-matrix songs) 
    (player/duration-matrix songs)))

Now, what it all comes down to: the music! Fair warning: don’t expect too much, this is just a beginning.

First, a sampling of ‘random-play’ (Sorry IE users, these probably wont play for you):

Yeah, doesn’t sound so good. But honestly, it actually sounded better than I expected. This is probably because all of the notes and durations were selected from ranges that I observed often in other midi files; if they had been drawn from the entire possible ranges, it would have sounded far worse.

Now for the generated music:

Well, it’s not Mozart, but it’s not too terrible either. Each of these samples was the first few hundred notes from a call to ‘(play ek gs hb oj)’. That plays music based on the Markov chains generated from Eine Kleine Nachtmusik, Greensleeves, Happy Birthday, and Ode to Joy. One thing that sticks out to listeners is the lack of “closure” at the end of each sample – remember that the music would keep going infinitely, and that each sample is only a short beginning portion. Also, if you listened to both samples fully, you might have noticed that the first clip tended to get stuck in a rut of Greensleeves now and then, before breaking out. That seems to be an occupational hazard of Markov Chains; might as well make it a feature and call it “Variations on a Theme”. A few more examples, from different combinations of the same five predefined songs:

Happy Birthday:

Eine Kleine Nachtmusik:

And one of my favorites, Happy Birthday and Ode to Joy:

Future Improvements

This is just an extremely rough start for music generation; there’s plenty of room for improvement. One idea I considered was making the duration more closely tied to the note, instead of having it’s own probability matrix. Perhaps I could use only one Markov chain, which decides both duration and pitch. As it stands, the duration often seems out-of-sync with pitch. Also, this music generator could be adapted to use the current and previous sounds to determine the next sound, instead of only using the current sound. This would make it a “second-order Markov chain”, and would probably make the output sound more cohesive and melodic, but also more similar to the pieces it is drawn from. Lastly, a larger and better-suited corpus of MIDI files would drastically change the performance of this generator, as might using a more nuanced sound for each note (try using a different oscillator, and adjusting the attack and release times).

OK, that’s that! I hope you enjoyed generating music, and have been inspired to try it out on your own. Here’s the full code for my project on Github: https://github.com/vishnumenon/markovmusic. If you download the project and try it out, remember that the songs are not included and that the file names in core.clj are specific to my computer and will not work for you. Sorry about the lack of niceties like documentation and tests; hopefully this post counts as documentation.

blog comments powered by Disqus