Saturday, August 28, 2010

Activity 11: Playing Notes by Image Processsing

Playing notes by image processing?! Is this even possible?! Well at first, I thought that this wouldn't be accomplished. But after sometime pondering about it and brain storming with a colleague/friend, I suddenly realized that this can be done but it would be very tedious! Using all the image processing techniques that I have learned so far, playing notes by image processing would now be realized. First things first, what image do we need so that we can play notes from it. Of course from a musical score sheet. The one that I have used was a musical score sheet of Twinkle Twinkle Little Star.
Figure 1. Musical score sheet of Twinkle Twinkle Little Star.

Since the score sheet has two parts, the treble-staff and the bass-staff, I separated the two staffs and then separated the two staffs into three lines.


Figure 2. Treble-staff separated into three lines.



Figure 3. Bass-staff separated into three lines.

Determining the pitch and note length
So now, how are we going to play notes from these images? Placement and shape of notes in a musical score sheet tells information on its pitch and its timing or length. The placement of notes in the staff tells what pitch it is on. It tell us what is the value of the note in the heptatonic scale, or also known as the Do-Re-Mi-Fa-So-La-Ti scale, or sometimes represented as letters, C-D-E-F-G-A-B respectively. Now, since I have two staffs in the musical score sheet, the Treble and the Bass, I might as well describe the difference between the two. The treble staff is defined by a Treble clef or also known as the G-clef. In a treble staff the first bottom line is defined as E in the heptatonic scale, the space above the E is defined as the F in the heptatonic scale. Every upward/downward step from a line to a space, or vice versa, represents a upward/downward step in the heptatonic scale. The bass staff on the other hand is represented by a bass clef or also known as the F-clef. Same principles in the treble staff applies in the bass staff. However, the first bottom line in the bass staff is defined as the G in the heptatonic scale.

The timing or length of a note is defined by its shape. A whole note has a different form from a half note and so as from a quarter note. Explanation about this would be very lengthy, so for this I suggest you visit this site.

Now we know what the important things are, pitch and timing, how do we implement it for note playing? First let us implement the pitch part. Since the pitch is defined by the noted position in the staff. I implemented morphological operations to pin down the y-locations of the notes in the staff.



Figure 4. Morphological processes applied to the three lines in the treble staff to pin down the y-location of the notes in the staff

The image above was scanned so that the first pixel seen in each blob will remain. From the these y-coordinates, the pitch was then acquired by pixel to real value ratio. The frequencies of the heptatonic scale was obtained from this site.

Now the pitch problem is solved, only the timing part is left. To get the timing of each note, correlation technique was used. Since only two type of notes are present in the score sheet, a half note and a quarter note, only two notes are needed to be correlated.

Figure 5. Image of the half note and quarter note that was used in the correlation technique.

After the line was correlated with the image of the note, it was then binarized to a certain thresholding value so that at least one pixel per note would be left.

Figure 6. Binarized image of the the correlation image of the 1st treble line with (top) half note and (bottom) quarter note.

The resulting images from the correlations of the line with the two notes was ten added. This image would be the basis on how to classify the length of the note.
Figure 7. Sum of the images in figure 6.

To classify the length of the note, we check the x-locations of the pixels of image in figure 7 if it is equal to the pixels found in the half note and quarter note correlation images. If they are equal to one of the x-locations of the correlation images, then its length is defined by that correlation image.

Playing the note in Scilab
So now that we know the all of the pitch and length of the notes, how are we going to play it in Scilab? First, we must make a sinusoidal wave with frequencies and length obtained from the image processing. The length of the note is defined by the function soundsec. In Scilab, a function sound is used to play sinusoidal waves. However, the sound function plays the sinusoid in a constant amplitude, different from how a musical instrument plays a note. In a musical instrument, the overall musical note played has an amplitude envelope. This envelop is divided into parts; attack, decay, sustain and release.
Figure 8. Sound envelop of an instrument.

Now that I have the sound envelop. I multiplied this, element per element with the sinusoidal wave of one note. After all the notes are multiplied with this sound envelop, it was then normalized. Normalization is required for the Scilab function wavwrite, because wavwrite expects the sound data to be saved to be in the range of -1 to 1.

The two parts of the song were then combined by wave superposition to generate the full audio sound of the musical score sheet.

Listening part
Now for the listening part, since blogger lets you only upload file of photos and videos, all audio files generated using Scilab were then made in to a video file.

Warning: Bass notes in the audio/video file may not be heard using ordinary speakers. Use a speaker that is capable of playing low frequency sounds. Enjoy!

Twinkle Twinkle Little Star - Treble part

Twinkle Twinkle Little Star - Bass part

Twinkle Twinkle Little Star

Generating notes from Scilab made me very interested. The following audio/video files are the results of my interests in music and Scilab.

Livin' on a Prayer - Bass part

The following music in the audio/video file is something that I composed myself. It was created to simulate a music that involves four instruments. Enjoy!

Feel Good song - by Mabi


Unfortunately, Blogger cannot upload the audio/video file very clearly. So might as well download the audio files from here:

I really enjoyed this activity, so I'll definitely give myself a grade of 10 on this one.

References:

2 comments:

  1. Try mo i-upload yung files sa uploading.com.
    :D
    http://uploading.com/files/aeccd6m7/mabi%2Bsong.mp3/

    ReplyDelete
  2. Lagay ka ng drum track sa feel good song.

    Nafeel ko mag-high hats nung 41 seconds within the song. :)

    ReplyDelete