How do humans understand and appreciate music? An exploration using computers.
The singer swings his arms around as he delves into a raga alapana. The violinist listens carefully and plays the appropriate following phrases. The percussion artists join once the composition starts, and the audience begins to keep track of the tala enthusiastically. Applause occurs sometimes in the middle of a piece, when a particularly telling svara phrase is sung or an interesting mrudangam pattern played. As the three hour concert comes to a close and the mangalam is sung, the curtains come down and listeners leave, content and filled with music.
Music is an art form, a source of entertainment, a means of communication, a way to celebrate, a method of therapy, a source of joy. So what do computers have to do with music? Can a machine recognise ragas? Can Indian music be given a notation? Can a computer transliterate mrudangam beats? Can it separate a concert out into different songs? Can it identify why certain songs make us cheerful, and others make us melancholic?
These are some of the questions that are being explored by Prof. Hema Murthy and her students in a project that is part of CompMusic, a worldwide Music Information Retrieval Project that is examining various traditional forms of music. The music genres covered by the project are Carnatic (South India), Hindustani (North India), Turkish-makam (Turkey), Arab-Andalusian (Maghreb, Northwest Africa), and Beijing Opera (China). The project deliberately focuses on certain traditional forms of music that have not been documented the way some other systems of music, such as western classical, have. One of the primary goals of the project is to showcase these systems of music to the world.
Prof. Xavier Serra, coordinator of the project, a researcher in the field of sound and music computing, and a musician himself, first heard Carnatic music when he was an expert speaker at the “Winter School on Speech and Audio Processing” in 2010 on “Audio Content Analysis and Retrieval”. In his own words, he had not heard anything like this before. He convinced Prof. Murthy to join the CompMusic project, overcoming her initial reservation that dissecting and analysing music would lead to losing the pleasure of simply listening to it. Prof. Murthy immediately saw that it was important for a musician to be part of the project, and TM Krishna, a popular Carnatic vocal musician, agreed to be a collaborator. In addition, a student of his, vocalist and engineer Vignesh Ishwar also joined the project officially.
The first challenge faced by the team was the concept of a svara in Indian classical music. Although loosely translated as a musical note, a svara is not so much a note of a single fixed frequency as a range of sounds hovering around a certain frequency. One may vocalise a particular svara, but really be singing a combination of them. When say, one sings the svara ‘ma’ in the raga Sankarabharanam, ‘ma’ is the only svara pronounced, but the svaras ‘ga’ and ‘pa’ are also touched upon.
In western music, pieces are composed to be performed in a prescribed fixed scale. However, in Carnatic music the tonic is the reference note established by the lead performer, relative to which other notes in a melody are structured.
Musicians generally perform across three octaves, the lower, middle and upper octaves. An octave consists of seven svaras, ‘sa ri ga ma pa dha ni’.
The tonic is the note referred to as ‘sa’ in the middle octave range of the performer. In order to identify a raga, the first requirement is to determine the tonic, because it gives a frame of reference. The same phrase of svaras may be identified as completely different ragas if the frame of reference is different!
The basic unit of measurement of musical intervals is a cent, which is a logarithmic measure. An octave is divided into 1200 cents spanning 12 semitones of 100 cents each. However, based on the tonic, the range of frequencies in an octave changes. For example, if the tonic is 220 Hz, then the higher octave ‘sa’ is at 440 Hz, and the range of frequencies in that octave is 220 Hz, and the 20 Hz are divided into 1200 cents. But for someone whose tonic is 160 Hz, the range of frequencies in an octave is 160 Hz, and those 160 Hz are divided into 1200 cents. Yet both these octaves are heard the same way by a listener. So tonic normalisation needs to be done. That is, the pitch histogram in the Hz scale needs to be converted to a pitch histogram in the cent scale.
How does a listener determine what the tonic is? Usually one can immediately determine which note is the ‘sa’ (even if the note itself is not articulated and only the lyrics of a song are sung). This is because the svaras ‘sa’ and ‘pa’ are ‘prakruti svaras’, or fixed svaras, which are sung in a plain way compared to the other svaras. That is, the bandwidths of frequencies for these notes are sharper. Hence Prof. Murthy and the team used signal processing and machine learning techniques to findout which are the sharper notes. The pitch histogram of the music is processed using what is known as a group delay function.
The group delay technique emphasises the svaras ‘sa’ and ‘pa’ and this gives the tonic. With this method, the group was able to achieve about 90 percent accuracy in identifying the tonic. To further fine-tune these methods, they segmented the sound of the tambura, an instrument that provides pitch to the musician. They determined the tonic from the tambura alone, and this helped increase the accuracy of tonic identification to around 98 percent.
The next problem addressed was that of melodic processing. Can a computer listen to a song in a particular raga and determine what that raga is? First of, what is a raga? Loosely, it a collection of svaras with some rules as to what combinations they can be sung in. But a raga is not merely the notes that make the scale. The phrases (sequences of notes) that are intrinsic to the raga, the various gamakas (ornamentations) employed, the silences between the notes, the inflections – all of these make up the aesthetics of a raga.
How does a listener identify a raga? Somehow, within a few seconds of a musician singing a raga, a reasonably musically literate listener is able to identify it. As the group realised, each raga has certain unique typical motifs or signature phrases. A typical motif is a phrase that a particular raga is replete with, and that does not figure in any other raga. A motif can be quantified by pitch contours and viewed as a time frequency trajectory. The group realised that most of these motifs come from compositions set to tune in the raga, typically those composed by the ‘musical trinity’ – Shyama Sastri, Thyagaraja and Muthuswamy Dikshitar. In particular, they realised that it is the pallavi or the first segment of a song that typically contains the richest phrases of the raga. As an analogue, it is in the initial few phrases of an alapana (a particular form of melodic improvisation typically performed before a composition) that the identity of the raga that is to be performed is established. Prof. Murthy likens this to an artist drawing an outline of a landscape, or a portrait, before filling in the details, or a computational mathematician, who can gauge the behaviour of a matrix by looking at its first few eigenvalues.
The first task was thus to build a database of typical motifs of commonly performed ragas. The team used an algorithm called the longest common subsequence and a variant of it called the rough longest common subsequence to identify those phrases that are frequently repeated in compositions.
When we hear a raga, we first identify it with a smaller group of ragas, its cohorts. The team set about the task of defining the cohorts for a number of commonly performed ragas. They eventually realised that what really happens when one listens to a piece of music is raga verification rather than raga identification. When we hear a raga, we do not actually compare it with every one of the hundreds of ragas we might know. First we think, ‘hey, it sounds like this song’. We identify it with some smaller subset – its cohorts – and then verify which of the ragas it is in the smaller group. The machine is trained to do the same – the given time-frequency trajectory is first identified with a small set of cohort ragas, each of which has some typical motifs derived from compositions and alapanas. These typical motifs are fed as queries and if they occur in the input raga, the raga is thus verified.
The ragas Sankarabharanam and Kalyani difer by a single svara. But anyone with a little knowledge of Carnatic music can tell them apart. Interestingly, comparisons of their time frequency trajectories also show how unlike each other they are. This is why the computer has to be trained to recognise raga motifs as time frequency trajectories rather than as mere notes. Carnatic music is primarily an oral tradition and notations only provide a rough framework.
The next task that the team worked on involved percussion. Percussion is a complex part of Carnatic music. The raga is the melody, while the rhythm aspects are the laya and tala. Specifically, the tala is the rhythmic structure in which a composition is set. The mrudangam is the primary percussion instrument used in Carnatic music, while other instruments such as the kanjira, ghatam and morsing are also used. The mrudangam playing can vary depending on the lead artist, the mrudangam artist himself, the song, the emotion conveyed by the music and so on. The silences in between strokes are as important as the beats themselves. Sometimes the playing is deliberately slightly of beat. This is called ’syncopation’. Improvisation is a very important part of Carnatic music and musicians usually meet for the first time on stage with no prior rehearsals.
One of the purposes of analysing percussion is to put markers on a composition, to determine which part of the tala the song starts in, ends in and so on. TM Krishna pointed out that ‘moharras’ (certain predetermined patterns played) are more or less fixed. But first, the beats of the mrudangam have to be transcribed in some way. Prof. Murthy was actually approached by Padma Vibhushan Sangeeta Kalanidhi Dr. Umayalapuram Sivaraman, a renowned senior mrudangam artist, who wanted the beats of his mrudangam to be displayed on a screen when he played. Every beat played on the mrudangam can be articulated orally as syllables, say for example, ‘ta ka dhi mi’. This process of the vocal articulation of percussion syllables, or ‘sollus’ as they are known, is also called ‘konakkol’.
The mrudangam stroke is viewed as an FM-AM (frequency and amplitude modulation) signal since the sound of the mrudangam involves both pitch and volume. The strokes played on the right hand side of the mrudangam are pitch strokes, while those played on the let side are not. There is still however a certain coupling between them. In order to analyse the strokes and syllables of the mrudangam, Prof. Murthy relied on her work in speech recognition. Just as in speech, each syllable has a vowel with consonants on either side. In a similar way, a mrudangam syllable has an onset, attack and decay. Onset is the beginning of the syllable, which reaches its crescendo in the attack, and then it begins to decay. They used what is known as a hidden Markov model, in which attack, onset and decay are the three states. Transitions between these can also be made states in this model. Using this model, they went about the process of classifying strokes. The team devised features to process strokes of the mrudangam, kanjira and some other percussion instruments.
Ater transcribing the strokes played by an artist, the machine uses the Markov model to locate the moharra. The moharra in turn gives the number of aksharas or beats in the tala and so the cycle length is determined. The team is now working on how to find the point in the tala where the song begins.
An important goal of music information retrieval is music archival. In this respect, one task handled by the group was concert segmentation, a major part of music archival. Most available recordings of Carnatic music are ontinuous but we oten need to listen to only one particular song or a raga. This requires that the concert be segmented into different segments. Prof. Murthy initially suggested that they segment concerts into different songs using applause. How can applause be detected? Mapped as a time vs amplitude graph, it has the shape of an eye. A few in the audience start clapping, it reaches a crescendo and then again becomes subdued. The team developed some descriptors to detect applause, and some criteria were fixed for the duration of an applause.
However, the assumption made here is that applauses occur only at the end of a certain piece in a concert. But this is simply not true in a Carnatic concert! For example, they found that they had a concert recording with, say, 9 items, which was segmented into, say, 20 parts if applause were used as a means of segmentation. This is because Carnatic music has certain kinds of improvisation that take place at various points during a concert, and the audience may spontaneously clap if they find a particular part of the alapana or svaraprasthara (an improvisation technique in which svaras are sung extempore to a particular line of the composition) appealing.
Now, in every complete item of the concert, at some point or another, the vocal, violin and percussion all occur. Moreover, they all appear together as an ensemble at some point. Also, a composition occurs in each item, and an item always ends with at least a small segment of the composition. The machine is thus trained to look for those places where the vocal, violin and percussion are present together. It then goes backwards and forwards in order to identify the complete item and segment it out. How is this ‘merging’ done from the right and let? Changes in the raga can be detected and this helps determine when the performer has moved onto the next composition. The group was able to successfully apply this technique to segment about 70 continuous recordings of concerts into separate items. Web discussion groups such as rasikas.org that carry reviews and lists of items performed in concerts made it possible to match the song names to the segmented items.
In addition, the team was able to quantify applause. The strength of applauses indicates the highlights of a concert. This algorithm can be used to pick out the best or most popular parts of a concert.
The group has also been able to do some work on singer identification. Drawing on experience from Prof. Murthy’s work in speaker recognition, they have been able to work on identifying musicians by the timbre of their voices. This is possible by actually training a machine learning algorithm to learn voice characteristics of different singers. This is also important in archival and concert categorisation.
The CompMusic team has contributed metadata for items from Carnatic music concerts and albums to MusicBrainz, an open online music encyclopedia and database which aims to serve as a repository of metadata for music from across the world. The group is also considering working on an application like SoundHound, in which one can query a song by humming a part of it. The CompMusic project has developed a web browser called Dunya which is freely available for use. The work that has been done in different genres by different groups can be tested on this browser. For example, one could think of applications like:
1. One can sing a snippet and the browser determines the tonic and range of frequencies,
2. If one is listening to a piece of music and reproducing it, the browser tells how accurately it has been copied – it can tell whether what has been reproduced is in the right ‘sruti’ (pitch),
3. One can feed it a piece of music and it can change the pitch and play the music alone (without the lyrics) at a different pitch.
All these applications are likely to be of great use to students of music. The browser is maintained by the core CompMusic team. Each group develops the algorithms and once they are robust, they are integrated into the tool kit on the browser.
Initially, Prof. Murthy did not expect to find enough students interested in working on Carnatic music. To her surprise, she found that a large number of them – many with no background in Carnatic music at all – were enthusiastic about the project. Shrey Dutta, for instance, went on to learn to play the veena, began to listen to Carnatic music, and did much of the motif recognition work. He says now that he only needs to look at the time frequency trajectory of a raga to identify it, not even requiring the analysis that the machine does! Jom Kuriakose also came to Prof. Murthy with little background in Carnatic music. He was, however, fascinated with percussion, and now works directly with Umayalapuram Sivaraman on onset detection. Prof. Murthy is very happy that students have been incredibly open about working in what is considered a niche and old fashioned genre of music.
Prof. Murthy stresses several times that any kind of machine learning should be implemented only with proper context. A proper knowledge base should be the frame of reference for any machine learning techniques. Machine learning involves big data, and as one pumps more information in, it learns on the average. Signal processing gives results in the particular, but can make errors. Combining the two makes for an ideal recipe for music information retrieval techniques.
Far from taking away the joy of listening to music, Prof. Murthy says analysing Carnatic music has brought light to many things she had not realised earlier. A raga can sound drastically diferent when interpreted by different musicians. Moreover, their structures have changed over the centuries, and only keep evolving with time. Why do some things work in music, and others fail? Why can one sing a particular gamaka in a particular raga but not in another? Several things are intuitive about music. Can a machine understand these subtleties? Computers can be trained to play chess, to prove mathematical theorems, to diagnose diseases, to recognise ragas and so much more. Can they also be trained to think intelligently and creatively? To give elegant proofs, and discern between good and mediocre music? If pointed in the right direction, can they even see things that humans may miss? This requires a deep understanding of human cognition and the human creative process. By working on various forms of music, and more generally in the understanding of art and creativity, CompMusic is a significant contribution to the vast ocean of artificial intelligence.
Prof. Hema A Murthy is a Professor in the Department of Computer Science. She obtained her PhD from the Department of Computer Science and Engineering at IIT Madras in 1992. Her areas of research are speech processing, speech synthesis and recognition, network traffic analysis and modelling, music information retrieval, music processing, time series modelling, and pattern recognition.
Arundhathi Krishnan is a PhD student in the Department of Mathematics and a Carnatic musician. Her area of research is Functional Analysis and Operator Theory. She can be reached at email@example.com
All the images have been taken from Prof.Murthy’s research papers. Cover Image Source: S Hariharan