Polyphonic Pitch Recognition

In the INSANE HOLLOW-pages(< click) of my P-ART homepage I explain how Music for a ritual dance (1984) - that's a tape composition of mine - builds the background of my Installation for ritual dance (1987)Since that time, I've been dreaming of a tape-to-score converter to play my polyphonic tape compositions from score.

In the nineties I used the pitch-to-midi converter (Roland CP-40) in an experimental way to convert monophonic audio performances of mine into MIDI-messages.

Further more, I used new MIDI- recording facilities to compose in realtime as a performer. Building a MIDI-scanner under the keyboard of my grand piano on the one hand and an intuitive music notation software on the other, made it possible to make scores without writing out note after note, so that I could play it as it came to me.

Actually, many computer based problems still have to be solved to convert polyphonic music into scores. With Melodyne (http://www.celemony.com) I can only record single melody tracks and save them as notes in their position in time and pitch, and respecting the phrasing of the melody line. You can even simply take a note and move it to any pitch or time position you like. The transition between notes will always be kept in a way that musically makes sense, and automatic formant correction always makes the moved note sound the way you expect it to.

Polyphonic pitch recognition is really very complex. However, advances with regard to signal separation and polyphonic harmonic tracking which are part of the 'polyphonic pitch-to-MIDI' problem, have been made.

It might still help to understand how our auditory perception works, and if it is only for the reason that perception is an additional (even creative) process rather than one of pure analysis. Like the decimal 'software'system in our brain on an underlying "neuro-binary" system, the computer is able to do decimal arithmetics based on a "binary-digit" system. This is a pretty similar concept, although it is certainly implemented differently on each system.

The group of sounds that the human perceptual system tags as having"the same pitch" is so complicated and weird that any pitch-analysis algorithm with relevance to the human listener must essentially be isomorphic to the human processing ability. The same is true for any other perceptual features like loudness, tempo, chord quality, etc.

When two instruments are playing notes from the same chord (particularly octave or fifth intervals) they will share some harmonics which makes segregation more difficult. It can also give the effect of a `harmonic root' an octave or so below, which is a kind of virtual pitch phenomenon (e.g. you can hear a low note even when there is no fundamental).

So the engineering problem is very difficult. Even the most raw signal processing approaches to the problem are doomed to fail because they do not incorporate information about the structure inherent to musical signals. There is structure on many different levels, from the low level sinusoidal/harmonic model, through Gestalt grouping, high time domain correlations (eg pitch varies slowly over time), and also much higher levels of modelling which incorporate elements of music theory.

As a result of six years, Innovative Music Systems (IMS) pretends creating the ultimate polyphonic technology for MP3 to MIDI and WAVE to MIDI conversion. IntelliScore for PC - for Mac only via Virtual PC - (look at http://www.intelliscore.net ) converts MP3 tracks to MIDI and audio tracks (WAVE format) to MIDI -files. IntelliScore also allows to assign notes to different filters (='patches' like classical full orchestra, choir, Indian raga etc.) based on their pitch. These features allow to control the conversion process and save editing time.

If you have any experience in this matter of polyphonic pitch recognition, I strongly invite you to mail me (click here ).

back to top



back to the P-ART JOURNAL index

back to the P-ART PARADISE-index