Background Automated detection of pitch in polyphonic music remains a difficult challenge (Benetos et al., 2013). Robust solutions can be found for simple cases such as monodies. Implementation of perceptive/cognitive models have been so far less successful than engineering methods, and in particular machine learning models. One reference model (Klapuri, 2006) preselects pitch candidates based on harmonic summation and searches for multiple pitches through cancellation. Aims The aim was to conceive a model for pitch detection in polyphonic music able to transcribe in details traditional Norwegian music played on Hardanger fiddle, where more than two strings are played at the same time. The new model should be applicable to other types of music as well. Perceptive and cognitive models should guide the improvement of the state of the art. Main Contribution The model is neither based on a machine-learning training on a given set of samples, nor explicitly relying on stylistic rules. Instead, the methodology consists in conceiving a set of rules as simple and general as possible while offering satisfying results for the chosen corpus of music. We follow some general principles of the model by (Klapuri 2006) while introducing new heuristics. We present a new method for harmonic summation that penalises harmonic series that are sparse, in particular when odd partials are absent, as it would indicate that the actual harmonic series is a multiple of the given pitch candidate. Besides, a multiple of a fundamental can be selected as pitch in addition to the fundamental itself if its attack phase is sufficiently distinctive. For that purpose, we introduce a concept of pitch percept that persists over the whole extent of the note, and that serves as a reference for the detection of higher pitches at harmonic intervals. Results The proposed method enables to obtain transcriptions of relatively good quality, with a low ratio of false positives and false negatives. The construction of the model is under refinement. We are applying this method to the analysis of recordings of Norwegian folk music, containing a large part of Hardinger fiddle pieces and a cappella singing. Implications Automated transcription is of high interest for musicology and music information retrieval. This enables for instance to build large corpora of scores for music analysis and opens news perspectives for computational musicology. By attempting to design computer models based on general rules as simple as possible rather than on machine learning, while resulting in a behaviour in terms of pitch detection that comes closer to human capabilities, we hypothesise that the underlying mechanisms thus modelled might suggest general computational capabilities that could be found in cognitive models as well. In the same time, an improvement of the model based on expertise in music perception and cognition is desired. References Benetos et al. (2013). Automatic music transcription: challenges and future directions. Journal of Intelligent Information Systems, 41, 407-434 Klapuri, Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes. ISMIR 2006 Keywords: pitch, computational model, harmonic summation, Norwegian folk music, Hardanger fiddle.
Computational model of pitch detection, perceptive foundations, and application to Norwegian fiddle music