We present a method for tempo estimation from audio recordings based on signal processing and peak tracking, and not depending on training on ground-truth data. First, an accentuation curve, emphasizing the temporal location and accentuation of notes, is based on a detection of bursts of energy localized in time and frequency. This enables the detection of notes in dense polyphonic texture, while ignoring spectral fluctuation produced by vibrato and tremolo. Periodicities in the accentuation curve are detected using an improved version of autocorrelation function. Hierarchical metrical structures, composed of a large set of periodicities in pairwise harmonic relationships, are tracked over time. In this way, the metrical structure can be tracked even if the rhythmical emphasis switches from one metrical level to another. This approach, compared to all the other participants to the Music Information Retrieval Evaluation eXchange (MIREX) Audio Tempo Extraction competition from 2006 to 2018, is the third best one among those that can track tempo variations. While the two best methods are based on machine learning, our method suggests a way to track tempo founded on signal processing and heuristics-based peak tracking. Moreover, the approach offers for the first time a detailed representation of the dynamic evolution of the metrical structure. The method is integrated into MIRtoolbox, a Matlab toolbox freely available.
This item's license is: Attribution 4.0 International