-
Notifications
You must be signed in to change notification settings - Fork 115
Design
mir_eval should implement common metrics used to evaluate music information retrieval and audio signal processing algorithms. All metrics should be based on a pre-existing publication. Each implementation should be well-documented and as "transparent" as possible, so it is easy to understand how the metric is being computed and to make changes to metrics. mir_eval should also be modular, so that common tasks across metrics have their own functions, both to prevent duplication of code but also so that certain subtasks can be replaced easily.
- Each metric for each task has its own function in mir_eval.task. The metric's function should not do any loading of data or preprocessing, it should work on raw annotations.
- All shared/non-domain-specific functionality (e.g. F-measure, sampling intervals) should be in mir_eval.util.
- Any shared functionality across metrics of a single task which are not meant to be used outside of the context of computing a metric should be defined in underscore functions of that task’s submodule.
- Any task-specific preprocessing functions should go in the task’s submodule.
- Each task should have an evaluator which performs all data loading, preprocessing, and evaluation. The evaluators should not define any new functionality, but instead should provide a usage example and a black-box system for going from annotation to score.
- Each task submodule should have an OrderedDict whose keys are the names of the metrics for that task and whose values are the functions used to compute those metrics.
- All metric functions should have example usage which includes loading/pre-processing.
mir_eval has submodules both for evaluation of specific tasks and for common functionality/utility.
These submodules contain metrics for a specific MIR/signal processing "task", to be used for quantitative analysis.
The beat submodule replicates the functionality of the beat evaluation toolbox.
The melody submodule implements all metrics from "Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges".
The segment submodule implements all of the MIREX metrics, that is: boundary retrieval recall rate, boundary retrieval precision rate, and boundary retrieval F-measure as used in "A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting" and " Structural Segmentation of Musical Audio by Constrained Clustering", pairwise precision rate, pairwise recall rate, and pairwise F-measure also from " Structural Segmentation of Musical Audio by Constrained Clustering", normalized conditional entropy-based over- and under-segmentation scores as described in "Towards Quantitative Measures of Evaluating Song Segmentation" and the clustering Rand index.
The separation submodule replicates the functionality of the BSS-eval toolbox
The onset submodule computes the precision, recall, and f-measure of the sampled onset times as described in "Evaluating the Online Capabilities of Onset Detection Methods".
The chord submodule contains functionality for mapping chords into different dialects (e.g. min/maj, triads, quads, etc.) and for computing the frame-wise accuracy.
The pattern submodule contains various functions to compute the standard and establishment f-measure, precision and recall (F,P,R), the occurrence f-measure, precision and recall (F_occ, P_occ, R_occ), the three layer f-measure, precision and recall (F_3, P_3, R_3), and the first five target proportion metric (FFP).
These submodules contain shared functionality across tasks and qualitative analysis.
Functions for creating audio signals based on algorithm output, including synthesizing clicks for temporal events, synthesizing chords and synthesizing chromagrams.
Functions for plotting the output of different algorithms.
Utility functions (pre-processing, basic metrics) shared across tasks.
For reading in data from files
- mir_eval.io.load_events
- mir_eval.beat._clean_beats
- compute all metrics
- mir_eval.io.load_annotation
- mir_eval.util.adjust_intervals
- cmpute all metrics
- mir_eval.io.load_time_series
- mir_eval.melody.hz2cents . . .
- mir_eval.io.load_events
- compute all metrics
- load audio either with scipy.io or librosa
- resample either with scipy.signal or librosa
- make signals the same length
- compute bss_eval metrics
- mir_eval.io.load_annotation
- Reduce chord alphabet
- Sample sequences
- Score frame-by-frame
- mir_eval.io.load_patterns
- compute all metrics
Should get a perfect pylint score. Docstrings everywhere in sphinx format.