Audio Effects Research

Please be patient while the database loads. Use FULL PAGE view for smoother navigation.

Contributions

To add a publication or suggest an update submit a new issue on our github repo and fill in the info: new publication or publication update

Support

To show your support please consider giving our repo a star: https://github.com/mcomunita/audio-effects-research

Tasks

Classification/Recognition/Identification - tasks concerned with recognizing the type (e.g., distortion, phaser, reverb) of audio effect(s) or specific device (e.g., Proco Rat distortion, LA2A compressor) used to process a certain audio example.

Estimation/Regression/Extraction - tasks concerned with estimating the controls settings (e.g., amount of gain, cutoff frequency, modulation speed) used to process a certain audio example or tasks related to estimating the internal coefficients of certain processing blocks (e.g., allpass filter, biquad filter, low-frequency oscillator).

Modeling - tasks concerned with capturing - as accurately as possible - the behaviour of a specific audio effect implementation, typically an analog device or part of an electronic circuit.

Processing - broad category of tasks concerned with processing an audio signal. This category includes: automatic audio effects control - were the task is to control 1 or more audio effects in a “meaningful” way from a perceptual point of view, automatic mixing - were given an audio multitrack a system generates a “meaningful” mix from a perceptual/stylistic point of view, audio processing graph estimation - were given a processed audio a system generates a “meaningful” signal chain that can be used to replicate the original processing, anti-aliasing techniques, creative uses of audio effects or derivation of new audio effects that do not strictly model specific devices.

Removal/Inversion - tasks concerned with removing the audio effect(s) applied to a certain audio example so that the unprocessed signal can be recovered from the processed one.

Style Transfer - tasks concerned with replicating the sonic characteristics of a reference audio example when applied to an input audio example, regardless of the content or specific audio effects and effects implementations used to process the reference and the input examples.

Review - overview of a specific subtopic or task in the field of audio effects research.

Paradigms

White-box - modeling is based on complete knowledge or thorough understanding of the system (e.g., circuit schematic) and typically employs ordinary/partial differential equations to describe its behaviour and numerical methods to solve them in the continuous or discrete domain. Therefore, such methods are often associated with a time consuming design process and computationally demanding and non-transferable implementations.

Gray-box - combine a partial theoretical structure - referred to as block-oriented model - with data - typically input/output measurements - to complete the model. Although they reduce prior knowledge necessary to model a device, gray-box approaches still require ad hoc measurements and optimization procedures and knowledge of the underlying implementation.

Black-box - modeling requires minimal knowledge of the system and mostly relies on input-output measurements. A major advantage is that black-box models simplify the process to collecting adequate data. However, these models often lack interpretability and might entail time-consuming optimizations.

Automatic Mixing

For research on automatic mixing check out Christian Steinmetz’s work: https://csteinmetz1.github.io/AutomaticMixingPapers/index.html

Methods

Differentiable DSP - a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks [Hayes, B., Shier, J., Fazekas, G., McPherson, A. and Saitis, C., 2024. A review of differentiable digital signal processing for music and speech synthesis. Frontiers in Signal Processing]. In other words, implementations of traditional digital signal processing blocks (e.g., biquad filter, non-linearity, low-frequency oscillator) in a framework that support gradient descent (e.g. pytorch) so that learning from data is possible.

Dynamic Convolution - techniques where the impulse response or processing kernels of a system are varied as a function of the present and/or past input amplitudes to model non-linear or hysteretic behaviours of a system.

Equations - techniques where modelling or emulation is based on solving or approximating the physical equations describing a system’s behaviour. Often based on iterative or numerical methods.

Neural Network - techniques where neural networks are used to solve a task by learning from data.

State-space - The state-space approach represents an electronic circuit as a system of first-order differential equations, describing the circuit's dynamics in terms of state variables, inputs, and outputs. This method uses matrix algebra to model the relationships between circuit components, allowing for efficient digital simulation of complex analog systems. Methods developed for the simulation of state-space systems include the K-method, NK-method and DK-method.

K-method - modified state-space representation of an electronic circuit where nonlinearities are assumed to be memoryless. See [Borin, G., De Poli, G. and Rocchesso, D., 2000. Elimination of delay-free loops in discrete-time models of nonlinear acoustic systems], [Yeh, D.T. and Smith, J.O., 2006. Discretization of the’59 Fender Bassman tone stack] and [Yeh, D.T.M., 2009. Digital implementation of musical distortion circuits by analysis and simulation - Section 4.3].

Nodal K-method or NK-method - the Nodal K-method or NK-method is a system for generating the parameters for the K-method formulation given the netlist description of a circuit. It derives the K-method parameters using an algorithm similar to Modified Nodal Analysis (MNA) and is known as the Nodal K-method (NK-method). Introduced in [Yeh, D.T., Abel, J.S. and Smith, J.O., 2009. Automated physical modeling of nonlinear audio circuits for real-time audio effects—Part I: Theoretical development] and [Yeh, D.T., 2011. Automated physical modeling of nonlinear audio circuits for real-time audio effects—Part II: BJT and vacuum tube examples]. See also [Yeh, D.T.M., 2009. Digital implementation of musical distortion circuits by analysis and simulation - Section 4.4].

Discrete K-method or DK-method or Nodal DK-method - The DK-method involves transforming the continuous-time differential equations that describe the behavior of circuit components into discrete-time difference equations. This transformation allows for the simulation of the circuit in a step-by-step manner over discrete time intervals. Introduced in [Yeh, D.T., Abel, J.S. and Smith, J.O., 2009. Automated physical modeling of nonlinear audio circuits for real-time audio effects—Part I: Theoretical development] and [Yeh, D.T., 2011. Automated physical modeling of nonlinear audio circuits for real-time audio effects—Part II: BJT and vacuum tube examples]. See also [Yeh, D.T.M., 2009. Digital implementation of musical distortion circuits by analysis and simulation - Section 4.5]

Wave Digital Filters - a method for digitally modeling analog circuits based on the theory of traveling waves (scattering theory). It uses wave variables instead of standard circuit variables (voltage and current) to represent the behavior of circuit elements. WDFs are designed to preserve key properties of analog circuits, such as passivity and energy conservation, making them stable and robust for digital implementation. See [Fettweis, A., 1986. Wave digital filters: Theory and practice] or [https://ccrma.stanford.edu/~dtyeh/papers/wdftutorial.pdf] for the details.

Port-Hamiltonian - a method for digitally modeling analog circuits based on the principles of Hamiltonian mechanics, focusing on the conservation of energy within a system. See [Van Der Schaft, A., 2006. Port-Hamiltonian systems: an introductory survey.]

Volterra Series - mathematical tool used for modeling and simulating nonlinear systems. It extends the concept of linear systems by representing a nonlinear system as an infinite series of integral operators, analogous to a Taylor series expansion but in the context of systems and signals. In contrast to linear systems, where the output is directly proportional to the input, nonlinear systems exhibit a more complex relationship. The output in such systems depends on both the current and past inputs in a nonlinear manner. A Volterra series expresses the output of a nonlinear system as a sum of convolutions of the input signal with a series of kernels (functions).

Waveshaping - application of a nonlinear function to an input audio signal. This function can be mathematically defined or represented as a waveshaping curve. The input signal, usually a simple waveform like a sine wave, is passed through this nonlinear function, which modifies the amplitude of the signal in a non-linear fashion, thereby altering its harmonic content. See [Le Brun, M., 1979. Digital waveshaping synthesis. Journal of the Audio Engineering Society] or [Roads, C., 1979. A tutorial on non-linear distortion or waveshaping synthesis. Computer Music Journal].

Wiener-Hammerstein - a class of models used to emulate the nonlinear and dynamic behavior of audio devices and other systems. These models combine linear and nonlinear elements in a structured way, making them particularly well-suited for capturing the characteristics of audio processing equipment like amplifiers, distortion units, and other effects that exhibit both linear filtering and nonlinear distortion. A Wiener model consists of a linear block followed by a nonlinear block. A Hammerstein model consists of a nonlinear block followed by a linear block. A Wiener-Hammerstein model combines both structures, placing a linear block both before and after the nonlinear block.