In music, “portamento” is a term used for hundreds of years, referring to the effect of sliding a note at one pitch into a note of a lower or higher pitch. But only instruments whose pitch can vary continuously, such as the human voice, stringed instruments and trombones, can produce this effect.
Today, a student at MIT has invented a new algorithm that produces a portamento effect between two audio signals in real time. During experiments, the algorithm seamlessly merged various audio clips, such as a piano note sliding into one human voice and a song melting into another. His paper describing the algorithm won the “Best Student Paper” award at the recent International Digital Audio Effects Conference.
The algorithm relies on “optimal transport,” a geometry-based framework that determines the most efficient ways to move objects – or data points – between multiple origin and destination configurations. Formulated in the 1700s, the framework has been applied to supply chains, fluid dynamics, image alignment, 3D modeling, computer graphics, and more.
In work from a class project, Trevor Henderson, now a computer science graduate student, applied optimal transport to interpolating audio signals – or mixing one signal into another. The algorithm first divides the audio signals into short segments. Then it finds the optimal way to move the pitches of each segment to the pitches of the other signal, to produce the smooth sliding of the portamento effect. The algorithm also includes specialized techniques to maintain the fidelity of the audio signal during its transition.
“Optimal transport is used here to determine how to map the pitches of one sound with the pitches of the other,” says Henderson, a classically trained organist who plays electronic music and has been a DJ on WMBR 88.1, the MIT radio station. “If it’s about turning a chord into a chord with a different harmony, or with more notes, for example, the notes will separate from the first chord and find a position to slide seamlessly into the other chord. “
According to Henderson, this is one of the first techniques to apply optimal transport to the transformation of audio signals. He has already used the algorithm to build equipment that provides a seamless transition between songs on his radio show. DJs could also use the equipment to switch between tracks during live performances. Other musicians could use it to mix instruments and vocals on stage or in the studio.
The co-author of Henderson’s article is Justin Solomon, Assistant Professor of X-Consortium Career Development in the Department of Electrical Engineering and Computer Science. Solomon, who also plays cello and piano, heads the Geometric Data Processing Group at the Computer Science and Artificial Intelligence Laboratory (CSAIL) and is a member of the Center for Computational Engineering.
Henderson took Solomon’s course, 6.838 (Shape Analysis), which asks students to apply geometric tools such as optimal transport to real-world applications. Student projects generally focus on 3D shapes from virtual reality or computer graphics. Henderson’s project therefore surprised Solomon. “Trevor saw an abstract connection between geometry and shifting frequencies in audio signals to create a portamento effect,” says Solomon. “He walked in and out of my office all semester with DJ equipment. It wasn’t what I expected, but it was quite entertaining.
For Henderson, it wasn’t too much of a stretch. “When I see a new idea, I ask, ‘Is this applicable to music? “, He said. “So when we talked about optimal transport, I wondered what would happen if I hooked it up to audio spectra. “
A good way to think about optimal transportation, says Henderson, is to find “a lazy way to build a sandcastle.” In this analogy, the framework is used to calculate how to move each grain of sand from its position in a shapeless heap to a corresponding position in a sandcastle, using as little labor as possible. In computer graphics, for example, optimal transport can be used to transform or transform shapes by finding the optimal movement of each point from one shape to another.
Applying this theory to audio clips involves a few more ideas from signal processing. Musical instruments produce sound by vibrations of components, depending on the instrument. Violins use strings, brass instruments use air inside hollow bodies, and humans use vocal cords. These vibrations can be captured as audio signals, where frequency and amplitude (peak height) represent different heights.
Conventionally, the transition between two audio signals is done with a fade, where one signal is reduced in volume while the other rises. Henderson’s algorithm, on the other hand, smoothly slides frequency segments from one clip to another, without degrading the volume.
The algorithm does this by splitting two audio clips into windows of approximately 50 milliseconds. Then, it performs a Fourier transform, which transforms each window into its frequency components. Frequency components in a window are grouped into individual synthesized “notes”. Optimal transport then maps how notes in one signal window will move to notes in the other.
Then an “interpolation parameter” takes over. It is essentially a value that determines where each note will be on the way from its starting pitch in one signal to its ending pitch in the other. Manually changing the value of the parameter will sweep the pitches between the two positions, producing the portamento effect. This single parameter can also be programmed and controlled by, for example, a crossfader, a slider component on a DJ’s mixer that smoothly fades between songs. As the crossfader slides, the interpolation parameter changes to produce the effect.
Behind the scenes, two innovations guarantee a distortion-free signal. First, Henderson used a new application of a signal processing technique, called “frequency reassignment,” which groups the frequency boxes to form single notes that can easily be switched from one signal to another. . Second, he invented a way to synthesize new phases for each audio signal while stitching the 50 millisecond windows together, so that neighboring windows do not interfere with each other.
Next, Henderson wants to experiment by feeding the output of the effect back into his input. This, he thinks, could automatically create another classical music effect, “legato,” which is a smooth transition between separate notes. Unlike a portamento – which plays all the notes between a start and end note – a legato seamlessly transitions between two separate notes, without capturing any notes in between.