Sunday, January 24, 2010

User Guided Audio Selection from Complex Sound Mixtures

(comment left on Nate Brown's blog.)

Research Group:
Paris Smaragdis
Adobe Systems Inc.

pdf link:

Traditional audio editing and manipulation software typically represents audio files as a waveform and selecting and editing individual components from that waveform can be quite difficult even for trained experts. So Paris Smaragdis at Adobe Systems sought out to create an audio selection algorithm that would provide an object based way to identify, select, and edit an individual sound from an entire audio file.

Although the implementation of the algorithm is quite complex, the basic model Smaragdis used was the Probabilistic Latent Component Analysis (PLCA) which is able to separate an audio signal into spectral bases, their temporal weights, and basis
priors. In other words, with PLCA we can identify separate elements of an audio file.

With this algorithm, Smaragdis just needed some way for the user to give the software a way to choose what audio elements the user wants modify. This was done by allowing the user to mimic the sound they want to modify by whistling, humming, singing or even using a matching instrument. The software then uses the PLCA analysis on that input and matches it with a similar section in the audio file. Then the user is free to modify that matched section.

Smaragdis ran a few experiments and found that his algorithm was able to correctly pick up voices in an audio file given input some several different users of both genders as well as particular sounds in an audio file.

He also noted that while the algorithm is able to pick up unique sounds from an audio source, it sadly could not pick out, "...targets that strongly correlated to non-targets," or in other words, "one violin from an orchestra playing in unison."

Discussion: This paper is particularly interesting for novice or casual audio editors who want to edit isolated sounds in an audio file. Often times it is difficult to really manipulate a given piece to give rise to your artistic expression but this algorithm is paving the way for more user friendly audio interaction. (Which is cool)

It is a shame though that a conductor still can't edit those irritating off tune instruments in a whole band or orchestra, but the algorithm is still providing a significant step closer to audio editing perfection.

I remember a few years back that after listening to my roommate's friend's work on an audio software called fruity loops that I would really have loved to read in a the song "Time's Scar" and editing out the tambourine to give the song a more somber tone, but I just couldn't do it with that kind of software. But given this particular work, that just might now be possible.

  1. I find it interesting that Adobe is working on this. They realize that even low end consumers are learning to edit digital photos and now want to edit audio that wasn't recorded professionally. I think this method of selection is akin to the magic wand and would hope they develop more ways to select audio pieces.