Capturing Audio Information from Video
Sound waves can be described as vibrations in a medium, whether that is air, water, or another material. Though you cannot see it with the naked eye, those vibrations can be translated to objects like bags, aluminum foil, and the surfaces of water. Researchers at MIT, Microsoft, and Adobe have recently developed an algorithm that can identify these vibrations in a video and recreate the sounds that caused them.
For the most accurate reconstructed audio, the video needs to have a frequency greater than the audio signal, so the researchers employed high speed cameras to film at 2000-6000 frames per second. They were also able to use video from more typical cameras that record at just 60 FPS by exploiting how some sensors will actually record a row of pixels at a time, instead of recording a full frame. This means that quick movements can be seen as an object moves between recording one row and the next. To overcome the very small size of the vibrations, which are smaller than the size of a pixel, the algorithm watches for color shifts, indicative of a border moving in the image. The pixel covering the border would be a blend of the colors on either side of the border, so if the border moves, the color will shift one way or the other.
As it is now, this technology can capture intelligible audio signals from video of some objects, or provide information concerning how many people are speaking, and their identity. Naturally there are possible applications in law enforcement and forensics, but like any new technology, it could have a variety of uses not yet imagine.