Audio-Visual Speech Source Separation
In complex room settings, machine listening systems may experience a degradation in performance due to factors like room reverberations, background noise, and unwanted sounds. Concurrently, machine vision systems can suffer from issues like visual occlusions, insufficient lighting, and background clutter. Combining audio and visual data has the potential to overcome these limitations and enhance machine perception in complex audio-visual environments.