Neuroscience Graduate Program
Ph.D. Thesis Defense
NATURAL SPEECH REPRESENTATIONS IN THE HUMAN BRAIN DURING A COCKTAIL PARTY
Advisor: Assoc. Prof. Dr. Tolga Çukur
Humans are remarkably adept in selectively listening to a desired speaker in a crowded environment, while filtering out non-target speakers in the background. Attention is key to solving this difficult cocktail-party task, yet a detailed characterization of attentional effects on speech representations is lacking. It remains unclear across what levels of speech features and how much attentional modulation occurs in each brain area during the cocktail-party task. Besides, it should be clarified whether unattended speech is represented in cortex during selective listening and if so, at what feature levels its representations are maintained. To address these questions, we recorded whole-brain blood-oxygen-level-dependent (BOLD) responses while subjects either passively listened to single-speaker stories, or selectively attended to a male or a female speaker in temporally-overlaid stories in separate experiments. Spectral, articulatory, and semantic models of the natural stories were constructed to enable comprehensive assessments on the hierarchy of speech features. Intrinsic selectivity profiles were identified via voxelwise models fit to passive listening responses. Attentional modulations were then quantified based on model predictions for attended and unattended stories in the cocktail-party task. We find that acoustic representations are confined to the early auditory cortex whereas linguistic representations are broadly distributed across cortex, that attention causes broad modulations at multiple levels of speech representations (articulatory and semantic) while growing stronger towards later stages of processing, and that unattended speech is represented up to the semantic level in parabelt auditory cortex. These results provide insights on speech perception and attentional mechanisms that underlie the ability to selectively listen to a desired speaker in noisy multi-speaker environments.
Keywords: functional magnetic resonance imaging (fMRI), cocktail-party, dorsal and ventral stream, encoding model, natural speech
DATE: 27.08.2021, Friday
Meeting ID: 876 074 2468