Shumin Que

My academic background combines a linguistic perspective (MSc from Edinburgh) with rigorous technical training in Computer Science (MSc from Sheffield). This duality allows me to approach Speech Processing problems—such as Prosody Modeling and Emotion Recognition—with both structural intuition and computational precision.

What is your PhD is about?
My PhD project is a collaboration with the Yorkshire Ambulance Service (YAS). The core objective is to develop advanced deep learning models to analyze emergency calls. Specifically, I will focus on detecting subtle acoustic cues —such as laboured breathing, vocal strain, or signs of extreme distress—that human handlers might miss or take too long to identify.

By automating the detection of these non-linguistic features, we aim to predict life-threatening Category 1 cases (like cardiac arrests) faster and more accurately, ultimately integrating these AI tools into call centers to save lives

Why is it important to do this research?
This research is vital because every second counts in life-threatening emergencies like cardiac arrests. Currently, call handlers face high stress and huge call volumes, which can make it hard to spot subtle signs of distress in the first few seconds. By developing AI that listens for non-linguistic cues—like laboured breathing or vocal strain—we can identify ‘Category 1’ patients faster and more accurately. Ultimately, this project isn’t just about improving algorithms; it’s about saving lives by getting ambulances to the right people sooner.

What drew you to studying this PhD?
What drew me to this PhD is the unique combination of social impact and exceptional resources. First, it’s incredibly meaningful. Knowing that my research on audio cues like laboured breathing could directly help save lives in ‘Category 1’ emergencies gives me a strong sense of purpose.

Second, this is a rare opportunity to integrate resources from both academia and the real world. The collaboration with Yorkshire Ambulance Service provides access to real-life emergency data that is usually inaccessible. Being able to work with Dr. Ning Ma and Professor Jon Barker while tackling real NHS challenges is exactly the kind of environment where I want to apply my skills.

What does a Sustainable Sound Future mean to you?
To me, a Sustainable Sound Future is defined by interdisciplinary connection. It signifies a unique ecosystem that integrates resources from multiple institutions, allowing for the cross-fertilization of ideas. It is about having direct access to brilliant minds across different acoustic fields, ensuring that our solutions are robust and well-rounded. Most importantly, it means growing alongside a strong, supportive cohort. This collaborative network fosters the resilience and diverse perspectives necessary to drive long-term innovation in sound technology

What were you doing before joining the CDT?

I was working as a Research Assistant Intern at the LivePerson Centre for Speech and Language. During this time, I conducted research on speech emotion recognition and conversation analysis, working under the supervision of Professor Thomas Hain and Dr. Mingjie Chen.

What do you do on a typical PhD day so far?
Engaging in rigorous literature review, conducting preliminary experiments, and synthesizing insights from supervisors and peer collaborators.

Tell us a fun acoustic fact!
A fun acoustic fact is the McGurk Effect. It demonstrates that what we see overrides what we hear. If you play audio of a sound ‘Ba-Ba’ but show a video of lips moving to ‘Ga-Ga’, your brain will trick you into hearing ‘Da-Da’. I love this fact because it perfectly validates my recent research on VisualSpeech. It proves that visual cues are not just supplementary—they fundamental alter how we perceive prosody and speech, which is why multi-modal modeling is so powerful.