Skip to main content

Shumin Que

My academic background combines a linguistic perspective (MSc from Edinburgh) with rigorous technical training in Computer Science (MSc from Sheffield). This duality allows me to approach Speech Processing problems—such as Prosody Modeling and Emotion Recognition—with both structural intuition and computational precision. When I am not fine-tuning models or analyzing WER scores, you can find me hiking in the mountains or experimenting with new recipes in the kitchen. I bring the same level of curiosity and endurance from the trails to my research lab.

Project Title: Sound Analysis for Predicting Category 1 Ambulance Calls

Project Partner: Yorkshire Ambulance Service (YAS)

Supervisors: Ning Ma & Jon Barker

What is your PhD is about?
My PhD project is a collaboration with the Yorkshire Ambulance Service (YAS). I’m working on an AI project to help save lives in emergency situations. When people call 999, every second counts. My research uses Artificial Intelligence to listen to these calls and instantly recognize sounds of critical danger—like someone struggling to breathe—which indicate a cardiac arrest. The goal is to build a system for the Yorkshire Ambulance Service that acts as a ‘second pair of ears,’ helping operators prioritize the most critical patients so ambulances can be dispatched faster.

Why is it important to do this research?
This research is vital because every second counts in life-threatening emergencies like cardiac arrests. Currently, call handlers face high stress and huge call volumes, which can make it hard to spot subtle signs of distress in the first few seconds. By developing AI that listens for non-linguistic cues—like laboured breathing or vocal strain—we can identify ‘Category 1’ patients faster and more accurately. Ultimately, this project isn’t just about improving algorithms; it’s about saving lives by getting ambulances to the right people sooner.

What drew you to studying this PhD? 
What drew me to this PhD is the unique combination of social impact and exceptional resources.

First, it’s incredibly meaningful. Knowing that my research on audio cues like laboured breathing could directly help save lives in ‘Category 1’ emergencies gives me a strong sense of purpose.

Second, this is a rare opportunity to integrate resources from both academia and the real world. The collaboration with Yorkshire Ambulance Service provides access to real-life emergency data that is usually inaccessible. Being able to work with Dr. Ning Ma and Professor Jon Barker while tackling real NHS challenges is exactly the kind of environment where I want to apply my skills.

What does a Sustainable Sound Future mean to you?
To me, a Sustainable Sound Future is defined by interdisciplinary connection. It signifies a unique ecosystem that integrates resources from multiple institutions, allowing for the cross-fertilization of ideas. It is about having direct access to brilliant minds across different acoustic fields, ensuring that our solutions are robust and well-rounded. Most importantly, it means growing alongside a strong, supportive cohort. This collaborative network fosters the resilience and diverse perspectives necessary to drive long-term innovation in sound technology.

What were you doing before joining the CDT?

I was working as a Research Assistant Intern at the LivePerson Centre for Speech and Language. During this time, I conducted research on speech emotion recognition and conversation analysis, working under the supervision of Professor Thomas Hain and Dr. Mingjie Chen.

What do you do on a typical PhD day so far?
Reading, reading, reading. Engaging in rigorous literature review, conducting preliminary experiments, and synthesizing insights from supervisors and peer collaborators.

Tell us a fun acoustic fact!
A fun acoustic fact is the McGurk Effect. It demonstrates that what we see overrides what we hear. If you play audio of a sound ‘Ba-Ba’ but show a video of lips moving to ‘Ga-Ga’, your brain will trick you into hearing ‘Da-Da’. I love this fact because it perfectly validates my recent research on VisualSpeech. It proves that visual cues are not just supplementary—they fundamental alter how we perceive prosody and speech, which is why multi-modal modeling is so powerful.