Forschungsergebnisse

Ausgewählte Publikationen

DEMENTIA CLASSIFICATION | ICASSP2025

A Multi-Stage Feature Pipeline on Timestamped Speech Transcriptions for Dementia Assessment

B. Thallinger, L. Wagner, T. Bloder, M. Zusag

This paper details our approach to the ”Prediction and Recognition of Cognitive Decline through Spontaneous Speech (PROCESS)” Grand Challenge at ICASSP 2025, which focuses on early dementia detection using speech alone. The challenge comprises two tasks: (1) classifying individuals as healthy controls (HC), with mild cognitive impairment (MCI), or with dementia, and (2) predicting Mini-Mental State Exam (MMSE) scores. We focused on the latter, developing a robust pipeline that leverages a diverse set of linguistic and temporal features extracted from transcribed speech. Our ensemble model scored at the top for the MMSE prediction task with a root mean squared error (RMSE) of 2.46 on the held-out test set while maintaining diagnostic transparency and real-world clinical applicability.

SPEECH RECOGNITION | INTERSPEECH 2024

CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions

L. Wagner, B. Thallinger, M. Zusag

We demonstrate that carefully adjusting the tokenizer of the Whisper speech recognition model significantly improves the precision of word-level timestamps when applying dynamic time warping to the decoder’s cross-attention scores. We finetune the model to produce more verbatim speech transcriptions and employ several techniques to increase robustness against multiple speakers and background noise. These adjustments achieve state-of-the-art performance on benchmarks for verbatim speech transcription, word segmentation, and the timed detection of filler events, and can further mitigate transcription hallucinations. The code is available open source.

APHASIA | ÖGN 2023

Human level fully automatic aphasia detection leveraging automatic speech recognition for language-agnostic feature extraction.

M. Zusag, L. Wagner, P. Schöllauf, T. Bloder, M. Cekolj, M. Müller-Mezin, A. Calleja-Dincer, C. Stepan

Aphasia is a common and debilitating speech and language disorder affecting millions of people worldwide. The manifestation of aphasia can vary significantly from individual to individual, making it a challenge for healthcare professionals, particularly speech pathologists, to accurately diagnose and classify the disorder. While various efforts have been made to automate the detection and evaluation of aphasic speech, the task remains challenging. However, the use of advanced machine learning models for the detection of aphasia and other speech disorders has the potential to significantly reduce the demand on clinical resources and thus support clinical teams effectively.

APHASIA | INTERSPEECH 2023

Careful Whisper - leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification

L. Wagner, M. Zusag, T. Bloder

This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments. By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts. We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech. Basic distance measures from these prototypes serve as input features for standard machine learning classifiers, yielding human-level accuracy for the distinction between recordings of people with aphasia and a healthy control group. Furthermore, the most frequently occurring aphasia types can be distinguished with 90% accuracy. The pipeline is directly applicable to other diseases and languages, showing promise for robustly extracting diagnostic speech biomarkers.

SPEECH BIOMARKER | INTERSPEECH 2023

Providing interpretable insights for neurological speech and cognitive disorders from interactive serious games

M. Zusag, L. Wagner

We propose an automated pipeline for robustly identifying neurological disorders from interactive therapeutic exercises, which are gathered via the mobile therapy app myReha. The app captures speech and cognitive parameters from over 30.000 tasks in various scenarios. Users get immediate and highly accurate feedback for pronunciation and coherency for language tasks, while voice recordings are fed to a feature extraction pipeline in the backend. These features are then used to construct speech characteristics, which are highly indicative of different neurological disorders, such as acquired aphasia after stroke. The data is visually presented in a web application nyra.insights, which allows medical professionals to quickly derive recommendations for treatment and closely monitor outcomes. During the Show and Tell session, users can experiment with the interactive myReha app and experience the real-time speech analysis capabilities via the nyra.insights web platform.

EFFECTIVENESS | ECNR 2023

Real world data analysis Tablet-based cognitive neurorehabilitation in the outpatient care of patients after brain injury.

T. Bloder, P. Schöllauf, M. Zusag, L. Wagner, M. Müller-Mezin, C. Stepan

Cognitive impairments such as aphasia, attention, memory, or perception disorders are very common symptoms of neurological diseases after brain damage. Since patients need individual therapy plans and a high intensity of therapy in neurorehabilitation, the use of digital rehabilitation tools is seen as having great potential. This study evaluates the effectiveness of digital speech and cognitive therapy within a real-world mobile health data set using the digital neurorehab platform myReha. With this tablet app, patients receive customized exercise plans through artificial intelligence from a large catalog of over 35 language and cognition exercises with over 30,000 examples. These exercise plans can be used independently by patients both in the clinic and on an outpatient basis. This study includes real world data from 183 patients with cognitive deficits following brain injury more than four weeks ago who trained with the myReha app for 60 days. This study evaluated the efficacy of this form of therapy using the myReha app. The results demonstrated a significant enhancement in all evaluated speech and cognitive abilities over the intervention period.

EFFECTIVENESS | ÖGN 2023

Tablet-based cognitive neurorehabilitation in the outpatient care of patients after brain injury

P. Schöllauf, M. Zusag, T. Bloder, L. Wagner, M. Cekolj, M. Müller-Mezin, A. Calleja-Dincer, C. Stepan

Individual therapy plans and high therapy intensity have been proven to be effective in the rehabilitation of cognitive deficits such as language. Computer-based therapy programs can achieve the necessary intensity and thus increase the effectiveness of therapy. Recent studies show the great benefit of digital and telerehabilitative interventions in patients with neurological disorders. Usage of myReha in an outpatient environment demonstrated efficacy for patients suffering from brain injuries, exhibiting significant enhancements across all evaluated therapeutic domains.

FEASIBILITY | ECNR 2023

Feasibility Study of Digital Neurorehabilitation for Language and Cognitive Impairment in Patients with Brain Damage

P. Schoellauf, M. Mueller-Mezin, S. Poell, M. Muellner, S. Gagl, T. Bloder, M. Zusag, L. Wagner, Christoph Stepan

Cognitive and language impairments are common sequelae in patients who have experienced brain damage. Traditional rehabilitation methods often require intensive, long-term care, which may not be feasible for all patients due to geographical, financial, or time constraints. However, its effectiveness in real-world settings remains under-explored. Digital neurorehabilitation with myReha, facilitated by technology-based interventions, has the potential to improve accessibility and personalization of therapeutic strategies.