Event Details

Computer Vision-Based Tracking and Feature Extraction for Lingual Ultrasound

Presenter: Khalid Al-Hammuri
Supervisor:

Date: Tue, April 23, 2019
Time: 11:30:00 - 12:30:00
Place: EOW 230

ABSTRACT

Abstract:

 

Lingual ultrasound is emerging as an important tool for providing visual feedback to second language learners. In this study, ultrasound videos from five Arabic speakers were collected as they pronounced fourteen Arabic sounds in three different vowel contexts. The sounds were repeated three times to form 630 ultrasound videos. The algorithm was characterized by four steps. First: denoising the ultrasound image by using the combined curvelet transform and shock filter. Second: automatic selection of the tongue contour area. Third: tongue contour approximation and missing data estimation. Fourth: tongue contour transformation from image space to full concatenated signal and features extraction. The automatic tongue tracking results were validated by measuring the mean sum of distances between automatic and manual tongue contour tracking to give an accuracy of 0.9558mm. The validation for the feature extraction showed that the average mean squared error between the extracted tongue signature for different sound repetitions was 0.000858 which means that the algorithm could extract a unique signature for each sound and across different vowel contexts with a high degree of similarity. Unlike other related works, the algorithm showed an efficient and robust approach that could extract the tongue contour and the significant feature for the dynamic tongue movement on the full video frames, not just on the significant single and static video frame as used in the conventional method. The algorithm did not need any training data and had no limitation for the video size or the frame number. The algorithm did not fail during tongue extraction and did not need any manual reinitialization. Even when the ultrasound image recordings missed some tongue contour information, the approach could estimate the missing data with a high degree of accuracy.