Overview and Objectives
I’m an intern in the Computational Neurobiology Lab at the Salk Institute for Biological Studies.
The research goal of our project is to develop an objective visual assessment of ADSD (the most common subtype of focal laryngeal dystonia, which affects the larynx and therefore the voice), that correlates with subjective voice analyses of ADSD patients.
Specifically, the objectives are
1) to develop and test a computer vision/machine learning system that predicts perceptual voice quality in ADSD from laryngoscope videos and
2) to identify dynamics of glottal features associated with voice quality
We have recently met our objectives and are currently in the peer-review process with the drafted paper. I will be the 1st author on the paper. Co-authors will be: Gerald Berke, Avraham Mendelsohn, Laura Froeschke, Simon Fei, Lauren Sy, etc. After the peer-review process, we plan to submit the paper to The Journal of the Acoustical Society of America.
Brief - Methods, Development, Results, and Impact
I developed and tested the computer vision pipeline (See figure to the right). The pipeline aims to meet objective #1.
Image segmentation is done with pixel thresholding: a single luminance threshold (YT) is used to binarize the grayscale luminance channel of the image. This process is automated by a single layer neural network (YT NN) which seeks to predict an image’s optimal YT to segment the glottis.
Then, to distinguish the glottis object from other objects in the binary image (BW), each object receives a score (IOU hat in the figure). This process is automated by another single layer neural network (OSM NN), which gives each object a score for the likelihood that it is the desired glottis object.
Finally, the object with the maximum score is considered the glottis, and subsequent glottal metrics are calculated such as glottal area and glottal major axis length. Statistics are calculated on these glottal metrics, and used as variables in our generalized regression models which seek to predict voice quality.
The second figure to the right depicts the regression models. Each scatterplot represents a different model. Each data point represents a patient. For each scatterplot, the x-axis is the actual voice quality rating, and the y-axis is the predicted voice quality rating based on our computed glottal metrics.
Our results are significant and hold value in further understanding ADSD and how characteristics of the disorder relate with voice quality.
(Many specifics of the methods, results and impact have been left out. We hope to publish our paper soon (see Overview section), at which point I’ll be able to share the finalized paper with all its details and results!)
Previous Solution Attempts
CNN Optimization Architecture
Built a convolutional neural network (CNN) using MATLAB to predict patients’ ADSD severity based off of images of their larynx. Designed and implemented an optimization architecture for testing parameters core to the internals of the CNN, as well as parameters regarding training/testing methodology.
The figure shows this optimization architecture.
- The CNN_prep prepares parameters, image data, label data, training options, and the CNN architecture layers
- The outer-most loop overwrites parameter values for optimization
- The partition loop allows for various partitioning of patients’ data into training and testing folds, to test different possible partitioning combinations within a single k-fold cross-validation method
- Monitor deep learning
- The x-axis is time (training iterations)
- The y-axis is loss (root mean square error)
- Respectively, the green and red curves are the average training and validation loss across partitions. The gray surrounding each curve is the +/- standard deviation.