Quinn Peterson



Intern in the Computational Neurobiology Lab at the Salk Institute for Biological Studies. Worked on a project aimed at using machine learning to objectively rate severity in patients with adductor spasmodic dysphonia.


Currently finalizing an algorithm that detects the visibility and location of the glottis in nasolaryngoscopic video recordings. This algorithm uses the YT Predictor (see section below) and the Object Scoring. Also, I am currently supervising two interns to annotate data using both MATLAB's video labeler app and ELAN linguistic software.

Our team currently expects that this algorithm will be a great stepping stone towards the project's bigger goal. Specifically, it will serve to objectively supply info regarding patients' glottal activity over time. Going foward, this info will likely be very helpful in bridging the gap between the nasolaryngoscopic recordings and an objective predictor of patients' ADSD severity.

YT Predictor

Prepared data, a data pipeline, and an optimization architecture to train a shallow neural network to predict an optimal luminance threshold (YT) per frame.

For a single frame, a YT is used to segment the image into binary sections: those brighter than YT's value, and those darker. Having an automated way to set this image segmentation threshold, YT, is thus important to automatically finding the glottis.

Generally, the network's input is a single frame, and its output is a predicted YT for the frame. More specifically, the network is trained to take in six numbers representing statistical moments of the frame's luminance distribution, and then predicts the optimal YT. Each frames' target YT (the training data) was created by looping through all possible YT's and finding which YT maximized the IOU. Here, the IOU is defined as the intersection-over-union between the annotated glottis area, and the nearest binary object that results in the segmented image.

The three figures show this process and its purpose:

Fig 1 depicts the annotated glottis area (top), the segmented image based on a predicted YT (middle), and the overlap of these which gives the IOU (bottom).

Fig 2 depicts the regression for the predicted vs. actual YTs for a test set of data after training the shallow neural network. Each point is a frame from a test set of frames. The R value of ~0.8 is very good, and means the network learned the desired mapping and can generalize this mapping well to unforseen data.

Fig 3 depicts the the resulting IOU ratio for each testing frame, after segmenting each testing frame based on its predicted YT. Despite the small peak at IOU ~= 0, the network's predicted YTs result in generally very high IOU's. This is good, meaning the resulting segmented images have a binary object that comes close to matching the annotated glottis area.

CNN Optimization Architecture

Built a convolutional neural network (CNN) using MATLAB to predict patients’ ADSD severity based off of images of their larynx. Designed and implemented an optimization architecture for testing parameters core to the internals of the CNN, as well as parameters regarding training/testing methodology.

The figure shows this optimization architecture.

- The CNN_prep prepares parameters, image data, label data, training options, and the CNN architecture layers
- The outer-most loop overwrites parameter values for optimization
- The partition loop allows for various partitioning of patients’ data into training and testing folds, to test different possible partitioning combinations within a single k-fold cross-validation method

Learning Curves

- Monitor deep learning
- The x-axis is time (training iterations)
- The y-axis is loss (root mean square error)
- Respectively, the green and red curves are the average training and validation loss across partitions. The gray surrounding each curve is the +/- standard deviation.

Regression - The CNN's Purpose

- Goal of the CNN: given an unseen ADSD patient’s data, predict the patient's ADSD severity rating
- Each observation represents a single patient
- The x-axis is the true ADSD severity rating
- The y-axis is the CNN’s prediction
- Uses a “leave one out” cross-validation technique across partitions to vary which patient to “leave out” as the test patient. Accordingly, the predictions in the figure are produced across a certain number of partitions.

Image Edge-Object Analysis

Utilized MATLAB’s computer vision toolbox to perform edge detection on frames taken from the patients’ nasolaryngoscopic recordings. Analyzed attributes of the resulting edge-objects for several purposes. First, to get a sense of possible distinguishing characteristics in the edge-objects between noise and objects of interest. Second, to serve as data for creating a high-level feature representation of characteristics of the patients’ frames (see "Edge-Object Feature Representation" header below).

The figure shows characteristics of edge-objects, accumulated across several frames for a single patient.

- Each observation represents a single edge-object
- The three axis are some of the core characteristics of these edge-objects, being eccentricity, orientation (degrees), and size (pixels)
- Notice spikes in the orientation distribution near values of 0 and +/- 45 and 90 degrees. Many of these edge-objects are noise. This is a good example of how this analysis helped distinguish noise in the edge-object space.

Edge-Object Feature Representation / LSTM Data Prep

Discretized the three edge-object dimensions and considered the resulting voxels to be features. Each feature’s value is the number of edge-objects within its voxel. Constrcuted time series for these features across each patient's frames.

- The x-axis is time (frames)
- The y-axis is the feature values
- Each separately colored plot is a different feature