Immunology ML Research

ResNet pipeline achieving 99% accuracy, reducing error detection from hours to seconds, deployed department-wide via Docker and Pennsieve.

Overview

In high-parameter flow cytometry, which allows immunologists to characterize complex cell populations, unmixing errors can silently contaminate downstream analysis, leading to inaccurate results. After talking to experts, I learned that these errors can cost hours of manual review.

To automate the quality control process, I worked with a team of researchers and expert immunologists at the University of Pennsylvania to build a ResNet-based pipeline that detects these errors automatically. The convolutional layers in the ResNet architecture allow the model to detect spatial patterns in the density plots. Unmixing errors appear as diagonal correlations and asymmetric distributions in the density plots, as illustrated by comparing correct unmixing (left) versus incorrect unmixing (right).

Correct Unmixing

Incorrect Unmixing

Using a dataset of 335 samples stained with a 36-color panel and an 80/20 train/validation split, I trained the model to achieve 99% accuracy while reducing detection time from hours to seconds. The model was further tested on a separate unseen dataset, where it achieved 100% accuracy. I was also able to design a system that automates the entire detection process, taking in a FCS file, converting it to a CSV file, generating a density plot, and classifying it using the model.

The system was deployed for use in the immunology department at Penn via Docker and Pennsieve.

A current limitation is that the model only works well on datasets created using the same machine, and we are currently working on creating a more customized model that can generalize to other machines.

Key Languages, Platforms, and Frameworks Used

Python

PyTorch

Torchvision

NumPy

Pandas

Matplotlib

Seaborn

Pillow