"Modeling Differential Expression In scRNA-seq Data With A Difference O" by Alicia Petrany

Date Approved

4-21-2025

Embargo Period

4-21-2025

Document Type

Thesis

Degree Name

Master of Science (M.S.) Bioinformatics

Department

Bioinformatics

College

College of Science & Mathematics

Advisor

Yong Chen, Ph.D.

Committee Member 1

Benjamin Carone, Ph.D.

Committee Member 2

Alison Krufka, Ph.D.

Keywords

Bioinformatics;Negative Binomial;Next Generation Sequencing;RNA Sequencing;Single Cell

Disciplines

Bioinformatics | Life Sciences

Abstract

Single cell RNA sequencing (scRNA-seq) is a powerful high throughput sequencing technology that quantifies the transcriptome at a single cell resolution. Differential expression (DE) analysis is a key scRNA-seq analysis task that identifies genes with statistically significant expression changes in response to biological stimuli. Existing DE methods inherently attempt to determine whether two sets of negative binomially distributed read counts are significantly different but lack exact testing strategies to do so. This work introduces a novel theoretical distribution, the Difference of Two Negative Binomial Distributions (DOTNB), and implements it within DEGage, an R package for DE analysis. Benchmarking DEGage against DESeq2, DESingle, edgeR, Monocle3, and scDD showed that DEGage offered greater sensitivity and robustness against scRNA-seq specific technical effects. After benchmarking, DEGage successfully identified regulators of long-term memory consolidation in engram neurons, and canonical prostate-cancer markers in a large-scale dataset of heterogeneous prostate cancer tissue. Given their success in the validation studies, DOTNB and DEGage can be further applied to new scRNA-seq projects and other forms of negative binomially distributed count data.

Share

COinS