Date Approved

5-18-2020

Embargo Period

5-19-2020

Document Type

Thesis

Degree Name

M.S. Computer Science

Department

Computer Science

College

College of Science & Mathematics

First Advisor

Hnatyshyn, Serhiy

Second Advisor

Hnatyshyn, Vasil

Third Advisor

Thayasivam, Umashanger

Subject(s)

Drug interactions; Drug Development; Machine learning

Disciplines

Artificial Intelligence and Robotics | Computer Sciences | Pharmacy and Pharmaceutical Sciences

Abstract

Drug discovery is a long, expensive, and complex, yet crucial process for the benefit of society. Selecting potential drug candidates requires an understanding of how well a compound will perform at its task, and more importantly, how safe the compound will act in patients. A key safety insight is understanding a molecule's potential for drug-drug interactions. The metabolism of many drugs is mediated by members of the cytochrome P450 superfamily, notably, the CYP3A4 enzyme. Inhibition of these enzymes can alter the bioavailability of other drugs, potentially increasing their levels to toxic amounts. Four models were developed to predict CYP3A4 inhibition: logistic regression, random forests, support vector machine, and neural network. Two novel convolutional approaches were explored for data featurization: SMILES string auto-extraction and 2D structure auto-extraction. The logistic regression model achieved an accuracy of 83.2%, the random forests model, 83.4%, the support vector machine model, 81.9%, and the neural network model, 82.3%. Additionally, the model built with SMILE string auto-extraction had an accuracy of 82.3%, and the model with 2D structure auto-extraction, 76.4%. The advantages of the novel featurization methods are their ability to learn relevant features from compound SMILE strings, eliminating feature engineering. The developed methodologies can be extended towards predicting any structure-activity relationship and fitted for other areas of drug discovery and development.

Share

COinS