Author(s)

Curtis White

Date Approved

11-3-2014

Document Type

Thesis

Degree Name

M.S. Computer Science

Department

Computer Science

College

College of Science & Mathematics

First Advisor

Hnatyshin, Vasil

Subject(s)

Data mining--Mathematics;Bioinformatics

Disciplines

Computer Sciences

Abstract

Metabolomics is the science of comprehensive evaluation of changes in the metabolome with a goal to elucidate underlying biological mechanisms of a living system. There is an opinion in the field of metabolomics, the study of the set of metabolites present within an organism, cell, or tissue, that the future development of the field is contingent upon two factors. One of the factors is the advancement of analytical instrumentation, and the other is developing data mining methodologies for extracting meaningful and interpretable experimental results. There are many different types of data mining methodologies, but the undertaking of selecting a particular technique for one's data is intricate. This task needs to take into account different issues like justifiability, reproducibility, and traceability when selecting and applying data mining techniques Random Forests methodology stands out among data mining techniques, since it can be used for classification, feature extraction, and analysis. Random Forests algorithm has many different customizable parameters that affect the outcome of a particular run. Identifying the best values for these customizable attributes is a task in itself. My work is focused on the study of the Random Forests algorithm, and the task of determining its optimal configuration parameters, for sample classification in the field of Metabolomics.

Share

COinS