M.S. in Engineering
Electrical & Computer Engineering
Henry M. Rowan College of Engineering
National Science Foundation
Electrical engineering--Research; Missing observations (Statistics)
Electrical and Computer Engineering
Missing data in real world applications is not an uncommon occurrence. It is not unusual for training, validation or field data to have missing features in some (or even all) of their instances, as bad sensors, failed pixels, malfunctioning equipment, unexpected noise causing signal saturation, data corruption, and so on, are all familiar scenarios in many practical applications.
In this thesis, the feasibility of an ensemble of classifiers trained on a feature subset space is investigated as an effective and practical solution for the missing feature problem. Two ensemble of classifiers approach motivated by the Random Subspace Method are proposed for supervised classifiers to handle data with missing features. A sufficiently large number of classifiers are trained, each with a random subset of the features. Those instances with missing features are then classified by a majority voting of those classifiers whose training data did not include the missing features. The proposed algorithm, Learn++.MF, along with a modified version of this algorithm, Learn++.MFv2, are introduced in this effort. We also investigate the effect of varying the cardinality of the random feature subsets on the classification performance, discuss the conditions under which the proposed approaches are most effective, and present simulation results on several benchmark datasets.
Mohammed, Hussein Syed, "Random feature subspace ensemble based approaches for the analysis of data with missing features" (2006). Theses and Dissertations. 910.