Date Approved

12-31-2006

Embargo Period

4-5-2016

Document Type

Thesis

Degree Name

M.S. in Engineering

Department

Electrical & Computer Engineering

College

Henry M. Rowan College of Engineering

First Advisor

Polikar, Robi

Subject(s)

Electrical engineering--Research; Missing observations (Statistics)

Disciplines

Electrical and Computer Engineering

Abstract

Missing data in real world applications is not an uncommon occurrence. It is not unusual for training, validation or field data to have missing features in some (or even all) of their instances, as bad sensors, failed pixels, malfunctioning equipment, unexpected noise causing signal saturation, data corruption, and so on, are all familiar scenarios in many practical applications.

In this thesis, the feasibility of an ensemble of classifiers trained on a feature subset space is investigated as an effective and practical solution for the missing feature problem. Two ensemble of classifiers approach motivated by the Random Subspace Method are proposed for supervised classifiers to handle data with missing features. A sufficiently large number of classifiers are trained, each with a random subset of the features. Those instances with missing features are then classified by a majority voting of those classifiers whose training data did not include the missing features. The proposed algorithm, Learn++.MF, along with a modified version of this algorithm, Learn++.MFv2, are introduced in this effort. We also investigate the effect of varying the cardinality of the random feature subsets on the classification performance, discuss the conditions under which the proposed approaches are most effective, and present simulation results on several benchmark datasets.

Share

COinS