Research Area:  Machine Learning
Increasing access to incredibly large,non stationary datasets and corresponding demands to analyse these data has led to the development of new online algorithms for performing machine learning on data streams.An important feature of real-world data streams is concept drift, whereby the distributions underlying the data can change arbitrarily over time. The presence of concept drift in a data stream causes many classical data mining techniques to become unsuitable, and therefore new approaches must be devloped in their place.In pursuit of this goal, we introduce the dynamic logistic regressor (DLR) a sequential Bayesian approach for performing binary classification on non stationary data streams. We proceed to show how the DLR framework can be extended to cope with missing observations and missing and corrupted labels.We proceed to describe a new meta-algorithm for performing classification and regression on data streams with concept drift.The convex hull of receiver operating characteristic (ROC) curves has long been used for identifying potentially optimal classifiers. Unfortunately, the ROC curve does not perform as expected when learning from data streams exhibiting concept drift.We introduce a modification to the ROC curve that provides an easily maintainable online summary of a classifiers performance, even in the presence of concept drift. We similarly modify the recently introduced re-gression error characteristic (REC) curve, giving analogous dynamic summaries of online regressors.
Name of the Researcher:  Roman Garnett,Stephen J. Roberts
Name of the Supervisor(s):  Michael Osborne
Year of Completion:  2008
University:  University of Oxford
Thesis Link:   Home Page Url