Drift Detection in Data Stream Classification without Fully Labelled Instances
Sprache des Titels:
Proceedings of the IEEE Evolving and Adaptive Intelligent Systems Conference
Drift detection is an important issue in classification-based stream mining in order to be able to inform the operators in case of unintended changes in the system. Usually, current detection approaches rely on the assumption to have fully supervised labeled streams available, which is often a quite unrealistic scenario in on-line real-world applications.
We propose two ways to improve economy and applicability of drift detection: 1.) a semi-supervised approach employing single-pass active learning filters
for selecting the most interesting samples for supervising the performance of classifiers and 2.) a fully unsupervised approach based on the overlap degree of classifier's output certainty distributions.
Both variants rely on a modified version of the Page-Hinkley test, where a fading factor is introduced to outweigh older samples, making it more flexible to detect successive drift occurrences in a stream. The approaches are compared with the fully supervised variant (SoA) on two real-world on-line applications: the semi-supervised approach is able to
detect three real-occurring drifts in these streams with an even lower than resp. the same delay as the supervised variant of about 200 (versus 300) resp. 70 samples, and this by requiring only 20% labelled samples.