Automatic Feed Phase Identification in Multivariate Process Profiles by Sequential Binary Classification
Sprache des Titels:
In this paper, we propose a new strategy for retrospective identification of feed phases from online sensor-data enriched feed profiles of an Escherichia Coli (E. coli) fed-batch fermentation process. In contrast to conventional (static), data-driven multi-class machine learning (ML), we exploit process knowledge in order to constrain our classification system yielding more parsimonious models compared to static ML approaches. In particular, we enforce unidirectionality on a set of binary, multivariate classifiers trained to discriminate between adjacent feed phases by linking the classifiers through a one-way switch. The switch is activated when the actual classifier output changes. As a consequence, the next binary classifier in the classifier chain is used for the discrimination between the next feed phase pair etc. From a complexity/parsimony perspective the benefit of our approach is three-fold: i) The multi-class learning task is broken down into binary subproblems which usually have simpler decision surfaces and tend to be less susceptible to the class-imbalance problem. ii) We exploit the fact that the process follows a rigid feed cycle structure (i.e. batch-feed-batch-feed) which allows us to focus on the subproblems involving phase transitions as they occur during the process while discarding off-transition classifiers and iii) only one binary classifier is active at the time which keeps effective model complexity low. We further use a combination of logistic regression and Lasso (i.e. regularized logistic regression, RLR) as a wrapper to extract the most relevant features for individual subproblems from the whole set of high-dimensional sensor data. Our results show a remarkable out-performance of the here proposed method over static ML approaches in terms of accuracy and robustness.