FABIA: A Probabilistic Model for Biclustering and its Application to Analyzing Big Data in Drug Design
Sprache des Titels:
ISMB 2014 Proceedings
Unsupervised bicluster analysis is a hot topic in data science and has become an invaluable tool for extracting concealed knowledge from high-dimensional data. Down the years, biclustering demonstrated its worth in many biomedical applications, e.g., to identify tightly co-expressed gene sets in cancer subgroups. Biclustering simultaneously organizes a data matrix into subsets of rows and columns in which the entities of each row subset are similar to each other on the column subset and vice versa. This simultaneous grouping of rows (genes or chemical fingerprints) and columns (conditions or compounds) allows identifying subgroups within the conditions, e.g. in drug design where researchers want to reveal how compounds affect gene expression (the effects of compounds may only be similar on a subgroup of genes). Standard clustering methods are not suited to tackle these kinds of problems. We therefore present a biclustering approach, called FABIA, which goes far beyond conventional clustering concepts. FABIA is a multiplicative latent variable model that extracts linear dependencies between column and row subsets by forcing both the hidden factors and their loadings to be sparse.
FABIA is a mathematically well-founded Bayesian analysis technique that allows exploring high-dimensional big data in an unsupervised manner and thereby shedding new light on the dark matter of many problems.