Detecting rare copy number variations (CNVs) with sparse coding
Sprache des Titels:
ISMB 2013 Proceedings
High-density oligonucleotide genotyping microarrays, especially Affymetrix SNP6 chips, are widely used for high-resolution copy number analysis. In order to identify CNVs more reliable, we have proposed a Maximum a posteriori factor analysis model called cn.FARMS. The latent variable, the factor, captures the simultaneous increase or decrease of DNA amount at neighboring chromosome locations measured by the intensity of oligonucleotide probes. This increase or decrease indicates amplification or deletion of a DNA region that is a CNV. cn.FARMS considerably reduces the false discovery rate (FDR) by combining adjacent chromosome locations to an ensemble voting (agreement of multiple measurements) instead of relying on a single measurement as other methods do. Standard factor analysis assumes a Gaussian factor distribution which, however, is a wrong assumption for CNVs. Redon et al. 2006 showed that most CNVs affect less than three individuals out of 270 HapMap samples. These rare events are hard to detect by cn.FARMS as they would be interpreted as noise. Therefore we propose a factor analysis model with a Laplacian prior, which leads to a sparse factor distribution. We have applied the Laplacian cn.FARMS model on the HapMap dataset to detect CNVs. We could verify most of published copy number variable regions and found new ones. However many known CNVs seem to be false positives.