HapRNF: a deep learning method to identify short IBD segments
Sprache des Titels:
ASHG 2016 Proceedings
For whole genome sequencing data HapFABIA was shown to be superior in detecting short IBD (identical by descent) segments that are tagged by rare variants. Nevertheless, HapFABIA still has several problems: (1) To decide whether individuals possess an IBD segment is often difficult because of the soft bicluster membership supplied by HapFABIA. (2) HapFABIA can only extract 10-30 IBD segments at once and therefore needs to perform multiple iterations. However, the IBD segments identified in different iterations may not be decorrelated, thus they may be redundant and overlapping or even split into smaller segments. (3) Very large data sets are time intensive.
We recently introduced Rectified Factor Networks (RFNs) as an unsupervised deep learning approach. Each code unit of the RFN represents a bicluster and therefore an IBD segment, where samples for which the code unit is active share the bicluster (IBD segment) and features (SNVs) that have activating weights to the code unit tag the IBD segment. HapRFN overcomes the problems of HapFABIA. (1) RFNs provide sparser codes via their rectified linear units that immediately supply bicluster memberships as factors being different from zero. (2) RFNs can learn thousands of factors and therefore many IBD segments simultaneously. Therefore, all IBD segments are mutually decorrelated, thus are not redundant and do not overlap. (3) RFNs allow for much faster processing of very large data sets using techniques from deep learning like efficient matrix multiplications and implementations of networks on graphical processing units (GPUs).