Multiscale DNA partitioning: statistical evidence for segments
Sprache des Vortragstitels:
XXVII International Biometric Conference
Sprache des Tagungstitel:
DNA segmentation, i.e. the partitioning of DNA in compositionally homogeneous segments, is a basic task in bioinformatics. Different algorithms have been proposed for various partitioning criteria such as GC content, local ancestry in Population genetics, or copy number variation. A critical component of any such method is the choice of an appropriate number of segments. Some methods use model selection criteria, and do not provide a suitable error control. Other methods that are based on simulating a statistic under a null model provide suitable error control only if the correct null model is chosen. Results: Here, we focus on partitioning with respect to GC content and propose a new approach that provides statistical error control: it guarantees with a user specified probability that the number of identified segments does not exceed the number of actually present segments. The method is based on a statistical multiscale criterion, rendering this as segmentation method which searches segments of any length (on all scales), simultaneously. It is also very accurate in localizing segments: under bench-mark scenarios, our approach leads to a Segmentation that is more accurate than the approaches discussed in the comparative review of Elhaik et al. (2010). In our real data examples, we find segments that often correspond well to the available genome annotation.