Ulrich Bodenhofer, Sepp Hochreiter,
"Utilizing Private Variants in Large Genome-Wide Association Studies: Issues, Techniques, Experiences"
: ASHG 2014 Proceedings, 2014
Utilizing Private Variants in Large Genome-Wide Association Studies: Issues, Techniques, Experiences
Sprache des Titels:
ASHG 2014 Proceedings
High-throughput sequencing technologies have facilitated the identification of large numbers of single-nucleotide variations (SNVs), many of which have already been proven to be associated with diseases or other complex traits. Several large sequencing studies, such as, the 1000 Genomes Project, the UK10K project, or the NHLBI-Exome Sequencing Project, have consistently reported a large proportion of private SNVs, that is, variants that are unique to a family or even a single individual. The role that private SNVs play in diseases and other traits is currently poorly understood ? which is largely due to the fact that it is statistically very challenging to consider private SNVs in association testing. While it is generally impossible to use single-marker tests for private SNVs, burden tests are potentially able to deal with private SNVs, but only if the number of private SNVs occurring in a region is correlated with the trait under consideration. Moreover, burden tests have a disadvantage if deleterious and protective SNVs occur together in the same region. Non-burden tests like the popular SNP-set (Sequence) Kernel Association Test (SKAT) are typically utilizing correlations between SNVs ? a strategy that is not applicable to private SNVs either, since singular events are generally uncorrelated. We propose the Position-Dependent Kernel Association Test (PODKAT), which is designed for detecting associations of very rare and private SNVs with the trait under consideration even if the burden scores are not correlated with the trait. PODKAT assumes that, the closer two SNVs are on the genome, the more likely they have similar effects on the trait under consideration. This assumption is fulfilled as long as deleterious, neutral, and protective variants are grouped sufficiently well along the genome. This contribution focuses on the use of PODKAT for large whole-genome studies.