Sparse Bayesian Unbounded Linear Models with Unknown Design over Finite Alphabets
Sprache des Vortragstitels:
Englisch
Original Tagungtitel:
Probabilistic Modelling in Genomics 2022
Sprache des Tagungstitel:
Englisch
Original Kurzfassung:
In population genetics, the haplotype structure can provide crucial information. However, for many sequencing methods like pool sequencing, we only have allele frequency data at hand rather than haplotype information. A new method to reconstruct the unknown haplotype structure $S$ and haplotype frequencies $\omega$ from the observed allele frequency data matrix $Y = S\omega+\varepsilon$ has been proposed in \cite{pelizzola2021multiple}. There $Y\in [0,1]^{N\times T}$ contains relative allele frequencies for $N$ SNPs from $T$ samples. Since this approach leads only to point estimates, we provide a Bayesian approach to this problem. More specifically, we propose a hierarchical Bayesian model with carefully calibrated hyperparameters and hyper-priors that also gives us credible intervals. In our case, the joint estimation is not unique if we do not have any constraint for the reconstruction. To achieve the identifiability condition in Bayesian inference, we introduce a shrinkage prior. And for the situation where the number of haplotypes is unknown, we perform model selection within our Bayesian framework to help us choose the number of haplotypes adaptively.