A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models
Sprache des Titels:
Proceedings of the 7th ACM SIGIR International Conference on the Theory of Information Retrieval and the 11th International Conference on the Theory of Information Retrieval
Existing neural ranking models follow the text matching paradigm, where document-to-query relevance is estimated through predicting a matching score. Drawing from the rich literature of classical generative retrieval models, we introduce and formalize the paradigm of deep generative retrieval models defined via the cumulative probabilities of generating query terms. This paradigm offers a grounded probabilistic view on relevance estimation while still enabling the use of modern BERT and transformer architectures. In contrast to the matching paradigm, the probabilistic nature of these generative rankers readily offers a fine-grained measure of uncertainty, without imposing any computational overhead nor any need for model modification. We adopt several current neural generative models in our framework and also introduce a novel generative ranker (T-PGN), which combines the encoding capacity of Transformers with the Pointer Generator Network model. We conduct an extensive set of evaluation experiments on passage retrieval, leveraging the MS MARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking collections. Lastly, to demonstrate potential benefits of using neural generative retrieval models for downstream tasks, we leverage the uncertainty information they provide to significantly improve the cut-off prediction task.