Multi-Modal Joint Embedding Space Learning for Cross-Modalit Retrieval
Sprache des Vortragstitels:
Englisch
Original Kurzfassung:
Cross-modality retrieval encompasses retrieval tasks where the fetched items are of a different type than the search query, e.g., retrieving pictures relevant to a given text query. The state-of-the-art approach to cross-modality retrieval relies on learning a joint embedding space of the two modalities, where items from either modality are retrieved using nearest-neighbor search. In my talk I will review two different learning paradigms -- Deep Canonical Correlation Analysis and Pairwise Ranking Losses ? which both yield embedding spaces exhibiting properties beneficial for retrieval. I will present potential application scenarios as well as experimental retrieval results on two different modality pairs, namely text and images as well as audio and sheet-music.