Pretrained Transformers for Text Ranking: BERT and Beyond
نویسندگان
چکیده
The goal of text ranking is to generate an ordered list texts retrieved from a corpus in response query. Although the most common formulation search, instances task can also be found many natural language processing applications. This survey provides overview with neural network architectures known as transformers, which BERT best-known example. combination transformers and self-supervised pretraining has been responsible for paradigm shift (NLP), information retrieval (IR), beyond. In this survey, we provide synthesis existing work single point entry practitioners who wish gain better understanding how apply problems researchers pursue area. We cover wide range modern techniques, grouped into two high-level categories: transformer models that perform reranking multi-stage dense techniques directly. There are themes pervade our survey: handling long documents, beyond typical sentence-by-sentence NLP, addressing tradeoff between effectiveness (i.e., result quality) efficiency (e.g., query latency, model index size). recent innovations, aspects they applied relatively well understood represent mature techniques. However, there remain open research questions, thus addition laying out foundations pretrained ranking, attempts prognosticate where field heading.
منابع مشابه
Symbol Ranking Text Compression
In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method adds the concept of “symbol ranking”, as in ‘the next symbol is the one 3rd most likely in the present contex...
متن کاملA Glance into the Future of Transformers ... and Beyond
An overview of the research and developments into new transformer design, undertaken in the Department of Electrical and Computer Engineering, University of Canterbury, is presented. Initially, single phase, 50 Hz, l1/0.23kV pole mount distribution transformers were fitted with either silicon or amorphous steel cores. The transformer tanks were filled with either standard transformer oil or liq...
متن کاملEnabling Data Retrieval : by Ranking and Beyond
The ubiquitous usage of databases for managing structured data, compounded with the expanded reach of the Internet to end users, has brought forward new scenarios of data retrieval. Users often want to express non-traditional fuzzy queries with soft criteria, in contrast to Boolean queries, and to explore what choices are available in databases and how they match the query criteria. Conventiona...
متن کاملText Summarization: News and Beyond
Redundancy in large text collections, such as the web, creates both problems and opportunities for natural language systems. On the one hand, the presence of numerous sources conveying the same information causes difficulties for end users of search engines and news providers; they must read the same information over and over again. On the other hand, redundancy can be exploited to identify imp...
متن کاملBeyond Forks: Finding and Ranking Star Factorings for Decoupled Search
Star-topology decoupling is a recent search reduction method for forward state space search. The idea basically is to automatically identify a star factoring, then search only over the center component in the star, avoiding interleavings across leaf components. The framework can handle complex star topologies, yet prior work on decoupled search considered only factoring strategies identifying f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Synthesis lectures on human language technologies
سال: 2021
ISSN: ['1947-4040', '1947-4059']
DOI: https://doi.org/10.2200/s01123ed1v01y202108hlt053