Distilling Monolingual Models from Large Multilingual Transformers

نویسندگان

چکیده

Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual such as mBERT and XLM-RoBERTa, which come with significant overheads deployment vis-à-vis their model size, inference speeds, etc. We attempt tackle this problem by proposing a novel methodology apply knowledge distillation techniques filter language-specific information from into small, fast monolingual that can often outperform the teacher model. demonstrate viability of on two downstream tasks each six languages. further dive possible modifications basic setup exploring ideas tune final vocabulary distilled models. Lastly, we perform detailed ablation study understand different components better find out what works best under-resourced languages, Swahili Slovene.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual vs. Monolingual User Models for Personalized Multilingual Information Retrieval

This paper demonstrates that a user of multilingual search has different interests depending on the language used, and that the user model should reflect this. To demonstrate this phenomenon, the paper proposes and evaluates a set of result re-ranking algorithms based on various user model representations.

متن کامل

Multilingual Versus Monolingual WSD

Although it is generally agreed that Word Sense Disambiguation (WSD) is an application dependent task, the great majority of the efforts has aimed at the development of WSD systems without considering their application. We argue that this strategy is not appropriate, since some aspects, such as the sense repository and the disambiguation process itself, vary according to the application. Taking...

متن کامل

Multilingual Aspects of Monolingual Corpora

If someone would collect opinions among the computational linguists what had been the most important trend in linguistics in the last decade, it is highly probable that the majority would answer that it was the massive use of large natural language corpora in many linguistic fields. The concept of collecting large amounts of written or spoken natural language data has become extremely important...

متن کامل

Multilingual Approach to e-Learning from a Monolingual Perspective

This paper describes the efforts undertaken in an international research project LT4eL from the perspective of one of the participating languages, Czech. The project aims at exploiting language technologies for adding new functionalities to an open source Learning Management System ILIAS. The new functionalities are based both on existing and newly developed tools for all languages involved. Th...

متن کامل

Distilling Intractable Generative Models

A generative model’s partition function is typically expressed as an intractable multi-dimensional integral, whose approximation presents a challenge to numerical and Monte Carlo integration. In this work, we propose a new estimation method for intractable partition functions, based on distilling an intractable generative model into a tractable approximation thereof, and using the latter for pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2023

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics12041022