Semi-Supervised Contextual Word Sense Disambiguation for Data Augmentation

نویسندگان

چکیده

Yapay zekâ alan?nda son dönemlerde öne ç?kan derin ö?renme mimarilerinin, do?al dil i?leme konusunun önemli problemlerinden biri olan Anlam Belirsizli?i Giderme (ABG) çal??malar?nda kayda de?er geli?melere yol açt??? gözlemlenmektedir. Denetimli yöntemler rakiplerine göre daha yüksek performans sergilemektedirler. Bunun en büyük nedeni kullan?lan e?itim verilerinin büyüklükleridir. ABG problemi için ?ngilizce dili üzerinde elle-etiketlenmi? çok miktarda veri çevrim içi olarak eri?ilebilir durumdad?r. Ancak dü?ük-kaynakl? diller (DKD’ler) probleme uygun eksikli?i ya?amaktad?rlar. Yeterli derecede toplamak ve etiketlemek vakit al?c? maliyet gerektiren bir i?tir. Bu de?inmek a?mak üzere, bu çal??mada yar?-denetimli ba?lamsal anlam belirsizli?i giderme yakla??m?n?n art?r?m? (daha sonra denetimli ö?renmede verisi kullan?lmak üzere) kullan?labilece?inin gösterilmesi amaçlanm??t?r. ba?lamda özellikle DKD’lerde test bulman?n zor olmas? nedeniyle yakla??m?n do?rulu?unu ilerleyen kullan?labilirli?ini ispatlamak amac?yla çevrimiçi bulunan kullan?lm??t?r. Olu?turulan yöntemde öbek kümesi (seed set) ba?lam vektörleri (context embeddings) kullan?lmaktad?r. Yap?lan çal??ma 9 farkl? modelinde (ELMo, BERT, RoBERTa vb.) edilmi? her modelinin üzerindeki etkileri raporlanm??t?r. ?lk temel yakla??ma sonuçlar %28 do?ruluk oran?nda art??? sa?lanm??t?r. (ELMo ile ilk yakla??m %50,39 ELMo Öbek Esasl? Ortalama Benzerlik Modeli %78,06). Al?nan sonuçlara neticesinde, önerilen DKD’ler yönelik olu?turmak gelecek vaat eden etti?i gösterilmi?tir. makale [18]’deki çal??mam?z?n geni?letilmi? versiyonudur.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation with Semi-Supervised Learning

Current word sense disambiguation (WSD) systems based on supervised learning are still limited in that they do not work well for all words in a language. One of the main reasons is the lack of sufficient training data. In this paper, we investigate the use of unlabeled training data for WSD, in the framework of semi-supervised learning. Four semisupervised learning algorithms are evaluated on 2...

متن کامل

Word Sense Disambiguation by Semi-supervised Learning

In this paper we propose to use a semi-supervised learning algorithm to deal with word sense disambiguation problem. We evaluated a semi-supervised learning algorithm, local and global consistency algorithm, on widely used benchmark corpus for word sense disambiguation. This algorithm yields encouraging experimental results. It achieves better performance than orthodox supervised learning algor...

متن کامل

Semi-Supervised Learning for Word Sense Disambiguation: Quality vs. Quantity

In this paper, we discuss the importance of the quality against the quantity of automatically extracted examples for word sense disambiguation (WSD). We first show that we can build a competitive WSD system with a memory-based classifier and a feature set reduced to easily and efficiently computable features. We then show that adding automatically annotated examples improves the performance of ...

متن کامل

Review: Semi-Supervised Learning Methods for Word Sense Disambiguation

Word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying the appropriate sense of a word in a sentence, when the word has multiple meanings. Many approaches have been proposed to solve the problem, of which supervised learning approaches are the most successful. However supervised machine learning are limited by the difficulties...

متن کامل

Investigating Problems of Semi-supervised Learning for Word Sense Disambiguation

Word Sense Disambiguation (WSD) is the problem of determining the right sense of a polysemous word in a given context. In this paper, we will investigate the use of unlabeled data for WSD within the framework of semi supervised learning, in which the original labeled dataset is iteratively extended by exploiting unlabeled data. This paper addresses two problems occurring in this approach: deter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Tbv bilgisayar bilimleri ve mühendisli?i dergisi

سال: 2021

ISSN: ['1305-8991']

DOI: https://doi.org/10.54525/tbbmd.835744