Extracting Information-Rich Part of Texts Using Text Denoising
نویسنده
چکیده
The aim of this paper is to report on a novel text reduction technique, called Text Denoising, that highlights information-rich content when processing a large volume of text data, especially from the biomedical domain. The core feature of the technique, the text readability index, embodies the hypothesis that complex text is more informationrich than the rest. When applied on tasks like biomedical relation bearing text extraction, keyphrase indexing and extracting sentences describing protein interactions, it is evident that the reduced set of text produced by text denoising is more information-rich than the rest.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملOntologies and Information Extraction
An ontology is a description of conceptual knowledge organized in a computerbased representation while information extraction (IE) is a method for analyzing texts expressing facts in natural language and extracting relevant pieces of information from these texts. IE and ontologies are involved in two main and related tasks, • Ontology is used for Information Extraction: IE needs ontologies as p...
متن کاملImproving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm
Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...
متن کاملآموزش سواد اطلاعاتی به کودکان 7 تا 11 ساله ایرانی
Purpose: To develop instructional objectives for implementing an information literacy instruction program for Iranian children (7-11 years old) based on the information literacy standards of American Association of School Library (AASL). Methodology: In this research, the following methods were used: a literature review in order to extract the instructional objectives of information literacy b...
متن کاملThe Impact of Input Enrichment in Long Text vs. Short Texts on Grammatical Accuracy in Writing Among Elementary Language Learners
This study was conducted to investigate the influence of teaching accurate grammar inwriting via enriched long text and short text for the elementary students atShokouhe_Farhang institute. The homogenized subjects were divided into two groups of 18and 17 participants. Using a writing exam as a pretest in order to check the students’knowledge in English past tense. The control group received the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013