Insights from Network Structure for Text Mining
نویسندگان
چکیده
Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.
منابع مشابه
Situation and Text: Representation of Migrants Whilst the Escalation of Refugee Crisis in Great Britain as Compared to Russia
Increasing migration is a vital concern for a globalizing sociocultural environment in today’s world. The UK and developed European countries have become an attractive destination for asylum seekers (labelled as “migrants”) in the past decade. The rapid rise in the number of asylum seekers, which was labelled “migration crisis” (Ruz, 2015), made this topic an integral part of scientific discuss...
متن کاملMine Your Own Business: Market-Structure Surveillance Through Text Mining
W 2.0 provides gathering places for Internet users in blogs, forums, and chat rooms. These gathering places leave footprints in the form of colossal amounts of data regarding consumers’ thoughts, beliefs, experiences, and even interactions. In this paper, we propose an approach for firms to explore online user-generated content and “listen” to what customers write about their and their competit...
متن کاملUsing Complex Argumentative Interactions to Reconstruct the Argumentative Structure of Large-Scale Debates
In this paper we consider the insights that can be gained by considering large scale argument networks and the complex interactions between their constituent propositions. We investigate metrics for analysing properties of these networks, illustrating these using a corpus of arguments taken from the 2016 US Presidential Debates. We present techniques for determining these features directly from...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011