Genre annotation for the Web

نویسندگان

چکیده

Abstract This paper describes a digital curation study aimed at comparing the composition of large Web corpora, such as enTenTen, ukWac or ruWac, by means automatic text classification. First, presents Deep Learning model suitable for classifying texts from corpora using small number communicative functions, Argumentation Reporting. Second, it results applying classification to these and compares their composition. Finally, introduces framework interpreting genre linguistic features. The can help in general reference obtained across languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web services for genre vocabularies

This paper presents an approach for providing terminology Web services for controlled vocabulary terms. Services are implemented within a service oriented framework. A set of experimental services for genre vocabularies is provided through the MS Office Research pane, a built-in feature of Internet Explorer (IE) when users have loaded MS Office 2003. Web browsers, such as Mozilla Firefox and Op...

متن کامل

Web-Specific Genre Visualization

User interfaces to WWW search engines typically present results as ranked lists of documents. Such lists give users little help in understanding document variation: we propose a richer representation of retrieval results in the search interface. Fundamental to us is the notion of document grouping. We use both stylistic genre-based document categorization and statistical content-based clusterin...

متن کامل

Annotation for the Deep Web

most of these approaches build on the assumption that the information sources are static, such as static HTML pages or books in a library (scenarios A and B in Table 1). Today, however, a large percentage of Web pages are dynamic. Estimates about the ratio of static to dynamic pages based on Web pages actually crawled by search engines typically conclude that dynamic Web pages outnumber static ...

متن کامل

Shallow Discourse Genre Annotation in CallHome Spanish

The classification of speech genre is not yet an established task in language technologies. However we believe that it is a task that will become fairly important as large amounts of audio (and video) data become widely available. The technological cability to easily transmit and store all human interactions in audio and video could have a radical impact on our social structure. The major open ...

متن کامل

Emotional Sentence Annotation Helps Predict Fiction Genre

Fiction, a prime form of entertainment, has evolved into multiple genres which one can broadly attribute to different forms of stories. In this paper, we examine the hypothesis that works of fiction can be characterised by the emotions they portray. To investigate this hypothesis, we use the work of fictions in the Project Gutenberg and we attribute basic emotional content to each individual se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Register studies

سال: 2021

ISSN: ['2542-9485', '2542-9477']

DOI: https://doi.org/10.1075/rs.19015.sha