Grand Challenge: Producing Meaningful Texts

نویسنده

  • Sandra Kuebler
چکیده

We will develop an automatic system capable of understanding scientific and humanistic texts, identifying novel ideas and key concepts, and then producing specialized summaries understandable for different audiences. The system will collect information from research publications, identify major topics, and produce readable summaries that can be targeted towards, for instance, other researchers, the general public, school children, or people with reading impairments. This task requires contributions from diverse fields, e.g. Computational Linguistics, Artificial Intelligence, Cognitive Science, and Science of Science. Subtasks include the automatic linguistic analysis of texts/recordings, production of text from abstract representations of meaning, identification of key topics in documents and of texts with a high potential for being cited, etc. While major progress has been made in some subtasks, major limitations remain: automatic generation systems often produce incoherent/unreadable text, while readable text can be reliably produced only in very limited areas such as fantasy football reporting. As such, the approaches we develop will catapult computational language understanding into a new era. We will produce a system that generates press releases summarizing university research that are judged (1) understandable and (2) indistinguishable from human-authored press releases by at least 50% of adult readers. The system can be adapted to rephrase medical texts, patents, intelligence information from a wide range of sources, etc. IU is ideally suited to address this challenge because we have a critical mass of researchers from different fields working on those subareas, e.g., Computer Science, Computational Linguistics, Cognitive Science, and Information and Library Science. Grand Challenge: Producing Meaningful Texts 1. The Grand Challenge We will develop an automatic system capable of understanding scientific and humanistic texts, identifying novel ideas and key concepts, and then producing specialized summaries understandable for different audiences. The system will collect information from research publications, identify major topics, and produce readable summarizes that can be targeted towards, for instance, other researchers, the general public, primary and secondary school children, or people with reading impairments. This task requires contributions from diverse fields, e.g. Computational Linguistics, Artificial Intelligence, Cognitive Science, and Science of Science, as well as from domain experts from the sciences and the humanities. Subtasks include the automatic linguistic analysis of texts and recordings, production of text from abstract representations of meaning, identification of key topics in documents and of texts with a high potential for being cited, etc. While major progress has been made in some subtasks, major limitation remain: automatic generation systems often produce incoherent/unreadable text, while readable text can be reliably produced only in very limited areas such as fantasy football reporting. As such, the approaches we develop will catapult computational text understanding into a new era.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Brain Tissue Segmentation of Multi-sequence MR images using Random Decision Forests MICCAI Grand Challenge: MR Brain Image Segmentation 2013

This work is integrated in the MICCAI Grand Challenge: MR Brain Image Segmentation 2013. It aims for the automatic segmentation of brain into Cerebrospinal fluid (CSF), Gray matter (GM) and White matter (WM). The provided dataset contains patients with white matter lesions, which makes the segmentation task more challenging. The proposed algorithm uses multisequence MR images to extract meaning...

متن کامل

Introduction to the Shared Task on Comparing Semantic Representations

Seven groups participated in the STEP 2008 shared task on comparing semantic representations as output by practical wide-coverage NLP systems. Each of this groups developed their own system for producing semantic representations for texts, each in their own semantic formalism. Each group was requested to provide a short sample text, producing a shared task set of seven texts, allowing participa...

متن کامل

The Human Language Project: Building a Universal Corpus of the World's Languages

We present a grand challenge to build a corpus that will include all of the world’s languages, in a consistent structure that permits large-scale cross-linguistic processing, enabling the study of universal linguistics. The focal data types, bilingual texts and lexicons, relate each language to one of a set of reference languages. We propose that the ability to train systems to translate into a...

متن کامل

The Grand Challenges and nursing.

66 V O L U M E 5 8 W hat are the Grand Challenges? For nearly 30 years, governments around the world have been allocating their resources to address problems known by several specific names, but generally known as the Grand Challenges. These problems were modeled after the work of David Hilbert, a mathematician, who more than 100 years ago developed a list of the unsolved problems he believed t...

متن کامل

A Grand Convergence in Mortality is Possible: Comment on Global Health 2035

The grand challenge in global health is the inequality in mortality and life expectancy between countries and within countries. According to Global Health 2035, the Lancet Commission celebrating the 20th anniversary of the World Development Report (WDR) of 1993, the world now has the unique opportunity to achieve a grand convergence in global mortality within a generation. This article comments...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015