Estimating Comma Placement in Natural Language
نویسندگان
چکیده
We study the feasibility of identifying comma locations using both n-gram models and stochastic contextfree grammars (SCFGs). Specifically, our algorithms take an input sentence without commas and returns the positions where commas should be inserted, along with probability or confidence estimates. This can be generalized to correcting comma placement with minor modifications. However, we focus on this simpler comma insertion problem. Two widely used tools for processing natural language are n-gram models and SCFG parsers. N-grams provide a linear Markov model, while SCFG parsers build a rich hierarchical structure. In English, commas are typically used to separate phrases in a sentence. Hence, it would seem logical that SCFGs should have better performance than the n-gram model, which examines sentences in n-gram length chunks. However, n-grams can easily be trained on very large data sets, and thus can provide a rich source of statistical information. Since these language models are very di↵erent, this paper evaluates both.
منابع مشابه
Automatic Comma Insertion for Japanese Text Generation
This paper proposes a method for automatically inserting commas into Japanese texts. In Japanese sentences, commas play an important role in explicitly separating the constituents, such as words and phrases, of a sentence. The method can be used as an elemental technology for natural language generation such as speech recognition and machine translation, or in writing-support tools for non-nati...
متن کاملNoDE: A Benchmark of Natural Language Arguments
In the latest years, natural models of argumentation and argument mining are becoming more and more important topics in the argumentation community. Given this tendency, there is the need to produce standard datasets on which natural language approaches to argumentation can be evaluated. In this paper, we present NoDE, a benchmark of natural language arguments composed of three datasets, built ...
متن کاملThe white ‘comma’ as a distractive mark on the wings of comma butterflies
0003-3472 2013 The Authors. Published on behalf http://dx.doi.org/10.1016/j.anbehav.2013.10.003 Distractive marks have been suggested to prevent predator detection or recognition of a prey, by drawing the attention away from recognizable traits of the bearer. The white ‘comma’ on the wings of comma butterflies, Polygonia c-album, has been suggested to represent such a distractive mark. In a lab...
متن کاملArgument Mining Using Argumentation Scheme Structures
Argumentation schemes are patterns of human reasoning which have been detailed extensively in philosophy and psychology. In this paper we demonstrate that the structure of such schemes can provide rich information to the task of automatically identify complex argumentative structures in natural language text. By training a range of classifiers to identify the individual proposition types which ...
متن کاملCorrecting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text
While the field of grammatical error detection has progressed over the past few years, one area of particular difficulty for both native and non-native learners of English, comma placement, has been largely ignored. We present a system for comma error correction in English that achieves an average of 89% precision and 25% recall on two corpora of unedited student essays. This system also achiev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012