Human Reading and the Curse of Dimensionality
نویسنده
چکیده
Whereas optical character recognition (OCR) systems learn to classify single characters; people learn to classify long character strings in parallel, within a single fixation . This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images is reduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1) . OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969). OCR (Low Dbnensionality) I Dorothy lived In the .... I [Q] ... _ ................................. .. "D" ~ ................................. .. "0" o ~ "R" HUnlan Reading (High Dbnensionality) I Dorothy lived In the midst of the ..... I Dorothy lil Ilived In the I ....... I midst of the I .............. "DOROTHY LI" .. . .. )00 "LIVED IN THE" ... "MIDST OF THE" Figure 1: Character classification versus character sequence classification. This is an interesting difference because high dimensionality is associated with poor classification learning-the so-called curse of dimensionality (Denker, et ali 1987; Geman, Bienenstock, & Doursat, 1992). OCR systems are designed to classify single characters to minimize such problems. The fact that most people learn to read quite well even with the high dimensional inputs and outputs, implies that variance
منابع مشابه
Sub-pixel classification of hydrothermal alteration zones using a kernel-based method and hyperspectral data; A case study of Sarcheshmeh Porphyry Copper Mine and surrounding area, Kerman, Iran
Remote sensing image analysis can be carried out at the per-pixel (hard) and sub-pixel (soft) scales. The former refers to the purity of image pixels, while the latter refers to the mixed spectra resulting from all objects composing of the image pixels. The spectral unmixing methods have been developed to decompose mixed spectra. Data-driven unmixing algorithms utilize the reference data called...
متن کاملIncome Inequality and the Oil Curse: The Case of Oil-Rich Developing Countries
While most literature on natural resource curse highlight its effect on the growth rate and the level of income, this paper shifts the focus toward the effect of oil dependence on the distribution of income in oil-rich developing countries (includiong Iran and 18 other countries). Moreover, the paper studies the impact of institutional quality and the interaction effect of different institution...
متن کاملOn Bias, Variance, 0/1 - Loss, and the Curse of Dimensionality
The purpose of this document is to summarize the main points from the paper, “On Bias, Variance, 0/1 Loss, and the Curse of Dimensionality”, written by Jerome H.Friedman(1997).
متن کاملUsing Randomization to Break the Curse of Dimensionality
This paper introduces random versions of successive approximations and multigrid algorithms for computing approximate solutions to a class of finite and infinite horizon Markovian decision problems (MDPs). We prove that these algorithms succeed in breaking the “curse of dimensionality” for a subclass of MDPs known as discrete decision processes (DDPs).
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995