Better PostScript than PostScript: portable self-extracting PostScript representation of scanned document images

نویسندگان

  • Qin Zhang
  • John M. Danskin
چکیده

We present a Pattern Matching Based Compression (PMBC) system which compresses scanned documents into PostScript format. The output of a PMBC system is a pattern library, or font, and a series of pattern indices and positions. PMBC represents scanned documents in the same way that word processing programs represent their output pages. We explore various PostScript representations of this output le, choosing the one resulting in the smallest ouput after compression with gzip. The resulting PostScript le doesn't require a separate decompression program to view and print, and is at least 50% smaller than the PostScript les generated by other conventional programs, such as tiitops.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PostScript and Acrobat / PDF - applications, troubleshooting, and cross-platform publishing

When there are many people who don't need to expect something more than the benefits to take, we will suggest you to have willing to reach all benefits. Be sure and surely do to take this postscript and acrobatpdf applications troubleshooting and cross platform publishing that gives the best reasons to read. When you really need to get the reason why, this postscript and acrobatpdf applications...

متن کامل

Generating Type 1 Fonts from METAFONT Sources

Nowadays most Printers demand PostScript files with scalable fonts instead of bitmapped fonts, as the later are not adequate for most cases. In addition, PDF files generated from PostScript files with embedded bitmapped fonts are poorly rendered on a computer screen. On the other hand, traditionally, PostScript files generated from TEX sources contained bitmapped fonts just because METAFONT gen...

متن کامل

Semi-Structured File Analysis for Information Integration

This paper describes a PostScript file analyzer for extracting information from Web PostScript documents. Our motivation for studying this problem is the building of an informationintegration system. The information extracted from these semi-structured files can be used to model the contents of Web information sources and to define semantic links between items of information. Extracted informat...

متن کامل

Research and Realization about Conversion Algorithm of PDF Format into PS Format

This paper firstly introduces the characteristics of PostScript document and PDF document as the basis, and proposes the necessity and the feasibility of the conversion from the PDF document format to the PostScript language program. Secondly, it studies the main algorithm and technology of the conversion process and realizes the information extraction for PDF document lastly, with achieving th...

متن کامل

Bilingual PRESRI - Integration of Multiple Research Paper Databases

Collecting all the papers in a research field is a first step towards an exhaustive survey. A number of research paper databases are available for searching papers. However, searchers are compelled to repeat the same search operation for each database if there are multiple databases for a research field. To improve such inefficient searching, we have developed PRESRI, which can construct an exh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997