MaxTract: Converting PDF to LTEX, MathML and Text

نویسندگان

  • Josef B. Baker
  • Alan P. Sexton
  • Volker Sorge
چکیده

In this paper we present the first public, online demonstration of MaxTract; a tool that converts PDF files containing mathematics into multiple formats including LTEX, HTML with embedded MathML, and plain text. Using a bespoke PDF parser and image analyser, we directly extract character and font information to use as input for a linear grammar which, in conjunction with specialised drivers, can accurately recognise and reproduce both the two dimensional relationships between symbols in mathematical formulae and the one dimensional relationships present in standard text. The main goals of MaxTract are to provide translation services into standard mathematical markup languages and to add accessibility to mathematical documents on multiple levels. This includes both accessibility in the narrow sense of providing access to content for print impaired users, such as those with visual impairments, dyslexia or dyspraxia, as well as more generally to enable any user access to the mathematical content at more re-usable levels than the merely visual. MaxTract produces output compatible with web browsers, screen readers, and tools such as copy and paste, which is achieved by enriching the regular text with mathematical markup. The output can also be used directly, within the limits of the presentation MathML produced, as machine readable mathematical input to software systems such as Mathematica or Maple.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tagged mathematics in PDFs for accessibility and other purposes

PDF has been the preferred format for publishing mathematics for many years now. With changes to methods of delivery (i.e., electronic rather than predominantly paper) there need to be corresponding enhancements in the document format. Not least among these can be implicit legal obligations to satisfy Accessibility criteria. The answer developed for PDF is tagging of document structure and cont...

متن کامل

PDF/A-3u as an Archival Format for Accessible Mathematics

Including LTEX source of mathematical expressions, within the PDF document of a text-book or research paper, has definite benefits regarding ‘Accessibility’ considerations. Here we describe three ways in which this can be done, fully compatibly with international standards ISO32000, ISO19005-3, and the forthcoming ISO32000-2 (PDF 2.0). Two methods use embedded files, also known as ‘attachments’...

متن کامل

Augmenting Presentation MathML for Search

The ubiquity of text search is both a boon and bane for the quest for math search. A bane in that user’s expectations are high regarding accuracy, in-context highlighting and similar features. Yet also a boon with the availability of highly evolved search engine libraries; Youssef has previously shown how an appropriate ‘textualization’ of mathematics into an indexable form allows standard text...

متن کامل

CEDRICS: When CEDRAM Meets Tralics

We describe CEDRICS, a general purpose system for automated journal production entirely based on a LTEX input format. We show how the very basic ideas that initiated the whole effort turned into an efficient system because of the ability of LTEX markup to parametrise simultaneously and without compromise high typographical quality for the PDF output as well as accurate XML metadata with (presen...

متن کامل

A Reappraisal of Online Mathematics Teaching Using LaTeX

The mathematics language LTEX is often seen as a legacy technology that is awkward to use. MathML a verbose language designed for data-exchange, and to be written and understood by machines is by contrast seen as something that will aid online mathematics and lack of browser support for it bemoaned. However LTEX can already do many of the things that MathML might promise. LTEX is here proposed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012