Multi-granularity Prediction for Scene Text Recognition

نویسندگان

چکیده

Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have successively proposed and incorporating linguistic knowledge into STR models recently become a prominent trend. In work, we first draw inspiration from the recent progress Vision Transformer (ViT) to construct conceptually simple yet powerful model, which is built upon ViT outperforms previous state-of-the-art scene recognition, including both pure language-augmented methods. integrate knowledge, further propose Multi-Granularity Prediction strategy inject information language modality model implicit way, i.e. , subword representations (BPE WordPiece) widely-used NLP are introduced output space, addition conventional character level representation, while no independent (LM) adopted. The resultant algorithm (termed MGP-STR) able push performance envelop of even higher level. Specifically, it achieves average accuracy $$93.35\%$$ on standard benchmarks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scene Text Detection via Holistic, Multi-Channel Prediction

Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge. However, vast majority of the existing methods detect text within local regions, typically through extracting character, word or line level candidates followed by candidate aggregation and false positive elimination, which potent...

متن کامل

Text Recognition and Translation of Multi-Oriented, Multi-Language and Curved Text in Natural Scene Images

This study is about text detection and recognition in natural scene images. The main focus is on the detection, recognition and eventually, translation, of multi-oriented, multi-language and curvilinear text in such images. The study attempts to provide a solution that can detect and recognise such text since current leading mobile applications such as Word Lens and Google Goggles do not suppor...

متن کامل

Fused Text Segmentation Networks for Multi-oriented Scene Text Detection

In this paper, we introduce a novel end-end framework for multi-oriented scene text detection from an instanceaware segmentation perspective. We present Fused Text Segmentation Networks, which combine multi-level features during feature extracting as text instance may rely on finer feature expression compared to general objects. It detects and segments the text instance jointly and simultaneous...

متن کامل

Towards Text Recognition in Natural Scene Images

In this paper, we propose a novel methodology for text detection in natural scene images. The proposed methodology is based on an efficient binarization and enhancement technique followed by a suitable connected component analysis procedure. Image binarization successfully processes natural scene images having shadows, non-uniform illumination, low contrast and large signaldependent noise. Conn...

متن کامل

Scene Text Recognition with Bilateral Regression

This paper focuses on improving the recognition of text in images of natural scenes, such as storefront signs or street signs. This is a difficult problem due to lighting conditions, variation in font shape and color, and complex backgrounds. We present a word recognition system that addresses these difficulties using an innovative technique to extract and recognize foreground text in an image....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-19815-1_20