urdu

Urdu in a parallel grammar development environment

Journal: :Language Resources and Evaluation 2007

Miriam Butt Tracy Holloway King

Abstract. In this paper, we report on the role of the Urdu grammar in the Parallel Grammar (ParGram) project (Butt et al., 1999; Butt et al., 2002). The Urdu grammar was able to take advantage of standards in analyses set by the original grammars in order to speed development. However, novel constructions, such as correlatives and extensive complex predicates, resulted in expansions of the anal...

متن کامل

Towards AI-Enabled Approach for Urdu Text Recognition: A Legacy for Urdu Image Apprehension

Journal: :IEEE Access 2022

Recognizing Urdu text in natural images is more challenging as compared to other languages, such English, due the cursive nature of script. However, scene has not received enough attention from both industry and academia lack dataset text. We propose a large-scale Scene Text Dataset (USTD) address this problem, which designed for detection recognition. The proposed contains 29674 annotations (1...

متن کامل

The Parallel Grammar Project

2002

Miriam Butt Helge Dyvik Tracy Holloway King Hiroshi Mashuichi Christian Rohrer

We report on the Parallel Grammar (ParGram) project which uses the XLE parser and grammar development platform for six languages: English, French, German, Japanese, Norwegian, and Urdu.1

متن کامل

Hindi Urdu Machine Transliteration using Finite-State Transducers

2008

Muhammad Ghulam Abbas Malik Christian Boitet Pushpak Bhattacharyya

Finite-state Transducers (FST) can be very efficient to implement inter-dialectal transliteration. We illustrate this on the Hindi and Urdu language pair. FSTs can also be used for translation between surface-close languages. We introduce UIT (universal intermediate transcription) for the same pair on the basis of their common phonetic repository in such a way that it can be extended to other l...

متن کامل

A Tagged Corpus and a Tagger for Urdu

2014

Bushra Jawaid Amir Kamran Ondrej Bojar

In this paper, we describe a release of a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the tagged corpus. Additi...

متن کامل

Analysis and Development of Urdu POS Tagged Corpus

2009

Ahmed Muaz Aasim Ali Sarmad Hussain

In this paper, two corpora of Urdu (with 110K and 120K words) tagged with different POS tagsets are used to train TnT and Tree taggers. Error analysis of both taggers is done to identify frequent confusions in tagging. Based on the analysis of tagging, and syntactic structure of Urdu, a more refined tagset is derived. The existing tagged corpora are tagged with the new tagset to develop a singl...

متن کامل

Recognition of Printed Urdu Script

2003

Umapada Pal Anirban Sarkar

This paper deals with an Optical Character Recognition system for printed Urdu, a popular Indian script. The development of OCR for this script is difficult because (i) a large number of characters have to be recognized (ii) there are many similar shaped characters. In the proposed system individual characters are recognized using a combination of topological, contour and water reservoir concep...

متن کامل

Extracting and Classifying Urdu Multiword Expressions

2011

Annette Hautli Sebastian Sulger

This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The MWEs are extracted by an unsupervised method and classified into two distinct classes, namely locations and person names. The classification is based on simple heuristics that take the co-occurren...

متن کامل

SCTUR: A Sentiment Classification Technique for URDU Text

2014

Nasir Gul

Sentiment analysis is an important current research area. The demand for sentiment analysis and classification is growing day by day; this paper presents a novel method to classify Urdu documents as previously no work recorded on sentiment classification for Urdu text. We consider the problem by determining whether the review or sentence is positive, negative or neutral. For the purpose we use ...

متن کامل

Named Entity Recognition System for Urdu

2012

UmrinderPal Singh Vishal Goyal Gurpreet Singh Lehal

Named Entity Recognition (NER) is a task which helps in finding out Persons name, Location names, Brand names, Abbreviations, Date, Time etc and classifies them into predefined different categories. NER plays a major role in various Natural Language Processing (NLP) fields like Information Extraction, Machine Translations and Question Answering. This paper describes the problems of NER in the c...

متن کامل