Classifying Arab Names Geographically

نویسندگان

  • Hamdy Mubarak
  • Kareem Darwish
چکیده

Different names may be popular in different countries. Hence, person names may give a clue to a person’s country of origin. Along with other features, mapping names to countries can be helpful in a variety of applications such as country tagging twitter users. This paper describes the collection of Arabic Twitter user names that are either written in Arabic or transliterated into Latin characters along with their stated geographical locations. To classify previously unseen names, we trained naive Bayes and Support Vector Machine (SVM) multi-class classifiers using primarily bag-of-words features. We are able to map Arabic user names to specific Arab countries with 79% accuracy and to specific regions (Gulf, Egypt, Levant, Maghreb, and others) with 94% accuracy. As for transliterated Arabic names, the accuracy per country and per region was 67% and 83% respectively. The approach is generic and language independent, and can be used to collect and classify names to other countries or regions, and considering language-dependent name features (like the compound names, and person titles) yields to better results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Literary Anthroponomastics of Three Selected African Novels: A Cross Cultural Perspective

Names as markers of identity are a source of a wide variety of information. This paper explores the names of characters to show the sociocultural factors which influence the choice of names and the effects that the names of these characters have on the roles they play. Using a variety of personal names from Ayi Kwei Armah’s Fragments, Buchi Emecheta’s The Joys of Motherhood, a...

متن کامل

سیستم شناسایی و طبقه بندی اسامی در متون فارسی

Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...

متن کامل

Hepatitis E Virus Infection in Dromedaries, North and East Africa, United Arab Emirates, and Pakistan, 1983–2015

A new hepatitis E virus (HEV-7) was recently found in dromedaries and 1 human from the United Arab Emirates. We screened 2,438 dromedary samples from Pakistan, the United Arab Emirates, and 4 African countries. HEV-7 is long established, diversified and geographically widespread. Dromedaries may constitute a neglected source of zoonotic HEV infections.

متن کامل

Implicit Discrimination in Hiring: Real World Evidence

Implicit Discrimination in Hiring: Real World Evidence This is the first study providing evidence of a new form of discrimination, implicit discrimination, acting in real economic life. In a two-stage field experiment we first measure the difference in callbacks for interview for applicants with Arab/Muslim sounding names compared to applicants with Swedish sounding names using the corresponden...

متن کامل

Postmodern Orientalized Terrorism: Don DeLillo’s The Names

The terrorism of obscurantism is one of the hallmarks of Don DeLillo’s The Names (1982), distinguishing it as one of the "difficult writings" in his canon. Terrorism, however, is not confined to the novel’s poetics of writing, it constitutes, as the arch-motif of the novel, its politics as well. Relying on the Orientalist bulk of knowledge about the Orient, DeLillo, in this novel, inaugurates a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015