Towards a Job Title Classification System

نویسندگان

  • Faizan Javed
  • Matt McNair
  • Ferosh Jacob
  • Meng Zhao
چکیده

Document classification for text, images and other applicable entities has long been a focus of research in academia and also finds application in many industrial settings. Amidst a plethora of approaches to solve such problems, machine-learning techniques have found success in a variety of scenarios. In this paper we discuss the design of a machine learning-based semi-supervised job title classification system for the online job recruitment domain currently in production at CareerBuilder.com and propose enhancements to it. The system leverages a varied collection of classification as well clustering algorithms. These algorithms are encompassed in an architecture that facilitates leveraging existing off-the-shelf machine learning tools and techniques while keeping into consideration the challenges of constructing a scalable classification system for a large taxonomy of categories. As a continuously evolving system that is still under development we first discuss the existing semi-supervised classification system which is composed of both clustering and classification components in a proximity-based classifier setup and results of which are already used across numerous products at CareerBuilder. We then elucidate our long-term goals for job title classification and propose enhancements to the existing system in the form of a two-stage coarse and fine level classifier augmentation to construct a cascade of hierarchical vertical classifiers. Preliminary results are presented using experimental evaluation on real world industrial data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Similarity Strategies for Job Title Classification

Automatic and accurate classification of items enables numerous downstream applications in many domains. These applications can range from faceted browsing of items to product recommendations and big data analytics. In the online recruitment domain, we refer to classifying job ads to pre-defined or custom occupation categories as job title classification. A large-scale job title classification ...

متن کامل

Stressful jobs and non-stressful jobs: a cluster analysis of office jobs.

The purpose of the study was to determine if office jobs could be characterized by a small number of combinations of stressors that could be related to job-title information and self-report of psychological strain. Two-hundred-and-sixty-two office workers from three public service organizations provided data on nine job stressors and seven indicators of psychological strain. Using cluster analy...

متن کامل

Classification of Web Job Advertisements: A Case Study

This work is concerned with classifying Web job advertisements against a standard classification system of occupations, by applying and comparing different text classification techniques. As a first step, we evaluated the classification algorithms using a hit/not-hit approach, that is either the prediction is correct or not compared to a gold classification provided by domain experts. Then, we ...

متن کامل

Job strain and cardiovascular risk factors: a cross sectional study of employed Danish men and women.

As part of the World Health Organisation initiated MONICA project, 2000 men and women aged 30, 40, 50, and 60 from the general population were invited to undergo a medical examination with special emphasis on cardiovascular disease. A total of 1504 (75%) participated, 1209 of whom were employed. The participants answered a questionnaire on working, social, and health conditions and underwent cl...

متن کامل

The Sources of Wage Variation: A Three-Way High-Dimensional Fixed Effects Model

This paper estimates a wage equation with three high-dimensional fixed effects – worker, firm, and job title – using a longitudinal matched employer-employee dataset covering virtually all Portuguese wage earners over a little more than two decades. The variation in log real hourly wages is decomposed into different components related to worker, firm, and job title characteristics (both observe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1606.00917  شماره 

صفحات  -

تاریخ انتشار 2014