Factors Influencing Robustness and Effectiveness of Conditional Random Fields in Active Learning Frameworks
نویسندگان
چکیده
Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.
منابع مشابه
Active learning: a step towards automating medical concept extraction
OBJECTIVE This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined. MATERIALS AND METHODS The compar...
متن کاملAspects of Semi-supervised and Active Learning in Conditional Random Fields
Conditional random fields are among the state-of-the art approaches to structured output prediction, and the model has been adopted for various real-world problems. The supervised classification is expensive, since it is usually expensive to produce labelled data. Unlabeled data are relatively cheap, but how to use it? Unlabeled data can be used to estimate marginal probability of observations,...
متن کاملConditional Random Fields for Airborne Lidar Point Cloud Classification in Urban Area
Over the past decades, urban growth has been known as a worldwide phenomenon that includes widening process and expanding pattern. While the cities are changing rapidly, their quantitative analysis as well as decision making in urban planning can benefit from two-dimensional (2D) and three-dimensional (3D) digital models. The recent developments in imaging and non-imaging sensor technologies, s...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملFactors influencing the Acceptance of E-learning among the Employees of Isfahan Oil Pipelines Company
This research aimed to determine the relationship between perceived organizational support (management support and organizational support), task interdependente and computer self-efficacy with behavioral intention to adopt e-learning system with the mediating role of individual norms in Isfahan Oil and Gas Pipeline Company. The research population was consisted of 250 employees and after the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015