An open-source toolkit for mining Wikipedia
نویسندگان
چکیده
منابع مشابه
An open-source toolkit for mining Wikipedia
The online encyclopedia Wikipedia is a vast repository of information. For developers and researchers it represents a giant multilingual database of concepts and semantic relations; a promising resource for natural language processing and many other research areas. In this paper we introduce the Wikipedia Miner toolkit: an open-source collection of code that allows researchers and developers to...
متن کاملPyCP: An Open-Source Conformal Predictions Toolkit
The Conformal Predictions framework is a new game-theoretic approach to reliable machine learning, which provides a methodology to obtain error calibration under classification and regression settings. The framework combines principles of transductive inference, algorithmic randomness and hypothesis testing to provide guaranteed error calibration in online settings (and calibration in offline s...
متن کاملFamilia: An Open-Source Toolkit for Industrial Topic Modeling
Familia is an open-source toolkit for pragmatic topic modeling in industry. Familia abstracts the utilities of topic modeling in industry as two paradigms: semantic representation and semantic matching. Efficient implementations of the two paradigms are made publicly available for the first time. Furthermore, we provide off-the-shelf topic models trained on large-scale industrial corpora, inclu...
متن کاملTHUMT: An Open Source Toolkit for Neural Machine Translation
This paper introduces THUMT, an opensource toolkit for neural machine translation (NMT) developed by the Natural Language Processing Group at Tsinghua University. THUMT implements the standard attention-based encoder-decoder framework on top of Theano and supports three training criteria: maximum likelihood estimation, minimum risk training, and semi-supervised training. It features a visualiza...
متن کاملJoshua: An Open Source Toolkit for Parsing-Based Machine Translation
We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, ngram language model integration, beamand cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Artificial Intelligence
سال: 2013
ISSN: 0004-3702
DOI: 10.1016/j.artint.2012.06.007