Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach

نویسنده

  • Amit Belani
چکیده

A bag-of-words based probabilistic classifier is trained using regularized logistic regression to detect vandalism in the English Wikipedia. Isotonic regression is used to calibrate the class membership probabilities. Learning curve, reliability, ROC, and cost analysis are performed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages

The malicious modification of articles, termed vandalism, is a serious problem for open access encyclopedias such as Wikipedia. Wikipedia’s counter-vandalism bots and past vandalism detection research have greatly reduced the exposure and damage of common and obvious types of vandalism. However, there remains increasingly more sneaky types of vandalism that are clearly out of context of the sen...

متن کامل

Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals - Lab Report for PAN at CLEF 2010

Wikipedia is an online encyclopedia that anyone can edit. In this open model, some people edits with the intent of harming the integrity of Wikipedia. This is known as vandalism. We extend the framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection. In this approach, several vandalism indicating features are extracted from edits in a vandalism corpus and ar...

متن کامل

Wikipedia Vandalism Detection Through Machine Learning : Feature Review and New Proposals ∗ Lab Report for PAN at CLEF 2010

Wikipedia is an online encyclopedia that anyone can edit. In this open model, some people edits with the intent of harming the integrity of Wikipedia. This is known as vandalism. We extend the framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection. In this approach, several vandalism indicating features are extracted from edits in a vandalism corpus and ar...

متن کامل

Wiki Vandalysis - Wikipedia Vandalism Analysis

Wikipedia describes itself as the “free encyclopedia that anyone can edit”. Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Deterring and reverting vandalism has become one of the major challenges of Wikipedia as its size grows. Wikipedia editors fight vandalism both manuall...

متن کامل

Wiki Vandalysis - Wikipedia Vandalism Analysis - Lab Report for PAN at CLEF 2010

Wikipedia describes itself as the “free encyclopedia that anyone can edit”. Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Deterring and reverting vandalism has become one of the major challenges of Wikipedia as its size grows. Wikipedia editors fight vandalism both manuall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1001.0700  شماره 

صفحات  -

تاریخ انتشار 2009