Unsupervised Spam Detection by Document Probability Estimation with Maximal Overlap Method
نویسندگان
چکیده
منابع مشابه
Unsupervised Spam Detection by Document Probability Estimation with Maximal Overlap Method
In this paper, we study content-based spam detection for spams that are generated by copying a seed document with some random perturbations. We propose an unsupervised detection algorithm based on an entropy-like measure called document complexity, which reflects how many similar documents exist in the input collection of documents. As the document complexity, however, is an ideal measure like ...
متن کاملUnsupervised Spam Detection by Document Complexity Estimation
In this paper, we study a content-based spam detection for a specific type of spams called blog and bulletin board spams. We develop an efficient unsupervised algorithm DCE that, detects spam documents from a mixture of spam and non-spam documents using a compression-based similarity measure, called the document complexity. Using suffix trees, the algorithm computes the document complexity for ...
متن کاملBotOnus: an online unsupervised method for Botnet detection
Botnets are recognized as one of the most dangerous threats to the Internet infrastructure. They are used for malicious activities such as launching distributed denial of service attacks, sending spam, and leaking personal information. Existing botnet detection methods produce a number of good ideas, but they are far from complete yet, since most of them cannot detect botnets in an early stage ...
متن کاملSemantic Document Distance Measures and Unsupervised Document Revision Detection
In this paper, we model the document revision detection problem as a minimum cost branching problem that relies on computing document distances. Furthermore, we propose two new document distance measures, word vector-based Dynamic Time Warping (wDTW) and word vector-based Tree Edit Distance (wTED). Our revision detection system is designed for a large scale corpus and implemented in Apache Spar...
متن کاملUnsupervised and Supervised Neural Network Learning for Spam Detection
With the rise of technology over the last several decades, spam detection has become an important machine learning problem. Nowadays abundance of labeled email data allows to build automatic systems effectively detecting spam. The majority of these systems are based on supervised machine learning classifiers. Even though some unsupervised systems exist as well, they are much less popular due to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Japanese Society for Artificial Intelligence
سال: 2011
ISSN: 1346-0714,1346-8030
DOI: 10.1527/tjsai.26.297