Unsupervised Spam Detection by Document Probability Estimation with Maximal Overlap Method

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Spam Detection by Document Probability Estimation with Maximal Overlap Method

In this paper, we study content-based spam detection for spams that are generated by copying a seed document with some random perturbations. We propose an unsupervised detection algorithm based on an entropy-like measure called document complexity, which reflects how many similar documents exist in the input collection of documents. As the document complexity, however, is an ideal measure like ...

متن کامل

Unsupervised Spam Detection by Document Complexity Estimation

In this paper, we study a content-based spam detection for a specific type of spams called blog and bulletin board spams. We develop an efficient unsupervised algorithm DCE that, detects spam documents from a mixture of spam and non-spam documents using a compression-based similarity measure, called the document complexity. Using suffix trees, the algorithm computes the document complexity for ...

متن کامل

BotOnus: an online unsupervised method for Botnet detection

Botnets are recognized as one of the most dangerous threats to the Internet infrastructure. They are used for malicious activities such as launching distributed denial of service attacks, sending spam, and leaking personal information. Existing botnet detection methods produce a number of good ideas, but they are far from complete yet, since most of them cannot detect botnets in an early stage ...

متن کامل

Semantic Document Distance Measures and Unsupervised Document Revision Detection

In this paper, we model the document revision detection problem as a minimum cost branching problem that relies on computing document distances. Furthermore, we propose two new document distance measures, word vector-based Dynamic Time Warping (wDTW) and word vector-based Tree Edit Distance (wTED). Our revision detection system is designed for a large scale corpus and implemented in Apache Spar...

متن کامل

Unsupervised and Supervised Neural Network Learning for Spam Detection

With the rise of technology over the last several decades, spam detection has become an important machine learning problem. Nowadays abundance of labeled email data allows to build automatic systems effectively detecting spam. The majority of these systems are based on supervised machine learning classifiers. Even though some unsupervised systems exist as well, they are much less popular due to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Japanese Society for Artificial Intelligence

سال: 2011

ISSN: 1346-0714,1346-8030

DOI: 10.1527/tjsai.26.297