Learning at Low False Positive Rates
نویسندگان
چکیده
Most spam filters are configured for use at a very low falsepositive rate. Typically, the filters are trained with techniques that optimize accuracy or entropy, rather than performance in this configuration. We describe two different techniques for optimizing for the low false-positive region. One method weights good data more than spam. The other method uses a two-stage technique of first finding data in the low false-positive region, and then learning using this subset. We show that with two different learning algorithms, logistic regression and Naive Bayes, we achieve substantial improvements, reducing missed spam by as much as 20% relative for logistic regression and 40% for Naive Bayes at the same low false-positive rate.
منابع مشابه
Automatic Sperm Analysis in Microscopic Images of Human Semen: Segmentation Using Minimization of Information Distance
Introduction The morphologic features of human sperms are key indicators for monitoring fertility problems in men. Therefore, automated analyzing methods via microscopic videos have become the most favorite policy in infertility treatment during the last decades. Materials and Methods In the proposed method, firstly a hypothesis testing framework was defined to distinguish sperms from backgroun...
متن کاملFast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
This paper develops a new approach for extremely fast detection in domains where the distribution of positive and negative examples is highly skewed (e.g. face detection or database retrieval). In such domains a cascade of simple classifiers each trained to achieve high detection rates and modest false positive rates can yield a final detector with many desirable features: including high detect...
متن کاملEfficient pedestrian detection by directly optimize the partial area under the ROC curve
Many typical applications of object detection operate within a prescribed false-positive range. In this situation the performance of a detector should be assessed on the basis of the area under the ROC curve over that range, rather than over the full curve, as the performance outside the range is irrelevant. This measure is labelled as the partial area under the ROC curve (pAUC). Effective casc...
متن کاملAccepted Version Efficient Pedestrian Detection by Directly Optimizing the Partial Area under the Roc Curve *
Many typical applications of object detection operate within a prescribed false-positive range. In this situation the performance of a detector should be assessed on the basis of the area under the ROC curve over that range, rather than over the full curve, as the performance outside the range is irrelevant. This measure is labelled as the partial area under the ROC curve (pAUC). Effective casc...
متن کاملMcPAD: A multiple classifier system for accurate payload-based anomaly detection
Anomaly-based network Intrusion Detection Systems (IDS) are valuable tools for the defense-in-depth of computer networks. Unsupervised or unlabeled learning approaches for network anomaly detection have been recently proposed. Such anomaly-based network IDS are able to detect (unknown) zero-day attacks, although much care has to be dedicated to controlling the amount of false positives generate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006