Breaking Audio CAPTCHAs

نویسندگان

  • Jennifer Tam
  • Jirí Simsa
  • Sean Hyde
  • Luis von Ahn
چکیده

CAPTCHAs are computer-generated tests that humans can pass but current computer systems cannot. CAPTCHAs provide a method for automatically distinguishing a human from a computer program, and therefore can protect Web services from abuse by so-called “bots.” Most CAPTCHAs consist of distorted images, usually text, for which a user must provide some description. Unfortunately, visual CAPTCHAs limit access to the millions of visually impaired people using the Web. Audio CAPTCHAs were created to solve this accessibility issue; however, the security of audio CAPTCHAs was never formally tested. Some visual CAPTCHAs have been broken using machine learning techniques, and we propose using similar ideas to test the security of audio CAPTCHAs. Audio CAPTCHAs are generally composed of a set of words to be identified, layered on top of noise. We analyzed the security of current audio CAPTCHAs from popular Web sites by using AdaBoost, SVM, and k-NN, and achieved correct solutions for test samples with accuracy up to 71%. Such accuracy is enough to consider these CAPTCHAs broken. Training several different machine learning algorithms on different types of audio CAPTCHAs allowed us to analyze the strengths and weaknesses of the algorithms so that we could suggest a design for a more robust audio CAPTCHA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decaptcha: Breaking 75% of eBay Audio CAPTCHAs

CAPTCHA tests aim at preventing attackers from performing automatic website registration. In this paper we show that our prototype Decaptcha is able to successfully break 75% of eBay audio captchas. We compare its performance with the state of the art, readily available speech recognition system Sphinx and discuss the implications for eBay security.

متن کامل

Improving Audio CAPTCHAs

CAPTCHAs are computer generated tests that humans can pass but current computer systems cannot. CAPTCHAs provide a method for automatically distinguishing a human from a computer program, and therefore can protect web services from bots. Most CAPTCHAs consist of distorted images, usually text, for which a user must provide some description. Unfortunately, visual CAPTCHAs limit access to the mil...

متن کامل

Breaking CAPTCHAs on the Dark Web

On the Dark Web, several websites inhibit automated scraping attempts by employing CAPTCHAs. Scraping important content from a website is possible if these CAPTCHAs are solved by a web scraper. For this purpose, a Machine Learning tool is used, TensorFlow and an Optical Character Recognition tool, Tesseract to solve simple CAPTCHAs. Two sets of CATPCHAs, which are also used on some Dark Web web...

متن کامل

Breaking a Visual Captcha: a Novel Approach Using Hmm

Completely Automated Public Turing Tests to Tell Computers and Humans Apart (CAPTCHAs) are the automatic filters that are widely used these days to disallow any automated script that can perform the work of a human. CAPTCHAs are built in such a way that it is very difficult for any automated script to break them. In this paper, a novel approach for EZ-Gimpy CAPTCHAs has been proposed that first...

متن کامل

A Projection-based Segmentation Algorithm for Breaking MSN and YAHOO CAPTCHAs

Defeating a CAPTCHA test requires two procedures: segmentation and recognition. Recent research shows that the problem of segmentation is much harder than recognition. In this paper, a new projection-based segmentation algorithm is proposed for the MSN and Yahoo CAPTCHAs. Experimental results show that the proposed algorithm can improve correct segmentation rates ranging from 9% to 14% over the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008