Detecting Fake Content with Relative Entropy Scoring

نویسندگان

  • Thomas Lavergne
  • Tanguy Urvoy
  • François Yvon
چکیده

How to distinguish natural texts from artificially generated ones ? Fake content is commonly encountered on the Internet, ranging from web scraping to random word salads. Most of this fake content is generated for spam purpose. In this paper, we present two methods to deal with this problem. The first one uses classical language models, while the second one is a novel approach using short range information between words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Filtering artificial texts with statistical machine learning techniques

Fake content is flourishing on the Internet, ranging from basic random word salads to web scraping. Most of this fake content is generated for the purpose of nourishing fake web sites aimed at biasing search engine indexes: at the scale of a search engine, using automatically generated texts render such sites harder to detect than using copies of existing pages. In this paper, we present three ...

متن کامل

On the Use of Similarity Search to Detect Fake Scientific Papers

Fake scientific papers have recently become of interest within the academic community as a result of the identification of fake papers in the digital libraries of major academic publishers [8]. Detecting and removing these papers is important for many reasons. We describe an investigation into the use of similarity search for detecting fake scientific papers by comparing several methods for sig...

متن کامل

Using LDA to detect semantically incoherent documents

Detecting the semantic coherence of a document is a challenging task and has several applications such as in text segmentation and categorization. This paper is an attempt to distinguish between a ‘semantically coherent’ true document and a ‘randomly generated’ false document through topic detection in the framework of latent Dirichlet analysis. Based on the premise that a true document contain...

متن کامل

Detecting Fake Websites Using Swarm Intelligence Mechanism in Human Learning

The internet and its various services have made users to easily communicate with each other. Internet benefits including online business and e-commerce. E-commerce has boosted online sales and online auction types. Despite their many uses and benefits, the internet and their services have various challenges, such as information theft, which challenges the use of these services. Information thef...

متن کامل

Detecting Fake Escrow Websites using Rich Fraud Cues and Kernel Based Methods

The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, fake escrow website categorization has received little attention. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of features extracted from...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008