An Off-the-shelf Approach to Authorship Attribution
نویسندگان
چکیده
Authorship detection is a challenging task due to many design choices the user has to decide on. The performance highly depends on the right set of features, the amount of data, in-sample vs. out-of-sample settings, and profilevs. instance-based approaches. So far, the variety of combinations renders off-the-shelf methods for authorship detection inappropriate. We propose a novel and generally deployable method that does not share these limitations. We treat authorship attribution as an anomaly detection problem where author regions are learned in feature space. The choice of the right feature space for a given task is identified automatically by representing the optimal solution as a linear mixture of multiple kernel functions (MKL). Our approach allows to include labelled as well as unlabelled examples to remedy the in-sample and out-of-sample problems. Empirically, we observe our proposed novel technique either to be better or on par with baseline competitors. However, our method relieves the user from critical design choices (e.g., feature set) and can therefore be used as an off-the-shelf method for authorship attribution.
منابع مشابه
The author who wasn’t there? Fairness and attribution in publications following access to population biobanks
We conducted a document analysis that explored publication ethics and authorship in the context of population biobanks from both a theoretical (e.g. normative documents) and practical (e.g. biobank-specific documentation) perspective. The aim was to provide an overview of the state of authorship attribution in population biobanks and attempt to fill the gap in discussions around the issue. Our ...
متن کاملPersonal Sense and Idiolect: Combining Authorship Attribution and Opinion Analysis
Subjectivity analysis and authorship attribution are very popular areas of research. However, work in these two areas has been done separately. Our conjecture is that by combining information about subjectivity in texts and authorship, the performance of both tasks can be improved. In the paper a personalized approach to opinion mining is presented, in which the notions of personal sense and id...
متن کاملEntropy-Based Authorship Search in Large Document Collections
The purpose of authorship search is to identify documents written by a particular author in large document collections. Standard search engines match documents to queries based on topic, and are not applicable to authorship search. In this paper we propose an approach to authorship search based on information theory. We propose relative entropy of style markers for ranking, inspired by the lang...
متن کاملThe Computational-Linguistic Approach to Forensic Authorship Attribution
This article examines the diversity of methods in authorship attribution through a lens which focuses attention on a single common element. The current state of authorship attribution study is spread throughout so many academic and non -academic disciplines that it is nigh impossible to describe all of the various assumptions about language and authorship. The disciplines involved in authorship...
متن کاملEPSMS and the Document Occurrence Representation for Authorship Identification - Notebook for PAN at CLEF 2011
This paper describes the participation of the PISIS team in the authorship identification track of PAN’11. We adopted two different strategies for the tasks of authorship attribution and authorship verification. For authorship attribution we performed experiments with a document occurrence representation using a standard classification-based approach. Results obtained with this approach were mi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014