Visualizing Authorship for Identification
نویسندگان
چکیده
As a result of growing misuse of online anonymity, researchers have begun to create visualization tools to facilitate greater user accountability in online communities. In this study we created an authorship visualization called Writeprints that can help identify individuals based on their writing style. The visualization creates unique writing style patterns that can be automatically identified in a manner similar to fingerprint biometric systems. Writeprints is a principal component analysis based technique that uses a dynamic feature-based sliding window algorithm, making it well suited at visualizing authorship across larger groups of messages. We evaluated the effectiveness of the visualization across messages from three English and Arabic forums in comparison with Support Vector Machines (SVM) and found that Writeprints provided excellent classification performance, significantly outperforming SVM in many instances. Based on our results, we believe the visualization can assist law enforcement in identifying cyber criminals and also help users authenticate fellow online members in order to deter cyber deception.
منابع مشابه
Visualizing Multiple System Atrophy Studies Based on Collaboration Network and Centrality Indices in Web of Science Database
Introduction: Social network analysis is an analytical method based on graph theories that identifies relationships between individuals or factors to analyze the social structures resulted from those relationships. The objective of this study was to analyze co-authorship and co-word networks based on scientometric indicators and centrality measures in the studies on multiple atrophy system dise...
متن کاملVisualizing Multiple System Atrophy Studies Based on Collaboration Network and Centrality Indices in Web of Science Database
Introduction: Social network analysis is an analytical method based on graph theories that identifies relationships between individuals or factors to analyze the social structures resulted from those relationships. The objective of this study was to analyze co-authorship and co-word networks based on scientometric indicators and centrality measures in the studies on multiple atrophy system dise...
متن کاملAuthor Entropy: A Metric for Characterization of Software Authorship Patterns
We propose the concept of author entropy and describe how file-level entropy measures may be used to understand and characterize authorship patterns within individual files, as well as across an entire project. As a proof of concept, we compute author entropy for 28,955 files from 33 open-source projects. We explore patterns of author entropy, identify techniques for visualizing author entropy,...
متن کاملRecognizing contributions in wikis: Authorship categories, algorithms, and visualizations
Wikis are designed to support collaborative editing, without focusing on individual contribution, such that it is not straightforward to determine who contributed to a specific page. However, as wikis are increasingly adopted in settings such as business, government and education, where editors are largely driven by career goals, there is a perceived need to modify wikis so that each editor’s c...
متن کاملAuthorship Identification in Large Email Collections: Experiments Using Features that Belong to Different Linguistic Levels - Notebook for PAN at CLEF 2011
The aim of this paper is to explore the usefulness of using features from different linguistic levels to email authorship identification. Using various email datasets provided by PAN’11 lab we tested several feature groups in both authorship attribution and authorship verification subtasks. The selected feature groups combined with Regularized Logistic Regression and One-Class SVMmachine learni...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006