A Robust Cybersecurity Topic Classification Tool
نویسندگان
چکیده
In this research, we use user defined labels from three internet text sources (Reddit, StackExchange, Arxiv) to train 21 different machine learning models for the topic classification task of detecting cybersecurity discussions in natural English text. We analyze false positive and negative rates each model’s cross validation experiments. Then present a Cybersecurity Topic Classification (CTC) tool, which takes majority vote trained as decision mechanism related also show that CTC tool provides lower on average than any individual models. is scalable hundreds thousands documents with wall clock time order hours.
منابع مشابه
Topic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملComputer Assisted Topic Classification
1 Running Head: COMPUTER ASSISTED TOPIC CLASSIFICATION PRE-PUBLICATION VERSION. Cite as JITP 4:4, Forthcoming. There are a few known changes to the colors of the graphs and there may be other editorial changes as suggested by the editors. Abstract Social scientists interested in mixed methods research have traditionally turned to human annotators to classify the documents or events used in thei...
متن کاملGe.tracker: a Robust, Lightweight Topic Tracking System
We describe a topic tracking system developed at GE R&D Center in connection with our participation in DARPA TDT evaluations. The TDT tracking task is specified as follows: given Nt training news stories on a topic, the system must find all subsequent stories on the same topic in all tracked news sources. These sources include radio and television news broadcasts, as well as newswire feeds. The...
متن کاملGETracker3: A Robust, Lightweight Topic Tracking System
We describe a topic tracking system developed at GE Corporate R&D Center in connection with our participation in DARPA TDT3 evaluations. The TDT tracking task is specified as follows: Given Nt training news stories on a topic, the system must find all subsequent stories on the same topic in all tracked news sources. These sources include radio and television news broadcasts, as well as newswire...
متن کاملTopic Classification for Suicidology
Computational techniques for topic classification can support qualitative research by automatically applying labels in preparation for qualitative analyses. This paper presents an evaluation of supervised learning techniques applied to one such use case, namely, that of labeling emotions, instructions and information in suicide notes. We train a collection of one-versus-all binary support vecto...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International journal of network security and applications
سال: 2022
ISSN: ['0975-2307', '0974-9330']
DOI: https://doi.org/10.5121/ijnsa.2022.14101