Scaling Software Security Analysis to Millions of Malicious Programs and Billions of Lines of Code
نویسنده
چکیده
Software security is a big data problem. The volume of new software artifacts created far outpaces the current capacity of software analysis. This gap has brought an urgent challenge to our security community—scalability. If our techniques cannot cope with an ever increasing volume of software, we will always be one step behind attackers. Thus developing scalable analysis to bridge the gap is essential. In this dissertation, we argue that automatic code reuse detection enables an efficient data reduction of a high volume of incoming malware for downstream analysis and enhances software security by efficiently finding known vulnerabilities across large code bases. In order to demonstrate the benefits of automatic software similarity detection, we discuss two representative problems that are remedied by scalable analysis: malware triage and unpatched code clone detection. First, we tackle the onslaught of malware. Although over one million new malware are reported each day, existing research shows that most malware are not written from scratch; instead, they are automatically generated variants of existing malware. When groups of highly similar variants are clustered together, new malware more easily stands out. Unfortunately, current systems struggle with handling this high volume of malware. We scale clustering using feature hashing and perform semantic analysis using co-clustering. Our evaluation demonstrates that these techniques are an order of magnitude faster than previous systems and automatically discover highly correlated features and malware groups. Furthermore, we design algorithms to infer evolutionary relationships among malware, which helps analysts understand trends over time and make informed decisions about which malware to analyze first. Second, we address the problem of detecting unpatched code clones at scale. When buggy code gets copied from project to project, eventually all projects will need to be patched. We call clones of buggy code that have been fixed in only a subset of projects unpatched code clones. Unfortunately, code copying is usually ad-hoc and is often not
منابع مشابه
Trust Relationship Modeling for Software Assurance
Software assurance, as practiced through the Common Criteria, is a mixture of processes, heuristics, and lessons learned from earlier failures. At the other end of the spectrum, formal methods establish rigorous mathematical properties of portions of code. By themselves, neither of these practices are scalable to software systems with millions or billions of lines of code. We propose a framewor...
متن کاملA Mutual Authentication Method for Internet of Things
Today, we are witnessing the expansion of various Internet of Things (IoT) applications and services such as surveillance and health. These services are delivered to users via smart devices anywhere and anytime. Forecasts show that the IoT, which is controlled online in the user environment, will reach 25 billion devices worldwide by 2020. Data security is one of the main concerns in the IoT. ...
متن کاملMapping of McGraw Cycle to RUP Methodology for Secure Software Developing
Designing a secure software is one of the major phases in developing a robust software. The McGraw life cycle, as one of the well-known software security development approaches, implements different touch points as a collection of software security practices. Each touch point includes explicit instructions for applying security in terms of design, coding, measurement, and maintenance of softwar...
متن کاملTowards More Effective Virus Detectors
Viruses (or malware) are a scourge, with potentially unlimited fraudulent uses. Smart viruses can hide, mutate and disable detection methods. Computers are an important part of everyday life to many people across the world. The Internet has revolutionized everyday life. The Internet has also brought an ugly side of computers: a plethora of malware. Home computers are most vulnerable to attacks ...
متن کاملDyVSoR: dynamic malware detection based on extracting patterns from value sets of registers
To control the exponential growth of malware files, security analysts pursue dynamic approaches that automatically identify and analyze malicious software samples. Obfuscation and polymorphism employed by malwares make it difficult for signature-based systems to detect sophisticated malware files. The dynamic analysis or run-time behavior provides a better technique to identify the threat. In t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015