Scaling Software Security Analysis to Millions of Malicious Programs and Billions of Lines of Code

نویسنده

Jiyong Jang

چکیده

Software security is a big data problem. The volume of new software artifacts created far outpaces the current capacity of software analysis. This gap has brought an urgent challenge to our security community—scalability. If our techniques cannot cope with an ever increasing volume of software, we will always be one step behind attackers. Thus developing scalable analysis to bridge the gap is essential. In this dissertation, we argue that automatic code reuse detection enables an efficient data reduction of a high volume of incoming malware for downstream analysis and enhances software security by efficiently finding known vulnerabilities across large code bases. In order to demonstrate the benefits of automatic software similarity detection, we discuss two representative problems that are remedied by scalable analysis: malware triage and unpatched code clone detection. First, we tackle the onslaught of malware. Although over one million new malware are reported each day, existing research shows that most malware are not written from scratch; instead, they are automatically generated variants of existing malware. When groups of highly similar variants are clustered together, new malware more easily stands out. Unfortunately, current systems struggle with handling this high volume of malware. We scale clustering using feature hashing and perform semantic analysis using co-clustering. Our evaluation demonstrates that these techniques are an order of magnitude faster than previous systems and automatically discover highly correlated features and malware groups. Furthermore, we design algorithms to infer evolutionary relationships among malware, which helps analysts understand trends over time and make informed decisions about which malware to analyze first. Second, we address the problem of detecting unpatched code clones at scale. When buggy code gets copied from project to project, eventually all projects will need to be patched. We call clones of buggy code that have been fixed in only a subset of projects unpatched code clones. Unfortunately, code copying is usually ad-hoc and is often not

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trust Relationship Modeling for Software Assurance

Software assurance, as practiced through the Common Criteria, is a mixture of processes, heuristics, and lessons learned from earlier failures. At the other end of the spectrum, formal methods establish rigorous mathematical properties of portions of code. By themselves, neither of these practices are scalable to software systems with millions or billions of lines of code. We propose a framewor...

متن کامل

A Mutual Authentication Method for Internet of Things

Today, we are witnessing the expansion of various Internet of Things (IoT) applications and services such as surveillance and health. These services are delivered to users via smart devices anywhere and anytime. Forecasts show that the IoT, which is controlled online in the user environment, will reach 25 billion devices worldwide by 2020. Data security is one of the main concerns in the IoT. ...

متن کامل

Mapping of McGraw Cycle to RUP Methodology for Secure Software Developing

Designing a secure software is one of the major phases in developing a robust software. The McGraw life cycle, as one of the well-known software security development approaches, implements different touch points as a collection of software security practices. Each touch point includes explicit instructions for applying security in terms of design, coding, measurement, and maintenance of softwar...

متن کامل

Towards More Effective Virus Detectors

Viruses (or malware) are a scourge, with potentially unlimited fraudulent uses. Smart viruses can hide, mutate and disable detection methods. Computers are an important part of everyday life to many people across the world. The Internet has revolutionized everyday life. The Internet has also brought an ugly side of computers: a plethora of malware. Home computers are most vulnerable to attacks ...

متن کامل

DyVSoR: dynamic malware detection based on extracting patterns from value sets of registers

To control the exponential growth of malware files, security analysts pursue dynamic approaches that automatically identify and analyze malicious software samples. Obfuscation and polymorphism employed by malwares make it difficult for signature-based systems to detect sophisticated malware files. The dynamic analysis or run-time behavior provides a better technique to identify the threat. In t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Scaling Software Security Analysis to Millions of Malicious Programs and Billions of Lines of Code

نویسنده

چکیده

منابع مشابه

Trust Relationship Modeling for Software Assurance

A Mutual Authentication Method for Internet of Things

Mapping of McGraw Cycle to RUP Methodology for Secure Software Developing

Towards More Effective Virus Detectors

DyVSoR: dynamic malware detection based on extracting patterns from value sets of registers

عنوان ژورنال:

اشتراک گذاری