Statistical Database Modeling for Privacy Preserving Database Generation
نویسندگان
چکیده
Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications which requires a large amount of representative data available.As testing over live production databases is often infeasible in many situations due to the high risks of disclosure of confidential information or incorrect updating of real data, in this paper we investigate the problem of generating synthetic database based on a-priori knowledge about production database. Our approach is to fit general location model using various characteristics (e.g., constraints, statistics, rules) extracted from production database and then generate synthetic data using model learnt. As characteristics extracted may contain information which may be used by attacker to derive some confidential information, we present a disclosure analysis method which is based on cell suppression technique. Our method is effective and efficient to remove aggregate private information during data generation.
منابع مشابه
An Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملSeparating indexes from data: a distributed scheme for secure database outsourcing
Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...
متن کاملPrivacy Preserving Data Generation for Database Application Performance Testing
Synthetic data plays an important role in software testing. In this paper, we initiate the study of synthetic data generation models for the purpose of application software performance testing. In particular, we will discuss models for protecting privacy in synthetic data generations. Within this model, we investigate the feasibility and techniques for privacy preserving synthetic database gene...
متن کاملA Survey on Preserving Privacy for Sensitive Association Rules in Databases
Privacy preserving data mining (PPDM) is a novel research area to preserve privacy for sensitive knowledge from disclosure. Many of the researchers in this area have recently made effort to preserve privacy for sensitive knowledge in statistical database. In this paper, we present a detailed overview and classification of approaches which have been applied to knowledge hiding in context of asso...
متن کاملPrivacy Preserving Data Mining Using Additive Perturbation on Relational Streaming Data
Data mining concerns with extracting the required important data from the database and ignoring the rest. With the success of data mining, privacy preservation has also acquired the great importance. The new concept privacy preserving data mining PPDM, concerns with preserving the privacy of sensitive individuals data. In this paper, privacy of sensitive attribute data concerned with individual...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005