Table Servers: Protecting Confidentiality in Tabular Data Releases

نویسندگان

  • Alan F. Karr
  • Adrian Dobra
  • Ashish P. Sanil
چکیده

Introduction. Federal statistical agencies must balance concern over confidentiality of data with their obligation to report information to the public [5]. Advances in information technology threaten confidentiality, but also new technologies can protect confidentiality while meeting user needs in innovative ways. Here we describe table servers being developed by the National Institute of Statistical Sciences (NISS) that disseminate tabular summaries of statistical data in response to user queries for marginal sub-tables of a large (e.g., 40 dimensions with 4 categories each) contingency table containing counts or sums. Table servers evaluate disclosure risk dynamically, in light of previously answered queries. Abstractions. The query space Q, which contains all 2K sub-tables of a K -way table, is partially ordered by set inclusion of variables in subtables. The set R(t) of all tables released through some time t contains direct releases in response to queries and indirect releases (previously unreleased children of direct releases); R(t) is specified by the released frontier RF (t) of its maximal elements (Figure 2). Underlying dynamic release decisions is a risk criterion RC defined on subsets of Q: at all times the system must satisfy RC(R(t)) ≤ α, where α is a risk threshold set by the operators. A typical risk criterion is accuracy of bounds based on R(t) for sensitive (small count) cells in the full table. Bounds can be computed using network methods [3, 9] and the “shuttle algorithm” [2]. There are also exact techniques for special cases. For example, if the released sub-tables constitute the minimal sufficient statistics of a decomposable graphical model [8] (Figure 1), then bounds can be expressed as explicit functions of these sub-tables [4]. Whenever an answered query releases previously unreleased information, other queries become unanswerable. Consequently (Figure 2), at t there is an unreleasable set U(t) of sub-tables whose release would be too risky, with an unreleasable frontier UF (t) of its minimal elements. Release rules determine which requests for unreleased tables will be fulfilled. The simplest is the myopic rule of releasing T at t as long as RC(R(t) ∪ T ) ≤ α. To prevent the table server from taking excessively large steps, one can allow only tables adding but one variable to a previously released table to be eligible for release. To prevent a single user (or a set of colluding users) from driving the table server into a region of Q that suits their needs but not those of other users, release rules can biased against releases that add large numbers of tables to U(t). Rules can also incorporate the value of releasing T [6, 12].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Table Servers : Protecting Confidentiality in Tabular

Introduction. Federal statistical agencies must balance concern over confidentiality of data with their obligation to report information to the public [5]. Advances in information technology threaten confidentiality , but also new technologies can protect confidentiality while meeting user needs in innovative ways. Here we describe table servers being developed by the National Institute of Stat...

متن کامل

Optimal Tabular Releases from Confidential Data

We describe and illustrate NISS-developed optimal tabular release technology, which releases sets of sub-tables of large contingency tables that maximize data utility (in our examples, the number of sub-tables released) subject to a constraint on disclosure risk (tightness of bounds on small-count, risky cells in the underlying table). This approach explicitly accommodates the mandate of Federa...

متن کامل

Software Systems for Tabular Data Releases

We describe two classes of software systems that release tabular summaries of an underlying database. Table servers respond to user queries for (marginal) sub-tables of the “full” table summarizing the entire database, and are characterized by dynamic assessment of disclosure risk, in light of previously answered queries. Optimal tabular releases are static releases of sets of sub-tables that a...

متن کامل

Cyclic Perturbation: Protecting Confidentiality While Preserving Data Utility in Tabular Data

When disseminating data on individuals, information organizations must balance the interests of data users in better access and the interests of data providers in confidentiality. Cyclic perturbation is a new method for protecting sensitive data in categorical tables. In the disseminated data product, the true table values are altered in a way that preserves the table’s marginal totals. Further...

متن کامل

The Bureau of Transportation Statistics’ Statistical Disclosure Limitation Method for Tabular Data: a Review

Overview The United States Department of Transportation’s Bureau of Transportation Statistics (BTS) is developing its confidentiality policy which is based on its legislative mandate (49 U.S.C. 111(i)) to protect individually identifiable information. Because the field of statistical disclosure limitation (SDL) research is still evolving, BTS wants to take advantage of the latest SDL research i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001