Private Topic Modeling

نویسندگان

  • Mijung Park
  • James R. Foulds
  • Kamalika Chaudhuri
  • Max Welling
چکیده

We develop a privatised stochastic variational inference method for Latent Dirichlet Allocation (LDA). The iterative nature of stochastic variational inference presents challenges: multiple iterations are required to obtain accurate posterior distributions, yet each iteration increases the amount of noise that must be added to achieve a reasonable degree of privacy. We propose a practical algorithm that overcomes this challenge by combining: (1) A relaxed notion of the differential privacy, called concentrated differential privacy, which provides high probability bounds for cumulative privacy loss, which is well suited for iterative algorithms, rather than focusing on single-query loss; and (2) privacy amplification resulting from subsampling of large-scale data. Focusing on conjugate exponential family models, in our private variational inference, all the posterior distributions will be privatised by simply perturbing expected sufficient statistics. Using Wikipedia data, we illustrate the effectiveness of our algorithm for large-scale data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Distributed LDA-based Topic Modeling and Topic Agglomeration in a Latent Space

We describe the methodology that we followed to automatically extract topics corresponding to known events provided by the SNOW 2014 challenge in the context of the SocialSensor project. A data crawling tool and selected filtering terms were provided to all the teams. The crawled data was to be divided in 96 (15-minute) timeslots spanning a 24 hour period and participants were asked to produce ...

متن کامل

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

Private cryptocurrency versus central bank digital money: Evolutionary game theory modeling of the distribution of Seigniorage Shares

When the monopoly of money creation is removed and private money can be exchanged between people, the issue of Seigniorage share will arise, which is currently conceivable with the advent of cryptocurrencies. The question of the present study is that if we are in a situation where private cryptocurrencies along with money are common in the society with the state publisher, what share of the Sei...

متن کامل

Policy-Based Automation of Dynamique and Multipoint Virtual Private Network Simulation on OPNET Modeler

The simulation of large-scale networks is a challenging task especially if the network to simulate is the Dynamic Multipoint Virtual Private Network, it requires expert knowledge to properly configure its component technologies. The study of these network architectures in a real environment is almost impossible because it requires a very large number of equipment, however, this task is feasible...

متن کامل

Modeling the operational risk in Iranian commercial banks: case study of a private bank

The Basel Committee on Banking Supervision from the Bank for International Settlement classifies banking risks into three main categories including credit risk, market risk, and operational risk. The focus of this study is on the operational risk measurement in Iranian banks. Therefore, issues arising when trying to implement operational risk models in Iran are discussed, and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1609.04120  شماره 

صفحات  -

تاریخ انتشار 2016