Towards a private vector space model for confidential documents

ثبت نشده
چکیده

We introduce in this paper a method to anonymize document vector spaces. These vector spaces can be used to analyze confidential documents without disclosing private information. The method is inspired in microaggregation, a popular technique used in statistical disclosure control. URL http://doi.acm.org/10.1145/2480362.2480543 [9] Source URL: https://www.iiia.csic.es/en/node/54488 Links [1] https://www.iiia.csic.es/en/staff/daniel-abril [2] https://www.iiia.csic.es/en/staff/guillermo-navarro-arribas [3] https://www.iiia.csic.es/en/staff/vicen%C3%A7-torra [4] https://www.iiia.csic.es/en/bibliography?f[author]=699 [5] https://www.iiia.csic.es/en/bibliography?f[keyword]=565 [6] https://www.iiia.csic.es/en/bibliography?f[keyword]=566 [7] https://www.iiia.csic.es/en/bibliography?f[keyword]=567 [8] https://www.iiia.csic.es/en/bibliography?f[keyword]=461 [9] http://doi.acm.org/10.1145/2480362.2480543

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

Spherical microaggregation: Anonymizing sparse vector spaces

Abstract Unstructured texts are a very popular data type and still widely unexplored in the privacy preserving data mining field. We consider the problem of providing public information about a set of confidential documents. To that end we have developed a method to protect a Vector Space Model (VSM), to make it public even if the documents it represents are private. This method is inspired by ...

متن کامل

Dynamic Anonymous Index for Confidential Data

In this paper we introduce a k-anonymous vector space model, which can be used as an index of a set of confidential documents. This model allows to index, for example, encrypted data. New documents can be added or removed while maintaining the k-anonymity property of the vector space. URL http://dx.doi.org/10.1007/978-3-642-54568-9_23 [10] Source URL: https://www.iiia.csic.es/en/node/54461 Link...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017