Table of Contents
Creating a anonymized dataset is commonly known to be a really hard product to build since pseudonymization of identifiers is not a good option since it can be compared to open source data and then the pseudonymes can eventually be cracked.
It is however of importance to be able to learn useful information from a database but not at the cost of the privacy from people registered in the database.
Here comes the interest of a differential privacy preserveing database.
The database client only support count aggregation function such as:
SELECT COUNT(*)
FROM db
WHERE movie = [queried movie name]
AND rating >= [queried rating level]This is an example of how you may give instructions:
querier = DpQuerier("imdb-dp.csv", privacy_budget=1)
count = querier.get_count("Seven Samurai", rating_threshold=3, epsilon=0.25)Some libraries are required
- numpy
- termcolor
- scipy
- attr
Distributed under the MIT License. See LICENSE for more information.
Raphaël Reis Nunes - raphael.reisnunes@epfl.ch
Project Link
This project is a homework from the EPFL course COM-402