This is the third post of my data privacy series. Please refer to the first post for the definition of differential privacy and the second post for privacy mechanism.
Robustness of differential privacy versus processing
If a mechanism is
-differential private to begin with, then further operation on
is still
-differential private.
Proof:
Let’s assume the operation applying on the output of
is stochastic as deterministic operation is just a special case. Then, we have
(1)
where and
are “adjacent” data sets and
is due to the fact that
is
-differential private. Therefore,
is
-differential private.
Actually, the above property should be apparent from the beginning. Because if this is not satisfied, the definition of privacy is meaningless since the “private” information could disclose unauthorized information simply through some malicious processing.
On the other hand, if multiple mechanisms are sequentially explored, we expect that the privacy of the overall mechanism (as a sequential combination of the multiple access) should reduce. We have this very simple lemma as follows.
Lemma (Sequential Composition):
If we have two mechanisms and
are
– and
-differential private, respectively, the overall mechanism is
-differential private provided that the two mechanisms are independent.
Proof:
(2)
where and
are “adjacent” data sets.
One can of course generalize the above lemma to combination of many mechanisms by applying it repeatedly.
We can also compose multiple mechanisms in parallel where each member mechanism can only access a disjoint portion of the data set. Then, we have the following lemma.
Lemma (Parallel Composition):
If we have two mechanisms and
applying on disjoint subsets of a data set and
and
are
-differential private, then the overall mechanism is
-differential private provided that the member mechanisms are independent.
Proof:
Let and
are “adjacent” data sets. Since they are adjacent, we cannot have both
and
. Without loss of generality, assume
and
. Then,
(3)
Again, the parallel composition lemma can obviously be generalized to the case with more than two member mechanisms.
Example 1 (Disclosing gender population sizes of a group):
Consider that we have a data set containing only the gender attribute and we disclose the population sizes of each gender. To ensure the data set is differentially private, one can add laplacian noise to each population size before disclosing. Let and
denote the mechanism of outputing the population for the males and females, respectively. If each mechanism is
-differential private, the combined mechanism is also
-differential private. Note that we have parallel composition here since the gender populations are disjoint.
Example 2 (Disclosing disease population sizes of a group):
Consider a similar setting as the previous example but we try to disclose the population sizes of a number of diseases, e.g., AIDS, SARS, lung cancer, etc. Again, we add laplacian noise to each population size before disclosing such that each disclosing mechanism is -differentially private. Let say we consider
different diseases here and we will disclose the population size for each of them. The overall mechanism in this case will be
-differentially private instead of
-differentially private. Note that we have sequential composition here rather than parallel composition since the diseases populations are generally not disjoint (e.g., one can have both AIDS and lung cancer).