Composition of privacy mechanisms

This is the third post of my data privacy series. Please refer to the first post for the definition of differential privacy and the second post for privacy mechanism.

Robustness of differential privacy versus processing

If a mechanism K is \epsilon-differential private to begin with, then further operation on K is still \epsilon-differential private.

Proof:

Let’s assume the operation f(\cdot) applying on the output of K is stochastic as deterministic operation is just a special case. Then, we have

(1)   \begin{align*} Pr(F(K(D)) \in S ) & = \sum_f Pr(F(\cdot)=f) Pr(f(K(D)) \in S) \\ & = \sum_f Pr(F(\cdot)=f) Pr(K(D) \in f^{-1}(S)) \\ & \overset{(a)}{\le} \sum_f Pr(F(\cdot)=f) e^\epsilon Pr(K(D') \in f^{-1}(S)) \\ & = \exp(\epsilon) \sum_f Pr(F(\cdot)=f) Pr(f(K(D')) \in S) \\ & = \exp(\epsilon) Pr(F(K(D')) \in S), \end{align*}

where D and D' are “adjacent” data sets and (a) is due to the fact that K(\cdot) is \epsilon-differential private. Therefore, F(K(\cdot)) is \epsilon-differential private.

Actually, the above property should be apparent from the beginning. Because if this is not satisfied, the definition of privacy is meaningless since the “private” information could disclose unauthorized information simply through some malicious processing.

On the other hand, if multiple mechanisms are sequentially explored, we expect that the privacy of the overall mechanism (as a sequential combination of the multiple access) should reduce. We have this very simple lemma as follows.

Lemma (Sequential Composition):

If we have two mechanisms K_1 and K_2 are \epsilon_1– and \epsilon_2-differential private, respectively, the overall mechanism is \epsilon_1+\epsilon_2-differential private provided that the two mechanisms are independent.

Proof:

(2)   \begin{align*} & Pr((K_1(D), K_2(D)) \in (S_1,S_2)) \nonumber \\ =& Pr(K_1(D) \in S_1) Pr(K_2(D)\in S_2) \\ \le& e^{\epsilon_1+\epsilon_2} Pr(K_1(D') \in S_1) Pr(K_2(D')\in S_2) \\ =& e^{\epsilon_1+\epsilon_2} Pr((K_1(D'), K_2(D')) \in (S_1,S_2)), \end{align*}

where D and D' are “adjacent” data sets.

One can of course generalize the above lemma to combination of many mechanisms by applying it repeatedly.

We can also compose multiple mechanisms in parallel where each member mechanism can only access a disjoint portion of the data set. Then, we have the following lemma.

Lemma (Parallel Composition):

If we have two mechanisms K_1 and K_2 applying on disjoint subsets of a data set and K_1 and K_2 are \epsilon-differential private, then the overall mechanism is \epsilon-differential private provided that the member mechanisms are independent.

Proof:

Let D=(D_1,D_2) and D'=(D'_1,D'_2) are “adjacent” data sets. Since they are adjacent, we cannot have both D_1\neq D'_1 and D_2 \neq D'_2. Without loss of generality, assume D_1 \neq D'_1 and D_2 = D'_2. Then,

(3)   \begin{align*} & Pr((K_1(D_1), K_2(D_2)) \in (S_1,S_2)) \nonumber \\ =& Pr(K_1(D_1) \in S_1) Pr(K_2(D_2)\in S_2) \\ \le& \left( e^{\epsilon} Pr(K_1(D'_1) \in S_1) \right) Pr(K_2(D'_2)\in S_2) \\ =& e^{\epsilon} Pr((K_1(D'_1), K_2(D'_2)) \in (S_1,S_2)), \end{align*}

Again, the parallel composition lemma can obviously be generalized to the case with more than two member mechanisms.

Example 1 (Disclosing gender population sizes of a group):

Consider that we have a data set containing only the gender attribute and we disclose the population sizes of each gender. To ensure the data set is differentially private, one can add laplacian noise to each population size before disclosing. Let K_1(D) and K_2(D) denote the mechanism of outputing the population for the males and females, respectively. If each mechanism is \epsilon-differential private, the combined mechanism is also \epsilon-differential private. Note that we have parallel composition here since the gender populations are disjoint.

Example 2 (Disclosing disease population sizes of a group):

Consider a similar setting as the previous example but we try to disclose the population sizes of a number of diseases, e.g., AIDS, SARS, lung cancer, etc. Again, we add laplacian noise to each population size before disclosing such that each disclosing mechanism is \epsilon-differentially private. Let say we consider n different diseases here and we will disclose the population size for each of them. The overall mechanism in this case will be n \epsilon-differentially private instead of \epsilon-differentially private. Note that we have sequential composition here rather than parallel composition since the diseases populations are generally not disjoint (e.g., one can have both AIDS and lung cancer).

Leave a Reply

Your email address will not be published. Required fields are marked *