Question 3
Consider the restricted Adult dataset “adult-rest.csv.” The dataset contains only the Race, Sex and Salary attributes. You have been tasked to release a differentially private variant of this dataset using the Stability-based Histogram algorithm (see Appendix). You choose = 1 and δ = 10−6. Let D be the initial dataset and let D0 be the differentially private dataset.
(a) Implement the algorithm in Python to obtain an output dataset D0.
(b) What is the “threshold” in Step 7 of the algorithm (see Appendix).
(c) Which points (rows) from D do not appear in D0 and why?
(d) For any point query qx, where x is a point in the domain, let |qx(D)−qx(D0)| be the absolute error between the original (true) answer from D and the noisy answer from D0. Which point in the domain has the largest error, and what is the error?
Appendix
4.1 Jaccard Index
Let A and B be two sets. Then their Jaccard index J is defined as: J(A,B) = |A∩B| |A∪B| , if A∪B is not empty, and J(A,B) = 0, if A∪B is empty. For example, if A = {1,2,3} and B = {2,3,4}, then: J(A,B) = 2 4 = 1 2 . Note that for any two sets A and B, 0 ≤ J(A,B) ≤ 1.
4.2 Stability-based Histogram Algorithm
This algorithm is taken from [BNS16, Vad17, BV18].
Algorithm 1: Stability-based Histogram Input: Domain X, dataset D ∈Xn, parameters and δ. 1 Randomly shuffle the rows of D. 2 Initialize Dout ←∅. 3 for each distinct point x ∈ D do 4 if qx(D) = 0 then set ax = 0. 5 if qx(D) > 0 then 6 Set ax ← qx(D) + Lap XXXXXXXXXXif ax <>δ/ + 1 then8 Set ax ← 0. 9 else 10 Set ax ← round(ax). 11 if ax > 0 then append ax copies of x to Dout. 12 Output Dout.
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.