1 / 36

Collusion-Resistant Anonymous Data Collection Method

Collusion-Resistant Anonymous Data Collection Method. Mafruz Zaman Ashrafi See-Kiong Ng Institute for Infocomm Research Singapore. Introduction. Quality data is a pre-requisite to obtain good data mining results . Collecting good quality data requires efforts and money.

Download Presentation

Collusion-Resistant Anonymous Data Collection Method

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collusion-Resistant Anonymous Data Collection Method Mafruz Zaman Ashrafi See-Kiong Ng Institute for Infocomm Research Singapore

  2. Introduction • Quality data is a pre-requisite to obtain good data mining results. • Collecting good quality data requires efforts and money. • Internet is a convenient and low-cost platform for large-scale data collection.

  3. Some Motivating Examples

  4. Corporate Survey • A large organization wishes to poll its employees for sensitive information. • eg. How satisfied they are with their bosses’ management skills. • Individuals need to rate their bosses. • However, they are afraid of the price to pay for honesty.

  5. Health Information • A drug company wishes to find out adverse effects of a drug. • eg. Relationship between the effects of a drug with other drugs. • Patients need to disclose all the drugs they are taking. • However, disclosing drug info may reveal health condition.

  6. Traffic Monitoring • Individual drivers wish to avoid roads with problematic conditions. • eg. Find out the congested road intersections and other bottlenecks. • Individuals need to disclose their GPS info. • However, disclosing GPS info may reveal current position.

  7. Introduction Cont’d.. • However, collecting data online has its challenges. • Privacy is the number-one concern for online respondents. • Respondents are reluctant to provide truthful information if their privacy is not protected.

  8. Technical Challenges

  9. Objective: Online Data Collection Two Actors: Data Collector and Respondents - The data collector wants to obtain the responses from a set of respondents. - The respondents submit honest responses only if the data collector is unable to link a particular response and its respondent.

  10. Challenges • How does the data collector guarantee that it is unable to associatea particular response to the corresponding respondent? • How can a collusion attackbe mitigated? • How can an honest respondent pull outhis response without revealing it to the data collector if he finds a threat to his anonymity? • How can we reduce the computationaland communicationoverhead?

  11. Related Works • Randomized Response • - Respondents’ responses are associated with the result of the toss of a coin. • Only a respondent knows whether the answer reflects the toss of the coin or his true experience. • Pros: • A well-known technique. • Easy to use. • Cons: • Adds noise to the result in response set that could distort the accuracy of the data mining results.

  12. Related Works Cont’d… • 2. Cryptographic Techniques • - Respondents employ two sets of keys to encrypt their responses before sending to the data collector. • - Each respondent strips off a layer off encryption sequentially and shuffles decrypted results. • - All respondents verify the intermediate results before the data collector obtains the actual response set. • Pros: • - A deterministic technique. • - The data mining results are accurate. • Cons: • - Vulnerable against collusion attacks. • - Higher communication overhead.

  13. Building Blocks of Our Approach • ElGamal Crypto • - is aasymmetric public key encryption scheme. • - is a probabilistic encryption. • - achieves semantic security. • - is malleable. • Substitution Cipher • - Replace a character with another character. • - Example:

  14. The Hybrid Model • Employs both ElGamal and Substitution Cipher. • Builds an Onion for a response. • Removes encryption layer (De-Onion) will result • in the original response. An Onion Original response ElGamal Encryption An Onion Layer Substitution Cipher ElGamal Encryption

  15. The Hybrid Model Cont’d.. An example Onion De-Onion 2901560011 2901560011 7893456720 7893456720 Original response 9809364789 9809364789 1234567890 1234567890 Original response

  16. The Protocol

  17. The Protocol • The Protocol has five phases • Data Preparation • Data Submission • Anonymization • Verification • Decryption

  18. 6652 5436 7065 DM’s. Pri key 2309 3905 1039 8902 Bob’s Sec. key Bob’s Encrypted Response dBob 9081 2098 8893 2453 Alice’s Sec. key 8091 Carol’s Sec. key 7609 Phase I: Data Preparation Suppose there are 3 respondents (Alice, Bob and Carol). Bob’s Data Preparation Process Bob’s Original Response 1234

  19. Bob Alice Carol Bob … … WBob = 6652 4240 7056 bb … … Alice Carol … … Phase I: Data Preparation (cont’d..) Bob also computes an partial intermediate verification code WBob

  20. Phase II: Data Submission • Each participant submits an encrypted response i.e. and W to the data miner. • The Data Miner • Computes the verification code ΩC = WBobWAlice WCarol • Encrypts ΩC using its secondary key and sends the result in encrypted value to each participant. • Shuffles response set {d1 ,d2 ,d3 } = { ,,} • Sends {d1 ,d2 ,d3 } to Carol.

  21. Phase III: Anonymization - Carol “de-onions” one layer from each of the responses {d1 ,d2 ,d3 } . eg, Intermediate verification Substitution De-Cipher ElGamal Decryption ElGamal Decryption d’x 3905 7056 5607 8893

  22. Phase III: Anonymization (cont’d..) • … and computes intermediate verification Vcarol. Bob Alice Carol VCarol = 7809 2291 6790  VC Carol Alice …. …. …. …. …. …. Bob • Shuffles the results in set {d’y ,d’z ,d’x} = { , , } • Sends {d’y ,d’z ,d’x} to the Data Miner.

  23. Phase III: Anonymization (cont’d..) • The Data Miner sends the randomize set {d’y ,d’z ,d’x} to next participant (eg, Alice) • Similar to Carol, Alice also ‘de-onion’ one layer from each element of {d’y ,d’z ,d’x}. • Computes intermediate verification. • Shuffles the results in set {d’p ,d’q ,d’r}={ , , } • Sends {d’p ,d’q ,d’r} to the Data Miner.

  24. Phase III: Anonymization (cont’d..) • The data miner sends {d’p ,d’q ,d’r} to the last participant (i.e. Bob), who ‘de-onion’ another layer from this set. • Computes intermediate verification, shuffles the result in set‘S’={d’m ,d’n ,d’o} and sends Sto data miner.

  25. Phase IV: Verification • - Data miner computes the final secondary encryption value ‘R’ from S. • Sends ‘R’ along with its secondary secret key to all participants. • Bob, Alice and Carol decrypt intermediate verification code they received at Phase 2. • They also compute ΩV and check ΩV =ΩC • If ok, each of them sends their secondary secret key to the data miner.

  26. Phase V: Decryption • - Data miner uses the respondents’ secondary keys to strip off remaining encryption layers from S. • It uses its own primary key to strip off the final layer to reveal the original responses {….,1234,…..}.

  27. Results and Analysis

  28. Performance Analysis - Communication Overhead • Brickellet al.KDD 2006

  29. Complexity • - Computation • - Respondent’s, O(N) • - Data Miner, O(N2) • Communication • - Participant’s, O(N)

  30. Conclusion • The privacy of individual is an important issue in online data collection. • Ignoring respondents’ privacy will result in inaccuracy in the data. • Privacy-preserving online data collection must be (i) deterministic and (ii) efficient.

  31. Conclusion • Deterministic: We employ crypto techniques • Collusion Resistance: We incorporate onion/de-onion technique (using ElGama + Substitution) to create a protective layer against collusion • Efficiency: Verification is done on single values instead of entire datasets

  32. Thank you Q&A

  33. The Protocol cont’d.. Suppose there are 3 respondents (Alice, Bob and Carol). 1.Data Preparation (Bob’s) Bob’s Original Response Bob’s Sec. key Alice’s Sec. key DM’s. Pri key Carol’s Sec. key 1234 8902 2453 8091 7609 Bob’s Pri. key 2094 Alice’s Pri. key Substitution Cipher Bob’s Pri. key 4240 9081 1039 6652 Substitution Cipher Alice’s Pri. key Carol’s Pri. key 5607 Bob’s Encrypted Response dBob 7056 Substitution Cipher 3905 8893 Carol’s Pri. key • Bob generates a random numberθand computes ba = gθand bb= gθ+7609 • - Bob also generates WBob = 665242407056bb

  34. The Protocol cont’d.. Suppose there are 3 respondents (Alice, Bob and Carol). 1.Data Preparation (Bob’s) Bob’s Original Response Bob’s Sec. key Alice’s Sec. key DM’s. Pri key Carol’s Sec. key 1234 8902 2453 8091 7609 Bob’s Pri. key 2094 Alice’s Pri. key Substitution Cipher Bob’s Pri. key 4240 9081 1039 6652 Substitution Cipher Alice’s Pri. key Carol’s Pri. key 5607 Bob’s Encrypted Response dBob 7056 Substitution Cipher 3905 8893 Carol’s Pri. key • Bob generates a random numberθand computes ba = gθand bb= gθ+7609 • - Bob also generates WBob = 665242407056bb

  35. Related Works Cont’d… 3. Mixed Networks - Respondents send response to an intermediate hop. - Each hop strips off a layer of encryption, which allows them to obtain the next hop’s address and forward the result to it. - The process continues till the response reached to the data collector. Pros: - Require less communication overhead. Cons: - Probabilistic approach and only works well if all participants and honest. - Intermediate hops can collaborate to breach an honest respondent’s anonymity.

  36. The Hybrid Model Cont’d.. An example Original response Onion De-Onion 1234567890 2901560011 ElGamal Encryption ElGamal Decryption 9809364789 7893456720 Substitution Cipher Substitution De-cipher 7893456720 9809364789 Original response ElGamal Encryption ElGamal Decryption 2901560011 1234567890

More Related