1 / 29

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments

IEEE INFOCOM 2012, March Orlando, USA. Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments. Presenter: Qin Liu a,b Joint work with Chiu C. Tan b , Jie Wu b , and Guojun Wang a. a Central South University, China b Temple University, USA. 2012-3-26.

eelton
Download Presentation

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IEEE INFOCOM 2012, March Orlando, USA Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b, and Guojun Wanga a Central South University, China b Temple University, USA 2012-3-26

  2. Introduction

  3. Cloud Cloud Computing Model • Cloud computing as a new commercial paradigm enables users to outsource data to a cloud • Data is described by a set of keywords • Users retrieve files with a set of keywords F1: { A, B} A, B F2: {B, D} F1 F2 F3: {C, D} Bob … • Cloud will learn user’s search pattern and access pattern

  4. Cloud Private search (Ostrovsky et al, CRYPTO 2005) • Given a public dictionary that contains all keywords, e.g., dictionary=<A,B,C,D> F1: { A, B} F2: {B,D} F3: {C,D} … • key trick: map unmatched files to 0 [1] [1] [0] [0] [1] [1] [0] [0] • E(0)*E(0)=E(0+0)=E(0) • E(0)^F3=E(0*F3)=E(0) Bob F1NA F1 F2 0 NA F1 F2 0 NA • A compressed version of all files F1 F2 F3 • E(F2)* E(0) =E(F2) Homomorphic encryption E(x)*E(y) = E(x+y) E(x)^y = E(x*y) F2 0 • survival • collision • survival • unmatched

  5. Problem: Cost Grows Linearly • Processing each query is expensive. Given n users, the cloud needs to execute n queries • Performance bottleneck • Cloud will return all matched files, even if a user is interested in smaller percentage • Waste bandwidth

  6. Cloud Our Solutions: EIRQ Scheme Efficient Information retrieval for Ranked Query • A proxy server (ADL) is introduced between the users and the cloud (trusted) • Aggregate user queries • Distribute searching results • Support ranked query … ADL

  7. Cloud Rank queries • Queries are classified into ranks • ADL constructs a mask matrix • Cloud filters a certain percentage of matched files F1: { A, B} F2: {B, D} F3: {C, D} Rank-0 query: 100% Rank-1 query: 50% … {A, B} Rank 0 F1 F2 Alice Mask matrix {A, C} Rank 1 F1 F2 F3 F1 F3 ADL F3 is filtered with 50% Bob • Challenges: the cloud • Cannot know which files are filtered/returned • Cannot know each queries’ rank

  8. Scheme Description

  9. Intuition of EIRQ • Key techniques: • Construct a mask matrix to protect query ranks • Filter files without knowing which files are filtered User ADL Cloud Step 1: Keywords, rank QueryGen Matrix Construct Mask matrix Step 2: FileFilter Step 3: File Recovery Step 4: Buffer Certain percentage of files matching user keywords

  10. Goal • Queries are classified into 0,1,…,r-1 ranks. • Rank-i query retrieves (1-i/r) percentage of matched files … … … … Files that match rank 1 queries Files that match rank 0 queries Files that match rank i queries Filtered with probability 1/r Filtered with probability i/r Will not be filtered • The cloud • Cannot know which files are filtered/returned • Cannot know each queries’ rank

  11. Cloud Construct Mask Matrix • ADL constructs a maskmatrix that is encrypted with its publics key, and sends it to the cloud A [1] [1] {A, B} Rank 0 Alice B [1] [1] C [1] [0] Number of keywords {A, C} Rank 1 D ADL [0] [0] Bob … … Number of ranks, r=2 For a keyword: Number of 1s is determined by the rank of query it appears: r-i High rank takes over Ratio of 1s to r determines the probability of a file containing it to be returned: (r-i)/r High ratio takes over [0] [0]

  12. Cloud Filter Files The cloud chooses a random column for each file F1: { A, B} F2: {B, D} F3: {C, D} … For F3: 50% 50% E(0)*E(0)=E(0) E(0)*E(0)=E(0) E(0)^F3 =E(0) E(1)^ F3 =E(F3) buffer A [1] [1] B [1] [1] C [1] [0] … A file, matched rank i query, the probability to be filtered i/r D [0] [0] … … [0] [0] F1 and F2 will be returned F3 will be filtered with 50% ADL

  13. Evaluation

  14. Setup • Our simulations are conducted with MATLAB R2010a, running on a local machine with an Intel Core 2 Duo E8400 3.0 GHz CPU and 8 GB RAM. We summarize the parameters in Table.

  15. Percentage of Returned Files • Queries are classified into 0 to 3 ranks • Rank-0: 100% • Rank-1: 75% • Rank-2: 50% • Rank-3: 25% • Our results: • Rank-0: 100% • Rank-1: 75% • Rank-2: 52% • Rank-3: 29%

  16. Computation Cost • ADL: 14.8270s-14.8788s • EIRQ:14.8664s-14. 9269s

  17. Communication Cost Communication cost • EIRQ works better when only a few users • 5 users in each rank, 4 common keywords • EIRQ : 439KB buffer • ADL: 834KB buffer

  18. 1 2 3 An ADL is introduced to avoid performance bottleneck of the cloud EIRQ scheme allows the queries with higher rank to retrieve higher percentage of matched files Our solution protects access pattern, search pattern, and rank privacy from the cloud Conclusion

  19. Thank you!

  20. Background System Model Adversary Model Ostrovsky Scheme

  21. Cloud System model • Users in the organization send queries to ADL • ADL will aggregate user queries and query cloud with a combined query • Cloud will return the files matching the combined query to ADL • ADL distributes results to each user ADL Organization Users

  22. Adversary Model • ADL is assumed to be trusted by all users • Cloud is the only adversary • Honest but curious • Obey our schemes, but still want to know some additional information • Our goal is to protect from the cloud • Access pattern • Search pattern • Rank privacy: hiding the rank of each user query

  23. [1], [1], [0], [0], [0] Cloud Ostrovsky Scheme (CRYPTO 2005) F1 : A, B Alice F2 : B F3 : C Public dictionary: <A, B, C, D, E> Alice’s keywords: A, B Alice’s query is a string of 0s and 1s Encrypted using homomorphic encryption • Let E() be encryption • E(x)*E(y) = E(x+y) • E(x)^y = E(x*y)

  24. [1], [1], [0], [0], [0] Cloud Ostrovsky Scheme (CRYPTO 2005) F1 : A, B F2 : B F3 : C Alice’s query * The magic is that unmatched file F3 is processed to 0 [0] [2] [1] [2] ^F1 [1] ^F2 [0] ^F3 Alice’s Buffer [2,2* F1] [1, 1*F2] [0,0]

  25. Cloud Ostrovsky Scheme (CRYPTO 2005) [2,2* F1] [1,1*F2], [0,0] Alice Decrypts to obtain F2 directly F1 is obtained by dividing 2* F1 by 2 The buffer size only relates to the number of matched files

  26. Cloud Cloud Security • The cloud may leak user privacy • Searchable encryption • Will not reveal what the users are searching for (search pattern) • Will reveals whether two users are interested in the same files (access pattern) {A, B} F1: {A, B} F1 F2 Alice F2: {B} {A, C} F3: {C} F3 F1 Bob

  27. Cloud Construction of EIRQ • Step 1. Each user runs the QueryGen algorithm to send keywords and query rank to the ADL File 1: { A, B} Dictionary: <A, B, C, D> 0~2 ranks: Rank 0: 100% Rank 1: 50%, Rank 2: 0% File 2: {B} File 3: {C} A, B, Rank 1 Alice B, C, Rank 1 ADL Bob

More Related