230 likes | 459 Views
Keyword search on encrypted data. Keyword search problem. Linux utility: grep Information retrieval Basic operation Advanced operations – relevance analysis and ranking Search engines highly complicated problem. New settings. Search data in the cloud Filter encrypted emails
E N D
Keyword search problem • Linux utility: grep • Information retrieval • Basic operation • Advanced operations – relevance analysis and ranking • Search engines • highly complicated problem
New settings • Search data in the cloud • Filter encrypted emails • Privacy preserving log retrieval
Basic techniques • Symmetric encryption • Public key encryption • Simple keyword matching • A little bit relevance evaluation
Secure keyword search with symmetric encryption • Paper: Song 2000 • Seed is random, different for • each Wi • Key idea: Li and Ri are self- • verifiable • Advantage of XOR
Setting of ki • Ki = Fk’(Wi), k’ is secret • User publishes W and k = Fk’(W) • Server checks CiW whether <Li, Fk(Li)> == CiW It reveals nothing if Ci is not the ciphertext for W. And Li is random for different Wi – server cannot find any information from Li.
Hidden search • In previous schemes, W is revealed • Weakness: each search will have to release k for W • Easy to collect information • Solution: encrypt Wi with an private key, then xor with <Li, Fk(Li)> • Still weaknesses • Wi encryption should be deterministic • Access pattern is leaked • Linear scan over the whole doc collection
Typical method for speedy keyword based search • Using the “inverted index” Word -> doc1:pos, doc2:pos,… Or simply word -> doc1, doc2, … However, inverted index reveals the word frequency
Recent developments • Reza 2006 • “Searchable symmetric encryption: improved definitions and efficient constructions” • Completely solved this problem, with a solution indistinguishability under chosen ciphertext attack (IND-CCA) • Allow inverted index • Hide word frequency
setup • D – the set of documents {D1,…,Dn} • max - the maximum number of distinct words in a document • Li – the list of document IDs that contain the keyword w_i , plus some dummy entries to reach max • A – array contains all elements in Li (max * |D|) • T – table that contains the <wi, address of Li’s first node>)
Symmetric encryption function, encrypt words and document ids • id(Dj) for wi entry is encoded as enc(wi||j) to make indistinguishable • Pseudo-random function f • Two pseudo-random permutation functions • : for mapping word to table entry • : for mapping index to next node of Li to the index of array A
Building the index table T 1. The key used to encrypt the node Ni,1 2. to random values of the same size of the existing entries
Generating Li with Ki,0, We can decrypt all nodes in the list For the remaining max – |D(wi)| dummy nodes, store the doc id that Already appears in the first |D(wi)| entries. This can be done with the help of a look-up table I
Search • Generate the trapdoor • Search
Property • Each keyword search returns the same number of encrypted document ids – the attacker cannot distinguish word frequency
Search public-key encrypted data • Users who encrypt the data (with public key) can be different from the owner of the private key
Cyclic group • For example, if G = { g0, g1, g2, g3, g4, g5 } mod p is a group, then g6 = g0, and G is cyclic. • p is the order • g is the generator
Bilinear-map construction • Two groups G1 G2 of prime order p • A bilinear map : G1 X G1 -> G2 • Properties: