1 / 64

Bhiksha Raj Pittsburgh, PA 15213 21 April 2011

Hearing without Listening Collaborators: Manas Pathak (CMU), Shantanu Rane (MERL), Paris Smaragdis (UIUC) and Madhusudana Shashanka (UTC), Jose Portelo (INESC). Bhiksha Raj Pittsburgh, PA 15213 21 April 2011. About Me. Bhiksha Raj Assoc. Prof in LTI Areas: Computational Audition

lixue
Download Presentation

Bhiksha Raj Pittsburgh, PA 15213 21 April 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hearing without ListeningCollaborators: Manas Pathak (CMU), Shantanu Rane (MERL), Paris Smaragdis (UIUC) and Madhusudana Shashanka (UTC), Jose Portelo (INESC) Bhiksha Raj Pittsburgh, PA 15213 21 April 2011

  2. About Me • Bhiksha Raj • Assoc. Prof in LTI • Areas: • Computational Audition • Speech recognition • Was at Mitsubishi Electric Research prior to CMU • Led research effort on speech processing

  3. A Voice Problem • Call center records customer response • Thousands of hours of customer calls • Must be mined for improved response to market conditions and improved customer response • Audio records carry confidential information • Social security number • Names, addresses • Cannot be revealed!! Call center mans calls relatingto various company servicesand products

  4. Mining Audio • Can be dealt with as a problem of keyword spotting • Must find words/phrases such as “problem”, “disk controller” • Data holder willing to provide list of terms to search for • Must not discover anything that was not specified! • E.g. Social security number, names, addresses, other identifying information • Trivial solution: Ship keyword spotter to data holder • Unsatisfactory • Does not prevent operator of spotter from having access to private information • Work on encrypted data?

  5. A more important problem • ABC NEWS Oct 2008 • Inside Account of U.S. Eavesdropping on Americans Despite pledges by President George W. Bush and American intelligence officials to the contrary, hundreds of US citizens overseas have been eavesdropped on as they called friends and family back home...

  6. The Problem • Security: • NSA must monitor call for public safety • Caller may be a known miscreant • Call may relate to planning subversive activity • Privacy: • Terrible violation of individual privacy • The gist of the problem: • NSA is possibly looking for key words or phrases • Did we hear “bomb the pentagon”?? • Or if some key people are calling in • Was that Ayman al Zawahiri’s voice? • But must have access to all audio to do so • Similar to the mining problem

  7. Abstracting the problem • Data holder willing to provide encrypted data • A locked box • Mining entity willing to provide encrypted keywords • Sealed packages • Must find if keywords occur in data • Find if contents of sealed packages are also present in the locked box • Without unsealing packages or opening the box! • Data are spoken audio

  8. Privacy Preserving Voice Processing • Problems are examples of need for privacy preserving voice processing algorithms • The majority of the world’s communication has historically been done by voice • Voice is considered a very private mode of communication • Eavesdropping is a social no-no • Many places that allow recording of images in public disallow recording of voice! • Yet little current research on how to maintain the privacy of voice ..

  9. The History of Privacy Preserving Technologies for Voice

  10. The History of Counter Espionage Techniques against Private Voice Communication

  11. Voice Database Privacy: Not Covered • Can “link” uploaders of video to Flickr, youtube etc., by matching the audio recordings • “Matching the uploaders of videos across accounts” • Lei, Choi, Janin, Friedland, ICSI Berkeley • More generally, can identify common “anonymous” speakers across databases • And, through their links to other identified subjects, can possibly identify them as well.. • Topic for another day

  12. Automatic Speech Processing Technologies • Lexical content comprehension • Recognition • Determining the sequence of words that was spoken • Keyterm spotting • Detecting if specified terms have occurred in a recording • Biometrics and non-lexical content recognition • Identification • Identifying the speaker • Verification/Authentication • Confirming that a speaker is who he/she claims to be

  13. Audio Problems involve Pattern Matching • Biometrics: • Speaker verification / authentication: • Binary classification • Does the recording belong to speaker X or not • Speaker identification: • Multi-class classification • Who (if) from a given known set is the speaker • Recognition: • Key-term spotting: • Binary classification • Is the current segment of audio an instance of the keyword • Continuous speech recognition • Multi- or infinite- class classification

  14. Simplistic View of Voice Privacy: Learning what is not intended • User is willing to permit the system to have access to the intended result, but not anything else • Recognition/spotting System allowed access to “text” content of message • But system should not be able to identify the speaker, his/her mood etc • Biometric system allowed access to identity of speaker • But should not be able to learn the text content as well • E.g. discover a passphrase

  15. Signal Processing Solutions to the simple problems • Voice is produced by a source filter mechanism • Lungs are the source • Vocal tract is the filter • Mathematical models allow us to separate the two • The source contains much of the information about the speaker • The filter contains information about the sounds

  16. Eliminating Speaker Identity • Modifying the Pitch • Lowering it makes the voice somewhat unrecognizable • Of course, one can undo this • Voice morphing • Map the entire speech (source and filter) to a neutral third voice • Even so, the cadence gives identity away

  17. Keep the voice, corrupt the content • Much tougher • The “excitation” also actually carries spoken content • Cannot destroy content without corrupting the excitation

  18. Signal Processing is not the Solution • Often no “acceptable to expose” data • Voice recognition: • Medical transcription. HIPAA rules • Court records • Police records • Keyword spotting: Signal processing inapplicable • Cannot modify only undesired portions of data • Biometrics: • Identification: Data holder may not know who is being identified (NSA) • Verification/Authentication – many problems

  19. What follows • A related problem: • Song matching • Two voice matching problems • Keyword spotting • Voice authentication • Cryptographic approach • An alternative approach • Issues and questions • Caveat: Have squiggles (cannot be avoided) • Will be kept to a minimum

  20. A related problem • Alice has just found a short piece of music on the web • Possibly from an illegal site! • She likes it. She would like to find out the name of the song

  21. Alice and her song • Bob has a large, organized catalogue of songs • Simple solution: • Alice sends her song snippet to Bob • Bob matches it against his catalogue • Returns the ID of the song that has the best match to the snippet

  22. What Bob does • Bob uses a simple correlation-based procedure • Receives snippet W = w[0]..w[N] • For each song S • At each lag t • Finds Correlation(W,S,t) = Si w[i] s[t+i] • Score(S) = maximum correlation: maxt Correlation(W,S,t) • Returns song with largest correlation. S1 S2 S3 SM

  23. What Bob does S1 W S2 S3 SM • Bob uses a simple correlation-based procedure • Receives snippet W = w[0]..w[N] • For each song S • At each lag t • Finds Correlation(W,S,t) = Si w[i] s[t+i] • Score(S) = maximum correlation: maxt Correlation(W,S,t) • Returns song with largest correlation.

  24. What Bob does • Bob uses a simple correlation-based procedure • Receives snippet W = w[0]..w[N] • For each song S • At each lag t • Finds Correlation(W,S,t) = Si w[i] s[t+i] • Score(S) = maximum correlation: maxt Correlation(W,S,t) • Returns song with largest correlation. S1 W S2 S3 SM

  25. What Bob does • Bob uses a simple correlation-based procedure • Receives snippet W = w[0]..w[N] • For each song S • At each lag t • Finds Correlation(W,S,t) = Si w[i] s[t+i] • Score(S) = maximum correlation: maxt Correlation(W,S,t) • Returns song with largest correlation. S1 W S2 S3 Corr(W, S1, 0) SM

  26. What Bob does • Bob uses a simple correlation-based procedure • Receives snippet W = w[0]..w[N] • For each song S • At each lag t • Finds Correlation(W,S,t) = Si w[i] s[t+i] • Score(S) = maximum correlation: maxt Correlation(W,S,t) • Returns song with largest correlation. S1 W S2 S3 Corr(W, S1, 0) SM Corr(W, S1, 1)

  27. What Bob does • Bob uses a simple correlation-based procedure • Receives snippet W = w[0]..w[N] • For each song S • At each lag t • Finds Correlation(W,S,t) = Si w[i] s[t+i] • Score(S) = maximum correlation: maxt Correlation(W,S,t) • Returns song with largest correlation. W S1 S2 S3 Corr(W, S1, 0) SM Corr(W, S1, 1) Corr(W, S1, 2)

  28. Relationship to Speech Recognition • Words are like Alice’s snippets. • Word spotting: Find locations in a recording where patterns that match the word occur • Computing “match” score is analogous to finding the correlation with an (or the best of many) examples of the word • Continuous speech recognition: Repeated word spotting

  29. Alice has a problem • Her snippet may have been illegally downloaded • She may go to jail if Bob sees it • Bob may be the DRM police..

  30. Solving Alice’s Problem • Alice could encrypt her snippet and send it to Bob • Bob could work entirely on the encrypted data • And obtain an encrypted result to return to Alice! • A job for SFE • Except we wont use a circuit

  31. Correlation is the root of the problem • The basic operation performed is a correlation • Corr(W,S,t) = Si w[i] s[t+i] • This is actually a dot product: • W = [w[0] w[2] .. w[N]]T • St = [s[t] s[t+1] .. s[t+N]]T • Corr(W,S,t) = W.St • Bob can compute Encrypt[Corr(W,S,t)] if Alice sends him Encrypt[W] • Assumption: All data are integers • Encryption is performed over large enough finite fields to hold all the numbers

  32. Paillier Encryption • Alice uses Paillier Encryption • Random P,Q • N = PQ • G = N+1 • L = (P-1)(Q-1) • M = L-1 mod N • Public key = (N, G), Private Key = (L,M) • Encrypt message m from (0…N-1): • Enc[m] = gmrNmod N2 • Dec[ u ] = { [ (uL mod N2) – 1]/N}M mod N • Important property: • Enc [m] == gm • Enc[m] . Enc[v]= gmgv = gm+v = Enc[m+v] • (Enc[m] )v= Enc[vm]

  33. Solving Alice’s Problem • Alice generates public and private keys • She encrypts her snippet using her public key and sends it to Bob: • Alice  Bob : Enc[W] = Enc[w[0]], Enc[w[2]], … Enc[w[N]] • To compute the dot product with any snippet s[t]..s[t+N] • Bob computes Pi Enc[w[i]]s[t+i] = Enc[Si w[i]s[t+i]] = Enc[Corr(W,S,t)] • Bob can compute the encrypted correlations from Alice’s encrypted input without needing even the public key • The above technique is the Secure Inner Product (SIP) protocol

  34. Solving Alice’s Problem - 2 • Given the encrypted snippet Enc[W] from Alice • Bob computes Enc[Corr(S,W,t)] for all S, and all t • i.e. he computes the correlation of Alice’s snippet with all of his songs at all lags • The correlations are encrypted, however • Bob only has to find which Corr(S,W,t) is the highest • And send the metadata for the corresponding S to Alice • Except, Bob cannot find argmaxS maxt Corr(S,W,t) • He only has encrypted correlation values

  35. Bob Solves the Problem • Bob ships the encrypted correlations back to Alice • She can decrypt them all, find the maximum value, and send the index back to Bob • Bob retrieves the song corresponding to the index • Problem – Bob effectively sends his catalogue over to Alice • Problems for Bob • Alice may be a competitor pretending to be a customer • She is phishing to find out what Bob has in his catalogue • She can find out what songs Bob has by comparing the correlations against those from a catalog of her own • Even if Bob sends the correlations in permuted order

  36. The SMAX protocol • Alice sends her public key to Bob • Bob adds a random number to each correlation • Enc[Corrnoisy(W,S,t)] = Enc[Corr(W,s,t)]*Enc[r] = Enc[Corr(W,s,t) + r] • A different random number corrupts each correlation • Bob permutes the noisy correlations and sends the permuted correlations to Alice • Bob  Alice P(Enc[Corrnoisy(W,S,t)]) for all S,t • P is a permutation • Alice can decrypt them to get P(Corrnoisy(W,S,t)) • Bob retains P (-r), i.e. the permuted set of random numbers

  37. The SMAX protocol • Bob has P (-r), Alice has P(Corr(W,s,t)+r) • Bob generates a public/private key, encrypts P (-r), and sends it along with his public key to Alice • Bob  Alice: EncBob[P (-r)] • Alice encrypts the noisy correlations she has and “adds” them to what she got from Bob: • EncBob[P(Corr(W,s,t)+r)]*EncBob[P (-r)] = P EncBob[(Corr(W,s,t))] • Alice now has encrypted and permuted correlations.

  38. The SMAX protocol • Alice now has encrypted and permuted correlations. • PEncBob[Corr(W,s,t)] • Alice adds a random number q : • P EncBob[Corr(W,s,t)]*EncBob[q] = P EncBob[Corr(W,s,t)+q] • The same number q corrupts everything • Alice permutes everything with her own permutation scheme and ships it all back to Bob: • Alice  Bob : PAlice P EncBob[Corr(W,s,t)+q]

  39. The SMAX protocol • Bob has : PAlice P EncBob[Corr(W,s,t)+q] • Bob decrypts them and finds the index of the largest value Imax • Bob unpermutesImax and sends the result to Alice: • Bob  Alice P-1Imax • Alice unpermutes it with her own permutation to get the true index of the song in Bob’s catalogue • PAlice-1P-1Imax = Idx • The above technique is insecure; in reality, the process is more involved • Alice finally has the index in Bob’s catalogue for her song

  40. Retrieving the Song • Alice cannot simply send the index of the best song to Bob • It will tell Bob which song it is • The song may not be available for public download • i.e. Alice’s snippet is illegal • i.e.

  41. Retrieving the Song • Alice cannot simply send the index of the best song to Bob • It will tell Bob which song it is • The song may not be available for public download • i.e. Alice’s snippet is illegal • i.e.

  42. Retrieving Alice’s song: OT • Alice sends Enc[Idx] to Bob (and public key) • Alice  Bob : Enc[Idx] • For each song Si, Bob computes • Enc[M(Si)]*(Enc[-i]*Enc[idx])ri = Enc[M(Si) + ri(Idx-i)] • Bob sends this to Alice for every song • Bob  Alice : Enc[M(Si) + ri(idx-i)] • M(Si) is the metadata for Si • Alice only decrypts the idx-th entry to get M(Sidx) • Decrypting the rest is pointless • Decrypting any other entry will give us M(Si) + ri(idx-i)

  43. Unrealistic solution • The proposed method is computationally unrealistic • Public key encryption is very slow • Repeated shipping of entire catalog • Can be simplified to one shipping of catalog • Still inefficient • Illustrates a few primitives • SIP: Secure inner product; computing dot products of private data • SMAX: Finding the index of the largest entry without revealing entries • Also possible – SVAL: Finding the largest entry (and nothing else) • OT as a mechanism for retrieving information • Have ignored a few issues..

  44. Security Picture • Assumption: Honest but curious • What to they gain by being Malicious? • Bob: Manipulating numbers • Can prevent Alice from learning what she wants • Bob learns nothing of Alice’s snippet • Alice: • Can prevent herself from learning the identity of her song • Key Assumption: • Sufficient if malicious activity does not provide immediate benefit to agent

  45. Pattern matching for voice: Basics • The basic approach includes • A feature computation module • Derives features from the audio • A pattern matching module • Stores “models” for the various classes • Assumption: Feature computation performed by entity holding voice data FeatureComputation PatternMatching

  46. Pattern matching in Audio: The basic tools • Gaussian mixtures • Class distributions are generally modelled as Gaussian mixtures • “Score” computation is equivalent to computing log likelihoods • Rather than computing correlations • Hidden Markov models • Time series models are usually Hidden Markov models • With Gaussian mixture distributions as state output densities • Conceptually not substantially more complex than Gaussian mixtures

  47. Computing a log Gaussian • The Gaussian likelihood for a vector X, given a Gaussian with mean m and covariance Q is given by • The log Gaussian is given by • With a little arithmetic manipulation, we can write

  48. Computing a log Gaussian • Which can be further modified to • is a vector containing all the components of W • is a vector containing all pairwise products • In other words, the log Gaussian is an inner product • If Bob has parameters and Alice has data , they can compute the log Gaussian without exposing the data using SIP • By extension, can compute log Mixture Gaussians privately

  49. Keyword Spotting • Very similar to song matching • The “snippet” is now the keyword • A “Song” is now a recording • Instead of finding a best match, we must find every recording in which a “correlation” score exceeds a threshold • Complexity is comparable to song retrieval • Actually worse • We will use HMMs instead of sample sequences and correlations are replaced by best-state-sequence scores

  50. A Biometric Problem: Authentication • Passwords and text-based forms of authentication are physically insecure • Passwords may be stolen, guessed etc. • Biometric authentication is based on “non-stealable” signatures • Fingerprints • Iris • Voice • We will ignore the problem of voice mimicry..

More Related