2. Example 1: Checking Equality
• Two large files at two different locations.
• Are they identical?
– By communicating only a small amount of
information!
3. Checking Equality
The Challenge
• Two large numbers N1 and N2 , n bits each
• Communication allowed: m<<n bits
• Possible?
4. Checking Equality
Impossibility
• Suppose the communication is based on N1 alone
• m<<n,
– Two different N1’s will have the same m-bit communication
pattern
– Switch N2 from one to another (YES->NO)
5. Checking Equality
Randomized Algorithms
• Communicate N1 mod M for some number M
• If N1 = N2 then you always get YES
• If N1 != N2 then you get YES if M divides N1 - N2
6. Checking Equality
Analysis
• Probability N1 != N2 but M divides N1 - N2 ?
• Probability over what?
• M and not N1,N2
• Choose M at random in the range 1..2m
7. Checking Equality
Analysis
• How many factors does N1 - N2 have?
– N1 - N2 <= 2n, so (2n)1/log n
• If we choose M randomly in the range 1..2 (2n)1/log n
– Probability N1 != N2 but M divides N1 - N2 <= 1/2
– So m is ~ n/log n bits (minor gains)
8. Checking Equality
Use Prime Numbers
• How many prime factors does N1 - N2 have?
– N1 - N2 <= 2n, so 2n/log n
• If we choose M to be a random prime in 1..4n
– There are at least 4n/log 4n > 4n/log(4n) primes
– Probability N1 != N2 but M divides N1 - N2 <= ~ 1/2
– So m is ~ log n bits (major gains)
9. Checking Equality
The Solution
• Two large numbers N1 and N2 , n bits each
• log n bits of communication
– Remainder w.r.t random prime in range 1..4n
• Error Prob < 1/2
10. Checking Equality
Reducing Error Prob
• Repeat k times
• Communication is klog n bits
• Error prob < (½)k
11. Checking Equality
Example Numbers
• 10GB file, n=1010
• Desired Error Prob 10-30
• Communication 99 * 33 = 3267 bits = 400 bytes
If 10 billion people do 10 billion checks a day, the prob
that even one of the checks is erroneous is 1/10
billion
12. Another Example
PCA
• Fit a line thru 0 to a
collection of points so as
to maximize sum of
squares of projections
13. PCA
Random Sampling
• Too many points?
• Pick a random sample
– The fitting line doesn’t
change too much?
14. PCA
Random Sampling
• How should you sample
here?
15. Puzzle
Checking Matrix Products
• Given three matrices A and BC, check if A=BC?
– mod p for simplicity
• Matrices are n*n
• Easy to do in n3 time
• Can you do better?
16. Puzzle
Checking Matrix Products
• Given three matrices A and BC, check if A=BC?
• Matrices are n*n
• Easy to do in n3 time
• Can you do better?