SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Lecture 4
Barna Saha
AT&T-Labs Research
September 19, 2013
Outline
Heavy Hitter Continued
Frequency Moment Estimation
Dimensionality Reduction
Heavy Hitter
Heavy Hitter Problem: For 0 < < φ < 1 find a set of
elements S including all i such that fi > φm and there is no
element in S with frequency ≤ (φ − )m.
Count-Min sketch guarantees: fi ≤ ˆfi ≤ fi + m with
probability ≥ 1 − δ in space e
log 1
(φ− )δ .
Insert only: Maintain a min-heap of size k = 1
φ− , when an
item arrives estimate frequency and if above φm include it in
the heap. If heap size more than k, discard the minimum
frequency element in the heap.
Heavy Hitter
Turnstile model:
Maintain dyadic intervals over binary search tree and maintain
log n count-min sketch with using space e
log 2 log n
δ(φ− ) one for
each level.
At every level at most 1
φ heavy hitters.
Estimate frequency of children of the heavy hitter nodes until
leaf-level is reached.
Return all the leaves with estimated frequency above φm.
Analysis
At most 2
φ− nodes at every level is examined.
Each true frequency > (φ − )m with probability at least
1 − δ(φ− )
2 log n .
By union bound all true frequencies are above (φ − )m with
probability at least 1 − δ.
l2 frequency estimation
|fi − ˆfi | ≤ ± f 2
1 + f 2
2 + ....f 2
n [Count-sketch]
F2 = f 2
1 + f 2
2 + ....f 2
n
How do we estimate F2 in small space ?
AMS-F2 Estimation
H = {h : [n] → {+1, −1}} four-wise independent hash
functions
Maintain Zj = Zj + ahj (i) on arrival of (i, a) for
j = 1, ..., t = c
2
Return Y = 1
t
t
j=1 Z2
j
Analysis
Zj = n
i=1 fi hj (i)
E Zj = 0, E Z2
j = F2.
Var Z2
j = E Z4
j − (E Zj )2 ≤ 4F2
2 .
E Y = F2. Var Y = 1
t2
t
j=1 Var(Z2
j ) = 4 2
c F2
2
By Chebyshev Inequality Pr |Y − E Y | > F2 ≤ 4
c
Boosting by Median
Keep Y1, Y2, ...Ys, s = O(log 1δ)
Return A = median(Y1, Y2, .., Ys)
By Chernoff bound Pr |A − F2| > F2 < δ
Linear Sketch
Algorithm maintains a linear sketch [Z1, Z2, ...., Zt]x = Rx
where R is a t × n random matrix with entries {+1, −1}.
Use Y = ||Rx||2
2 to estimate t||x|2
2. t = O( 1
2 ).
Streaming algorithm operating in the sketch model can be
viewed as dimensionality reduction technique.
Dimensionality Reduction
Streaming algorithm operating in the sketch model can be
viewed as dimensionality reduction technique.
stream S: point in n dimensional space, want to compute l2(S)
sketch operator can be viewed as an approximate embedding
of ln
2 to sketch space C such that
1. Each point in C can be described using only small number
(say m) of numbers so C ⊂ Rm
and
2. value of l2(S) is approximately equal to F(C(S)).
F(Y1, Y2, ..Yt) = median(Y1, Y2, .., Yt)
Dimensionality Reduction
F(Y1, Y2, ..Yt) = median(Y1, Y2, .., Yt)
Disadvantage: F is not a norm–performing any nontrivial
operations in the sketch space (e.g. clustering, similarity
search, regression etc.) becomes difficult.
Can we embed from ln
2 to lm
2 , m << n approximately
preserving the distance ? Johnson-Lindenstrauss Lemma
Interlude to Normal Distribution
Normal distribution N(0, 1):
Range (−∞, ∞)
Density f (x) = e−x2
/
√
2π
Mean=0, Variance=1
Basic facts
If X and Y are independent random variables with normal
distribution then so is X + Y
If X and Y are independent with mean 0 then
E [X + Y ]2 = E X2 + E Y 2
E cX = cE X , Var cX = c2Var X
A Different Linear Sketch
Instead of ±1 let ri be a i.i.d. random variable from N(0, 1).
Consider Z = i ri xi
E Z2 = E ( i ri xi )2 = i E r2
i x2
i = i Var ri x2
i =
i x2
i = ||x||2
2.
As before we maintain Z = [Z1, Z2, ..., Zt] and define
Y = ||Z||2
2
E Y = t||x||2
2
We show that there exists constant C > 0 s.t. for small
enough > 0
Pr |Y − t||x||2
2| > t||x||2
2 ≤ e−C 2t
(JL lemma)
set t = O( 1
2 log 1
δ )
Johnson Lindenstrauss Lemma
Lemma
For any 0 < epsilon < 1 and any integer m, let t be a positive
integer such that
t >
4 ln m
2/2 + 3/3
Then for any set V of m points in Rn, there is a map f : Rn → Rt
such that for all u and v ∈ V ,
(1 − )||u − v||2
2 ≤ ||f (u) − f (v)||2
2 ≤ (1 + )||u − v||2
2.
Furthermore this map can be found in randomized polynomial time.

Más contenido relacionado

La actualidad más candente

Newton’s Divided Difference Formula
Newton’s Divided Difference FormulaNewton’s Divided Difference Formula
Newton’s Divided Difference FormulaJas Singh Bhasin
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common SubsequenceSwati Swati
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common SubsequenceKrishma Parekh
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common SubsequenceSyeda
 
Longest common subsequence(dynamic programming).
Longest common subsequence(dynamic programming).Longest common subsequence(dynamic programming).
Longest common subsequence(dynamic programming).munawerzareef
 
Naville's Interpolation
Naville's InterpolationNaville's Interpolation
Naville's InterpolationVARUN KUMAR
 
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...BRNSS Publication Hub
 
Applications of Numerical Functional Analysis in Atomless Finite Measure Spaces
Applications of Numerical Functional Analysis in Atomless Finite Measure SpacesApplications of Numerical Functional Analysis in Atomless Finite Measure Spaces
Applications of Numerical Functional Analysis in Atomless Finite Measure SpacesQUESTJOURNAL
 
Cs229 notes7b
Cs229 notes7bCs229 notes7b
Cs229 notes7bVuTran231
 
Functions of several variables
Functions of several variablesFunctions of several variables
Functions of several variableslord
 
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...RyotaroTsukada
 
Numerical Differentiation and Integration
 Numerical Differentiation and Integration Numerical Differentiation and Integration
Numerical Differentiation and IntegrationMeenakshisundaram N
 
Partial Differentiation & Application
Partial Differentiation & Application Partial Differentiation & Application
Partial Differentiation & Application Yana Qlah
 
CHAIN RULE AND IMPLICIT FUNCTION
CHAIN RULE AND IMPLICIT FUNCTIONCHAIN RULE AND IMPLICIT FUNCTION
CHAIN RULE AND IMPLICIT FUNCTIONNikhil Pandit
 
Complex Variables and Numerical Methods
Complex Variables and Numerical MethodsComplex Variables and Numerical Methods
Complex Variables and Numerical MethodsDhrumit Patel
 

La actualidad más candente (20)

Newton’s Divided Difference Formula
Newton’s Divided Difference FormulaNewton’s Divided Difference Formula
Newton’s Divided Difference Formula
 
Imc2017 day1-solutions
Imc2017 day1-solutionsImc2017 day1-solutions
Imc2017 day1-solutions
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common Subsequence
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common Subsequence
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common Subsequence
 
Longest common subsequence(dynamic programming).
Longest common subsequence(dynamic programming).Longest common subsequence(dynamic programming).
Longest common subsequence(dynamic programming).
 
Section3 stochastic
Section3 stochasticSection3 stochastic
Section3 stochastic
 
Naville's Interpolation
Naville's InterpolationNaville's Interpolation
Naville's Interpolation
 
Complex varible
Complex varibleComplex varible
Complex varible
 
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
 
Applications of Numerical Functional Analysis in Atomless Finite Measure Spaces
Applications of Numerical Functional Analysis in Atomless Finite Measure SpacesApplications of Numerical Functional Analysis in Atomless Finite Measure Spaces
Applications of Numerical Functional Analysis in Atomless Finite Measure Spaces
 
Cs229 notes7b
Cs229 notes7bCs229 notes7b
Cs229 notes7b
 
Numerical method
Numerical methodNumerical method
Numerical method
 
Functions of several variables
Functions of several variablesFunctions of several variables
Functions of several variables
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
 
Numerical Differentiation and Integration
 Numerical Differentiation and Integration Numerical Differentiation and Integration
Numerical Differentiation and Integration
 
Partial Differentiation & Application
Partial Differentiation & Application Partial Differentiation & Application
Partial Differentiation & Application
 
CHAIN RULE AND IMPLICIT FUNCTION
CHAIN RULE AND IMPLICIT FUNCTIONCHAIN RULE AND IMPLICIT FUNCTION
CHAIN RULE AND IMPLICIT FUNCTION
 
Complex Variables and Numerical Methods
Complex Variables and Numerical MethodsComplex Variables and Numerical Methods
Complex Variables and Numerical Methods
 

Destacado

Развитие интеграционных процессов бизнеса и учреждений профобразования
Развитие интеграционных процессов бизнеса и учреждений профобразованияРазвитие интеграционных процессов бизнеса и учреждений профобразования
Развитие интеграционных процессов бизнеса и учреждений профобразованияAtner Yegorov
 
Pautova doklad о молодежи
Pautova doklad о  молодежиPautova doklad о  молодежи
Pautova doklad о молодежиAtner Yegorov
 
Формирование кадров через стажировки
Формирование кадров через стажировкиФормирование кадров через стажировки
Формирование кадров через стажировкиAtner Yegorov
 
S1 mobile based-smart-life-v2.5-owen-j_nipa
S1 mobile based-smart-life-v2.5-owen-j_nipaS1 mobile based-smart-life-v2.5-owen-j_nipa
S1 mobile based-smart-life-v2.5-owen-j_nipaAtner Yegorov
 
доклад кудряшов слайды для презентации
доклад кудряшов слайды для презентациидоклад кудряшов слайды для презентации
доклад кудряшов слайды для презентацииAtner Yegorov
 
Эскизный проект ул.Б.Хмельницкого г.Чебоксары
Эскизный проект ул.Б.Хмельницкого г.ЧебоксарыЭскизный проект ул.Б.Хмельницкого г.Чебоксары
Эскизный проект ул.Б.Хмельницкого г.ЧебоксарыAtner Yegorov
 
Meroprijatija centra 1348719387-1
Meroprijatija centra 1348719387-1Meroprijatija centra 1348719387-1
Meroprijatija centra 1348719387-1Atner Yegorov
 
Дистанционное обучение в Чувашии
Дистанционное обучение в ЧувашииДистанционное обучение в Чувашии
Дистанционное обучение в ЧувашииAtner Yegorov
 
2 release q4_2010_rus_f2
2 release q4_2010_rus_f22 release q4_2010_rus_f2
2 release q4_2010_rus_f2Atner Yegorov
 
курсы для беременных
курсы для беременныхкурсы для беременных
курсы для беременныхAtner Yegorov
 
Подготовка специалистов в университете Дубна
Подготовка специалистов в университете ДубнаПодготовка специалистов в университете Дубна
Подготовка специалистов в университете ДубнаAtner Yegorov
 
Развитие туризма в Чувашии 2011-2016
Развитие туризма в Чувашии 2011-2016Развитие туризма в Чувашии 2011-2016
Развитие туризма в Чувашии 2011-2016Atner Yegorov
 
Биотехнологический кластер в Чувашии
Биотехнологический кластер в ЧувашииБиотехнологический кластер в Чувашии
Биотехнологический кластер в ЧувашииAtner Yegorov
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг ЧувашииAtner Yegorov
 
Путь к Европейскому БИО Сертификату (EG) 834/2007 и доступ к рынку
Путь к Европейскому БИО Сертификату (EG) 834/2007 и доступ к рынкуПуть к Европейскому БИО Сертификату (EG) 834/2007 и доступ к рынку
Путь к Европейскому БИО Сертификату (EG) 834/2007 и доступ к рынкуAtner Yegorov
 
маунтинбайк
маунтинбайкмаунтинбайк
маунтинбайкAtner Yegorov
 
Профиль здоровья г.чебоксары
Профиль здоровья г.чебоксарыПрофиль здоровья г.чебоксары
Профиль здоровья г.чебоксарыAtner Yegorov
 

Destacado (19)

Развитие интеграционных процессов бизнеса и учреждений профобразования
Развитие интеграционных процессов бизнеса и учреждений профобразованияРазвитие интеграционных процессов бизнеса и учреждений профобразования
Развитие интеграционных процессов бизнеса и учреждений профобразования
 
Pautova doklad о молодежи
Pautova doklad о  молодежиPautova doklad о  молодежи
Pautova doklad о молодежи
 
Формирование кадров через стажировки
Формирование кадров через стажировкиФормирование кадров через стажировки
Формирование кадров через стажировки
 
S1 mobile based-smart-life-v2.5-owen-j_nipa
S1 mobile based-smart-life-v2.5-owen-j_nipaS1 mobile based-smart-life-v2.5-owen-j_nipa
S1 mobile based-smart-life-v2.5-owen-j_nipa
 
доклад кудряшов слайды для презентации
доклад кудряшов слайды для презентациидоклад кудряшов слайды для презентации
доклад кудряшов слайды для презентации
 
Эскизный проект ул.Б.Хмельницкого г.Чебоксары
Эскизный проект ул.Б.Хмельницкого г.ЧебоксарыЭскизный проект ул.Б.Хмельницкого г.Чебоксары
Эскизный проект ул.Б.Хмельницкого г.Чебоксары
 
Meroprijatija centra 1348719387-1
Meroprijatija centra 1348719387-1Meroprijatija centra 1348719387-1
Meroprijatija centra 1348719387-1
 
Дистанционное обучение в Чувашии
Дистанционное обучение в ЧувашииДистанционное обучение в Чувашии
Дистанционное обучение в Чувашии
 
2 release q4_2010_rus_f2
2 release q4_2010_rus_f22 release q4_2010_rus_f2
2 release q4_2010_rus_f2
 
курсы для беременных
курсы для беременныхкурсы для беременных
курсы для беременных
 
Подготовка специалистов в университете Дубна
Подготовка специалистов в университете ДубнаПодготовка специалистов в университете Дубна
Подготовка специалистов в университете Дубна
 
Развитие туризма в Чувашии 2011-2016
Развитие туризма в Чувашии 2011-2016Развитие туризма в Чувашии 2011-2016
Развитие туризма в Чувашии 2011-2016
 
Биотехнологический кластер в Чувашии
Биотехнологический кластер в ЧувашииБиотехнологический кластер в Чувашии
Биотехнологический кластер в Чувашии
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг Чувашии
 
Путь к Европейскому БИО Сертификату (EG) 834/2007 и доступ к рынку
Путь к Европейскому БИО Сертификату (EG) 834/2007 и доступ к рынкуПуть к Европейскому БИО Сертификату (EG) 834/2007 и доступ к рынку
Путь к Европейскому БИО Сертификату (EG) 834/2007 и доступ к рынку
 
550l rab
550l rab550l rab
550l rab
 
маунтинбайк
маунтинбайкмаунтинбайк
маунтинбайк
 
Book 11
Book 11Book 11
Book 11
 
Профиль здоровья г.чебоксары
Профиль здоровья г.чебоксарыПрофиль здоровья г.чебоксары
Профиль здоровья г.чебоксары
 

Similar a Lec 4-slides

Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave functionjiang-min zhang
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
Actuarial Science Reference Sheet
Actuarial Science Reference SheetActuarial Science Reference Sheet
Actuarial Science Reference SheetDaniel Nolan
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt msFaeco Bot
 
Refresher probabilities-statistics
Refresher probabilities-statisticsRefresher probabilities-statistics
Refresher probabilities-statisticsSteve Nouri
 
University of manchester mathematical formula tables
University of manchester mathematical formula tablesUniversity of manchester mathematical formula tables
University of manchester mathematical formula tablesGaurav Vasani
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化Akira Tanimoto
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) inventionjournals
 
Generating Chebychev Chaotic Sequence
Generating Chebychev Chaotic SequenceGenerating Chebychev Chaotic Sequence
Generating Chebychev Chaotic SequenceCheng-An Yang
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 

Similar a Lec 4-slides (20)

Lecture5
Lecture5Lecture5
Lecture5
 
Imc2016 day2-solutions
Imc2016 day2-solutionsImc2016 day2-solutions
Imc2016 day2-solutions
 
Section5 stochastic
Section5 stochasticSection5 stochastic
Section5 stochastic
 
Section2 stochastic
Section2 stochasticSection2 stochastic
Section2 stochastic
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave function
 
Section4 stochastic
Section4 stochasticSection4 stochastic
Section4 stochastic
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
Actuarial Science Reference Sheet
Actuarial Science Reference SheetActuarial Science Reference Sheet
Actuarial Science Reference Sheet
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
2.1 Calculus 2.formulas.pdf.pdf
2.1 Calculus 2.formulas.pdf.pdf2.1 Calculus 2.formulas.pdf.pdf
2.1 Calculus 2.formulas.pdf.pdf
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
presentazione
presentazionepresentazione
presentazione
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
 
Refresher probabilities-statistics
Refresher probabilities-statisticsRefresher probabilities-statistics
Refresher probabilities-statistics
 
University of manchester mathematical formula tables
University of manchester mathematical formula tablesUniversity of manchester mathematical formula tables
University of manchester mathematical formula tables
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
 
Generating Chebychev Chaotic Sequence
Generating Chebychev Chaotic SequenceGenerating Chebychev Chaotic Sequence
Generating Chebychev Chaotic Sequence
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 

Más de Atner Yegorov

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Atner Yegorov
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Atner Yegorov
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Atner Yegorov
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Atner Yegorov
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015Atner Yegorov
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыAtner Yegorov
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Atner Yegorov
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанAtner Yegorov
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, ТатарстанAtner Yegorov
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистикиAtner Yegorov
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИAtner Yegorov
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Atner Yegorov
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Atner Yegorov
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициативаAtner Yegorov
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ Atner Yegorov
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизацииAtner Yegorov
 
Город Innopolis
Город InnopolisГород Innopolis
Город InnopolisAtner Yegorov
 
презентация1
презентация1презентация1
презентация1Atner Yegorov
 
сквер дк салют
сквер дк салютсквер дк салют
сквер дк салютAtner Yegorov
 

Más de Atner Yegorov (20)

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум Чебоксары
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, Татарстан
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, Татарстан
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистики
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИ
 
Relap
RelapRelap
Relap
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициатива
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизации
 
Город Innopolis
Город InnopolisГород Innopolis
Город Innopolis
 
презентация1
презентация1презентация1
презентация1
 
сквер дк салют
сквер дк салютсквер дк салют
сквер дк салют
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Lec 4-slides

  • 1. Lecture 4 Barna Saha AT&T-Labs Research September 19, 2013
  • 2. Outline Heavy Hitter Continued Frequency Moment Estimation Dimensionality Reduction
  • 3. Heavy Hitter Heavy Hitter Problem: For 0 < < φ < 1 find a set of elements S including all i such that fi > φm and there is no element in S with frequency ≤ (φ − )m. Count-Min sketch guarantees: fi ≤ ˆfi ≤ fi + m with probability ≥ 1 − δ in space e log 1 (φ− )δ . Insert only: Maintain a min-heap of size k = 1 φ− , when an item arrives estimate frequency and if above φm include it in the heap. If heap size more than k, discard the minimum frequency element in the heap.
  • 4. Heavy Hitter Turnstile model: Maintain dyadic intervals over binary search tree and maintain log n count-min sketch with using space e log 2 log n δ(φ− ) one for each level. At every level at most 1 φ heavy hitters. Estimate frequency of children of the heavy hitter nodes until leaf-level is reached. Return all the leaves with estimated frequency above φm. Analysis At most 2 φ− nodes at every level is examined. Each true frequency > (φ − )m with probability at least 1 − δ(φ− ) 2 log n . By union bound all true frequencies are above (φ − )m with probability at least 1 − δ.
  • 5. l2 frequency estimation |fi − ˆfi | ≤ ± f 2 1 + f 2 2 + ....f 2 n [Count-sketch] F2 = f 2 1 + f 2 2 + ....f 2 n How do we estimate F2 in small space ?
  • 6. AMS-F2 Estimation H = {h : [n] → {+1, −1}} four-wise independent hash functions Maintain Zj = Zj + ahj (i) on arrival of (i, a) for j = 1, ..., t = c 2 Return Y = 1 t t j=1 Z2 j
  • 7. Analysis Zj = n i=1 fi hj (i) E Zj = 0, E Z2 j = F2. Var Z2 j = E Z4 j − (E Zj )2 ≤ 4F2 2 . E Y = F2. Var Y = 1 t2 t j=1 Var(Z2 j ) = 4 2 c F2 2 By Chebyshev Inequality Pr |Y − E Y | > F2 ≤ 4 c
  • 8. Boosting by Median Keep Y1, Y2, ...Ys, s = O(log 1δ) Return A = median(Y1, Y2, .., Ys) By Chernoff bound Pr |A − F2| > F2 < δ
  • 9. Linear Sketch Algorithm maintains a linear sketch [Z1, Z2, ...., Zt]x = Rx where R is a t × n random matrix with entries {+1, −1}. Use Y = ||Rx||2 2 to estimate t||x|2 2. t = O( 1 2 ). Streaming algorithm operating in the sketch model can be viewed as dimensionality reduction technique.
  • 10. Dimensionality Reduction Streaming algorithm operating in the sketch model can be viewed as dimensionality reduction technique. stream S: point in n dimensional space, want to compute l2(S) sketch operator can be viewed as an approximate embedding of ln 2 to sketch space C such that 1. Each point in C can be described using only small number (say m) of numbers so C ⊂ Rm and 2. value of l2(S) is approximately equal to F(C(S)). F(Y1, Y2, ..Yt) = median(Y1, Y2, .., Yt)
  • 11. Dimensionality Reduction F(Y1, Y2, ..Yt) = median(Y1, Y2, .., Yt) Disadvantage: F is not a norm–performing any nontrivial operations in the sketch space (e.g. clustering, similarity search, regression etc.) becomes difficult. Can we embed from ln 2 to lm 2 , m << n approximately preserving the distance ? Johnson-Lindenstrauss Lemma
  • 12. Interlude to Normal Distribution Normal distribution N(0, 1): Range (−∞, ∞) Density f (x) = e−x2 / √ 2π Mean=0, Variance=1 Basic facts If X and Y are independent random variables with normal distribution then so is X + Y If X and Y are independent with mean 0 then E [X + Y ]2 = E X2 + E Y 2 E cX = cE X , Var cX = c2Var X
  • 13. A Different Linear Sketch Instead of ±1 let ri be a i.i.d. random variable from N(0, 1). Consider Z = i ri xi E Z2 = E ( i ri xi )2 = i E r2 i x2 i = i Var ri x2 i = i x2 i = ||x||2 2. As before we maintain Z = [Z1, Z2, ..., Zt] and define Y = ||Z||2 2 E Y = t||x||2 2 We show that there exists constant C > 0 s.t. for small enough > 0 Pr |Y − t||x||2 2| > t||x||2 2 ≤ e−C 2t (JL lemma) set t = O( 1 2 log 1 δ )
  • 14. Johnson Lindenstrauss Lemma Lemma For any 0 < epsilon < 1 and any integer m, let t be a positive integer such that t > 4 ln m 2/2 + 3/3 Then for any set V of m points in Rn, there is a map f : Rn → Rt such that for all u and v ∈ V , (1 − )||u − v||2 2 ≤ ||f (u) − f (v)||2 2 ≤ (1 + )||u − v||2 2. Furthermore this map can be found in randomized polynomial time.