SlideShare a Scribd company logo
1 of 45
calculation | consulting
This is an early draft of some notes
on the relationship between
statistical physics and deep learning
(TM)
c|c
(TM)
charles@calculationconsulting.com
calculation|consulting
This is an early draft of some notes
on the relationship between
statistical physics and deep learning
(TM)
charles@calculationconsulting.com
calculation | consulting stat phys of deep learning
Who Are We?
c|c
(TM)
Dr. Charles H. Martin, PhD
University of Chicago, Chemical Physics
NSF Fellow in Theoretical Chemistry
Over 10 years experience in applied Machine Learning
Developed ML algos for Demand Media; the first $1B IPO since Google
Tech: Aardvark (now Google), eHow, GoDaddy, …
Wall Street: BlackRock
Fortune 500: Big Pharma, Telecom, eBay
www.calculationconsulting.com
charles@calculationconsulting.com
(TM)
3
Data Scientists are Different
c|c
(TM)
theoretical physics
machine learning specialist
(TM)
4
experimental physics
data scientist
engineer
software, browser tech, dev ops, …
not all techies are the same
calculation | consulting stat phys of deep learning
Statistical Physics of Information Theory
c|c
(TM)
(TM)
5
not my ideas just a summary
calculation | consulting stat phys of deep learning
the book : Merhav (2009)
http://webee.technion.ac.il/people/merhav/papers/p138f.pdf
”If I have seen further than others,
it is by standing on the shoulders of
giants” (Isaac Newton)
notes from the web &
Statistical Physics of Information Theory
c|c
(TM)
(TM)
6
not my ideas just a summary
calculation | consulting stat phys of deep learning
the book : Merhav (2009)
http://webee.technion.ac.il/people/merhav/papers/p138f.pdf
”If I have seen further than others,
it is by standing on the shoulders of
giants” (Isaac Newton)
notes from the web &
c|c
(TM)
(TM)
7
Energies: unnormalized probabilities

calculation | consulting stat phys of deep learning
in stat phys and ML , energies
give unnormalized probabilities
xj = Ej = - ln xj
xj
in ML, is an (optional) scale /smoothing parameter
in stat phys, is the inverse Temperature
c|c
(TM)
(TM)
8
Energy normalization: Partition Function (Z)

calculation | consulting stat phys of deep learning
the normalization factor Z is
to get probabilities, we do a soft-max transform
but we also include the inverse Temperature
c|c
(TM)
(TM)
9
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
modern nets are layers of nodes and activation functions
What happened to E and Z ?
They are easy to recover in simple cases…
c|c
(TM)
(TM)
10
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
consider 1 layer of an RBM
c|c
(TM)
(TM)
11
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
lets compute the p(h|x) directly from the Energy function
we expect the conditional probabilities to factor
and to have sigmoid activations
c|c
(TM)
(TM)
12
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
http://www.youtube.com/watch?v=lekCh_i32iE&t=18m31s
c|c
(TM)
(TM)
13
Old School Nets: from Z to sigmoid activations

calculation | consulting stat phys of deep learning
http://www.youtube.com/watch?v=lekCh_i32iE&t=18m31s
we find that the conditional probabilities do factor
and we can recover the local sigmoid activations
but we don’t include Temperature…although old models did
c|c
(TM)
(TM)
14
Scaled Energies: w/ Temperature

calculation | consulting stat phys of deep learning
we do see T in some simple reinforcement learning methods
c|c
(TM)
(TM)
15
Scaled Energies: Temperature smoothing

calculation | consulting stat phys of deep learning
and T arises as a smoothing parameter in Dark Knowledge
c|c
(TM)
(TM)
16
Scaled Energies: Max Norm Regularization

calculation | consulting stat phys of deep learning
http://www.deeplearningbook.org/slides/dls_2016.pdf
We frequently have to rescale the weights in the deep net
I simply observe that this, effectively, energy rescaling
c|c
(TM)
(TM)
17
Scaled Energies: Batch Norm Regularization

calculation | consulting stat phys of deep learning
most recent ideas out of Google Deep Mind
http://www.deeplearningbook.org/slides/dls_2016.pdf
ReLU
mean = 0
variance = 1
Z ~ E energy
local layer energies must be rescaled explicitly on each batch step
c|c
(TM)
(TM)
18
Scaled Energies: Batch Norm Regularization

calculation | consulting stat phys of deep learning
most recent ideas out of Google Deep Mind
http://www.deeplearningbook.org/slides/dls_2016.pdf
ReLU
mean = 0
variance = 1
Z ~ E energy
local layer energies must be rescaled explicitly on each batch step
c|c
(TM)
(TM)
19
Recap: energies and temperatures

calculation | consulting stat phys of deep learning
http://www.deeplearningbook.org/slides/dls_2016.pdf
Neural Networks define energies at each layer
Sigmoid activations result from normalization and factorization
Local energies / weights must be rescaled carefully
Lots of hacks to get good convergence
Lets turn to some stat mech / stats to see howT arises
c|c
(TM)
(TM)
20
Boltzmann Distribution: classic argument (Hill)

calculation | consulting stat phys of deep learning
https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/
given the constraints (constant N, E)
given many discrete states, the distribution is
what is the most probable distribution ?
c|c
(TM)
(TM)
21
Boltzmann Distribution: the most likely distribution ?

calculation | consulting stat phys of deep learning
https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/
and the most likely
energy distribution
we expect the most likely
distribution of states
to both be highly peaked
i.e. concentrate to the means very fast
min log s.t.
c|c
(TM)
(TM)
22
Boltzmann Distribution: Lagrange multiplier problem

calculation | consulting stat phys of deep learning
https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/
so peaked we can minimize the log of the distribution
as
giving
are Lagrange multipliers, and aswhere
c|c
(TM)
(TM)
23
Boltzmann Distribution: Stirling’s Approximation

calculation | consulting stat phys of deep learning
see Art of Computer Programming by Knuth
we apply an asymptotically convergent expansion
to the terms in the multinomial distribution
when taking ; note that term vanishes
c|c
(TM)
(TM)
24
calculation | consulting stat phys of deep learning
https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/
Boltzmann Distribution: Lagrange multiplier problem

after applying Stirling’s approximation, and taking partials
mean number of events
this leads to the final most likely distribution …
we get
giving
c|c
(TM)
(TM)
25
Boltzmann Distribution: and Partition Function
calculation | consulting stat phys of deep learning
https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/
optimal probability
average energy
partition function
central result of Gibbs statistical mechanics
c|c
(TM)
(TM)
26
Partition Function: a generating function
calculation | consulting stat phys of deep learning
we get all sorts of useful stuff out of it
c|c
(TM)
(TM)
27
Ground State Energy: the low Temp limit

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
28
Statistical Physics: an ML viewpoint

calculation | consulting stat phys of deep learning
we can derive and describe these results
using language familiar to the ML community
• max entropy principle
• KL divergence
• Chernoff bounds
• sums of random numbers
• concentration to the mean
• extreme value statistics
some results may be familiar; others surprising
c|c
(TM)
(TM)
29
Canonical Ensemble: from states to energies

calculation | consulting stat phys of deep learning
microcanonical: maximum entropy
Boltzmann-Gibbs distribution minimizes the free energy
canonical: minimum free energy
at constantT
c|c
(TM)
(TM)
30
Canonical Ensemble: from states to energies

calculation | consulting stat phys of deep learning
sum over states
sum over energy levels
many states ( ) can have the same energy level E
we count them w/ density of states
free energy
entropy S = ln
c|c
(TM)
(TM)
31
Free Energy: back to probabilities

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
32
Free Energy: KL Divergence

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
33
Temperature: a Chernoff parameter

calculation | consulting stat phys of deep learning
given X1,X2 … i.i.d vars, and a function
how fast does event (sum) decay ?
where
apply Chernoff bound
w/ exponential Indicator
minimize over
c|c
(TM)
(TM)
34
Temperature: a Chernoff parameter

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
35
Temperature: a Chernoff parameter

calculation | consulting stat phys of deep learning
principle of minimum free energy
is the equilibrium inverse temperature
see book for details & caveats
S is really a rate function,
as in large deviations theory
c|c
(TM)
(TM)
36
Free Energy: thermodynamic limit

calculation | consulting stat phys of deep learning
free energy density
these may differ: the order of the limits matter
annealed (w/ moments)
c|c
(TM)
(TM)
37
Free Energy: indicates Phase Transitions (PT)

calculation | consulting stat phys of deep learning
thermodynamic functions change abruptly with external changes
should be analytic
first order PT
second order PT
discontinuous
c|c
(TM)
(TM)
38
Random Energies: sum of exponentials of
random numbers

calculation | consulting stat phys of deep learning
say we have i.i.d. events
w/probability
what is the probability that at least one event occurs ?
c|c
(TM)
(TM)
39
sums of exp(rand(x)): concentration result

calculation | consulting stat phys of deep learning
w/expectation
# successes = sum of i.i.d. binary random vars
A < B vanishes completely
A > B concentrates to mean very fast
c|c
(TM)
(TM)
40
calculation | consulting stat phys of deep learning
either 1 event or 0 events are seen, depending on A/B
ln(1- x) x + …
sums of exp(rand(x)): proof of concentrations

c|c
(TM)
(TM)
41
Random Energy Model (REM): setup

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
42
Random Energy Model (REM): …

calculation | consulting stat phys of deep learning
c|c
(TM)
(TM)
43
Replica Method: an old trick to eval Z

calculation | consulting stat phys of deep learning
expected value
in moments of Z
of ln Z
express w/ integer m
analytic continuation to real as m-> 0
bad branch cut? deal w/ later
c|c
(TM)
(TM)
44
Summary

calculation | consulting stat phys of deep learning
(TM)
c|c
(TM)
c | c
charles@calculationconsulting.com

More Related Content

What's hot

Georgetown B-school Talk 2021
Georgetown B-school Talk  2021Georgetown B-school Talk  2021
Georgetown B-school Talk 2021Charles Martin
 
Stanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning WorksStanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning WorksCharles Martin
 
This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019Charles Martin
 
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...Charles Martin
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
 
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...Mokhtar SELLAMI
 
Cari presentation maurice-tchoupe-joskelngoufo
Cari presentation maurice-tchoupe-joskelngoufoCari presentation maurice-tchoupe-joskelngoufo
Cari presentation maurice-tchoupe-joskelngoufoMokhtar SELLAMI
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance charlesmartin14
 
07 cv mil_modeling_complex_densities
07 cv mil_modeling_complex_densities07 cv mil_modeling_complex_densities
07 cv mil_modeling_complex_densitieszukun
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization MethodsStefan Kühn
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Thang Nguyen
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceWork-Bench
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Graph Convolutional Neural Networks
Graph Convolutional Neural Networks Graph Convolutional Neural Networks
Graph Convolutional Neural Networks 신동 강
 
Применение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботамиПрименение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботамиSkolkovo Robotics Center
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
 

What's hot (20)

Georgetown B-school Talk 2021
Georgetown B-school Talk  2021Georgetown B-school Talk  2021
Georgetown B-school Talk 2021
 
Stanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning WorksStanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning Works
 
This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019
 
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
 
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
 
Cari presentation maurice-tchoupe-joskelngoufo
Cari presentation maurice-tchoupe-joskelngoufoCari presentation maurice-tchoupe-joskelngoufo
Cari presentation maurice-tchoupe-joskelngoufo
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
 
07 cv mil_modeling_complex_densities
07 cv mil_modeling_complex_densities07 cv mil_modeling_complex_densities
07 cv mil_modeling_complex_densities
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
PMF BPMF and BPTF
PMF BPMF and BPTFPMF BPMF and BPTF
PMF BPMF and BPTF
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
 
Discussion of PMCMC
Discussion of PMCMCDiscussion of PMCMC
Discussion of PMCMC
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Graph Convolutional Neural Networks
Graph Convolutional Neural Networks Graph Convolutional Neural Networks
Graph Convolutional Neural Networks
 
Применение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботамиПрименение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботами
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian Processes
 

Similar to Cc stat phys draft

WeightWatcher LLM Update
WeightWatcher LLM UpdateWeightWatcher LLM Update
WeightWatcher LLM UpdateCharles Martin
 
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Charles Martin
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodEconomic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodIOSR Journals
 
Contract-Based Integration of Cyber-Physical Analyses (Poster)
Contract-Based Integration of Cyber-Physical Analyses (Poster)Contract-Based Integration of Cyber-Physical Analyses (Poster)
Contract-Based Integration of Cyber-Physical Analyses (Poster)Ivan Ruchkin
 
WeightWatcher Update: January 2021
WeightWatcher Update:  January 2021WeightWatcher Update:  January 2021
WeightWatcher Update: January 2021Charles Martin
 
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...Abolfazl Asudeh
 
Dynamic programming prasintation eaisy
Dynamic programming prasintation eaisyDynamic programming prasintation eaisy
Dynamic programming prasintation eaisyahmed51236
 
Solution of Combined Heat and Power Economic Dispatch Problem Using Different...
Solution of Combined Heat and Power Economic Dispatch Problem Using Different...Solution of Combined Heat and Power Economic Dispatch Problem Using Different...
Solution of Combined Heat and Power Economic Dispatch Problem Using Different...Arkadev Ghosh
 
Introduction to simulink (1)
Introduction to simulink (1)Introduction to simulink (1)
Introduction to simulink (1)Memo Love
 
第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料Takayuki Osogami
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
dynamic programming complete by Mumtaz Ali (03154103173)
dynamic programming complete by Mumtaz Ali (03154103173)dynamic programming complete by Mumtaz Ali (03154103173)
dynamic programming complete by Mumtaz Ali (03154103173)Mumtaz Ali
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Université de Liège (ULg)
 

Similar to Cc stat phys draft (20)

WeightWatcher LLM Update
WeightWatcher LLM UpdateWeightWatcher LLM Update
WeightWatcher LLM Update
 
ENS Macrh 2022.pdf
ENS Macrh 2022.pdfENS Macrh 2022.pdf
ENS Macrh 2022.pdf
 
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodEconomic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
 
Contract-Based Integration of Cyber-Physical Analyses (Poster)
Contract-Based Integration of Cyber-Physical Analyses (Poster)Contract-Based Integration of Cyber-Physical Analyses (Poster)
Contract-Based Integration of Cyber-Physical Analyses (Poster)
 
WeightWatcher Update: January 2021
WeightWatcher Update:  January 2021WeightWatcher Update:  January 2021
WeightWatcher Update: January 2021
 
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
 
Dynamic programming prasintation eaisy
Dynamic programming prasintation eaisyDynamic programming prasintation eaisy
Dynamic programming prasintation eaisy
 
Solution of Combined Heat and Power Economic Dispatch Problem Using Different...
Solution of Combined Heat and Power Economic Dispatch Problem Using Different...Solution of Combined Heat and Power Economic Dispatch Problem Using Different...
Solution of Combined Heat and Power Economic Dispatch Problem Using Different...
 
Cmb part3
Cmb part3Cmb part3
Cmb part3
 
Introduction to simulink (1)
Introduction to simulink (1)Introduction to simulink (1)
Introduction to simulink (1)
 
第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
dynamic programming complete by Mumtaz Ali (03154103173)
dynamic programming complete by Mumtaz Ali (03154103173)dynamic programming complete by Mumtaz Ali (03154103173)
dynamic programming complete by Mumtaz Ali (03154103173)
 
Lecture16 xing
Lecture16 xingLecture16 xing
Lecture16 xing
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...
 
Network Security CS3-4
Network Security CS3-4 Network Security CS3-4
Network Security CS3-4
 
ICCF24.pdf
ICCF24.pdfICCF24.pdf
ICCF24.pdf
 

More from Charles Martin

Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfCharles Martin
 
LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfCharles Martin
 
WeightWatcher Introduction
WeightWatcher IntroductionWeightWatcher Introduction
WeightWatcher IntroductionCharles Martin
 
Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery Charles Martin
 
Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107Charles Martin
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
Cc hass b school talk 2105
Cc hass b school talk  2105Cc hass b school talk  2105
Cc hass b school talk 2105Charles Martin
 

More from Charles Martin (8)

Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdf
 
LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdf
 
WeightWatcher Introduction
WeightWatcher IntroductionWeightWatcher Introduction
WeightWatcher Introduction
 
Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery
 
Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Cc hass b school talk 2105
Cc hass b school talk  2105Cc hass b school talk  2105
Cc hass b school talk 2105
 
CC Talk at Berekely
CC Talk at BerekelyCC Talk at Berekely
CC Talk at Berekely
 

Recently uploaded

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 

Recently uploaded (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 

Cc stat phys draft

  • 1. calculation | consulting This is an early draft of some notes on the relationship between statistical physics and deep learning (TM) c|c (TM) charles@calculationconsulting.com
  • 2. calculation|consulting This is an early draft of some notes on the relationship between statistical physics and deep learning (TM) charles@calculationconsulting.com
  • 3. calculation | consulting stat phys of deep learning Who Are We? c|c (TM) Dr. Charles H. Martin, PhD University of Chicago, Chemical Physics NSF Fellow in Theoretical Chemistry Over 10 years experience in applied Machine Learning Developed ML algos for Demand Media; the first $1B IPO since Google Tech: Aardvark (now Google), eHow, GoDaddy, … Wall Street: BlackRock Fortune 500: Big Pharma, Telecom, eBay www.calculationconsulting.com charles@calculationconsulting.com (TM) 3
  • 4. Data Scientists are Different c|c (TM) theoretical physics machine learning specialist (TM) 4 experimental physics data scientist engineer software, browser tech, dev ops, … not all techies are the same calculation | consulting stat phys of deep learning
  • 5. Statistical Physics of Information Theory c|c (TM) (TM) 5 not my ideas just a summary calculation | consulting stat phys of deep learning the book : Merhav (2009) http://webee.technion.ac.il/people/merhav/papers/p138f.pdf ”If I have seen further than others, it is by standing on the shoulders of giants” (Isaac Newton) notes from the web &
  • 6. Statistical Physics of Information Theory c|c (TM) (TM) 6 not my ideas just a summary calculation | consulting stat phys of deep learning the book : Merhav (2009) http://webee.technion.ac.il/people/merhav/papers/p138f.pdf ”If I have seen further than others, it is by standing on the shoulders of giants” (Isaac Newton) notes from the web &
  • 7. c|c (TM) (TM) 7 Energies: unnormalized probabilities
 calculation | consulting stat phys of deep learning in stat phys and ML , energies give unnormalized probabilities xj = Ej = - ln xj xj in ML, is an (optional) scale /smoothing parameter in stat phys, is the inverse Temperature
  • 8. c|c (TM) (TM) 8 Energy normalization: Partition Function (Z)
 calculation | consulting stat phys of deep learning the normalization factor Z is to get probabilities, we do a soft-max transform but we also include the inverse Temperature
  • 9. c|c (TM) (TM) 9 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning modern nets are layers of nodes and activation functions What happened to E and Z ? They are easy to recover in simple cases…
  • 10. c|c (TM) (TM) 10 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning consider 1 layer of an RBM
  • 11. c|c (TM) (TM) 11 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning lets compute the p(h|x) directly from the Energy function we expect the conditional probabilities to factor and to have sigmoid activations
  • 12. c|c (TM) (TM) 12 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning http://www.youtube.com/watch?v=lekCh_i32iE&t=18m31s
  • 13. c|c (TM) (TM) 13 Old School Nets: from Z to sigmoid activations
 calculation | consulting stat phys of deep learning http://www.youtube.com/watch?v=lekCh_i32iE&t=18m31s we find that the conditional probabilities do factor and we can recover the local sigmoid activations but we don’t include Temperature…although old models did
  • 14. c|c (TM) (TM) 14 Scaled Energies: w/ Temperature
 calculation | consulting stat phys of deep learning we do see T in some simple reinforcement learning methods
  • 15. c|c (TM) (TM) 15 Scaled Energies: Temperature smoothing
 calculation | consulting stat phys of deep learning and T arises as a smoothing parameter in Dark Knowledge
  • 16. c|c (TM) (TM) 16 Scaled Energies: Max Norm Regularization
 calculation | consulting stat phys of deep learning http://www.deeplearningbook.org/slides/dls_2016.pdf We frequently have to rescale the weights in the deep net I simply observe that this, effectively, energy rescaling
  • 17. c|c (TM) (TM) 17 Scaled Energies: Batch Norm Regularization
 calculation | consulting stat phys of deep learning most recent ideas out of Google Deep Mind http://www.deeplearningbook.org/slides/dls_2016.pdf ReLU mean = 0 variance = 1 Z ~ E energy local layer energies must be rescaled explicitly on each batch step
  • 18. c|c (TM) (TM) 18 Scaled Energies: Batch Norm Regularization
 calculation | consulting stat phys of deep learning most recent ideas out of Google Deep Mind http://www.deeplearningbook.org/slides/dls_2016.pdf ReLU mean = 0 variance = 1 Z ~ E energy local layer energies must be rescaled explicitly on each batch step
  • 19. c|c (TM) (TM) 19 Recap: energies and temperatures
 calculation | consulting stat phys of deep learning http://www.deeplearningbook.org/slides/dls_2016.pdf Neural Networks define energies at each layer Sigmoid activations result from normalization and factorization Local energies / weights must be rescaled carefully Lots of hacks to get good convergence Lets turn to some stat mech / stats to see howT arises
  • 20. c|c (TM) (TM) 20 Boltzmann Distribution: classic argument (Hill)
 calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ given the constraints (constant N, E) given many discrete states, the distribution is what is the most probable distribution ?
  • 21. c|c (TM) (TM) 21 Boltzmann Distribution: the most likely distribution ?
 calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ and the most likely energy distribution we expect the most likely distribution of states to both be highly peaked i.e. concentrate to the means very fast
  • 22. min log s.t. c|c (TM) (TM) 22 Boltzmann Distribution: Lagrange multiplier problem
 calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ so peaked we can minimize the log of the distribution as giving are Lagrange multipliers, and aswhere
  • 23. c|c (TM) (TM) 23 Boltzmann Distribution: Stirling’s Approximation
 calculation | consulting stat phys of deep learning see Art of Computer Programming by Knuth we apply an asymptotically convergent expansion to the terms in the multinomial distribution when taking ; note that term vanishes
  • 24. c|c (TM) (TM) 24 calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ Boltzmann Distribution: Lagrange multiplier problem
 after applying Stirling’s approximation, and taking partials mean number of events this leads to the final most likely distribution … we get giving
  • 25. c|c (TM) (TM) 25 Boltzmann Distribution: and Partition Function calculation | consulting stat phys of deep learning https://charlesmartin14.wordpress.com/2013/11/14/metric-learning-some-quantum-statistical-mechanics/ optimal probability average energy partition function central result of Gibbs statistical mechanics
  • 26. c|c (TM) (TM) 26 Partition Function: a generating function calculation | consulting stat phys of deep learning we get all sorts of useful stuff out of it
  • 27. c|c (TM) (TM) 27 Ground State Energy: the low Temp limit
 calculation | consulting stat phys of deep learning
  • 28. c|c (TM) (TM) 28 Statistical Physics: an ML viewpoint
 calculation | consulting stat phys of deep learning we can derive and describe these results using language familiar to the ML community • max entropy principle • KL divergence • Chernoff bounds • sums of random numbers • concentration to the mean • extreme value statistics some results may be familiar; others surprising
  • 29. c|c (TM) (TM) 29 Canonical Ensemble: from states to energies
 calculation | consulting stat phys of deep learning microcanonical: maximum entropy Boltzmann-Gibbs distribution minimizes the free energy canonical: minimum free energy at constantT
  • 30. c|c (TM) (TM) 30 Canonical Ensemble: from states to energies
 calculation | consulting stat phys of deep learning sum over states sum over energy levels many states ( ) can have the same energy level E we count them w/ density of states free energy entropy S = ln
  • 31. c|c (TM) (TM) 31 Free Energy: back to probabilities
 calculation | consulting stat phys of deep learning
  • 32. c|c (TM) (TM) 32 Free Energy: KL Divergence
 calculation | consulting stat phys of deep learning
  • 33. c|c (TM) (TM) 33 Temperature: a Chernoff parameter
 calculation | consulting stat phys of deep learning given X1,X2 … i.i.d vars, and a function how fast does event (sum) decay ? where apply Chernoff bound w/ exponential Indicator minimize over
  • 34. c|c (TM) (TM) 34 Temperature: a Chernoff parameter
 calculation | consulting stat phys of deep learning
  • 35. c|c (TM) (TM) 35 Temperature: a Chernoff parameter
 calculation | consulting stat phys of deep learning principle of minimum free energy is the equilibrium inverse temperature see book for details & caveats S is really a rate function, as in large deviations theory
  • 36. c|c (TM) (TM) 36 Free Energy: thermodynamic limit
 calculation | consulting stat phys of deep learning free energy density these may differ: the order of the limits matter annealed (w/ moments)
  • 37. c|c (TM) (TM) 37 Free Energy: indicates Phase Transitions (PT)
 calculation | consulting stat phys of deep learning thermodynamic functions change abruptly with external changes should be analytic first order PT second order PT discontinuous
  • 38. c|c (TM) (TM) 38 Random Energies: sum of exponentials of random numbers
 calculation | consulting stat phys of deep learning say we have i.i.d. events w/probability what is the probability that at least one event occurs ?
  • 39. c|c (TM) (TM) 39 sums of exp(rand(x)): concentration result
 calculation | consulting stat phys of deep learning w/expectation # successes = sum of i.i.d. binary random vars A < B vanishes completely A > B concentrates to mean very fast
  • 40. c|c (TM) (TM) 40 calculation | consulting stat phys of deep learning either 1 event or 0 events are seen, depending on A/B ln(1- x) x + … sums of exp(rand(x)): proof of concentrations

  • 41. c|c (TM) (TM) 41 Random Energy Model (REM): setup
 calculation | consulting stat phys of deep learning
  • 42. c|c (TM) (TM) 42 Random Energy Model (REM): …
 calculation | consulting stat phys of deep learning
  • 43. c|c (TM) (TM) 43 Replica Method: an old trick to eval Z
 calculation | consulting stat phys of deep learning expected value in moments of Z of ln Z express w/ integer m analytic continuation to real as m-> 0 bad branch cut? deal w/ later