SlideShare una empresa de Scribd logo
1 de 13
Learning Sparse Neural Networks using
L0 Regularization
- Varun Reddy G
Neural Networks
 Very good function approximators and flexible
 Scales well
Some problems
1. Highly overparameterized
2. Can easily overfit.
One of the Solutions:
Model Compression and Sparsification
 A typical Lp regularization loss would look like
Where ||θ||p is the L p norm and
L(.) is the loss function
 L0 norm essentially means counting the number of non-zero parameters in the model.
 It penalizes all non-zero values equally, unlike other Lp norms which penalize on the value of θj causing
more shrinkage on higher values
So, now the error function looks like this
But, now this function is computationally intractable given non-differentiability and combinatorial nature of
the 2 |θ| possible states for the parameter vector θ
So, we reformulate to try and make it continuous.
 Consider the following re-parametarization,
Where, Zj corresponds to the binary gates 0, 1 representing the parameter is present or not.
Now, if we consider q(zj |πj ) = Bern(πj) distribution where πj is the probability of 1, then we can
reformulate the loss on average as
Now, the second term is easy to minimize, but the first term, due to the discrete nature of z, is difficult to
optimize.
Let s be a continuous random variable with a distribution q(s) and let the z’s be given by a hard-sigmoid
rectification of s
Hard-sigmoid
f(.) = min(1, max(0, .))
So, now z is given by
z = min(1, max(0, s))
This is equivalent to
z =
0 𝑖𝑓 𝑠 ≤ 0
1 𝑖𝑓 𝑠 ≥ 1
𝑠 𝑖𝑓 0 < 𝑠 < 1
So, if we look at the loss function, we have to penalize all the non-zero θ, so, the second term is
essentially the probability of s < 0, which is given out by the CDF Q(s)
Substituting these
 Our loss function becomes
where g(s) is our hard-sigmoid function.
Re-parameterization Trick
We can choose q(s), with parameters ɸ such that they allow the re-parameterization trick and express the
loss function as an expectation over a parameter free noise distribution p(ϵ) and a deterministic and
differentiable transformation f(.) of the parameters ɸ and ϵ
P.S variables in the above definition do not correspond to those in the picture
Therefore, the objective now becomes,
Choosing the q(s)
We are free to choose the q(s) and something that worked well in practice is a binary concrete random
variable distributed in (0, 1) with probability density qs (s| ɸ) and cumulative density Qs (s | ɸ).
The parameters of this distribution are ɸ = (log ⍺, β) where, log ⍺ is location and β is temperature.
We stretch this distribution to an interval (ɣ, 𝛿) such that ɣ < 0 and 𝛿 > 0 and apply hard-sigmoid on its
random samples
 So, with the above changes, the objective function is
Eq. 9
Results
Summary
1. Force the network weights to become absolute 0’s
2. To remove non-differentiability, re-parameterize
3. Now, to make the objective function continuous and to keep the sampling step out of the main network,
use the re-parameterization trick.
4. Learn the parameters for the q(s) and use them at inference time, like so
Resources
 Numenta Journal Club https://www.youtube.com/watch?v=HD2uvsAEZFM
 Original Paper https://arxiv.org/abs/1712.01312

Más contenido relacionado

La actualidad más candente

DSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformDSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformAmr E. Mohamed
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applicationsAgam Goel
 
Website designing company in Noida
Website designing company in NoidaWebsite designing company in Noida
Website designing company in NoidaCss Founder
 
Understanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman OperatorsUnderstanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman OperatorsAshwin Rao
 
1.7. eqivalence of nfa and dfa
1.7. eqivalence of nfa and dfa1.7. eqivalence of nfa and dfa
1.7. eqivalence of nfa and dfaSampath Kumar S
 
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)Amr E. Mohamed
 
Operations on fourier series
Operations on fourier seriesOperations on fourier series
Operations on fourier seriesTarun Gehlot
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmit Ghosh
 
Seismic data processing lecture 4
Seismic data processing lecture 4Seismic data processing lecture 4
Seismic data processing lecture 4Amin khalil
 
Design of sampled data control systems 5th lecture
Design of sampled data control systems  5th  lectureDesign of sampled data control systems  5th  lecture
Design of sampled data control systems 5th lectureKhalaf Gaeid Alshammery
 
DSP_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_FOEHU - Lec 08 - The Discrete Fourier TransformDSP_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_FOEHU - Lec 08 - The Discrete Fourier TransformAmr E. Mohamed
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic NotationsRishabh Soni
 

La actualidad más candente (20)

1.5 all notes
1.5 all notes1.5 all notes
1.5 all notes
 
DSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformDSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-Transform
 
big_oh
big_ohbig_oh
big_oh
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applications
 
Website designing company in Noida
Website designing company in NoidaWebsite designing company in Noida
Website designing company in Noida
 
Understanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman OperatorsUnderstanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman Operators
 
1.7. eqivalence of nfa and dfa
1.7. eqivalence of nfa and dfa1.7. eqivalence of nfa and dfa
1.7. eqivalence of nfa and dfa
 
Dft,fft,windowing
Dft,fft,windowingDft,fft,windowing
Dft,fft,windowing
 
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
 
Operations on fourier series
Operations on fourier seriesOperations on fourier series
Operations on fourier series
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Vector operators
Vector operatorsVector operators
Vector operators
 
Seismic data processing lecture 4
Seismic data processing lecture 4Seismic data processing lecture 4
Seismic data processing lecture 4
 
poster
posterposter
poster
 
Theta notation
Theta notationTheta notation
Theta notation
 
Design of sampled data control systems 5th lecture
Design of sampled data control systems  5th  lectureDesign of sampled data control systems  5th  lecture
Design of sampled data control systems 5th lecture
 
DSP_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_FOEHU - Lec 08 - The Discrete Fourier TransformDSP_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_FOEHU - Lec 08 - The Discrete Fourier Transform
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
 
Signal Processing Homework Help
Signal Processing Homework HelpSignal Processing Homework Help
Signal Processing Homework Help
 
Bode plot
Bode plotBode plot
Bode plot
 

Similar a Learning sparse Neural Networks using L0 Regularization

Method of weighted residuals
Method of weighted residualsMethod of weighted residuals
Method of weighted residualsJasim Almuhandis
 
Linear Regression
Linear Regression Linear Regression
Linear Regression Rupak Roy
 
Calculus ii power series and functions
Calculus ii   power series and functionsCalculus ii   power series and functions
Calculus ii power series and functionsmeezanchand
 
Approximate Thin Plate Spline Mappings
Approximate Thin Plate Spline MappingsApproximate Thin Plate Spline Mappings
Approximate Thin Plate Spline MappingsArchzilon Eshun-Davies
 
Modeling biased tracers at the field level
Modeling biased tracers at the field levelModeling biased tracers at the field level
Modeling biased tracers at the field levelMarcel Schmittfull
 
Moudling of sensitivityof transfer function
Moudling of sensitivityof transfer functionMoudling of sensitivityof transfer function
Moudling of sensitivityof transfer functionpradeep kumar
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural NetworksESCOM
 
Differential Equations Assignment Help
Differential Equations Assignment HelpDifferential Equations Assignment Help
Differential Equations Assignment HelpMaths Assignment Help
 
Inverse laplacetransform
Inverse laplacetransformInverse laplacetransform
Inverse laplacetransformTarun Gehlot
 
Mba Ebooks ! Edhole
Mba Ebooks ! EdholeMba Ebooks ! Edhole
Mba Ebooks ! EdholeEdhole.com
 
Stochastic Processes Homework Help
Stochastic Processes Homework HelpStochastic Processes Homework Help
Stochastic Processes Homework HelpExcel Homework Help
 
Machine learning (13)
Machine learning (13)Machine learning (13)
Machine learning (13)NYversity
 
3. Weighted residual methods (1).pptx
3. Weighted residual methods (1).pptx3. Weighted residual methods (1).pptx
3. Weighted residual methods (1).pptxDeepu Sivakumar
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equationsXequeMateShannon
 

Similar a Learning sparse Neural Networks using L0 Regularization (20)

Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Method of weighted residuals
Method of weighted residualsMethod of weighted residuals
Method of weighted residuals
 
Linear Regression
Linear Regression Linear Regression
Linear Regression
 
7 regularization
7 regularization7 regularization
7 regularization
 
Calculus ii power series and functions
Calculus ii   power series and functionsCalculus ii   power series and functions
Calculus ii power series and functions
 
Approximate Thin Plate Spline Mappings
Approximate Thin Plate Spline MappingsApproximate Thin Plate Spline Mappings
Approximate Thin Plate Spline Mappings
 
Modeling biased tracers at the field level
Modeling biased tracers at the field levelModeling biased tracers at the field level
Modeling biased tracers at the field level
 
Moudling of sensitivityof transfer function
Moudling of sensitivityof transfer functionMoudling of sensitivityof transfer function
Moudling of sensitivityof transfer function
 
1607.01152.pdf
1607.01152.pdf1607.01152.pdf
1607.01152.pdf
 
Klt
KltKlt
Klt
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural Networks
 
Differential Equations Assignment Help
Differential Equations Assignment HelpDifferential Equations Assignment Help
Differential Equations Assignment Help
 
Capitulo9
Capitulo9Capitulo9
Capitulo9
 
Inverse laplacetransform
Inverse laplacetransformInverse laplacetransform
Inverse laplacetransform
 
10.1.1.630.8055
10.1.1.630.805510.1.1.630.8055
10.1.1.630.8055
 
Mba Ebooks ! Edhole
Mba Ebooks ! EdholeMba Ebooks ! Edhole
Mba Ebooks ! Edhole
 
Stochastic Processes Homework Help
Stochastic Processes Homework HelpStochastic Processes Homework Help
Stochastic Processes Homework Help
 
Machine learning (13)
Machine learning (13)Machine learning (13)
Machine learning (13)
 
3. Weighted residual methods (1).pptx
3. Weighted residual methods (1).pptx3. Weighted residual methods (1).pptx
3. Weighted residual methods (1).pptx
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equations
 

Último

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Último (20)

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

Learning sparse Neural Networks using L0 Regularization

  • 1. Learning Sparse Neural Networks using L0 Regularization - Varun Reddy G
  • 2. Neural Networks  Very good function approximators and flexible  Scales well Some problems 1. Highly overparameterized 2. Can easily overfit. One of the Solutions: Model Compression and Sparsification
  • 3.  A typical Lp regularization loss would look like Where ||θ||p is the L p norm and L(.) is the loss function
  • 4.  L0 norm essentially means counting the number of non-zero parameters in the model.  It penalizes all non-zero values equally, unlike other Lp norms which penalize on the value of θj causing more shrinkage on higher values So, now the error function looks like this But, now this function is computationally intractable given non-differentiability and combinatorial nature of the 2 |θ| possible states for the parameter vector θ So, we reformulate to try and make it continuous.
  • 5.  Consider the following re-parametarization, Where, Zj corresponds to the binary gates 0, 1 representing the parameter is present or not. Now, if we consider q(zj |πj ) = Bern(πj) distribution where πj is the probability of 1, then we can reformulate the loss on average as Now, the second term is easy to minimize, but the first term, due to the discrete nature of z, is difficult to optimize.
  • 6. Let s be a continuous random variable with a distribution q(s) and let the z’s be given by a hard-sigmoid rectification of s Hard-sigmoid f(.) = min(1, max(0, .)) So, now z is given by z = min(1, max(0, s)) This is equivalent to z = 0 𝑖𝑓 𝑠 ≤ 0 1 𝑖𝑓 𝑠 ≥ 1 𝑠 𝑖𝑓 0 < 𝑠 < 1 So, if we look at the loss function, we have to penalize all the non-zero θ, so, the second term is essentially the probability of s < 0, which is given out by the CDF Q(s) Substituting these
  • 7.  Our loss function becomes where g(s) is our hard-sigmoid function.
  • 8. Re-parameterization Trick We can choose q(s), with parameters ɸ such that they allow the re-parameterization trick and express the loss function as an expectation over a parameter free noise distribution p(ϵ) and a deterministic and differentiable transformation f(.) of the parameters ɸ and ϵ P.S variables in the above definition do not correspond to those in the picture Therefore, the objective now becomes,
  • 9. Choosing the q(s) We are free to choose the q(s) and something that worked well in practice is a binary concrete random variable distributed in (0, 1) with probability density qs (s| ɸ) and cumulative density Qs (s | ɸ). The parameters of this distribution are ɸ = (log ⍺, β) where, log ⍺ is location and β is temperature. We stretch this distribution to an interval (ɣ, 𝛿) such that ɣ < 0 and 𝛿 > 0 and apply hard-sigmoid on its random samples
  • 10.  So, with the above changes, the objective function is Eq. 9
  • 12. Summary 1. Force the network weights to become absolute 0’s 2. To remove non-differentiability, re-parameterize 3. Now, to make the objective function continuous and to keep the sampling step out of the main network, use the re-parameterization trick. 4. Learn the parameters for the q(s) and use them at inference time, like so
  • 13. Resources  Numenta Journal Club https://www.youtube.com/watch?v=HD2uvsAEZFM  Original Paper https://arxiv.org/abs/1712.01312