SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
Parallel asynchronous
inference of word
senses with Azure
Sergey Bartunov, MSU
Learning as optimization
F(✓) = r(✓) +
NX
i=1
fi(xi; ✓) ! min
✓
Learning as optimization
loss
F(✓) = r(✓) +
NX
i=1
fi(xi; ✓) ! min
✓
regularizer
objectparameters
Learning as optimization
• can be huge
• regularizer and loss can be complex
• parameters’ dimensionality can be very large
N
loss
F(✓) = r(✓) +
NX
i=1
fi(xi; ✓) ! min
✓
regularizer
objectparameters
Learning as optimization
• can be huge
• regularizer and loss can be complex
• parameters’ dimensionality can be very large
N
loss
F(✓) = r(✓) +
NX
i=1
fi(xi; ✓) ! min
✓
regularizer
objectparameters
Commodity PC is not enough!
Learning word embeddings
For each word find its embedding such that 

similar words have close embeddings
Java
Platform
.NET
Mono
Railways
Ticket
Train
Politics
Party
Socialism
Learning word embeddings
…compiled for a specific hardware platform, since different central processor…
Learning word embeddings
…compiled for a specific hardware platform, since different central processor…
object: word and its context
Learning word embeddings
…compiled for a specific hardware platform, since different central processor…
object: word and its context
loss: log p(v|w) p(v|w) =
exp(AT
wBv)
PV
v0=1 exp(AT
wBv0 )
Learning word embeddings
…compiled for a specific hardware platform, since different central processor…
object: word and its context
loss:
parameters: word embeddings Aw, Bw 2 RD
, w 2 1, . . . , V
log p(v|w) p(v|w) =
exp(AT
wBv)
PV
v0=1 exp(AT
wBv0 )
Skip-gram (Mikolov et al, 2013)
Gradient optimization
F(✓) = r(✓) +
NX
i=1
fi(xi; ✓) ! min
✓
gradient descent
✓t+1
= ✓t
trF(✓t
)
Stochastic optimization
F(✓) = r(✓) +
NX
i=1
fi(xi; ✓) ! min
✓
stochastic gradient descent
✓t+1
= ✓t
tG(✓t
)
Stochastic optimization
F(✓) = r(✓) +
NX
i=1
fi(xi; ✓) ! min
✓
stochastic gradient descent
✓t+1
= ✓t
tG(✓t
)
EG(✓) = rF(✓)
Stochastic optimization
F(✓) = r(✓) +
NX
i=1
fi(xi; ✓) ! min
✓
stochastic gradient descent
✓t+1
= ✓t
tG(✓t
)
EG(✓) = rF(✓)
for example: G(✓) = r [r(✓) + Nfj(xj; ✓)] , j ⇠ Uniform(1, N)
Learning word embeddings
• 400k word vocabulary, 300-dimensional embeddings
• 240 million parameters to train!
• 1 GB memory snapshot
Stochastic parallel optimization
core 1 core 2 core K…
shared parameters
Stochastic parallel optimization
core 1 core 2 core K…
shared parameters
data flow
Stochastic parallel optimization
core 1 core 2 core K…
shared parameters
data flow
Stochastic parallel optimization
core 1 core 2 core K…
shared parameters
data flow
no synchronization!!
(see e.g. Hogwild paper)
Stochastic parallel optimization
My laptop: 2 cores, 8 GB RAM
Stochastic parallel optimization
My laptop: 2 cores, 8 GB RAM
Stochastic parallel optimization
My laptop: 2 cores, 8 GB RAM
Stochastic parallel optimization
My laptop: 2 cores, 8 GB RAM 22 hours
2 hoursDataset - English Wikipedia 2012 (5.7 GB raw text, 1 billion words)
Learning polysemic word embeddings
Java
Platform (1)
.NET
Mono
Railways
Ticket
Platform (2)
Train
Platform (3)
Politics
Party
Socialism
Learning polysemic word embeddings
…compiled for a specific hardware platform, since different central processor…
(computer meaning)
Learning polysemic word embeddings
…compiled for a specific hardware platform, since different central processor…
…as the safe distance from the platform edge increases with the speed…
(railway meaning)
(computer meaning)
Learning polysemic word embeddings
…compiled for a specific hardware platform, since different central processor…
…as the safe distance from the platform edge increases with the speed…
(railway meaning)
(computer meaning)
… Socialist Party; the Socialist Workers Platform and the Committee for a…
(political meaning)
Learning polysemic word embeddings
…compiled for a specific hardware platform, since different central processor…
…as the safe distance from the platform edge increases with the speed…
(railway meaning)
(computer meaning)
… Socialist Party; the Socialist Workers Platform and the Committee for a…
(political meaning)
loss:
loss:
loss:
log p(v|w, z = 1)
log p(v|w, z = 2)
log p(v|w, z = 3)
Learning polysemic word embeddings
…compiled for a specific hardware platform, since different central processor…
…as the safe distance from the platform edge increases with the speed…
(railway meaning)
(computer meaning)
… Socialist Party; the Socialist Workers Platform and the Committee for a…
(political meaning)
loss:
loss:
loss:
p(v|w, z = k) =
exp(AT
wkBv)
PV
v0=1 exp(AT
wkBv0 )
log p(v|w, z = 1)
log p(v|w, z = 2)
log p(v|w, z = 3)
word meanings are unobserved
Learning polysemic word embeddings
log p(W, V |A, B, ↵) = log
Z
p(z|↵)
Y
i
Y
j
p(vij|wi, zi, A, B)dz ! max
A,B
word meanings are unobserved, hence EM algorithm must be employed
Learning polysemic word embeddings
log p(W, V |A, B, ↵) = log
Z
p(z|↵)
Y
i
Y
j
p(vij|wi, zi, A, B)dz ! max
A,B
word meanings are unobserved, hence EM algorithm must be employed
• How to choose prior such that it allows to automatically increase number of word
meanings if necessary?
• How to put the EM procedure into stochastic optimization framework?
Learning polysemic word embeddings
log p(W, V |A, B, ↵) = log
Z
p(z|↵)
Y
i
Y
j
p(vij|wi, zi, A, B)dz ! max
A,B
word meanings are unobserved, hence EM algorithm must be employed
• How to choose prior such that it allows to automatically increase number of word
meanings if necessary?
• How to put the EM procedure into stochastic optimization framework?
Bayesian nonparametrics (Orbanz, 2014)
Learning polysemic word embeddings
log p(W, V |A, B, ↵) = log
Z
p(z|↵)
Y
i
Y
j
p(vij|wi, zi, A, B)dz ! max
A,B
word meanings are unobserved, hence EM algorithm must be employed
• How to choose prior such that it allows to automatically increase number of word
meanings if necessary?
• How to put the EM procedure into stochastic optimization framework?
Stochastic variational inference (Blei et al, 2012)
Bayesian nonparametrics (Orbanz, 2014)
EM algorithm
… Socialist Party; the Socialist Workers Platform and the Committee for a…
EM algorithm
E-step: disambiguate the word given its context
… Socialist Party; the Socialist Workers Platform and the Committee for a…
p(z = politics) = 0.96
p(z = transport) = 0.01
p(z = computer) = 0.03
EM algorithm
E-step: disambiguate the word given its context
… Socialist Party; the Socialist Workers Platform and the Committee for a…
p(z = politics) = 0.96
p(z = transport) = 0.01
p(z = computer) = 0.03
M-step: update word embeddings by weighted gradient
✓t+1
= ✓t
+ tr
"
X
k
p(zi = k) log p(vij|wi, zi = k, ✓t
)
#
Learning polysemic word embeddings
• 400k word vocabulary, 300-dimensional embeddings,

max. 30 meanings per word
• 7.2 billion parameters to train!
• 18 GB memory snapshot
Learning polysemic word embeddings
My laptop: 2 cores, 8 GB RAM 6 days!
16 hoursDataset - English Wikipedia 2012 (5.7 GB raw text, 1 billion words)
Results
julia> expected_pi(vm, dict.word2id["cloud"])
30-element Array{Float64,1}:
0.404964
0.134444
0.0987207
0.361865
5.70338e-6
5.18419e-7
4.7129e-8
4.28446e-9
3.89496e-10
3.54087e-11
⋮
Results
julia> nearest_neighbors(vm, dict, "cloud", 1)
10-element Array{(Any,Any,Any),1}:
("clouds",1,0.791538f0)
("haze",2,0.6702103f0)
("nimbostratus",1,0.653774f0)
("altostratus",1,0.6300289f0)
("noctilucent",1,0.6294991f0)
("cumulonimbus",1,0.6289225f0)
("stratocumulus",1,0.6274564f0)
("cumulus",2,0.6273055f0)
("clouds",2,0.6201524f0)
("cirrostratus",1,0.6146165f0)
Results
julia> nearest_neighbors(vm, dict, "cloud", 2)
10-element Array{(Any,Any,Any),1}:
("louis",5,0.5705162f0)
("vrain",1,0.55054826f0)
("lucie",1,0.52579653f0)
("clair",1,0.52284604f0)
("johns",2,0.5215208f0)
("marys",1,0.5036709f0)
("nazianz",1,0.4979607f0)
("lawrence",2,0.49513188f0)
("missouri",3,0.49284995f0)
("joseph",2,0.4928328f0)
Results
julia> nearest_neighbors(vm, dict, "cloud", 3)
10-element Array{(Any,Any,Any),1}:
("computing",1,0.7052178f0)
("middleware",1,0.68975633f0)
("cloud-based",1,0.6546666f0)
("context-aware",1,0.6417114f0)
("enterprise",1,0.63958025f0)
("virtualization",1,0.6359488f0)
("soa",1,0.6349716f0)
("distributed",1,0.6310058f0)
("unicore",1,0.62737936f0)
("client-server",1,0.6239226f0)
Results
julia> nearest_neighbors(vm, dict, "cloud", 4)
10-element Array{(Any,Any,Any),1}:
("mist",1,0.56100917f0)
("clouds",3,0.54695433f0)
("fire",5,0.53125167f0)
("flame",3,0.52561617f0)
("dragon",1,0.5224602f0)
("sorceror",1,0.5199405f0)
("shining",2,0.5165066f0)
("shadow",1,0.516233f0)
("mysterious",2,0.5153119f0)
("smoke",3,0.51471066f0)
Results
julia> disambiguate(vm, dict, "cloud",
split("weather forecast cold rainy"))
30-element Array{Float64,1}:
0.999278
9.49993e-7
1.52921e-8
0.000720983
0.0
0.0
0.0
0.0
0.0
0.0
⋮
Results
julia> disambiguate(vm, dict, "cloud",
split("multi-core virtual machine"))
30-element Array{Float64,1}:
0.000243637
6.16926e-5
0.998918
0.000776869
0.0
0.0
0.0
0.0
0.0
0.0
⋮
and thanks to Microsoft Research and
Microsoft Azure team!
Dmitry Kondrashkin Anton Osokin Dmitry P. Vetrov
project page: bayesgroup.ru/adagram
sources: github.com/sbos/AdaGram.jl

Más contenido relacionado

La actualidad más candente

Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation systemKimikazu Kato
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationPierre de Lacaze
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisDataWorks Summit
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto
 
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...Edureka!
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksBennoG1
 
Cs1123 9 strings
Cs1123 9 stringsCs1123 9 strings
Cs1123 9 stringsTAlha MAlik
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageMarko Rodriguez
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar itemsViet-Trung TRAN
 

La actualidad más candente (12)

Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation system
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
1
11
1
 
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Cs1123 9 strings
Cs1123 9 stringsCs1123 9 strings
Cs1123 9 strings
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming Language
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 

Destacado (6)

containers2016
containers2016containers2016
containers2016
 
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talkieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk
 
Rob DeRosa - Seattle .NET Mobile MeetUp
Rob DeRosa - Seattle .NET Mobile MeetUpRob DeRosa - Seattle .NET Mobile MeetUp
Rob DeRosa - Seattle .NET Mobile MeetUp
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)
 
ieeecloud2016
ieeecloud2016ieeecloud2016
ieeecloud2016
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 

Similar a Parallel asynchronous inference of word senses with Microsoft Azure

Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Avelin Huo
 
Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Jason Yang
 
Os8 2
Os8 2Os8 2
Os8 2issbp
 
19. Data Structures and Algorithm Complexity
19. Data Structures and Algorithm Complexity19. Data Structures and Algorithm Complexity
19. Data Structures and Algorithm ComplexityIntro C# Book
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues listsJames Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
StacksqueueslistsFraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsYoung Alista
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsTony Nguyen
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsHarry Potter
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Machine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptMachine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptbutest
 
Artificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismArtificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
 
XESLite - Handling Event Logs in ProM
XESLite - Handling Event Logs in ProMXESLite - Handling Event Logs in ProM
XESLite - Handling Event Logs in ProMFelix Mannhardt
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...PyData
 
Machine Learning for Trading
Machine Learning for TradingMachine Learning for Trading
Machine Learning for TradingLarry Guo
 

Similar a Parallel asynchronous inference of word senses with Microsoft Azure (20)

Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03
 
Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10
 
Os8 2
Os8 2Os8 2
Os8 2
 
Lecture20 xing
Lecture20 xingLecture20 xing
Lecture20 xing
 
19. Data Structures and Algorithm Complexity
19. Data Structures and Algorithm Complexity19. Data Structures and Algorithm Complexity
19. Data Structures and Algorithm Complexity
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Machine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptMachine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.ppt
 
Artificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismArtificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with Parallelism
 
XESLite - Handling Event Logs in ProM
XESLite - Handling Event Logs in ProMXESLite - Handling Event Logs in ProM
XESLite - Handling Event Logs in ProM
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
 
Machine Learning for Trading
Machine Learning for TradingMachine Learning for Trading
Machine Learning for Trading
 

Más de Microsoft Azure for Research

Más de Microsoft Azure for Research (11)

Esciencetalk
EsciencetalkEsciencetalk
Esciencetalk
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
 
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challengesCloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
 
Environmental Science, Big Data and the Cloud
Environmental Science, Big Data and the CloudEnvironmental Science, Big Data and the Cloud
Environmental Science, Big Data and the Cloud
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
Big data - from consumers and patients, to the sea and stars
Big data - from consumers and patients, to the sea and starsBig data - from consumers and patients, to the sea and stars
Big data - from consumers and patients, to the sea and stars
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green   florianopolis 5-7-2014Living Outside the Comfort Zone - Daron green   florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
 
Keynote Presentation at Moscow State University.
Keynote Presentation at Moscow State University.Keynote Presentation at Moscow State University.
Keynote Presentation at Moscow State University.
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Parallel asynchronous inference of word senses with Microsoft Azure

  • 1. Parallel asynchronous inference of word senses with Azure Sergey Bartunov, MSU
  • 2. Learning as optimization F(✓) = r(✓) + NX i=1 fi(xi; ✓) ! min ✓
  • 3. Learning as optimization loss F(✓) = r(✓) + NX i=1 fi(xi; ✓) ! min ✓ regularizer objectparameters
  • 4. Learning as optimization • can be huge • regularizer and loss can be complex • parameters’ dimensionality can be very large N loss F(✓) = r(✓) + NX i=1 fi(xi; ✓) ! min ✓ regularizer objectparameters
  • 5. Learning as optimization • can be huge • regularizer and loss can be complex • parameters’ dimensionality can be very large N loss F(✓) = r(✓) + NX i=1 fi(xi; ✓) ! min ✓ regularizer objectparameters Commodity PC is not enough!
  • 6. Learning word embeddings For each word find its embedding such that 
 similar words have close embeddings Java Platform .NET Mono Railways Ticket Train Politics Party Socialism
  • 7. Learning word embeddings …compiled for a specific hardware platform, since different central processor…
  • 8. Learning word embeddings …compiled for a specific hardware platform, since different central processor… object: word and its context
  • 9. Learning word embeddings …compiled for a specific hardware platform, since different central processor… object: word and its context loss: log p(v|w) p(v|w) = exp(AT wBv) PV v0=1 exp(AT wBv0 )
  • 10. Learning word embeddings …compiled for a specific hardware platform, since different central processor… object: word and its context loss: parameters: word embeddings Aw, Bw 2 RD , w 2 1, . . . , V log p(v|w) p(v|w) = exp(AT wBv) PV v0=1 exp(AT wBv0 ) Skip-gram (Mikolov et al, 2013)
  • 11. Gradient optimization F(✓) = r(✓) + NX i=1 fi(xi; ✓) ! min ✓ gradient descent ✓t+1 = ✓t trF(✓t )
  • 12. Stochastic optimization F(✓) = r(✓) + NX i=1 fi(xi; ✓) ! min ✓ stochastic gradient descent ✓t+1 = ✓t tG(✓t )
  • 13. Stochastic optimization F(✓) = r(✓) + NX i=1 fi(xi; ✓) ! min ✓ stochastic gradient descent ✓t+1 = ✓t tG(✓t ) EG(✓) = rF(✓)
  • 14. Stochastic optimization F(✓) = r(✓) + NX i=1 fi(xi; ✓) ! min ✓ stochastic gradient descent ✓t+1 = ✓t tG(✓t ) EG(✓) = rF(✓) for example: G(✓) = r [r(✓) + Nfj(xj; ✓)] , j ⇠ Uniform(1, N)
  • 15. Learning word embeddings • 400k word vocabulary, 300-dimensional embeddings • 240 million parameters to train! • 1 GB memory snapshot
  • 16. Stochastic parallel optimization core 1 core 2 core K… shared parameters
  • 17. Stochastic parallel optimization core 1 core 2 core K… shared parameters data flow
  • 18. Stochastic parallel optimization core 1 core 2 core K… shared parameters data flow
  • 19. Stochastic parallel optimization core 1 core 2 core K… shared parameters data flow no synchronization!! (see e.g. Hogwild paper)
  • 20. Stochastic parallel optimization My laptop: 2 cores, 8 GB RAM
  • 21. Stochastic parallel optimization My laptop: 2 cores, 8 GB RAM
  • 22. Stochastic parallel optimization My laptop: 2 cores, 8 GB RAM
  • 23. Stochastic parallel optimization My laptop: 2 cores, 8 GB RAM 22 hours 2 hoursDataset - English Wikipedia 2012 (5.7 GB raw text, 1 billion words)
  • 24. Learning polysemic word embeddings Java Platform (1) .NET Mono Railways Ticket Platform (2) Train Platform (3) Politics Party Socialism
  • 25. Learning polysemic word embeddings …compiled for a specific hardware platform, since different central processor… (computer meaning)
  • 26. Learning polysemic word embeddings …compiled for a specific hardware platform, since different central processor… …as the safe distance from the platform edge increases with the speed… (railway meaning) (computer meaning)
  • 27. Learning polysemic word embeddings …compiled for a specific hardware platform, since different central processor… …as the safe distance from the platform edge increases with the speed… (railway meaning) (computer meaning) … Socialist Party; the Socialist Workers Platform and the Committee for a… (political meaning)
  • 28. Learning polysemic word embeddings …compiled for a specific hardware platform, since different central processor… …as the safe distance from the platform edge increases with the speed… (railway meaning) (computer meaning) … Socialist Party; the Socialist Workers Platform and the Committee for a… (political meaning) loss: loss: loss: log p(v|w, z = 1) log p(v|w, z = 2) log p(v|w, z = 3)
  • 29. Learning polysemic word embeddings …compiled for a specific hardware platform, since different central processor… …as the safe distance from the platform edge increases with the speed… (railway meaning) (computer meaning) … Socialist Party; the Socialist Workers Platform and the Committee for a… (political meaning) loss: loss: loss: p(v|w, z = k) = exp(AT wkBv) PV v0=1 exp(AT wkBv0 ) log p(v|w, z = 1) log p(v|w, z = 2) log p(v|w, z = 3) word meanings are unobserved
  • 30. Learning polysemic word embeddings log p(W, V |A, B, ↵) = log Z p(z|↵) Y i Y j p(vij|wi, zi, A, B)dz ! max A,B word meanings are unobserved, hence EM algorithm must be employed
  • 31. Learning polysemic word embeddings log p(W, V |A, B, ↵) = log Z p(z|↵) Y i Y j p(vij|wi, zi, A, B)dz ! max A,B word meanings are unobserved, hence EM algorithm must be employed • How to choose prior such that it allows to automatically increase number of word meanings if necessary? • How to put the EM procedure into stochastic optimization framework?
  • 32. Learning polysemic word embeddings log p(W, V |A, B, ↵) = log Z p(z|↵) Y i Y j p(vij|wi, zi, A, B)dz ! max A,B word meanings are unobserved, hence EM algorithm must be employed • How to choose prior such that it allows to automatically increase number of word meanings if necessary? • How to put the EM procedure into stochastic optimization framework? Bayesian nonparametrics (Orbanz, 2014)
  • 33. Learning polysemic word embeddings log p(W, V |A, B, ↵) = log Z p(z|↵) Y i Y j p(vij|wi, zi, A, B)dz ! max A,B word meanings are unobserved, hence EM algorithm must be employed • How to choose prior such that it allows to automatically increase number of word meanings if necessary? • How to put the EM procedure into stochastic optimization framework? Stochastic variational inference (Blei et al, 2012) Bayesian nonparametrics (Orbanz, 2014)
  • 34. EM algorithm … Socialist Party; the Socialist Workers Platform and the Committee for a…
  • 35. EM algorithm E-step: disambiguate the word given its context … Socialist Party; the Socialist Workers Platform and the Committee for a… p(z = politics) = 0.96 p(z = transport) = 0.01 p(z = computer) = 0.03
  • 36. EM algorithm E-step: disambiguate the word given its context … Socialist Party; the Socialist Workers Platform and the Committee for a… p(z = politics) = 0.96 p(z = transport) = 0.01 p(z = computer) = 0.03 M-step: update word embeddings by weighted gradient ✓t+1 = ✓t + tr " X k p(zi = k) log p(vij|wi, zi = k, ✓t ) #
  • 37. Learning polysemic word embeddings • 400k word vocabulary, 300-dimensional embeddings,
 max. 30 meanings per word • 7.2 billion parameters to train! • 18 GB memory snapshot
  • 38. Learning polysemic word embeddings My laptop: 2 cores, 8 GB RAM 6 days! 16 hoursDataset - English Wikipedia 2012 (5.7 GB raw text, 1 billion words)
  • 39. Results julia> expected_pi(vm, dict.word2id["cloud"]) 30-element Array{Float64,1}: 0.404964 0.134444 0.0987207 0.361865 5.70338e-6 5.18419e-7 4.7129e-8 4.28446e-9 3.89496e-10 3.54087e-11 ⋮
  • 40. Results julia> nearest_neighbors(vm, dict, "cloud", 1) 10-element Array{(Any,Any,Any),1}: ("clouds",1,0.791538f0) ("haze",2,0.6702103f0) ("nimbostratus",1,0.653774f0) ("altostratus",1,0.6300289f0) ("noctilucent",1,0.6294991f0) ("cumulonimbus",1,0.6289225f0) ("stratocumulus",1,0.6274564f0) ("cumulus",2,0.6273055f0) ("clouds",2,0.6201524f0) ("cirrostratus",1,0.6146165f0)
  • 41. Results julia> nearest_neighbors(vm, dict, "cloud", 2) 10-element Array{(Any,Any,Any),1}: ("louis",5,0.5705162f0) ("vrain",1,0.55054826f0) ("lucie",1,0.52579653f0) ("clair",1,0.52284604f0) ("johns",2,0.5215208f0) ("marys",1,0.5036709f0) ("nazianz",1,0.4979607f0) ("lawrence",2,0.49513188f0) ("missouri",3,0.49284995f0) ("joseph",2,0.4928328f0)
  • 42. Results julia> nearest_neighbors(vm, dict, "cloud", 3) 10-element Array{(Any,Any,Any),1}: ("computing",1,0.7052178f0) ("middleware",1,0.68975633f0) ("cloud-based",1,0.6546666f0) ("context-aware",1,0.6417114f0) ("enterprise",1,0.63958025f0) ("virtualization",1,0.6359488f0) ("soa",1,0.6349716f0) ("distributed",1,0.6310058f0) ("unicore",1,0.62737936f0) ("client-server",1,0.6239226f0)
  • 43. Results julia> nearest_neighbors(vm, dict, "cloud", 4) 10-element Array{(Any,Any,Any),1}: ("mist",1,0.56100917f0) ("clouds",3,0.54695433f0) ("fire",5,0.53125167f0) ("flame",3,0.52561617f0) ("dragon",1,0.5224602f0) ("sorceror",1,0.5199405f0) ("shining",2,0.5165066f0) ("shadow",1,0.516233f0) ("mysterious",2,0.5153119f0) ("smoke",3,0.51471066f0)
  • 44. Results julia> disambiguate(vm, dict, "cloud", split("weather forecast cold rainy")) 30-element Array{Float64,1}: 0.999278 9.49993e-7 1.52921e-8 0.000720983 0.0 0.0 0.0 0.0 0.0 0.0 ⋮
  • 45. Results julia> disambiguate(vm, dict, "cloud", split("multi-core virtual machine")) 30-element Array{Float64,1}: 0.000243637 6.16926e-5 0.998918 0.000776869 0.0 0.0 0.0 0.0 0.0 0.0 ⋮
  • 46. and thanks to Microsoft Research and Microsoft Azure team! Dmitry Kondrashkin Anton Osokin Dmitry P. Vetrov project page: bayesgroup.ru/adagram sources: github.com/sbos/AdaGram.jl