Casual taaaalk july_21th_2016

初心者向け
AI Safety
ⓒ 2016 UEC Tokyo.
July 29nd, 2016
Kurihara Lab
Xcompass Intelligence Ltd.
Ashihara Yuta

No.Xⓒ 2016 UEC Tokyo.
Let Me Introduce Myself
Name : Ashihara Yuta
Occupation : Researcher(Xcompass Intelligence Ltd.)
Ph.D. Student(UEC Kurihara Lab.)
WBA Future Leaders (Society Branch)
Hobby : Fishing(Not Phishing)
NicoNico Doga (wrestling series, Jikkyo Play)
Motor cycle(Retire This year)
Waching Movie

Let Me Introduce Myself

Today’s Topic
Title : “Concrete Problems in AI Safety”
Author : Dario Amodei, Chris Olah, Jacob Steinhardt,
Paul Christiano, John Schulman, Dan Mane
Published : June, 21th, 2016
＋人工知能学会全国大会　倫理委員会　公開討論
＋人工知能学会　倫理委員会　倫理綱領（案）

・ (Loosely) inspired by what (just a little) 　
know about the biological brain.
Deep Learning Background ①

Deep Learning Background ②
・ Lower layers have low level of abstraction

Deep Learning Background ②
・ Higher layers have high level of abstraction

Deep Learning Concept
・ DeepLearning の手法では，中間層に
　入力された物体の特徴を得ている
・つまり，物体の認識に必要な情報は
　中間層のどこかにある

Demo2
？

Vector Background
・ Word vector compressed 2D vector has 2D shape
　 ex) word2vec , LDA , NNLM…

Vector Background
・ Well compressed word vector sometimes
meaningful

My ex-Research Theme 　
Encoder
Encoder
Encoder
RNN1
RNN3
RNN2
Decoder

Target

Summary
・ Deep Learning : （ Has Ability to Diffuse ）
Has Ability to Compress
・ Compressed Information : Useful but…

AI Safety

Today’s
Topic （ Repeated ）
Title : “Concrete Problems in AI Safety”
Author : Dario Amodei, Chris Olah, Jacob Steinhardt,
Paul Christiano, John Schulman, Dan Mane
Published : June, 21th, 2016
＋人工知能学会全国大会　倫理委員会　公開討論
＋人工知能学会　倫理委員会　倫理綱領（案）

Mind when they make…
・ Avoiding Negative Side Effects
　→ Don’t knock over a vase for faster cleaning
・ Avoiding Reward Hacking
　→ Don’t game its reward function
・ Scalable Oversight
　→ Human Check might have to be relatively infrequent
・ Safe Exploration
　→ Putting a wet mop in an electrical outlet is bad idea
・ Robustness to Distributional Shift
　→ Factory work floor may be dangerous than Office floor

AI Safety
Avoiding Negative Side Effects
　・ Define or Learn an Impact Regularizer
　　→ Side effects may be similar across tasks than main
goals
　・ Penalize Influence
　　→ This idea as written would not quite work
　・ Multi-Agent Approaches
　　→ Cooperative Inverse Reinforcement Learning
　・ Reward Uncertainty
　　→ Uncertain reward function is better

AI Safety
Avoiding Reward Hacking
　・ Partially Observed Goals
　　→ Don’t say “Perfect.” with closing eyes.
　・ Careful Engineering
　　→ No comment…
　・ Multiple Rewards
　　→ There also call bad behaviors

AI Safety
Scalable Oversight
　・ Distant supervision
　　→ where feedback is more interactive and i.i.d
　・ Hierarchical reinforcement learning
　　→ Top -> Middle -> Low

AI Safety
Safe Exploration
　・ Use Demonstrations : Simulated Exploration
　　→ Use simulated environments is less for catastrophe
　・ Human Oversight
　　→ But some actions are too fast for humans to judge

AI Safety
Robustness to Distributional Shift
　・ Omitted because it is technical…

AI Safety 　 Sammary
・ Journey (making AI) is “keep an eye” till making a good
one
・ Does not mean that the end once working the program

AI Safety(?) in Japan

・人類への貢献
　→専門家として，安全への脅威を排除する
・誠実な振る舞い
　→虚偽や不明瞭な主張を行わない
・公正性
　→不公平や格差を生む可能性を認識する
・不断の自己研鑽
　→絶え間ない自己研鑽に努める
・検証と警鐘
　→潜在的な危険性について警鐘を鳴らす

・社会の啓蒙
　→社会が誤った認識をしてるときに正す主張をする
・法規制の遵守
　→法規制が整合していない場合は倫理的に判断する
・他社の尊重
　→他社の情報や財産の損失をしてはならない
・他社のプライバシーの尊重
　→個人情報の適正な取り扱いを行う義務を負う
・説明責任
　→技術を悪用するものには説明を求め，
　　　正当でない場合はそれを防止しなければならない

Japan and America
・ The “manual”
to avoid making bad AI
・ Focus on the
problem
concretely
・ The “manual”
to avoid making bad AI
・ Focus on the
problem
concretely
・研究者，専門家と
して
　 ”あるべき姿の“指針
・人類の幸福を目指
す
　人工知能の開発
・研究者，専門家と
して
　 ”あるべき姿の“指針
・人類の幸福を目指
す
　人工知能の開発America Japan
どちらも非常に大事な考え方だと思ってい
ます
どちらも非常に大事な考え方だと思ってい
ます

Think About It … AI

Casual taaaalk july_21th_2016

Recomendados

Recomendados

Más contenido relacionado

Similar a Casual taaaalk july_21th_2016

Similar a Casual taaaalk july_21th_2016 (8)

Último

Último (20)

Casual taaaalk july_21th_2016

Notas del editor