Practical Reinforcement Learning with TensorFlow

•Descargar como PPTX, PDF•

5 recomendaciones•5,640 vistas

Illia Polosukhin

How to build reinforcement learning in Tensorflow, from Q-Learning to Policy Gradient and A3C.

Software

Practical RL with
TensorFlow
Illia Polosukhin, XIX.ai

OpenAI Gym
- Library of environments
Control, Atari, Doom, etc.
- Same API
- Provides way to share and
compare results
https://gym.openai.com/

Markov Decision Process
MDP < S, A, P, R, 𝛾 >
- S: set of states
- A: set of actions
- T(s, a, s’): probability of transition
- Reward(s): reward function
- 𝛾: discounting factory
Trace: {<s0,a0,r0>, …, <sn,an,rn>}

Definitions
- Return: total discounted reward:
- Policy: Agent’s behavior
- Deterministic policy: π(s) = a
- Stochastic policy: π(a | s) = P[At = a | St = s]
- Value function: Expected return starting from state s:
- State-value function: Vπ(s) = Eπ[R | St = s]
- Action-value function: Qπ(s, a) = Eπ[R | St = s, At = a]

Deep Q Learning
- Model-free, off-policy technique to learn optimal Q(s, a):
- Qi+1(s, a) ← Qi(s, a) + 𝛼(R + 𝛾 maxa’ Qi(s’, a’) - Qi(s, a))
- Optimal policy then π(s) = argmaxa’ Q(s, a’)
- Requires exploration (ε-greedy) to explore various transitions from the states.
- Take random action with ε probability, start ε high and decay to low value as training
progresses.
- Deep Q Learning: approximate Q(s, a) with neural network: Q(s, a, 𝜃)
- Do stochastic gradient descent using loss

Run Optimization
Full example: https://github.com/ilblackdragon/tensorflow-rl/blob/master/examples/atari-rl.py

Monitored Session
- Handles pitfalls of distributed training.
- Saving and restoring checkpoints.
- Hooks is a general interface for injecting
computation into TensorFlow training
loop.

Original Results on Atari Games
Mnih et al., 2013

Policy Gradient
- Given policy π 𝜃(a | s) find such 𝜃 that maximizes expected return:
J(𝜃) = ∑sdπ(s)V(s)
- In Deep RL, we approximate π 𝜃(a | s) with neural network.
- Usually with softmax layer on top to estimate probabilities of each action.
- We can estimate J(𝜃) from samples of observed behavior: ∑k=0..Tp𝜃( 𝜏k | π)R( 𝜏k)
- Do stochastic gradient descent using update:
𝜃i+1 = 𝜃i + 𝛼 (1/T) ∑k=0..T ∇log p𝜃( 𝜏k | π)R( 𝜏k)

Async Advantage Actor-Critic (A3C)
- Asynchronous: using multiple instances of
environments and networks
- Actor-Critic: using both policy and
estimate of value function.
- Advantage: estimate how different was
outcome than expected.
Image by Arthur Juliani

A3C Results on Atari Games
Mnih at el., 2016

Practical use cases
- Robotics
- Finance
- Industrial optimization
- Predictive assistant

Illia Polosukhin
XIX.ai
@ilblackdragon, illia@xix.ai
Questions?
Full code will be available soon at
https://github.com/ilblackdragon/tensorflow-rl/

Más contenido relacionado

La actualidad más candente

GTC Japan 2016 Chainer feature introduction

Kenta Oono

Chainer v3

Seiya Tokui

ICML2013読み会 Large-Scale Learning with Less RAM via Randomization

Hidekazu Oiwa

Deep Learning with PyTorch

Mayur Bhangale

Machine Intelligence at Google Scale: TensorFlow

DataWorks Summit/Hadoop Summit

Introduction to Machine Learning with TensorFlow

Paolo Tomeo

Introduction to theano, case study of Word Embeddings

Shashank Gupta

Cv mini project (1)

Kadambini Indurkar

PFN Spring Internship Final Report: Autonomous Drive by Deep RL

Naoto Yoshida

DQN with Differentiable Memory Architectures

Preferred Networks

Introduction to Chainer 11 may,2018

Preferred Networks

Introduction to TensorFlow

Ralph Vincent Regalado

Andrew recently joined Lucidworks to head up their Advisory practice, and is a Committer and PMC member on the Apache Mahout project. Abstract summary Apache Mahout: Distributed Matrix Math for Machine Learning: Machine learning and statistics tools like R and Scikit-learn are declarative, flexible, and extensible, but they scale poorly. “Big Data” tools such as Apache Spark, Apache Flink, and H2O distribute well, but have rudimentary functionality for machine learning and are not easily extensible. In this talk we present Apache Mahout, which provides a Scala-based, R-like DSL for doing linear algebra on distributed systems, letting practitioners quickly implement algorithms on distributed matrices. We will highlight new features in version 0.13 including the hybrid CPU/GPU-optimized engine, and a new framework for user-contributed methods and algorithms similar to R’s CRAN. We will cover some history of Mahout, introduce the R-Like Scala DSL, provide an overview of how Mahout is able to operate on matrices distributed across multiple computers, and how it takes advantage of GPUs on each computer in a cluster creating a hybrid distributed/GPU-accelerated environment; then demonstrate the kinds of normally complex or unfeasible problems users can easily solve with Mahout; show an integration which allows Mahout to leverage the visualization packages of projects such as R, Python, and D3; and lastly explain how to develop algorithms and submit them to the Mahout project for other users to use.

Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...

MLconf

TensorFlow example for AI Ukraine2016

Andrii Babii

Deep Learning in theano

Massimo Quadrana

TensorFlow Dev Summit 2017 요약

Jin Joong Kim

CUDA and Caffe for deep learning

Amgad Muhammad

Overview of Chainer and Its Features

Seiya Tokui

TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...

Big Data Spain

In this talk at AI Frontiers Conference, Alex Smola gives a brief overview over the features used to scale deep learning using MXNet. It relies on a mix between declarative and imperative programming to achieve efficiency while also allowing for significant flexibility for the user. It relies on a distributed (key, value) store for synchronization between GPUs and between machines. It also relies on the separation between a highly efficient execution engine and language bindings to achieve a high degree of flexibility between different languages while offering a native feel in each of them. Alex also briefly discusses how Amazon AWS can help deploy deep learning models and outline steps on our future roadmap.

Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet

AI Frontiers

La actualidad más candente (20)

GTC Japan 2016 Chainer feature introduction

Chainer v3

ICML2013読み会 Large-Scale Learning with Less RAM via Randomization

Deep Learning with PyTorch

Machine Intelligence at Google Scale: TensorFlow

Introduction to Machine Learning with TensorFlow

Introduction to theano, case study of Word Embeddings

Cv mini project (1)

PFN Spring Internship Final Report: Autonomous Drive by Deep RL

DQN with Differentiable Memory Architectures

Introduction to Chainer 11 may,2018

Introduction to TensorFlow

Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...

TensorFlow example for AI Ukraine2016

Deep Learning in theano

TensorFlow Dev Summit 2017 요약

CUDA and Caffe for deep learning

Overview of Chainer and Its Features

TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...

Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet

Similar a Practical Reinforcement Learning with TensorFlow

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI

Jack Clark

Distributed Deep Q-Learning

Lyft

We present the ﬁrst deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. WeapplyourmethodtosevenAtari2600gamesfromtheArcadeLearningEnvironment,withnoadjustmentofthearchitectureorlearningalgorithm. We ﬁndthat itoutperforms all previous approachesonsix ofthe games and surpasses a human expert on three of them.

Playing Atari with Deep Reinforcement Learning

Willy Marroquin (WillyDevNET)

Reinforcement Learning is a growing subset of Machine Learning and one of the most important frontiers of Artificial Intelligence. Its goal is to capture higher logic and use more adaptable algorithms than classical Machine Learning. Formally it denotes a set of algorithms that deal with sequential decision-making and have the potential capability to make highly intelligent decisions depending on their local environment. Reinforcement Learning problems can be described as an agent that has to make decisions in its environment in order to optimize a cumulative reward, and it is clear that this formalization applies to a great variety of tasks in many different fields. In this talk, the main features of the most important Reinforcement Learning algorithms will be illustrated and deepened, with some concrete and explanatory examples. Bio: Marco Del Pra Marco was born in Venice 41 years ago, has two master's degrees (Computer Science and Mathematics), and has two important publications in applied mathematics. He has been working in Artificial Intelligence for 10 years, mainly as a freelancer. Among others, he worked for the European Commission's Joint Research Center, for Cuebiq, and as Data Science Lead for Microsoft's Artificial Intelligence projects in Italy.

Reinforcement Learning Overview | Marco Del Pra

Data Science Milan

Cheatsheet deep-learning

Steve Nouri

Value Function Approximation via Low-Rank Models

Lyft

A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata

infopapers

Demystifying deep reinforement learning

재연 윤

Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm

infopapers

Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer. Bio: DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.

Multinomial Logistic Regression with Apache Spark

DB Tsai

Alpine Data Labs presents a deep dive into our implementation of Multinomial Logistic Regression with Apache Spark. Machine Learning Engineer DB Tsai takes us through the technical implementation details step by step. First, he explains how the state of the art Machine Learning on Hadoop is not doing fulfilling the promise of Big Data. Next, he explains how Spark is a perfect match for machine learning through their in-memory cache-ing capability demonstrating 100x performance improvement. Third, he takes us through each aspect of a multinomial logistic regression and how this is developed with Spark APIs. Fourth, he demonstrates an extension of MLOR and training parameters. Finally, he benchmarks MLOR with 11M rows, 123 features, 11% non-zero elements with a 5 node Hadoop cluster. Finally, he shows Alpine's unique visual environment with Spark and verifies the performance with the job tracker. In conclusion, Alpine supports the state of the art Cloudera and Pivotal Hadoop clusters and performances at a level that far exceeds its next nearest competitor.

Alpine Spark Implementation - Technical

alpinedatalabs

shuyangli_summerpresentation08082014

Shuyang Li

La question de la durabilité des technologies de calcul et de télécommunication

Alexandre Monnin

Learning to discover monte carlo algorithm on spin ice manifold

Kai-Wen Zhao

Reinforcement Learning - DQN

Mohammaderfan Arefimoghaddam

A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata

infopapers

Applied machine learning for search engine relevance 3

Charles Martin

safe and efficient off policy reinforcement learning

Ryo Iwaki

Dueling Network Architectures for Deep Reinforcement Learning

Yoonho Lee

Supervisory control of discrete event systems for linear temporal logic speci...

AmiSakakibara

Similar a Practical Reinforcement Learning with TensorFlow (20)

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI

Distributed Deep Q-Learning

Playing Atari with Deep Reinforcement Learning

Reinforcement Learning Overview | Marco Del Pra

Cheatsheet deep-learning

Value Function Approximation via Low-Rank Models

A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata

Demystifying deep reinforement learning

Optimizing a New Nonlinear Reinforcement Scheme with Breeder genetic algorithm

Multinomial Logistic Regression with Apache Spark

Alpine Spark Implementation - Technical

shuyangli_summerpresentation08082014

La question de la durabilité des technologies de calcul et de télécommunication

Learning to discover monte carlo algorithm on spin ice manifold

Reinforcement Learning - DQN

A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata

Applied machine learning for search engine relevance 3

safe and efficient off policy reinforcement learning

Dueling Network Architectures for Deep Reinforcement Learning

Supervisory control of discrete event systems for linear temporal logic speci...

Último

We specialize in Psychic Readings, Psychic Love Spells, Binding Love Spells, Obsession Spells, Voodoo Spells, Lottery Spells, Marriage Spells, Black Magic Spells, Palm Readings & much more. Are you depressed? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? Do u need to solve any relationship problem? Contact the powerful spells caster chief kule with love spells that work overnight and love spells that really work. Have you found yourself infatuated with a special someone you think could be the one? Are you looking for a spell to provide them with a nudge in the right direction? Or maybe the spell you cast didn’t achieve the results you were hoping for? Whether you’re new or versed in the ways of spell casting, we’re here to help. Today we’re going to provide you with a detailed guide on the types of love spells to cast. Not only that but there’s something for those who wish to find outside advice from more advanced spell casters. We’re also going to provide you with the top sites available to help you with your dilemma. Let’s begin our journey by educating ourselves on love magic and what a real love caster looks like. Love Magic and Love Casters Love magic made its first appearance back in Ancient Egypt and has been an active practice since. This type of magic is a branch of traditional magic and can be practiced in various ways. Typically the more common use of love magic is through the work of spells, but other methods look like Charms Rituals-LOVE Potions-Dolls and even Amulets If you are interested in becoming a love caster, be prepared for what’s to come. A genuine love caster knows that the art of love casting is no easy feat and shouldn’t be done casually. You should know that not only does it require you to be gifted spiritually, but you must be ready to serve others. Someone who is considered a real love caster has experience in all manner of spells, no matter the difficulty. Training yourself in attraction, commitment, and marriage spells is an excellent place to start. But this by no means will make you a professional. Practice your craft and expand your knowledge; understand that you will possess the ability to help others in time truly. Types of Love Spells What better way to start broadening your experiences with love spells than by learning more about them? These spells work like just about any other spell. Simply apply your intention, use a medium (sigils, mantras, candles, or charm bags), and top it off with establishing the belief that you will receive what you want. So what kind of spells are available and which ones suit your needs the best? Let’s take a look at the many options you have at your disposal.

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...

masabamasaba

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in Tembisa ● Abortion Pills For Sale in Tembisa ● Tembisa 🏥🚑!! Abortion Clinic Near Me Cost, Price, Women's Clinic Near Me, Abortion Clinic Near, Abortion Doctors Near me, Abortion Services Near Me, Abortion Pills Over The Counter, Abortion Pill Doctors' Offices, Abortion Clinics, Abortion Places Near Me, Cheap Abortion Places Near Me, Medical Abortion & Surgical Abortion, approved cyctotec pills and womb cleaning pills too plus all the instructions needed This Discrete women’s Termination Clinic offers same day services that are safe and pain free, we use approved pills and we clean the womb so that no side effects are present. Our main goal is that of preventing unintended pregnancies and unwanted births every day to enable more women to have children by choice, not chance. We offer Terminations by Pill and The Morning After Pill.” Our Private VIP Abortion Service offers the ultimate in privacy, efficiency and discretion. we do safe and same day termination and we do also womb cleaning as well its done from 1 week up to 28 weeks. We do delivery of our services world wide SAFE ABORTION CLINICS/PILLS ON SALE WE DO DELIVERY OF PILLS ALSO Abortion clinic at very low costs, 100% Guaranteed and it’s safe, pain free and a same day service. It Is A 45 Minutes Procedure, we use tested abortion pills and we do womb cleaning as well. Alternatively the medical abortion pill and womb cleansing !!!

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...

Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Philip Schwarz

Data spaces in distributed environments should be allowed to evolve in agile ways providing data space owners with large flexibility about which data they store. Agility and heterogeneity, however, jeopardize data exchanges because representations may build on varying ontologies and data consumers may not rely on the semantic correctness of their queries in the context of semantically heterogeneous, evolving data spaces. Graph data spaces are one example of a powerful model for representing and querying data whose semantics may change over time. To assert and enforce conditions on individual graph data spaces, shape languages (e.g SHACL) have been developed. We investigate the question of how querying and programming can be guarded by reasoning over SHACL constraints in a distributed setting and we sketch a picture of how a future landscape based on semantically heterogeneous data spaces might look like.

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Steffen Staab

%in Soweto+277-882-255-28 abortion pills for sale in soweto

masabamasaba

WSO2CON2024 - It's time to go Platformless

WSO2

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pretoria ● Abortion Pills For Sale In Pretoria ● Pretoria 🏥🚑!! Abortion Clinic Near Me Cost, Price, Women's Clinic Near Me, Abortion Clinic Near, Abortion Doctors Near me, Abortion Services Near Me, Abortion Pills Over The Counter, Abortion Pill Doctors' Offices, Abortion Clinics, Abortion Places Near Me, Cheap Abortion Places Near Me, Medical Abortion & Surgical Abortion, approved cyctotec pills and womb cleaning pills too plus all the instructions needed This Discrete women’s Termination Clinic offers same day services that are safe and pain free, we use approved pills and we clean the womb so that no side effects are present. Our main goal is that of preventing unintended pregnancies and unwanted births every day to enable more women to have children by choice, not chance. We offer Terminations by Pill and The Morning After Pill.” Our Private VIP Abortion Service offers the ultimate in privacy, efficiency and discretion. we do safe and same day termination and we do also womb cleaning as well its done from 1 week up to 28 weeks. We do delivery of our services world wide SAFE ABORTION CLINICS/PILLS ON SALE WE DO DELIVERY OF PILLS ALSO Abortion clinic at very low costs, 100% Guaranteed and it’s safe, pain free and a same day service. It Is A 45 Minutes Procedure, we use tested abortion pills and we do womb cleaning as well. Alternatively the medical abortion pill and womb cleansing !!!

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...

Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

SelfMade bd

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

masabamasaba

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

masabamasaba

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic near me by a powerful astrologer and spell caster in Atlanta. Bring back your lover with Asaf's voodoo-love witchcraft. Psychic Reading | Astrologer | Spell Caster | Obsession Spells | Black Magic | Witchcraft | Protection spell | wiccan spells. Black magic expert and voodoo love spells that work overnight to retrieve that love | lost love spell caster with powerful love psychic reading. Best astrologer & psychic in Sandy Springs, GA to renew your relationship & make your relationship stronger. love spells to bring back the feelings of love for ex-lovers. Increase the intimacy, affection & love between you and your lover using voodoo relationship obsession spells in USA. Money spells, easy love spells with just words, think of me spell, powerful love spell, spells of love, spells that work, love potion to attract a man, easy love spells with just words, pink candle prayer, white magic spells, call me spell, manifestation spell, gay love spells, Commitment spells, business spells and, how to bring back lost love in a relationship, Witchcraft love spells that work immediately to increase love & intimacy in your relationship. Attraction love spells to attract someone, stop a divorce, prevent a breakup & get your ex back. When using love binding spells, there are several things you should know, particularly how to protect yourself from negative energies and how to use the powers of incantations for the good of all people involved. When the focus is love, the results are truly magical. It’s important to remember love is not manipulative, it is not forceful, and it does not bend another to its will. Love is free-flowing, accepting, kind, and generous. For your love spell to work as intended, you must have good intentions in your heart. Below, we share the top five love spells you can use today to shift the energies in your life and create a future filled with love and fulfilment. Obsession spells to Get Your Ex-Lover Back. All women want one thing the most in life to be love and love in return – not much to ask. A lady wants a good man who loves you unconditionally and loyally. You want a man who is honest, hardworking, and loyal – not a complainer or a weakling or a self-centred man. You are a strong, independent, sensual, caring, loving woman. And you don’t think it’s asking too much to want to be with a loving, intelligent man with a good sense of humour and daring, an upright and confident man – who knows who he is. No one wants a chauvinistic and macho man, but you do want someone who will be willing to protect and care for you. Who loves you for you and not some kind of imaginary. You have no doubt that when the right guy appears on your doorstep, you’ll never let him go. But sometimes a man doesn’t realize he has that good woman.

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...

chiefasafspells

tonesoftg

lanshi9

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...

WSO2

Investing in AI transformation today The modern business advantage: Uncovering deep insights with AI Organizations around the world have come to recognize AI as the transformative technology that enables them to gain real business advantage. AI’s ability to organize vast quantities of data allows those who implement it to uncover deep business insights, augment human expertise, drive operational efficiency, transform their products, and better serve their customers

Microsoft AI Transformation Partner Playbook.pdf

Willy Marroquin (WillyDevNET)

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...

masabamasaba

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

masabamasaba

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

masabamasaba

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...

Jittipong Loespradit

VTU technical seminar 8Th Sem on Scikit-learn

AmarnathKambale

+971565801893 Mtp-Kit (500MG) Prices » Dubai [(+971565801893**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Leen Whatsapp +971565801893 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971565801893''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971565801893' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Clinic in Abu Dhabi, United Arab Emirates.+971565801893

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Health

Practical Reinforcement Learning with TensorFlow

1. Practical RL with TensorFlow Illia Polosukhin, XIX.ai

2. Reinforcement Learning Problem

3. OpenAI Gym - Library of environments Control, Atari, Doom, etc. - Same API - Provides way to share and compare results https://gym.openai.com/

4. Acting in an Environment

5. Random Agent

6. Let’s review some theory

7. Markov Decision Process MDP < S, A, P, R, 𝛾 > - S: set of states - A: set of actions - T(s, a, s’): probability of transition - Reward(s): reward function - 𝛾: discounting factory Trace: {<s0,a0,r0>, …, <sn,an,rn>}

8. Definitions - Return: total discounted reward: - Policy: Agent’s behavior - Deterministic policy: π(s) = a - Stochastic policy: π(a | s) = P[At = a | St = s] - Value function: Expected return starting from state s: - State-value function: Vπ(s) = Eπ[R | St = s] - Action-value function: Qπ(s, a) = Eπ[R | St = s, At = a]

9. Deep Q Learning - Model-free, off-policy technique to learn optimal Q(s, a): - Qi+1(s, a) ← Qi(s, a) + 𝛼(R + 𝛾 maxa’ Qi(s’, a’) - Qi(s, a)) - Optimal policy then π(s) = argmaxa’ Q(s, a’) - Requires exploration (ε-greedy) to explore various transitions from the states. - Take random action with ε probability, start ε high and decay to low value as training progresses. - Deep Q Learning: approximate Q(s, a) with neural network: Q(s, a, 𝜃) - Do stochastic gradient descent using loss

10. Q-network

11. Run Optimization Full example: https://github.com/ilblackdragon/tensorflow-rl/blob/master/examples/atari-rl.py

12. Monitored Session - Handles pitfalls of distributed training. - Saving and restoring checkpoints. - Hooks is a general interface for injecting computation into TensorFlow training loop.

13. Original Results on Atari Games Mnih et al., 2013

14. Beating Human Level Mnih at el., 2015

15. Policy Gradient - Given policy π 𝜃(a | s) find such 𝜃 that maximizes expected return: J(𝜃) = ∑sdπ(s)V(s) - In Deep RL, we approximate π 𝜃(a | s) with neural network. - Usually with softmax layer on top to estimate probabilities of each action. - We can estimate J(𝜃) from samples of observed behavior: ∑k=0..Tp𝜃( 𝜏k | π)R( 𝜏k) - Do stochastic gradient descent using update: 𝜃i+1 = 𝜃i + 𝛼 (1/T) ∑k=0..T ∇log p𝜃( 𝜏k | π)R( 𝜏k)

16. Policy Network

17. Run Optimization

18. Async Advantage Actor-Critic (A3C) - Asynchronous: using multiple instances of environments and networks - Actor-Critic: using both policy and estimate of value function. - Advantage: estimate how different was outcome than expected. Image by Arthur Juliani

19. Policy and Value Networks

20. Run optimization

21. A3C Results on Atari Games Mnih at el., 2016

22. Mnih at el., 2016

23. Practical use cases - Robotics - Finance - Industrial optimization - Predictive assistant

24. Illia Polosukhin XIX.ai @ilblackdragon, illia@xix.ai Questions? Full code will be available soon at https://github.com/ilblackdragon/tensorflow-rl/

Notas del editor

Let’s start by defining a problem that we are trying to solve. ... Agents divide into model-based and model-free agents Model based agent try to simulate the environment inside it to make decisions based on that. Model free though just take observation and choose action. This is interesting, because this is very close how animals and people learn - based on some limited feedback from the environment or teacher. Like animals get positive reinforcement when developing reflexes. Or children getting positive or negative reinforcement from parents on their behaviour.
Let’s review some theory around RL. The set of states and actions, together with rules for transitioning from one state to another, make up a Markov decision process. One episode of this process (e.g. one game) forms a finite sequence of states, actions and rewards. Additional term - set of [(s, a), ..] is a trajectory.
Model free - meaning there is no MDP approximation or learning inside the agent. Observations are stored into replay buffers and used as training data for the model. Off policy means that learning optimal policy is independent of agent’s actions. Because the policy of taking action would be deterministic, force it to explore by taking random action with ε probability. Where ε starts high in the beginning and slowly decays as training progresses. For example for Atari game, there is lots of possible states (number of pixels by number of colors). E.g. breakout game 84x84 pixels screen by 256 colors - at least 256^84*84 states. And it will take a long time to even visit each state. Approximate with neural network, that will be able to learn how to deal with state based on their similarity. Deep Q Learning - popularized by DeepMind - first Deep RL model that worked.
Expected return is can be defined in few ways. One way is to define as sum of values of state-value function of each state weighted by how much we will end up at that state under current policy (it’s also called stationary distribution). This can be estimated from observations - trajectories, as a sum of probability of a trajectory under policy multiplied by reward from this trajectory.
Asynchronous: Unlike DQN, where a single agent represented by a single neural network interacts with a single environment, A3C utilizes multiple incarnations of the above in order to learn more efficiently. In A3C there is a global network, and multiple worker agents which each have their own set of network parameters. Each of these agents interacts with it’s own copy of the environment at the same time as the other agents are interacting with their environments. The reason this works better than having a single agent (beyond the speedup of getting more work done), is that the experience of each agent is independent of the experience of the others. In this way the overall experience available for training becomes more diverse. Actor-Critic: Actor-Critic combines the benefits of both approaches. In the case of A3C, our network will estimate both a value function V(s) (how good a certain state is to be in) and a policy π(s) (a set of action probability outputs). These will each be separate fully-connected layers sitting at the top of the network. Critically, the agent uses the value estimate (the critic) to update the policy (the actor) more intelligently than traditional policy gradient methods. The insight of using advantage estimates rather than just discounted returns is to allow the agent to determine not just how good its actions were, but how much better they turned out to be than expected.
Mean and median human-normalized scores on 57 Atari games using the human starts evaluation metric. D-DQN - double DQN. A3C paper - https://arxiv.org/pdf/1602.01783.pdf

Practical Reinforcement Learning with TensorFlow

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Practical Reinforcement Learning with TensorFlow

Similar a Practical Reinforcement Learning with TensorFlow (20)

Último

Último (20)

Practical Reinforcement Learning with TensorFlow

Notas del editor