1. Robert van Straalen presented a method using machine learning and reinforcement learning to optimize control of an industrial waste water treatment plant.
2. An initial neural network model was developed to predict air flow and energy consumption.
3. A reinforcement learning algorithm was then used to choose control settings that meet airflow needs while minimizing energy use.
2. AI FOR WASTE WATER TREATMENT.
Machine learning for real-time control
Robert van Straalen
3. • Robert van Straalen
• Lead data scientist @ Data Science Lab
• Data scientist @ ING
• Database marketing, BI, Software development
• MSc. Artificial Intelligence @ Utrecht University
ABOUT ME.
4. • Waste water treatment
• Drinking water treatment
• Dikes, sewer systems, etc.
• Innovation culture
8. WWTP AMSTERDAM WEST.
• Largest WWTP in the
region
• Built in 2005
• Handles 63M m3/year
• ± 1 million people
• 7 water treatment lanes
• Sludge processing lane
• Programme to explore
how AI can help with
optimization
9. • Primary treatment: Sedation
• Separate solids from fluids
• Secondary treatment: Aeration
• Add microbacteria
• Add oxygen to feed them
• Microbacteria convert NH4 & O2
to NO3 & H2O
• Tertiary treatment:
• Sedation again
• Sludge processing
• Clean water, bio gas, struvite
crystals
HOW DOES A WWTP WORK?
17. • Part 1: Basics
• Determine continuously (i.e. every minute):
• How much air flows through each lane?
• How much energy is consumed by each lane
• Part 2: Optimization
• Determine a control strategy that continously (i.e. every minute):
• Controls the aeration process
• Blower settings
• Valve settings
• such that
• The right amount of air is pumped through each tank
• With minimal energy consumption
PROBLEM DEFINITION.
18. • Goal: determine air flow & energy consumption for each lane
• Approach:
1. Determine air flow per lane
1. Gebruik data van flowmeter voor AT 5 (betrouwbaar)
2. Bouw model op deze data
3. Gebruik model om flow voor overige AT’s te voorspellen
2. Determine total energy consumption 6 blowers
1. Compute from kW measurements
3. Energy consumption lane = (air flow lane / air flow total) *
energy consumption total
APPROACH.
20. • Input scaling
• Pressure only dependent on
blowers & valves:
• ➤ no bias
• ➤ non-negative constraints
• Output layers:
• Linear activation
• Use bias initializer with
average output value
• Tune loss_weights for right
balance
• Adam optimizer; learning rate
0.0003
NEURAL NETWORK.
21. MODEL PERFORMANCE.
• Pressure:
Mean absolute error 0.37
(on a range of ± 0-90)
• Air flow:
Mean absolute error 177
(on a range of ± 0-11.000)
• Energy consumption:
Mean absolute error 3.1
(on a range of 0-400)
22. • predict.py
• Python module to get model predictions
• api.py
• Flask script for API definition
• Dockerfile:
• Python + packages
• Start Gunicorn web server
• azure-pipelines.yml
• Build & Test pipeline
DEPLOYMENT.
23. • So we predicted air flow / energy consumption...
• That’s nice and all
• But we haven’t optimized anything yet!
That was just the first part!
24. • Observe the environment
• Choose an action
• Execute the action in the environment
• Observe reward and
changes in the environment
• Evaluate the choice of your action
• Repeat
REINFORCEMENT LEARNING.
25. 1. Observe required airflow
• Derived from water influent + oxygen/ammonium measurements
2. Choose new settings for blowers & valves
• This is done by the agent
3. Observe resulting air flow & energy consumption
• This is done using the modelfrom the 1st part
4. Evaluate the result
• Penalize deviations wrt required airflow
• Penalize energy consumption
5. Adapt the agent’s model based on the evalution
6. Back to 1.
APPROACH.
26. • Deep Deterministic Policy Gradient
• Similar to Deep Q-learning, but for
continuous action spaces
• Two models, trained itreratively:
• Actor controls the settings (chooses actions)
• Critic evaluates the result
• Model from part 1 is a simulation of the
environment
• Tricks like experience replay, target
networks, warmup, random process
exploration
DDPG ALGORITHM.
27. • Environment in OpenAI’s gym format
• Define possible actions
• Define what the states look like
• Define a function that, given a state and an action, returns a reward
• This uses the model from part 1.
• DDPG from Keras-RL library
• Relatively clear design; DDPG implemented
• States: Pressure, Air flow, Blowers, Valves
• Actions: Blowers, Valves
• Reward: required vs. achieved airflow + energy consumption
IMPLEMENTATION.
28. • Actor: State ➤ Action
• Input = State (Pressure, Air flow, Blowers, Valves)
• Scaling
• 2 dense layers with ReLU activation
• Output with tanh activation ➤ Blowers + Valves
• Critic: State + Action ➤ Reward
• Input = State + Action
• Scaling
• 2 dense layers with ReLU activation
• Output with linear activation ➤ Reward
N.B. Actor netwerk is pre-trained on current control strategy
ACTOR & CRITIC.
30. • Impediment:
• Direct control of blowers &
valves not possible for the next
two years or so
• Scope:
• Don’t focus on energy
consumption
• Focus on nitrogen emissions
(N2O, NO3, NH4)
SURPRISE!
31. • A new environment model that
• given a ‘state’ (measurements)
• given proposed action (control settings)
• predicts the next state (measurements)
• An influent forecast model that
• given date/time & previous measurements
• predicts the expected influent
• A control strategy model that
• given a ‘state’
• given an influent forecast
• predicts the best action (control settings)
FOLLOW-UP.
+ ➤
➤
➤
+
33. • Who has the domain knowledge?
• Which data is reliable?
• Data augmentation for better generalization
• Andrej Karpathy’s Recipe for training neural networks:
• Tensorboard: write_images makes logs explode, but it helps in validating the
weights
• Tuning a neural net based on the loss curve is one thing, but tuning a
reinforcement learning agent on a reward curve...
• Reinforcement learning is still in its infancy
• Translate continuous actions to discrete actions
LEARNINGS.