On-Prem Solution for the Selection of Wind Energy Models

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Ana M. Martinez, Vestas Wind Systems A/S
On-Prem Solution for the
Selection of Wind Energy
Models
#UnifiedDataAnalytics #SparkAISummit

3
Is this a good
piece of art?
pixabay.com
Classification: Public

4
Is this a good
piece of art?
pixabay.com

5
Is this a good site?

6
Is this a good site?

SiteHunt®
• Enables early identification of potential wind farms
7
SiteHunt® FirstView
3km resolution

SiteHunt®
8
3km resolution
SiteHunt® CloseUp
1km – 300m resolution

SiteHunt®
9
3km resolution
SiteHunt® CloseUp
1km – 300m resolution
SiteHunt® DeepDive
10 – 25m resolution

Wind resources
enrichment
Existing modelling options:
– Physical modelling leads to
time-consuming simulations.
– Sub-optimal geostatistical
approaches.
10Classification: Public

Motivation
• DL technology has been recently proved
successful on similar tasks with data that has
hierarchical structure.
• What tools and data do we have at Vestas for
this task?
• What is missing?

HPC is our
primary tool
• 650 compute nodes (Lenovo).
• ~ 16000 CPU cores.
• Total memory > 100 TB.
• >5 PB HDD storage (EMC Isilon).
• 56Gb/s IB.
• ~500 TFLOPS.
• 20 GPU nodes.
• Sun Grid Engine scheduler.
12

Data is our spine
• Vestas Climate Library (peta-byte scale).
– Hourly wind resource data from 2000-01-01 to
present in 3km horizontal resolution.
– More than 50 parameters.
– From ground level to beyond 500m.
– ORC database, started in 2012.
• Elevation database.
• Roughness database.

US
average
wind
speed at
80m
14Classification: Public 14

US: Avg.
80m wind,
terrain below
1500m, wspd
> 3m/s

US: Exclude
National
Parks,
protected
areas, national
forests and
federal land

US: Remove
urban areas
and airports

High-voltage
grid
proximity
(up to 30 km
from the
grid)

Siting
• Improve siting by not relying on point estimates from
meteorological masts.
• Wind resources in higher resolution.

Technical Solution
20
Data
Preparation
Data
Extraction
Data
Preparation
Model
selection
Model
Training &
Evaluation
Model
deployment
Hyperparameter
search

Wind resource
downscale
PoC Example

Data Extraction & Preparation
22
Wind data
(HR/LR)
orc format
~1.5PB
Elevation
Data (VHR)
hgt format
~400GB
Roughness
(HR)
GeoTIFF
format
Apache
Spark*
(pyspark)
Apache
Spark*
Derived features vector field
Curl, divergence, laplacian
* All product names, logos and brands are property of their respective owners. All company, product and service names used in this document are for identification purposes only. Use of these names, logos and brands does not imply endorsement.
Apache
Hive*
python

VCL 3km point – global
coverage (19 years).
3km
23

VCL 1km point – Saudi Arabia
coverage (1 year).
1km
3km
24

coverage (1 year).
Terrain data - SRTM (very high
resolution, up to 30m).
25

26
Each red point generates 1 row
per timesptamp on the dataset
coverage (1 year).
Terrain data - SRTM (very high
resolution, up to 30m).

27
INPUT TARGET
swdown u_HR
xhour/yhour v_HR
temperature
u_LR
v_LR
heights_HR
roughness_HR
INPUT TARGET
heights u_HR
u_LR v_HR
v_LR
DNN BASELINE
u
v
wind
speed
q

Feed Forward neural network
28
Input parameters
Output parameters
Hidden
layers
first_neuron (width)
Shape(brick)

Hyperparameter selection
29
48 combinations
#neurons #hidden
layers
#epochs dropout
12 1 200 0.2
56 1 200 0.2
128 1 200 0.2
256 1 200 0.2
12 5 200 0.2
56 5 200 0.2
…
…
256 10 400 0.5

Hyperparameter search
Existing tools not directly applicable:
– Talos.
– KubeFlow.
– MLflow.
– Elephas.

Model Selection
31
Job Scheduler
qsub array
TensorBoard*
Configuration
+
train/val data
C
onfiguration
+
train/val data Output
keras_model.h5
Tensorboard logs
params.json
* All product names, logos and brands are property of their respective owners. All company, product and service names used in this document are for identification purposes only. Use of these names, logos and brands does not imply endorsement.
docker
containers
docker
containers
docker
containers
docker
containers
docker
containers
docker
containers
talos*
*

Model Training & Evaluation
32
Evaluation measures
MAE, RMSE, BIAS, STDE
Riemann sum between the CDF*
differences (CDF diff.)
Pearson correlation coefficient (Pearson’s r)
TestValidationTrain
Train
baseline
& wining
DNN
model
Learn
hyperpara
meters
Learn
candidate
models
Time-
consecutive
data kept to
evaluate
* Cumulative distribution function

Do we
downscale?
PoC Example

Quantitative results
Method RMSE Bias Pearson’s R CDF diff.
closest 3km point 0.0305 -0.0063 0.9827 0.0077
Linear regression 0.0315 0.0080 0.9836 0.0114
DNN 0.0294 0.0022 0.9853 0.0093

Quantitative results
Method RMSE Bias Pearson’s R CDF diff.
closest 3km point 0.0752 -0.021 0.9223 0.0218
Linear regression 0.0663 0.0009 0.9236 0.0178
DNN 0.0538 0.0021 0.9459 0.0075

Do we
downscale?
PoC Example

Ongoing work
• Use of convolutional + recurrent neural
networks.
• Test different evaluation scenarios.
• Test higher resolution terrain information.
• Connect and automate the end-to-end cycle.

Potential
• ML importance across the whole value chain.
– Power forecasting.
– Long-term correction of wind measurements.
– Wind resources enrichment.
– Troubleshooting turbine errors.
– Condition monitoring.
– Wind farm control.
– Wind Turbine Surface Damage Detection.
– …

Vestas Team
54
Ana M. Martinez Hjalte Vinther Kiefer
Hans Harhoff Andersen Tiago Miguel da Costa Luna

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

On-Prem Solution for the Selection of Wind Energy Models

Recomendados

Recomendados

Más contenido relacionado

Similar a On-Prem Solution for the Selection of Wind Energy Models

Similar a On-Prem Solution for the Selection of Wind Energy Models (20)

Más de Databricks

Más de Databricks (20)

Último

Último (20)

On-Prem Solution for the Selection of Wind Energy Models