This document summarizes Vestas Wind Systems' work on using deep learning models to improve wind resource modeling. Vestas has developed a tool called SiteHunt that provides wind resource data at different resolutions to help identify potential wind farm sites. The company is exploring using deep learning models to downscale lower resolution wind data to higher resolutions. An initial proof of concept showed a deep neural network improved downscaling accuracy compared to traditional methods. Ongoing work includes testing more advanced neural network architectures and automating the end-to-end modeling process.
7. SiteHunt®
• Enables early identification of potential wind farms
7
SiteHunt® FirstView
3km resolution
Classification: Public
8. SiteHunt®
• Enables early identification of potential wind farms
8
SiteHunt® FirstView
3km resolution
SiteHunt® CloseUp
1km – 300m resolution
Classification: Public
9. SiteHunt®
• Enables early identification of potential wind farms
9
SiteHunt® FirstView
3km resolution
SiteHunt® CloseUp
1km – 300m resolution
SiteHunt® DeepDive
10 – 25m resolution
Classification: Public
10. Wind resources
enrichment
Existing modelling options:
– Physical modelling leads to
time-consuming simulations.
– Sub-optimal geostatistical
approaches.
10Classification: Public
11. Motivation
• DL technology has been recently proved
successful on similar tasks with data that has
hierarchical structure.
• What tools and data do we have at Vestas for
this task?
• What is missing?
11Classification: Public
12. HPC is our
primary tool
• 650 compute nodes (Lenovo).
• ~ 16000 CPU cores.
• Total memory > 100 TB.
• >5 PB HDD storage (EMC Isilon).
• 56Gb/s IB.
• ~500 TFLOPS.
• 20 GPU nodes.
• Sun Grid Engine scheduler.
12
Classification: Public
13. Data is our spine
• Vestas Climate Library (peta-byte scale).
– Hourly wind resource data from 2000-01-01 to
present in 3km horizontal resolution.
– More than 50 parameters.
– From ground level to beyond 500m.
– ORC database, started in 2012.
• Elevation database.
• Roughness database.
13Classification: Public
19. Siting
• Improve siting by not relying on point estimates from
meteorological masts.
• Wind resources in higher resolution.
19Classification: Public 19
22. Data Extraction & Preparation
22
Wind data
(HR/LR)
orc format
~1.5PB
Elevation
Data (VHR)
hgt format
~400GB
Roughness
(HR)
GeoTIFF
format
Apache
Spark*
(pyspark)
Apache
Spark*
Derived features vector field
Curl, divergence, laplacian
* All product names, logos and brands are property of their respective owners. All company, product and service names used in this document are for identification purposes only. Use of these names, logos and brands does not imply endorsement.
Classification: Public
Apache
Hive*
python
23. VCL 3km point – global
coverage (19 years).
3km
23
Data Extraction & Preparation
23Classification: Public
24. VCL 3km point – global
coverage (19 years).
VCL 1km point – Saudi Arabia
coverage (1 year).
1km
3km
24
Data Extraction & Preparation
24Classification: Public
25. VCL 3km point – global
coverage (19 years).
VCL 1km point – Saudi Arabia
coverage (1 year).
Terrain data - SRTM (very high
resolution, up to 30m).
25
Data Extraction & Preparation
25Classification: Public
26. 26
Each red point generates 1 row
per timesptamp on the dataset
VCL 3km point – global
coverage (19 years).
VCL 1km point – Saudi Arabia
coverage (1 year).
Terrain data - SRTM (very high
resolution, up to 30m).
Data Extraction & Preparation
26Classification: Public
27. Data Extraction & Preparation
27
INPUT TARGET
swdown u_HR
xhour/yhour v_HR
temperature
u_LR
v_LR
heights_HR
roughness_HR
INPUT TARGET
heights u_HR
u_LR v_HR
v_LR
DNN BASELINE
u
v
wind
speed
q
Classification: Public
31. Model Selection
31
Job Scheduler
qsub array
TensorBoard*
Configuration
+
train/val data
C
onfiguration
+
train/val data Output
keras_model.h5
Tensorboard logs
params.json
* All product names, logos and brands are property of their respective owners. All company, product and service names used in this document are for identification purposes only. Use of these names, logos and brands does not imply endorsement.
docker
containers
docker
containers
docker
containers
docker
containers
docker
containers
docker
containers
talos*
*
Classification: Public
32. Model Training & Evaluation
32
Evaluation measures
MAE, RMSE, BIAS, STDE
Riemann sum between the CDF*
differences (CDF diff.)
Pearson correlation coefficient (Pearson’s r)
TestValidationTrain
Train
baseline
& wining
DNN
model
Learn
hyperpara
meters
Learn
candidate
models
Time-
consecutive
data kept to
evaluate
* Cumulative distribution function
Classification: Public
52. Ongoing work
• Use of convolutional + recurrent neural
networks.
• Test different evaluation scenarios.
• Test higher resolution terrain information.
• Connect and automate the end-to-end cycle.
52Classification: Public
53. Potential
• ML importance across the whole value chain.
– Power forecasting.
– Long-term correction of wind measurements.
– Wind resources enrichment.
– Troubleshooting turbine errors.
– Condition monitoring.
– Wind farm control.
– Wind Turbine Surface Damage Detection.
– …
53Classification: Public
54. Vestas Team
54
Ana M. Martinez Hjalte Vinther Kiefer
Hans Harhoff Andersen Tiago Miguel da Costa Luna
Classification: Public
55. DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT