SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Split Knowledge Transfer in Learning Under
Privileged Information Framework
Niharika Gaurahaa
, Fabian S¨oderdahlb
and Ola Spjutha
Varna, Bulgaria, Sep 9, 2019
a
Department of Pharmaceutical Biosciences, Uppsala University, Sweden
b
Statisticon AB, Uppsala, Sweden 1
Our domain
We study life science problems, and in particular effects of drugs,
from various aspects.
Some data can be very expensive to generate, others cheaper.
2
Our domain
Another example: Imaging in multiple resolutions
Can we improve predictions using data from multiple resolutions at
training time? 3
Learning Under Privileged Information
• The classical machine learning paradigm assumes, training
examples in the form of iid pair:
(x1, y1), ..., (xl , yl ), xi ∈ X, yi ∈ {−1, +1}
• LUPI paradigm assumes, training examples in the form of iid
triplets:
(x1, x∗
1 , y1), ..., (xl , x∗
l , yl ), xi ∈ X, x∗
i ∈ X∗
, yi ∈ {−1, +1}
• x∗
i ∈ X∗ is Privileged Information (PI) or additional data
available for each training example, but not available for
testing examples.
4
LUPI contd...
Traning Data
in feature
space X
New Object
in feature
space X
Train Model Predictive
Rules
Prediction
Traning Data
in PI space X∗
5
LUPI and CP
We introduced CP in LUPI at COPA 2018 1.
We used SVM+ implementation of LUPI. Only minor
improvements in efficiency. Does not scale well for larger problems.
1
Niharika Gauraha, Lars Carlsson, Ola Spjuth. ”Conformal prediction in learning under privileged information
paradigm with applications in drug discovery”. Proceedings of the Seventh Workshop on Conformal and
Probabilistic Prediction and Applications, PMLR 91:147-156, 2018.
6
Knowledge Transfer LUPI
For knowledge transfer from space X∗− > X, for each privileged
feature a regression function is learned by using p decision features
in X as explanatory variables and privileged feature vectors X∗(i),
for i = 1, . . . , m, as response variables. In total m regression
functions φj (x), j = 1, . . . , m are learned.
7
Knowledge Transfer LUPI
We can augment the training data set as follows:



y1 x1 φ1(x1) . . . φm(x1)
. . . . . . . . . . . . . . .
yl xl φ1(xl ) . . . φm(xl )



Then, we apply some standard algorithm to this modified training
data and construct an p + m-dimensional decision rule f (x, Φ(x)),
where
Φ(x) = (φ1(x) φ2(x) . . . φm(x))
8
Knowledge Transfer LUPI: Contd...
Figure adapted from: Izmailov et al ”Feature Selection in Learning Using Privileged Information”. ICDM 2017
9
Knowledge Transfer LUPI: Contd...
Various regression techniques can be used for approximating
privileged features. For example, multiple linear Regression and
Kernelized Ridge Regression (KRR) with RBF kernel etc.
In LUPI with knowledge transfer approach the same data is used to
learn the m-regression functions Φ = φj (x), j = 1, . . . , m as for
creating the final decision rule.
10
Split Knowledge Transfer LUPI: Training
• Inspired by the cross-validation approach, we propose LUPI
with Split Knowledge Transfer, SKT-LUPI .
• The training set is split into K folds of equal (or almost equal)
size, and then the following two steps are repeated for
k = 1 . . . K.
step 1 We use the kth fold (Xk, X∗
k) for learning the m transfer
function Φk = (Φk1, . . . Φkm)T and the rest of the training set
is augmented with its predicted values from m regressions
¯Xk = (X−k Φk(X−k).
step 2 Using this augmented design matrix ¯Xk and the corresponding
labels, a decision rule, Fk is constructed.
11
Split Knowledge Transfer LUPI: Prediction
• For a new test object x ∈ X, we compute ˆx∗
1 , . . . , ˆx∗
K
regression estimates using regression functions learned
previously.
• Predictions are made for each combination (x, x∗
k ), for
k = 1, . . . , K in the m + p dimension space using the
corresponding decision rule Fk.
• We use the synergy method2 to combine the results from K
decision rules.
2
Vapnik, Vladimir, and Rauf Izmailov. ”Synergy of monotonic rules.” The
Journal of Machine Learning Research 17.1 (2016): 4722-4754.
12
Experiment 1: Synthetic Dataset
Table 1: Comparison of error rates (in %) on the synthetic dataset 3
between four types of classification scenarios considered: SVM on
standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with
split knowledge transfer (SKT-LUPI) with 5-folds, and SVM on privileged
features. Multiple linear regression is used for knowledge transfer.
Training Size SVM KT-LUPI SKT-LUPI SVM on PI
25 9.86 7.68 6.10 3.39
35 7.26 5.42 4.80 3.19
45 7.21 4.69 4.22 2.98
55 6.88 5.04 4.84 2.51
3
Vapnik and Izmailov, ”Learning with Intelligent Teacher”, COPA 2016
Proceedings of the 5th International Symposium on Conformal and
Probabilistic Prediction with Applications, vol 9653 pp 3-19 13
Experiment 2: Knowledge Transfer using Linear Transformation
Table 2: Comparison of error rates (in %) on modified real datasets 4
between four types of classification scenarios considered: SVM on
standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with
split knowledge transfer (SKT-LUPI) with 10-folds, and SVM on all
features (standard features and privileged features). Multiple linear
regression is used for knowledge transfer.
Dataset SVM KT-LUPI SKT-LUPI SVM on all features
Parkinsons 10.25 8.58 7.05 6.53
Ionosphere 5.49 4.92 4.29 4.85
kc2 17.28 17.09 16.99 16.80
4
Izmailov et al. ”Feature selection in learning using privileged information”. In
2017 IEEE International Conference on Data Mining Workshops, pp 957-963.
14
Experiment 3: Knowledge Transfer using non-Linear Transfor-
mation
Table 3: Comparison of error rates (in %) on modified real datasets 5
between four types of classification scenarios considered: SVM on
standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with
split knowledge transfer (SKT-LUPI) with 10-folds, and SVM on all
features (standard features and privileged features). Kernel ridge
regression is used for knowledge transfer.
Dataset SVM KT-LUPI SKT-LUPI SVM on all features
Parkinsons 10.25 8.97 7.82 6.53
Ionosphere 5.49 5.07 4.57 4.85
kc2 17.28 17.23 16.95 16.80
5
Izmailov et al. ”Feature selection in learning using privileged information”. In
2017 IEEE International Conference on Data Mining Workshops, pp 957-963.
15
Error rates
16
Conclusions
SKT-LUPI shows improved performance over KT-LUPI for the
studied datasets.
Limitations: Only a few (generated) LUPI datasets
Future work:
• SKT-LUPI with conformal prediction
• Explore feature selection / dimension reduction
• Application in image classification, compare with Deep
Learning
• Generating LUPI datasets
17
Acknowledgements
This project received financial support from the Swedish
Foundation for Strategic Research (SSF) as part of the HASTE
project under the call ‘Big Data and Computational Science’.
The computations were performed on resources provided by SNIC
through Uppsala Multidisciplinary Center for Advanced
Computational Science (UPPMAX) under project
SNIC 2019/8 − 15.
18

Más contenido relacionado

Más de Ola Spjuth

Más de Ola Spjuth (10)

The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
 
Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discovery
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
 
Interoperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceInteroperability and scalability with microservices in science
Interoperability and scalability with microservices in science
 
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
 
Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...
 
Accessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseAccessing and scripting CDK from Bioclipse
Accessing and scripting CDK from Bioclipse
 

Último

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Último (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Split Knowledge Transfer in Learning Under Privileged Information Framework

  • 1. Split Knowledge Transfer in Learning Under Privileged Information Framework Niharika Gaurahaa , Fabian S¨oderdahlb and Ola Spjutha Varna, Bulgaria, Sep 9, 2019 a Department of Pharmaceutical Biosciences, Uppsala University, Sweden b Statisticon AB, Uppsala, Sweden 1
  • 2. Our domain We study life science problems, and in particular effects of drugs, from various aspects. Some data can be very expensive to generate, others cheaper. 2
  • 3. Our domain Another example: Imaging in multiple resolutions Can we improve predictions using data from multiple resolutions at training time? 3
  • 4. Learning Under Privileged Information • The classical machine learning paradigm assumes, training examples in the form of iid pair: (x1, y1), ..., (xl , yl ), xi ∈ X, yi ∈ {−1, +1} • LUPI paradigm assumes, training examples in the form of iid triplets: (x1, x∗ 1 , y1), ..., (xl , x∗ l , yl ), xi ∈ X, x∗ i ∈ X∗ , yi ∈ {−1, +1} • x∗ i ∈ X∗ is Privileged Information (PI) or additional data available for each training example, but not available for testing examples. 4
  • 5. LUPI contd... Traning Data in feature space X New Object in feature space X Train Model Predictive Rules Prediction Traning Data in PI space X∗ 5
  • 6. LUPI and CP We introduced CP in LUPI at COPA 2018 1. We used SVM+ implementation of LUPI. Only minor improvements in efficiency. Does not scale well for larger problems. 1 Niharika Gauraha, Lars Carlsson, Ola Spjuth. ”Conformal prediction in learning under privileged information paradigm with applications in drug discovery”. Proceedings of the Seventh Workshop on Conformal and Probabilistic Prediction and Applications, PMLR 91:147-156, 2018. 6
  • 7. Knowledge Transfer LUPI For knowledge transfer from space X∗− > X, for each privileged feature a regression function is learned by using p decision features in X as explanatory variables and privileged feature vectors X∗(i), for i = 1, . . . , m, as response variables. In total m regression functions φj (x), j = 1, . . . , m are learned. 7
  • 8. Knowledge Transfer LUPI We can augment the training data set as follows:    y1 x1 φ1(x1) . . . φm(x1) . . . . . . . . . . . . . . . yl xl φ1(xl ) . . . φm(xl )    Then, we apply some standard algorithm to this modified training data and construct an p + m-dimensional decision rule f (x, Φ(x)), where Φ(x) = (φ1(x) φ2(x) . . . φm(x)) 8
  • 9. Knowledge Transfer LUPI: Contd... Figure adapted from: Izmailov et al ”Feature Selection in Learning Using Privileged Information”. ICDM 2017 9
  • 10. Knowledge Transfer LUPI: Contd... Various regression techniques can be used for approximating privileged features. For example, multiple linear Regression and Kernelized Ridge Regression (KRR) with RBF kernel etc. In LUPI with knowledge transfer approach the same data is used to learn the m-regression functions Φ = φj (x), j = 1, . . . , m as for creating the final decision rule. 10
  • 11. Split Knowledge Transfer LUPI: Training • Inspired by the cross-validation approach, we propose LUPI with Split Knowledge Transfer, SKT-LUPI . • The training set is split into K folds of equal (or almost equal) size, and then the following two steps are repeated for k = 1 . . . K. step 1 We use the kth fold (Xk, X∗ k) for learning the m transfer function Φk = (Φk1, . . . Φkm)T and the rest of the training set is augmented with its predicted values from m regressions ¯Xk = (X−k Φk(X−k). step 2 Using this augmented design matrix ¯Xk and the corresponding labels, a decision rule, Fk is constructed. 11
  • 12. Split Knowledge Transfer LUPI: Prediction • For a new test object x ∈ X, we compute ˆx∗ 1 , . . . , ˆx∗ K regression estimates using regression functions learned previously. • Predictions are made for each combination (x, x∗ k ), for k = 1, . . . , K in the m + p dimension space using the corresponding decision rule Fk. • We use the synergy method2 to combine the results from K decision rules. 2 Vapnik, Vladimir, and Rauf Izmailov. ”Synergy of monotonic rules.” The Journal of Machine Learning Research 17.1 (2016): 4722-4754. 12
  • 13. Experiment 1: Synthetic Dataset Table 1: Comparison of error rates (in %) on the synthetic dataset 3 between four types of classification scenarios considered: SVM on standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with split knowledge transfer (SKT-LUPI) with 5-folds, and SVM on privileged features. Multiple linear regression is used for knowledge transfer. Training Size SVM KT-LUPI SKT-LUPI SVM on PI 25 9.86 7.68 6.10 3.39 35 7.26 5.42 4.80 3.19 45 7.21 4.69 4.22 2.98 55 6.88 5.04 4.84 2.51 3 Vapnik and Izmailov, ”Learning with Intelligent Teacher”, COPA 2016 Proceedings of the 5th International Symposium on Conformal and Probabilistic Prediction with Applications, vol 9653 pp 3-19 13
  • 14. Experiment 2: Knowledge Transfer using Linear Transformation Table 2: Comparison of error rates (in %) on modified real datasets 4 between four types of classification scenarios considered: SVM on standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with split knowledge transfer (SKT-LUPI) with 10-folds, and SVM on all features (standard features and privileged features). Multiple linear regression is used for knowledge transfer. Dataset SVM KT-LUPI SKT-LUPI SVM on all features Parkinsons 10.25 8.58 7.05 6.53 Ionosphere 5.49 4.92 4.29 4.85 kc2 17.28 17.09 16.99 16.80 4 Izmailov et al. ”Feature selection in learning using privileged information”. In 2017 IEEE International Conference on Data Mining Workshops, pp 957-963. 14
  • 15. Experiment 3: Knowledge Transfer using non-Linear Transfor- mation Table 3: Comparison of error rates (in %) on modified real datasets 5 between four types of classification scenarios considered: SVM on standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with split knowledge transfer (SKT-LUPI) with 10-folds, and SVM on all features (standard features and privileged features). Kernel ridge regression is used for knowledge transfer. Dataset SVM KT-LUPI SKT-LUPI SVM on all features Parkinsons 10.25 8.97 7.82 6.53 Ionosphere 5.49 5.07 4.57 4.85 kc2 17.28 17.23 16.95 16.80 5 Izmailov et al. ”Feature selection in learning using privileged information”. In 2017 IEEE International Conference on Data Mining Workshops, pp 957-963. 15
  • 17. Conclusions SKT-LUPI shows improved performance over KT-LUPI for the studied datasets. Limitations: Only a few (generated) LUPI datasets Future work: • SKT-LUPI with conformal prediction • Explore feature selection / dimension reduction • Application in image classification, compare with Deep Learning • Generating LUPI datasets 17
  • 18. Acknowledgements This project received financial support from the Swedish Foundation for Strategic Research (SSF) as part of the HASTE project under the call ‘Big Data and Computational Science’. The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under project SNIC 2019/8 − 15. 18