SlideShare a Scribd company logo
1 of 118
Reproducibility in
By Carlos Toxtli
Some AI projects that I've done
● Hum2Song : Compose the musical accompaniment of a melody produced by a human voice.
● MultiAffect : Reproducible Research Framework for Multimodal Affect and Action Recognition
● AutomEditor: AutomEditor is an AI-based video editor.
● DeepStab: Real-time Video Object Stabilization tool by using Deep Learning
● DeepPiracy: Video piracy detection system by using Longest Common Subsequence and DL
● VR-360-musi: Transforms a Youtube video into five stems by using AI and place them into a room.
● ReputationAgent: System that detects inaccurate and unfair reviews given to gig workers.
● TaskBot: Research and development of a bot that helps teams to delegate tasks
● ExpertTwin: Enhanced workspace by an AI agent that provides content to knowledge workers
● LivenessDetection: Design and development of Machine VIsion algorithms to validate identity
● QuantumDrugDiscovery: Drug discovery by using Quantum Computing.
● Awesome Machine Learning Jupyter Notebooks for Colab: Curated list of notebooks
● Awesome Robotic Process Automation: Curated list of notebooks
● Artificial Intelligence By Example Second Edition, Book
● Explainable AI, Book
● Among others ...
● Overview
● Reproducibility problems
● Solutions for reproducibility
● Understanding techniques
● Conclusions
What is Reproducibility?
Reproducibility means obtaining consistent computational
results using the same input data, computational steps,
methods, code, and conditions of analysis.
Replicability means obtaining consistent results across
studies aimed at answering the same scientific question,
each of which has obtained its own data.
Researchers over the years have investigated the factors that affect reproducibility in
data science related studies. Some common findings point that non-reproducible
● Lack information or access to the dataset in its original form and order
● The software environment used
● Randomization control
● The actual implementation of the proposed techniques
● Some studies require a large number of computational resources that not
everybody can afford.
Looking for solutions ...
During my work on academia I have explored three different solutions
● Reproducibility framework
● Reproducible benchmarking
● Reproducible standalone methods
Reproducible Framework for Multimodal Tasks
Text Classification Benchmark
Machine Learning notebooks (~100)
My journey
I will explain what is needed
to produce and use any of
these approaches.
Reproducibility framework
A reproducible research framework standarizes:
● Data processing
● Feature engineering
● Training methods
● Evaluation methods
● Research document formatting
● Administration interface
Additionally it should be accessible to have a
broader impact, some of the desired features may
● No client requirements (online)
● No special hardware requirements
● No extra configuration
● Free of charge
MultiAffect: Reproducible Research
Framework for Multimodal Video
Classification and Regression Tasks at
utterance-level with spatio-temporal
feature fusion by using Face, Body,
Audio, Text, and Emotion features
So with this in mind, I created MultiAffect
The main goal of MultiAffect is to give guidance on how to
reproduce research experiments in a fixed setting.
These are the 5 main components:
● Platform Setup: Ensures that the machine is
properly configured
● Feature Extractor: Monitors the feature extraction
and manage the extracted features
● Model Trainer: Defines, trains, and fine-tunes the
● Evaluator: Calculates and reports the performance
● Research Paper Template: Defines the minimum set
of sections and mandatory citations
Platform Setup
Preparing a host machine to replicate machine learning research is usually
challenging, time-consuming, and expensive. One of the reasons is that
most of the models available today require a large scale dataset for
training. Hence, multimedia datasets have a high storage requirement. In
machine learning tasks, the feature extraction step helps algorithms to
reduce the dimensionality of the data and aids the model to focus on their
most significant or discriminative parameters. However, extracting
features from multimedia samples is a highly demanding task in terms of
Dealing with faulty code and
compiled libraries
Some of the tools that are required to perform the data extraction
need to be compiled for the host operating system. Scientific tools
are commonly built from multiple libraries and sometimes depend on
specific versions of certain libraries for certain operating systems;
this makes them prone to throw compilation errors. Sometimes the
code is not given, and there is an extra effort to code the
instructions described in the publication. Even if the code is available,
sometimes the code is not ready to reproduce, and important
efforts should be performed to make it work when works.
The solution is a
virtual machine
The software challenges can be mitigated by using virtual machines or
containers. Virtual machines and containers give a base operating system
that can contain the proper configuration built-in. These approaches can
run in the top of the host operating system or in online infrastructure. The
hardware challenges can be overcome by investing in powerful enough
infrastructure in-site or by using online on-demand infrastructure.
Conventional research paper replication depends on multiple factors as we
have explored.
MultiAffect over
The MultiAffect framework uses Google
Colaboratory to publish the Jupyter interactive
notebook and to perform the computation in
the attached virtual machine. Google
Colaboratory is a free research tool that enables
users with a Google account to host and run code
over Google's infrastructure. Google
Colaboratory offers users the ability to execute
their code segments in CPUs, GPUs, and TPUs
(an AI accelerator application-specific integrated
circuit). By the time this work is published, Google
Colaboratory offers a virtual machine with a Tesla
K80 GPU, 12 GB of RAM, and 350 GB of storage.
This platform provides enough resources to
perform video action recognition.
Ubuntu as
This platform includes a Debian based operating setting, so the provided
instructions are platform-specific. Local replication of our framework
requires an Ubuntu 18.04 operating system in order to install all the
libraries successfully. Our platform is agnostic to the Python version, all
the code executed in the notebook is written in Python, and it can be
executed in the versions 2 or 3 of the interpreter. Our framework is able
to set up and run the experiment from the online platform, enabling
users to deploy and execute the code in a free of charge environment
and without special requirements in the client-side.
Fine tuning the setup
The definition of the setup was an incremental
process of three main steps: (1) Initial setup:
The first functional version; (2) Packing
components: Uploading components in
batches to cloud storage; and (3) Optimal
setup: A version that loads faster.
In this step, the libraries were downloaded and compiled
directly from the notebook by running shell commands
from the notebook cells. Pre-requisites, missing
dependencies, and additional packages were installed in
the same notebook. The dataset and the pre-trained
models were downloaded from their original sources to
the virtual machine. The feature extraction, training,
and evaluation code were directly inserted into the
notebook in separate cells. The first version was tested
until it successfully extracted the features, trained, and
evaluated the models from the notebook. A backup of this
notebook was documented and set as the initial version.
Packing components
Each individual compiled library was packaged into a zip file that contains
the binary files as well as the configuration files. The pre-trained models
that were individually downloaded from their original sources were packed
together into a single file. Sometimes the latency is reduced by
downloading a single large file from a high-speed source and increased
when downloading multiple large files from different bandwidths. The
outcome of this task is a collection of zip files that were uploaded to a
Google Drive account. The files were shared with public access to be able
to be downloaded in Google Colaboratory notebooks logged with different
After packaging and storing the files from the initial setup
to the cloud, we started a branch of the initial setup that
loads these files. The optimal setup notebook was a
simplified version of the initial notebook, instead of having a
long section documenting the setup process, it was
replaced with a download pre-requisites section. The files
were downloaded by using a Python tool called GDown
that is already installed in Google Colaborary. It is important
to mention that the virtual machine attached to the Google
Colaboratory notebooks has already an Ubuntu
distribution with the most common machine learning
tools and libraries already installed. This optimal version is
tailored to Google Colaboratory only.
Optimizing the loading time
Per each of the libraries installed, we measured the time that takes to
install the prerequisites plus the compilation time. In average, the
overall setup of each library was five times slower than downloading
and extracting a previously compiled and zipped version of the library.
The total setup time for the Google Colaboratory environment was
reduced from 43 minutes to 6 minutes after implementing the pre-
compiled tools strategy and by downloading the files from the same
Google infrastructure.
Feature extractors
MultiAffect includes a feature extraction module as an independent component.
Multimodal feature extraction is often a highly demanding task, as it requires a
certain pre-processing of the videos before being able to extract features. Some
common pre-processing tasks are: separating the audio, extracting frames,
identifying faces, cropping faces, removing the background, skelethon detection
(pose), emotion detection, among many other procedures. Our feature extraction
methodology is based on the common ground found in submissions. Our feature
extraction process aims to maintain as invariant factors features such as the person
descriptors (i.e., gender, age, race), scale, position, background, and language. Our
approach considers ten features from five different modalities: face, body, audio,
text, and emotions.
Audio features
OpenSMILE (1582 features): The audio is
extracted from the videos and are processed
by OpenSMILE that extract audio features such
as loudness, pitch, jitter, etc.
It was tested on video-clip length (general) and
20 fragments (temporal).
Text features
Opinion Lexicon (6 features): depends on the ratio of
sentiment words (adjectives, adverbs, verbs and
nouns), which express positive or negative sentiments.
Subjective Lexicon (4 features): They used the
subjective Lexicon from MPQA (Multi-Perspective
Question Answering) that models the sentiment by its
type and intensity.
Word vectors GloVe, and BERT embeddings
Face features
OpenFace (709 features): Facial behavior analysis tool
that provides accurate facial landmark detection,
head pose estimation, facial action unit recognition,
and eye-gaze estimation. We get points that
represents the face.
VGG16 FC6 (4096 features): The faces are cropped
(224×224×3), aligned, zero out the background, and
passed through a pretrained VGG16 to get a take a
dimensional feature vector from FC6 layer.
Body Features
OpenPose (BODY_25) (11
features): The normalized angles
between the joints.I did not use
the calculated features because
were 25x224x224
VGG16 FC6 Skelethon image (4096
features): I drew the skeleton (neck
in the center) on a black
background and feed a VGG16 and
extracted a feature vector of the
FC6 layer.
Emotion features
EmoPy (7 features): A deep neural net toolkit for
emotion analysis via Facial Expression Recognition
Other (28 features): Other 4 models from different FER
contest participants.
7 categories per model, 35 features in total
20 samples per video clip were predicted (temporal)
from there I computed its normalized sum (general)
Model trainer
The MultiAffect models use different deep
learning models to recognize affect. Among
them we find RNNs (Recurrent Neural
Networks), CNNs (Convolutional Neural
Networks), and simple DNNs (Deep Neural
Networks) as MLPs (Multilayer Perceptrons).
The MultiAffect framework is designed to perform classification and regression tasks.
Depending on the performed task, the platform is adjusted to display meaningful
evaluations. The classification task gives accuracy, F1-score, recall, precision, AUC
and other metrics for the training, validation, and testing sets. In the case of a
regression task, the framework computes the MSE (Mean Square Error) and CCC
(Concordance Correlation Coefficient) that describes how well a new test or
measurement reproduces a gold standard test.
Plotting the
The results obtained from our reproducible framework for
the classification task are two plots, one to visualize the
accuracy while training and one for the training and testing
loss; and a confusion matrix obtained while evaluating the
model on the test data. On the other hand, for the
regression tasks the results are displayed in a scatter plot
that shows the correlation between the predicted and gold
standard labels.
In order to test its generalizability, we performed experiments on
two main tasks: affect recognition and video action recognition.
The video action and affect recognition tasks are attacked through the
training and testing of classification and regression models,
respectively. One of the main goals of the proposed framework is to
be able to perform both actions by only configuring a new set of
variable without performing any change to the code. Another goal was
to deliver results comparable to existing work
Video Action Recognition for Automatic Video Editing
All (Quadmodal): BodyTF+FaceTF+AudioG+EmoT
acc_val acc_train acc_test f1_score f1_test Loss
All 1.00 1.00 0.90 1.00 0.90 0.01
Train Validation Test
Confusion matrices of the Quadmodal model
Affect recognition experiment
arousal 0.3730994852
Valence 0.2109641637
Emotion: "Surprise"
Results (it shows an almost 45 degrees line)
Let's switch
approaches to
You can use MultiAffect as a tool for any video
categorization and regression tasks. You can try it out
from this URL:
Now if we want to compare which of the existing
techniques work better for your problem, then you will
need a tool that benchmarks all the methods. This is
why I adapted an existing Text Classification
Benchmarking tool to be used as a tool in the cloud,
you can find it out here:
Text Classification
Benchmarking tool
This is a Google Colaboratory notebook
with instructions that has these
● Word ngram + LR (Logistic
● Char ngram + LR
● (Word + Char ngram) + LR
● RNN no embedding
● RNN + GloVe embedding
● CNN (multi-channel):
● Google BERT
Experiment, learning fairness from reviews
It promoted to improve fairness in reviews
The last approach, Independent ML,DL methods
Sometime you may know what is the best algorithm to
use for your requirements. In that case I adapted >100
notebooks to be able to use them as a tool and to train
models from the cloud by only uploading your data. You
can find it out here:
The process of adapting a notebook is 1) open in colab
from github 2) Add extra libraries 3) Download the data
from Drive.
Let’s do a quick recap of all the ML/DL/RL methods to
identify which method fit better to your problem.
● Sklearn
● Weka
● Matlab
Linear Regression
When to use it?
● Simple regression problems
○ How much the rent should cost in certain area
○ How much should I charge for specific amount of work
● Problems where we want to define a rule that separates two
categories that are similar, i.e. Premium or Basic price for customers
under certain parameters (number of rooms vs number of cars)
XOR problem - Not linear
Decision Tree
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import pydotplus
iris = load_iris()
clf = DecisionTreeClassifier().fit(,
dot_data = export_graphviz(clf, out_file=None, filled=True, rounded=True,
graph = pydotplus.graph_from_dot_data(dot_data)
When to use it?
● When we need to know what decisions the machine is taking
● When we need to explain to others how the features are evaluated
● When there are no much features
Random Forest
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(,, test_size=0.2)
clf = RandomForestClassifier(n_estimators=100), y_train)
y_pred = classifier.predict(X_test)
print('accuracy is',accuracy_score(y_pred, y_test))
When to use it?
● When we want to know alternatives of how to evaluate a problem.
● When we want to manually discard flows that are biased
● When we want to manage ensembles from one single method.
Naive Bayes
When to use it?
● When we want to know the probabilities of the different cases.
● When we need a probabilistic model.
● When we need an easy way to prove in paper
k-Nearest Neighbor
When to use it?
● When intuition says that the problem can be solved from getting thee
most similar option.
● When the information is no exhaustive.
● When we want to justify the decision of the algorithm in a common
human reasoning.
When to use it?
● When we don’t know how to understand the data
● When we want to optimize resources by grouping related elements.
● When we want that the computer creates the labels for us.
Support Vector Machine
When to use it?
● It was the most effective technique before Neural Networks, it can
achieve excellent results with less processing.
● Mathematically speaking, it is based in very strong math principles, it
creates complex multidimensional hyperplanes that separates the classes
● It is not a white box technique, but may be the best option for problems
where we want to get the best of Machine Learning approach without
dealing with Neural Networks.
Logistic Regression
No regularization
L2 regularization
When to use it?
● When we want to optimize a regression
● When we want to binarize the output
● As a preliminary analysis before implementing neural networks
When to use it?
● When we have very few features and there is no extra details that can be
extracted from hidden layers.
● There are in fact neural networks, and we do not need alway to use
them for deep learning these can be used for machine learning when we
benchmark with other machine learning techniques.
● When we want to get the power of neural networks and we don’t have
much computational power.
ML & DL frameworks
It’s time for Deep Learning
Deep Learning
Artificial Neural Networks = Multi-Layer Perceptron
● Tensorflow
● Keras
● Pythorch
● Sklearn
When to use it?
● Classifiers when common machine Learning Algorithms performs
● Models with much features.
● Multiple classes projects.
Convolutional Neural Networks (CNN)
● Tensorflow
● Keras
● Pytorch
● Caffe
When to use it?
● When we want to process images
● When we want to process videos
● When we have highly dimensional data
Recurrent Neural Networks (RNN)
● Tensorflow
● Keras
● Pytorch
When to use it?
● When sequences are provided
○ Text sequences
○ Image sequences (videos)
○ Time series
● When we need to provide an ordered output
Mixed approaches
Mixed Deep learning features
When to use it?
● When we want to benchmark models
● When different models are stronger when these are evaluated together
● When the individual processing is not exhaustive
● MLBox
● H2O
● Google AutoML
When to use it?
● On every new model
● When we have enough time to train multiple models
● When we don’t know wich hyperparameters are better.
Reinforcement Learning
● OpenAI Gym
● Google Dopamine
● RLLib
● Keras-RL
● Tensorforce
● Facebook Horizon
When to use it?
● When a robot explores a place and needs to learn from the environment.
● When we can try as much as we can in a simulator.
● When we want to find the most optimal path
Techniques to improve the learning process
Principal Component Analysis (PCA)
Feature selection
When to use it?
● When we have too much features and we do not know which of them
are useful.
● When we want to reduce the dimensionality of our model.
● When we want to plot our decision boundaries.
Data Augmentation
When to use it?
● When we have limited data
● When we want to help our model to generalize more
● When our unseen data comes in very different formats.
Generative models
Discriminative: Predicts from Data
Generative: Generates from data distribution
Generative models
● Autoencoders
● Adversarial Networks
● Sequence Models
● Transformers
● Tensorflow
● Keras
● Pytorch
When to use it?
● When we want to compress data.
● When we need to change one type of input to other type of output.
● When we don’t need much variability in the generated data.
Generative Adversarial Networks
When to use it?
● When we need to transfer a style
● When we need more variability in the generated output
● When we need to keep context in the generation.
Sequence models
When to use it?
● When we generate text
● When we generate the next sequence from a serie
● When the order in the generated output matters.
When to use it?
● When context is an essential part of the generated output
● When we need to keep consistency in the frequency space.
● When we have enough computational resources.
Put notebooks into production
It seems that running code from a notebook in the cloud is just for testing
purposes, but actually you can run it as a service by running from a Docker
container locally.
I created a script that automatically prepares a container and execute it
every time you need as a command line application.
docker run psykohack/google-colab
More and more AI research is being distributed nowadays in redistributable
format. Some valuable resources can be found in:
● Nowadays we can reproduce state-of-the-art AI algorithms from a web
based platform.
● Complex tasks can be executed in notebooks structured as
● Our main job is to prepare the data to feed the algorithm that fits the
most to our needs.
● AI prototyping is drastically accelerated by using this technologies.
● Since these technologies are between pure-code and pure-tool
approaches, that gives the flexibility to iterate faster.

More Related Content

What's hot

Qtp Interview Questions
Qtp Interview QuestionsQtp Interview Questions
Qtp Interview Questionskspanigra
JPQL/ JPA Activity 1
JPQL/ JPA Activity 1JPQL/ JPA Activity 1
JPQL/ JPA Activity 1SFI
Asynchronous Programming in Android
Asynchronous Programming in AndroidAsynchronous Programming in Android
Asynchronous Programming in AndroidJohn Pendexter
Managing Your Runtime With P2
Managing Your Runtime With P2Managing Your Runtime With P2
Managing Your Runtime With P2Pascal Rapicault
Peering Inside the Black Box: A Case for Observability
Peering Inside the Black Box: A Case for ObservabilityPeering Inside the Black Box: A Case for Observability
Peering Inside the Black Box: A Case for ObservabilityVMware Tanzu
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...0xdaryl
Google ART (Android RunTime)
Google ART (Android RunTime)Google ART (Android RunTime)
Google ART (Android RunTime)Niraj Solanke
Debugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbDebugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbSenthilKumar Selvaraj
Micronaut: A new way to build microservices
Micronaut: A new way to build microservicesMicronaut: A new way to build microservices
Micronaut: A new way to build microservicesLuram Archanjo
Reverse engineering android apps
Reverse engineering android appsReverse engineering android apps
Reverse engineering android appsPranay Airan
Jython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Jython 2.7 and techniques for integrating with Java - Frank WierzbickiJython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Jython 2.7 and techniques for integrating with Java - Frank Wierzbickifwierzbicki
Qtp interview questions and answers
Qtp interview questions and answersQtp interview questions and answers
Qtp interview questions and answersITeLearn
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages HeavenIBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages HeavenPaul Withers
Domino OSGi Development
Domino OSGi DevelopmentDomino OSGi Development
Domino OSGi DevelopmentPaul Fiore
Toward dynamic analysis of obfuscated android malware
Toward dynamic analysis of obfuscated android malwareToward dynamic analysis of obfuscated android malware
Toward dynamic analysis of obfuscated android malwareZongXian Shen
Post-mortem Debugging of Windows Applications
Post-mortem Debugging of  Windows ApplicationsPost-mortem Debugging of  Windows Applications
Post-mortem Debugging of Windows ApplicationsGlobalLogic Ukraine
p2, modular provisioning for OSGi
p2, modular provisioning for OSGip2, modular provisioning for OSGi
p2, modular provisioning for OSGiPascal Rapicault
Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011Doug Hawkins

What's hot (20)

Qtp Interview Questions
Qtp Interview QuestionsQtp Interview Questions
Qtp Interview Questions
JPQL/ JPA Activity 1
JPQL/ JPA Activity 1JPQL/ JPA Activity 1
JPQL/ JPA Activity 1
Asynchronous Programming in Android
Asynchronous Programming in AndroidAsynchronous Programming in Android
Asynchronous Programming in Android
Managing Your Runtime With P2
Managing Your Runtime With P2Managing Your Runtime With P2
Managing Your Runtime With P2
Peering Inside the Black Box: A Case for Observability
Peering Inside the Black Box: A Case for ObservabilityPeering Inside the Black Box: A Case for Observability
Peering Inside the Black Box: A Case for Observability
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
Google ART (Android RunTime)
Google ART (Android RunTime)Google ART (Android RunTime)
Google ART (Android RunTime)
Debugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbDebugging Modern C++ Application with Gdb
Debugging Modern C++ Application with Gdb
Micronaut: A new way to build microservices
Micronaut: A new way to build microservicesMicronaut: A new way to build microservices
Micronaut: A new way to build microservices
Reverse engineering android apps
Reverse engineering android appsReverse engineering android apps
Reverse engineering android apps
Jython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Jython 2.7 and techniques for integrating with Java - Frank WierzbickiJython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Jython 2.7 and techniques for integrating with Java - Frank Wierzbicki
Qtp interview questions and answers
Qtp interview questions and answersQtp interview questions and answers
Qtp interview questions and answers
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages HeavenIBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
IBM Connect 2014 BP204: It's Not Infernal: Dante's Nine Circles of XPages Heaven
Domino OSGi Development
Domino OSGi DevelopmentDomino OSGi Development
Domino OSGi Development
Introduction to jython
Introduction to jythonIntroduction to jython
Introduction to jython
Toward dynamic analysis of obfuscated android malware
Toward dynamic analysis of obfuscated android malwareToward dynamic analysis of obfuscated android malware
Toward dynamic analysis of obfuscated android malware
Post-mortem Debugging of Windows Applications
Post-mortem Debugging of  Windows ApplicationsPost-mortem Debugging of  Windows Applications
Post-mortem Debugging of Windows Applications
p2, modular provisioning for OSGi
p2, modular provisioning for OSGip2, modular provisioning for OSGi
p2, modular provisioning for OSGi
Gwt portlet
Gwt portletGwt portlet
Gwt portlet
Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011

Similar to Reproducibility in artificial intelligence

Continuous Localisation On A Massive Scale
Continuous Localisation On A Massive ScaleContinuous Localisation On A Massive Scale
Continuous Localisation On A Massive ScaleGary Lefman
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiryVishwas N
Languages don't matter anymore!
Languages don't matter anymore!Languages don't matter anymore!
Languages don't matter anymore!Soluto
Advantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworksAdvantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworksKaty Slemon
DevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsDevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsFedir RYKHTIK
Making software development processes to work for you
Making software development processes to work for youMaking software development processes to work for you
Making software development processes to work for youAmbientia
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.Vlad Fedosov
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with PythonBrian Lyttle
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
Don't Fear the Autotools
Don't Fear the AutotoolsDon't Fear the Autotools
Don't Fear the AutotoolsScott Garman
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal BootRenesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal Bootandrewmurraympc
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsJules Pierre-Louis
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Demi Ben-Ari
Summer project- Jack Fletcher
Summer project- Jack Fletcher Summer project- Jack Fletcher
Summer project- Jack Fletcher Jack Fletcher

Similar to Reproducibility in artificial intelligence (20)

Introduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdfIntroduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdf
Continuous Localisation On A Massive Scale
Continuous Localisation On A Massive ScaleContinuous Localisation On A Massive Scale
Continuous Localisation On A Massive Scale
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiry
Languages don't matter anymore!
Languages don't matter anymore!Languages don't matter anymore!
Languages don't matter anymore!
Advantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworksAdvantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworks
DevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsDevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and Projects
Making software development processes to work for you
Making software development processes to work for youMaking software development processes to work for you
Making software development processes to work for you
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Docker in Production at the Aurora Team
Docker in Production at the Aurora TeamDocker in Production at the Aurora Team
Docker in Production at the Aurora Team
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with Python
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Don't Fear the Autotools
Don't Fear the AutotoolsDon't Fear the Autotools
Don't Fear the Autotools
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal BootRenesas DevCon 2010: Starting a QT Application with Minimal Boot
Renesas DevCon 2010: Starting a QT Application with Minimal Boot
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Summer project- Jack Fletcher
Summer project- Jack Fletcher Summer project- Jack Fletcher
Summer project- Jack Fletcher

More from Carlos Toxtli

Artificial intelligence and open source
Artificial intelligence and open sourceArtificial intelligence and open source
Artificial intelligence and open sourceCarlos Toxtli
Bots in robotic process automation
Bots in robotic process automationBots in robotic process automation
Bots in robotic process automationCarlos Toxtli
How to implement artificial intelligence solutions
How to implement artificial intelligence solutionsHow to implement artificial intelligence solutions
How to implement artificial intelligence solutionsCarlos Toxtli
Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Carlos Toxtli
Changing paradigms in ai prototyping
Changing paradigms in ai prototypingChanging paradigms in ai prototyping
Changing paradigms in ai prototypingCarlos Toxtli
Inteligencia Artificial From Zero to Hero
Inteligencia Artificial From Zero to HeroInteligencia Artificial From Zero to Hero
Inteligencia Artificial From Zero to HeroCarlos Toxtli
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersCarlos Toxtli
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Carlos Toxtli
Cómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCarlos Toxtli
Education 3.0 - Megatendencias
Education 3.0 - MegatendenciasEducation 3.0 - Megatendencias
Education 3.0 - MegatendenciasCarlos Toxtli
Understanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConUnderstanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConCarlos Toxtli
Understanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementUnderstanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementCarlos Toxtli
Single sign on spanish - guía completa
Single sign on   spanish - guía completaSingle sign on   spanish - guía completa
Single sign on spanish - guía completaCarlos Toxtli
Los empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaLos empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaCarlos Toxtli
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Carlos Toxtli
RPA (Robotic Process Automation)
RPA (Robotic Process Automation)RPA (Robotic Process Automation)
RPA (Robotic Process Automation)Carlos Toxtli
Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Carlos Toxtli
Estrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsEstrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsCarlos Toxtli
Tecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompTecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompCarlos Toxtli

More from Carlos Toxtli (20)

Artificial intelligence and open source
Artificial intelligence and open sourceArtificial intelligence and open source
Artificial intelligence and open source
Bots in robotic process automation
Bots in robotic process automationBots in robotic process automation
Bots in robotic process automation
How to implement artificial intelligence solutions
How to implement artificial intelligence solutionsHow to implement artificial intelligence solutions
How to implement artificial intelligence solutions
Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...
Changing paradigms in ai prototyping
Changing paradigms in ai prototypingChanging paradigms in ai prototyping
Changing paradigms in ai prototyping
Inteligencia Artificial From Zero to Hero
Inteligencia Artificial From Zero to HeroInteligencia Artificial From Zero to Hero
Inteligencia Artificial From Zero to Hero
Bots for Crowds
Bots for CrowdsBots for Crowds
Bots for Crowds
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Enabling Expert Critique with Chatbots and Micro-Guidance - Ci 2018
Cómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificialCómo vivir de la inteligencia artificial
Cómo vivir de la inteligencia artificial
Education 3.0 - Megatendencias
Education 3.0 - MegatendenciasEducation 3.0 - Megatendencias
Education 3.0 - Megatendencias
Understanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsConUnderstanding Political Manipulation and Botnets - RightsCon
Understanding Political Manipulation and Botnets - RightsCon
Understanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementUnderstanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task Management
Single sign on spanish - guía completa
Single sign on   spanish - guía completaSingle sign on   spanish - guía completa
Single sign on spanish - guía completa
Los empleos del futuro en Latinoamérica
Los empleos del futuro en LatinoaméricaLos empleos del futuro en Latinoamérica
Los empleos del futuro en Latinoamérica
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
Empleos que ya están siendo reemplazados por bots y el futuro del RPA (Roboti...
RPA (Robotic Process Automation)
RPA (Robotic Process Automation)RPA (Robotic Process Automation)
RPA (Robotic Process Automation)
Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)Chatbots + rpa (robotic process automation)
Chatbots + rpa (robotic process automation)
Estrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startupsEstrategias tecnológicas de crecimiento acelerado para startups
Estrategias tecnológicas de crecimiento acelerado para startups
Tecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiCompTecnología del futuro, predicciones a 10 años - CiComp
Tecnología del futuro, predicciones a 10 años - CiComp

Recently uploaded

Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Recently uploaded (20)

Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm

Reproducibility in artificial intelligence

  • 3. Some AI projects that I've done ● Hum2Song : Compose the musical accompaniment of a melody produced by a human voice. ● MultiAffect : Reproducible Research Framework for Multimodal Affect and Action Recognition ● AutomEditor: AutomEditor is an AI-based video editor. ● DeepStab: Real-time Video Object Stabilization tool by using Deep Learning ● DeepPiracy: Video piracy detection system by using Longest Common Subsequence and DL ● VR-360-musi: Transforms a Youtube video into five stems by using AI and place them into a room. ● ReputationAgent: System that detects inaccurate and unfair reviews given to gig workers. ● TaskBot: Research and development of a bot that helps teams to delegate tasks ● ExpertTwin: Enhanced workspace by an AI agent that provides content to knowledge workers ● LivenessDetection: Design and development of Machine VIsion algorithms to validate identity ● QuantumDrugDiscovery: Drug discovery by using Quantum Computing. ● Awesome Machine Learning Jupyter Notebooks for Colab: Curated list of notebooks ● Awesome Robotic Process Automation: Curated list of notebooks ● Artificial Intelligence By Example Second Edition, Book ● Explainable AI, Book ● Among others ...
  • 4. Index ● Overview ● Reproducibility problems ● Solutions for reproducibility ● Understanding techniques ● Conclusions
  • 5. What is Reproducibility? Reproducibility means obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis. Replicability means obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.
  • 6.
  • 7. Causes Researchers over the years have investigated the factors that affect reproducibility in data science related studies. Some common findings point that non-reproducible studies: ● Lack information or access to the dataset in its original form and order ● The software environment used ● Randomization control ● The actual implementation of the proposed techniques ● Some studies require a large number of computational resources that not everybody can afford.
  • 8. Looking for solutions ... During my work on academia I have explored three different solutions ● Reproducibility framework ● Reproducible benchmarking ● Reproducible standalone methods
  • 9. Reproducible Framework for Multimodal Tasks
  • 11. Machine Learning notebooks (~100)
  • 12. My journey I will explain what is needed to produce and use any of these approaches.
  • 13. Reproducibility framework A reproducible research framework standarizes: ● Data processing ● Feature engineering ● Training methods ● Evaluation methods ● Research document formatting ● Administration interface
  • 14. Inclusiveness Additionally it should be accessible to have a broader impact, some of the desired features may be: ● No client requirements (online) ● No special hardware requirements ● No extra configuration ● Free of charge
  • 15. MultiAffect: Reproducible Research Framework for Multimodal Video Classification and Regression Tasks at utterance-level with spatio-temporal feature fusion by using Face, Body, Audio, Text, and Emotion features So with this in mind, I created MultiAffect
  • 16. MultiAffect framework The main goal of MultiAffect is to give guidance on how to reproduce research experiments in a fixed setting. These are the 5 main components: ● Platform Setup: Ensures that the machine is properly configured ● Feature Extractor: Monitors the feature extraction and manage the extracted features ● Model Trainer: Defines, trains, and fine-tunes the model ● Evaluator: Calculates and reports the performance metrics. ● Research Paper Template: Defines the minimum set of sections and mandatory citations
  • 17. Platform Setup Preparing a host machine to replicate machine learning research is usually challenging, time-consuming, and expensive. One of the reasons is that most of the models available today require a large scale dataset for training. Hence, multimedia datasets have a high storage requirement. In machine learning tasks, the feature extraction step helps algorithms to reduce the dimensionality of the data and aids the model to focus on their most significant or discriminative parameters. However, extracting features from multimedia samples is a highly demanding task in terms of computation.
  • 18. Dealing with faulty code and compiled libraries Some of the tools that are required to perform the data extraction need to be compiled for the host operating system. Scientific tools are commonly built from multiple libraries and sometimes depend on specific versions of certain libraries for certain operating systems; this makes them prone to throw compilation errors. Sometimes the code is not given, and there is an extra effort to code the instructions described in the publication. Even if the code is available, sometimes the code is not ready to reproduce, and important efforts should be performed to make it work when works.
  • 19. The solution is a virtual machine The software challenges can be mitigated by using virtual machines or containers. Virtual machines and containers give a base operating system that can contain the proper configuration built-in. These approaches can run in the top of the host operating system or in online infrastructure. The hardware challenges can be overcome by investing in powerful enough infrastructure in-site or by using online on-demand infrastructure. Conventional research paper replication depends on multiple factors as we have explored.
  • 20. MultiAffect over Google Colaboratory The MultiAffect framework uses Google Colaboratory to publish the Jupyter interactive notebook and to perform the computation in the attached virtual machine. Google Colaboratory is a free research tool that enables users with a Google account to host and run code over Google's infrastructure. Google Colaboratory offers users the ability to execute their code segments in CPUs, GPUs, and TPUs (an AI accelerator application-specific integrated circuit). By the time this work is published, Google Colaboratory offers a virtual machine with a Tesla K80 GPU, 12 GB of RAM, and 350 GB of storage. This platform provides enough resources to perform video action recognition.
  • 21.
  • 22. Ubuntu as Operating System This platform includes a Debian based operating setting, so the provided instructions are platform-specific. Local replication of our framework requires an Ubuntu 18.04 operating system in order to install all the libraries successfully. Our platform is agnostic to the Python version, all the code executed in the notebook is written in Python, and it can be executed in the versions 2 or 3 of the interpreter. Our framework is able to set up and run the experiment from the online platform, enabling users to deploy and execute the code in a free of charge environment and without special requirements in the client-side.
  • 23. Fine tuning the setup process The definition of the setup was an incremental process of three main steps: (1) Initial setup: The first functional version; (2) Packing components: Uploading components in batches to cloud storage; and (3) Optimal setup: A version that loads faster.
  • 24. Initialsetup In this step, the libraries were downloaded and compiled directly from the notebook by running shell commands from the notebook cells. Pre-requisites, missing dependencies, and additional packages were installed in the same notebook. The dataset and the pre-trained models were downloaded from their original sources to the virtual machine. The feature extraction, training, and evaluation code were directly inserted into the notebook in separate cells. The first version was tested until it successfully extracted the features, trained, and evaluated the models from the notebook. A backup of this notebook was documented and set as the initial version.
  • 25. Packing components Each individual compiled library was packaged into a zip file that contains the binary files as well as the configuration files. The pre-trained models that were individually downloaded from their original sources were packed together into a single file. Sometimes the latency is reduced by downloading a single large file from a high-speed source and increased when downloading multiple large files from different bandwidths. The outcome of this task is a collection of zip files that were uploaded to a Google Drive account. The files were shared with public access to be able to be downloaded in Google Colaboratory notebooks logged with different accounts.
  • 26. Optimal setup After packaging and storing the files from the initial setup to the cloud, we started a branch of the initial setup that loads these files. The optimal setup notebook was a simplified version of the initial notebook, instead of having a long section documenting the setup process, it was replaced with a download pre-requisites section. The files were downloaded by using a Python tool called GDown that is already installed in Google Colaborary. It is important to mention that the virtual machine attached to the Google Colaboratory notebooks has already an Ubuntu distribution with the most common machine learning tools and libraries already installed. This optimal version is tailored to Google Colaboratory only.
  • 27. Optimizing the loading time Per each of the libraries installed, we measured the time that takes to install the prerequisites plus the compilation time. In average, the overall setup of each library was five times slower than downloading and extracting a previously compiled and zipped version of the library. The total setup time for the Google Colaboratory environment was reduced from 43 minutes to 6 minutes after implementing the pre- compiled tools strategy and by downloading the files from the same Google infrastructure.
  • 28. Feature extractors MultiAffect includes a feature extraction module as an independent component. Multimodal feature extraction is often a highly demanding task, as it requires a certain pre-processing of the videos before being able to extract features. Some common pre-processing tasks are: separating the audio, extracting frames, identifying faces, cropping faces, removing the background, skelethon detection (pose), emotion detection, among many other procedures. Our feature extraction methodology is based on the common ground found in submissions. Our feature extraction process aims to maintain as invariant factors features such as the person descriptors (i.e., gender, age, race), scale, position, background, and language. Our approach considers ten features from five different modalities: face, body, audio, text, and emotions.
  • 29. Audio features OpenSMILE (1582 features): The audio is extracted from the videos and are processed by OpenSMILE that extract audio features such as loudness, pitch, jitter, etc. It was tested on video-clip length (general) and 20 fragments (temporal).
  • 30. Text features Opinion Lexicon (6 features): depends on the ratio of sentiment words (adjectives, adverbs, verbs and nouns), which express positive or negative sentiments. Subjective Lexicon (4 features): They used the subjective Lexicon from MPQA (Multi-Perspective Question Answering) that models the sentiment by its type and intensity. Word vectors GloVe, and BERT embeddings
  • 31. Face features OpenFace (709 features): Facial behavior analysis tool that provides accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation. We get points that represents the face. VGG16 FC6 (4096 features): The faces are cropped (224×224×3), aligned, zero out the background, and passed through a pretrained VGG16 to get a take a dimensional feature vector from FC6 layer.
  • 32. Body Features OpenPose (BODY_25) (11 features): The normalized angles between the joints.I did not use the calculated features because were 25x224x224 VGG16 FC6 Skelethon image (4096 features): I drew the skeleton (neck in the center) on a black background and feed a VGG16 and extracted a feature vector of the FC6 layer.
  • 33. Emotion features EmoPy (7 features): A deep neural net toolkit for emotion analysis via Facial Expression Recognition (FER). Other (28 features): Other 4 models from different FER contest participants. 7 categories per model, 35 features in total 20 samples per video clip were predicted (temporal) from there I computed its normalized sum (general)
  • 34. Model trainer The MultiAffect models use different deep learning models to recognize affect. Among them we find RNNs (Recurrent Neural Networks), CNNs (Convolutional Neural Networks), and simple DNNs (Deep Neural Networks) as MLPs (Multilayer Perceptrons).
  • 35.
  • 36. Evaluator The MultiAffect framework is designed to perform classification and regression tasks. Depending on the performed task, the platform is adjusted to display meaningful evaluations. The classification task gives accuracy, F1-score, recall, precision, AUC and other metrics for the training, validation, and testing sets. In the case of a regression task, the framework computes the MSE (Mean Square Error) and CCC (Concordance Correlation Coefficient) that describes how well a new test or measurement reproduces a gold standard test.
  • 37. Plotting the results The results obtained from our reproducible framework for the classification task are two plots, one to visualize the accuracy while training and one for the training and testing loss; and a confusion matrix obtained while evaluating the model on the test data. On the other hand, for the regression tasks the results are displayed in a scatter plot that shows the correlation between the predicted and gold standard labels.
  • 38. Experimentation In order to test its generalizability, we performed experiments on two main tasks: affect recognition and video action recognition. The video action and affect recognition tasks are attacked through the training and testing of classification and regression models, respectively. One of the main goals of the proposed framework is to be able to perform both actions by only configuring a new set of variable without performing any change to the code. Another goal was to deliver results comparable to existing work
  • 39. Video Action Recognition for Automatic Video Editing
  • 40. All (Quadmodal): BodyTF+FaceTF+AudioG+EmoT acc_val acc_train acc_test f1_score f1_test Loss All 1.00 1.00 0.90 1.00 0.90 0.01
  • 41. Train Validation Test Confusion matrices of the Quadmodal model
  • 44. Results (it shows an almost 45 degrees line)
  • 45. Let's switch approaches to Benchmarking You can use MultiAffect as a tool for any video categorization and regression tasks. You can try it out from this URL: Now if we want to compare which of the existing techniques work better for your problem, then you will need a tool that benchmarks all the methods. This is why I adapted an existing Text Classification Benchmarking tool to be used as a tool in the cloud, you can find it out here:
  • 46. Text Classification Benchmarking tool This is a Google Colaboratory notebook with instructions that has these methods: ● Word ngram + LR (Logistic regression) ● Char ngram + LR ● (Word + Char ngram) + LR ● RNN no embedding ● RNN + GloVe embedding ● CNN (multi-channel): ● RNN + CNN ● Google BERT
  • 48. It promoted to improve fairness in reviews
  • 49. The last approach, Independent ML,DL methods Sometime you may know what is the best algorithm to use for your requirements. In that case I adapted >100 notebooks to be able to use them as a tool and to train models from the cloud by only uploading your data. You can find it out here: The process of adapting a notebook is 1) open in colab from github 2) Add extra libraries 3) Download the data from Drive. Let’s do a quick recap of all the ML/DL/RL methods to identify which method fit better to your problem.
  • 50.
  • 53. When to use it? ● Simple regression problems ○ How much the rent should cost in certain area ○ How much should I charge for specific amount of work ● Problems where we want to define a rule that separates two categories that are similar, i.e. Premium or Basic price for customers under certain parameters (number of rooms vs number of cars)
  • 54. XOR problem - Not linear
  • 56. Code from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier, export_graphviz import pydotplus iris = load_iris() clf = DecisionTreeClassifier().fit(, dot_data = export_graphviz(clf, out_file=None, filled=True, rounded=True, feature_names=iris.feature_names, class_names=['Versicolor','Setosa','Virginica']) graph = pydotplus.graph_from_dot_data(dot_data)
  • 58. When to use it? ● When we need to know what decisions the machine is taking ● When we need to explain to others how the features are evaluated ● When there are no much features
  • 60. Code from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(,, test_size=0.2) clf = RandomForestClassifier(n_estimators=100), y_train) y_pred = classifier.predict(X_test) print('accuracy is',accuracy_score(y_pred, y_test))
  • 61. When to use it? ● When we want to know alternatives of how to evaluate a problem. ● When we want to manually discard flows that are biased ● When we want to manage ensembles from one single method.
  • 63. When to use it? ● When we want to know the probabilities of the different cases. ● When we need a probabilistic model. ● When we need an easy way to prove in paper
  • 65. When to use it? ● When intuition says that the problem can be solved from getting thee most similar option. ● When the information is no exhaustive. ● When we want to justify the decision of the algorithm in a common human reasoning.
  • 67. When to use it? ● When we don’t know how to understand the data ● When we want to optimize resources by grouping related elements. ● When we want that the computer creates the labels for us.
  • 69. When to use it? ● It was the most effective technique before Neural Networks, it can achieve excellent results with less processing. ● Mathematically speaking, it is based in very strong math principles, it creates complex multidimensional hyperplanes that separates the classes precisely. ● It is not a white box technique, but may be the best option for problems where we want to get the best of Machine Learning approach without dealing with Neural Networks.
  • 71. When to use it? ● When we want to optimize a regression ● When we want to binarize the output ● As a preliminary analysis before implementing neural networks
  • 73. When to use it? ● When we have very few features and there is no extra details that can be extracted from hidden layers. ● There are in fact neural networks, and we do not need alway to use them for deep learning these can be used for machine learning when we benchmark with other machine learning techniques. ● When we want to get the power of neural networks and we don’t have much computational power.
  • 74. ML & DL frameworks
  • 75. It’s time for Deep Learning
  • 77. Artificial Neural Networks = Multi-Layer Perceptron
  • 79. When to use it? ● Classifiers when common machine Learning Algorithms performs poorly. ● Models with much features. ● Multiple classes projects.
  • 81. MNIST
  • 82.
  • 84. When to use it? ● When we want to process images ● When we want to process videos ● When we have highly dimensional data
  • 87. When to use it? ● When sequences are provided ○ Text sequences ○ Image sequences (videos) ○ Time series ● When we need to provide an ordered output
  • 91. When to use it? ● When we want to benchmark models ● When different models are stronger when these are evaluated together ● When the individual processing is not exhaustive
  • 93. Frameworks ● TPOT ● MLBox ● H2O ● Google AutoML
  • 94. When to use it? ● On every new model ● When we have enough time to train multiple models ● When we don’t know wich hyperparameters are better.
  • 96. Frameworks ● OpenAI Gym ● Google Dopamine ● RLLib ● Keras-RL ● Tensorforce ● Facebook Horizon
  • 97. When to use it? ● When a robot explores a place and needs to learn from the environment. ● When we can try as much as we can in a simulator. ● When we want to find the most optimal path
  • 98. Techniques to improve the learning process
  • 99. Principal Component Analysis (PCA) Feature selection
  • 100. When to use it? ● When we have too much features and we do not know which of them are useful. ● When we want to reduce the dimensionality of our model. ● When we want to plot our decision boundaries.
  • 102. When to use it? ● When we have limited data ● When we want to help our model to generalize more ● When our unseen data comes in very different formats.
  • 104. Discriminative: Predicts from Data Generative: Generates from data distribution
  • 105. Generative models ● Autoencoders ● Adversarial Networks ● Sequence Models ● Transformers
  • 108. When to use it? ● When we want to compress data. ● When we need to change one type of input to other type of output. ● When we don’t need much variability in the generated data.
  • 110. When to use it? ● When we need to transfer a style ● When we need more variability in the generated output ● When we need to keep context in the generation.
  • 112. When to use it? ● When we generate text ● When we generate the next sequence from a serie ● When the order in the generated output matters.
  • 113.
  • 114. When to use it? ● When context is an essential part of the generated output ● When we need to keep consistency in the frequency space. ● When we have enough computational resources.
  • 115. Put notebooks into production It seems that running code from a notebook in the cloud is just for testing purposes, but actually you can run it as a service by running from a Docker container locally. I created a script that automatically prepares a container and execute it every time you need as a command line application. Example: docker run psykohack/google-colab Code:
  • 116. Resources More and more AI research is being distributed nowadays in redistributable format. Some valuable resources can be found in:
  • 117. Conclusions ● Nowadays we can reproduce state-of-the-art AI algorithms from a web based platform. ● Complex tasks can be executed in notebooks structured as frameworks ● Our main job is to prepare the data to feed the algorithm that fits the most to our needs. ● AI prototyping is drastically accelerated by using this technologies. ● Since these technologies are between pure-code and pure-tool approaches, that gives the flexibility to iterate faster.

Editor's Notes

  1. This is how neural networks process the images to predict an output