SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Data Analysis
with TensorFlow
in PostgreSQL
Dave Page
12 May 2021
Dave Page
● EDB (CTO Office)
○ VP & Chief Architect, Database Infrastructure
● PostgreSQL
○ Core Team
○ pgAdmin Lead Developer
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
In this talk...
3
● What are PostgreSQL, pl/python3 and TensorFlow?
● Why would I use them together?
● Examples of analysis types.
● Calling TensorFlow from PostgreSQL.
● Preparing data.
● Designing a network.
● Training a model.
● Performing analysis.
Software
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
What is PostgreSQL?
5
50,000 foot overview
● Relational, SQL based database.
● Fully enterprise ready; increasingly replacing Oracle, SQL Server, DB2 and more.
● Used in pretty much every sector: government, law enforcement, financial, healthcare…
● Possibly the most SQL Standard compliant database there is.
● Highly extensible:
○ Plugin extension modules.
○ Plugin procedural languages (e.g. Python, Perl, R, Java, v8).
○ Low level code hooks.
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
What is pl/python3?
6
50,000 foot overview
● Procedural language for PostgreSQL.
● Write stored procedures, functions and anonymous blocks within your database.
● Supports Python 3:
○ Don’t try to use pl/python, which uses the now-obsolete Python 2!
● The vast Python ecosystem of libraries may be used.
● Combines the power of Python with PostgreSQL.
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
What is TensorFlow?
7
50,000 foot overview
● Open Source Machine Learning library.
● Originated from the Google Brain team.
● Extremely powerful and flexible.
● Supports a variety of languages:
○ Python
○ C/C++
○ R
○ Javascript
○ …
● Library of pre-built models and datasets.
● Supports distributed learning.
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Why?
8
Not just for fun
● Our data is already in the database.
● We can easily use the power of SQL to choose and format data for analysis:
○ SQL is designed for working with datasets:
■ datum ~= scalar
■ tuple ~= vector
■ array/set ~= matrix/tensor
○ SELECT … FROM … WHERE …
○ Mathematical functions & operators: sqrt(), log(), power(), mod(), round()...
○ Aggregates and Window Functions, Common Table Expressions.
Analysis types
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Regression analysis
10
● Model relationships between input values (features) and outputs.
● Analyse new or hypothetical inputs and predict outputs.
● For example, house prices:
○ Inputs:
■ Number of bedrooms
■ Property type (detached, semi, flat etc.)
■ Property condition
■ Proximity to the beach
■ Proximity to major roads or a rail link to the city
■ Council tax cost
■ Number of nearby pubs serving CAMRA recommended beer
○ Output:
■ The price of the house
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Time series analysis
11
● Analyse time series data and make predictions.
● More powerful than linear analysis, predicting:
○ Linear trends (upwards or downwards)
○ Seasonal variability, e.g.
■ Summer is busier than winter.
■ Friday and Saturday night account for 60% of trade.
■ January is always the slowest month.
■ Multiple seasonalities can be predicted together.
○ Noise is inherently smoothed out, unless it overshadows trends and seasonal variations.
● Useful for multiple purposes:
○ Capacity management of application deployments.
○ Sales predictions.
○ Stock management.
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Other types of analysis
12
Not covered in this talk!
● Text prediction/generation.
● Text classification.
● Image classification.
● Object detection.
● Audio analysis.
● Speech recognition.
● The list goes on!
Getting set up
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Setting up pl/python3
14
● Install PostgreSQL:
○ If using EDB installers, use StackBuilder to install the LanguagePack.
○ On Linux, install the pl/python3 package, e.g. on Debian/Ubuntu: postgresql-plpython3-13.
● Run psql or pgAdmin, and execute:
○ CREATE EXTENSION plpython3;
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Setting up the Python environment
15
● Any Python libraries that will be used need to be added to the Python environment, using pip or the
OS package manager:
○ On Linux, using the system Python:
■ sudo pip3 install <package 1> …
○ On macOS, using the EDB LanguagePack:
■ sudo /Library/edb/languagepack/v1/Python-3.7/bin/pip install <package 1> …
○ On Window, using the EDB LanguagePack (as Administrator):
■ C:edblanguagepackv1Python-3.7binpip install <package 1> …
● Recommended starter packages:
○ tensorflow
○ numpy (will be installed automatically as a dependency of tensorflow)
○ pandas
○ matplotlib
○ seaborn
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
A brief introduction to pl/python3
16
A.K.A. Making sure it all works
Data preparation
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Preparing the data
18
● Cleanup:
○ Goal: maximise the accuracy of the model.
○ Method: eliminate data that might skew results.
○ Requires: analysis and understanding of existing data.
○ Applies mostly to regression analysis where we're trying to model a relationship, rather than time series.
● Multiple data sets:
○ Training data is used to teach the model.
○ Validation data is used during training to validate what has been learnt.
○ Test data is optionally used to test the model.
○ Training vs. validation data is typically randomly selected for regression analysis.
○ Training vs. validation data is typically sequential for time series analysis.
○ Ratio of training to validation (and test) data is usually skewed towards training, e.g. 3:1 or 4:1.
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Correlations
19
Analysis
● Some features have stronger correlations to the output than others.
● We can exclude uncorrelated or loosely correlated features to simplify the neural network (model)
and increase accuracy.
NOTICE: Correlation data:
crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
crim 1.000000 -0.200469 0.406583 -0.055892 0.420972 -0.219247 0.352734 -0.379670 0.625505 0.582764 0.289946 -0.385064 0.455621 -0.388305
zn -0.200469 1.000000 -0.533828 -0.042697 -0.516604 0.311991 -0.569537 0.664408 -0.311948 -0.314563 -0.391679 0.175520 -0.412995 0.360445
indus 0.406583 -0.533828 1.000000 0.062938 0.763651 -0.391676 0.644779 -0.708027 0.595129 0.720760 0.383248 -0.356977 0.603800 -0.483725
chas -0.055892 -0.042697 0.062938 1.000000 0.091203 0.091251 0.086518 -0.099176 -0.007368 -0.035587 -0.121515 0.048788 -0.053929 0.175260
nox 0.420972 -0.516604 0.763651 0.091203 1.000000 -0.302188 0.731470 -0.769230 0.611441 0.668023 0.188933 -0.380051 0.590879 -0.427321
rm -0.219247 0.311991 -0.391676 0.091251 -0.302188 1.000000 -0.240265 0.205246 -0.209847 -0.292048 -0.355501 0.128069 -0.613808 0.695360
age 0.352734 -0.569537 0.644779 0.086518 0.731470 -0.240265 1.000000 -0.747881 0.456022 0.506456 0.261515 -0.273534 0.602339 -0.376955
dis -0.379670 0.664408 -0.708027 -0.099176 -0.769230 0.205246 -0.747881 1.000000 -0.494588 -0.534432 -0.232471 0.291512 -0.496996 0.249929
rad 0.625505 -0.311948 0.595129 -0.007368 0.611441 -0.209847 0.456022 -0.494588 1.000000 0.910228 0.464741 -0.444413 0.488676 -0.381626
tax 0.582764 -0.314563 0.720760 -0.035587 0.668023 -0.292048 0.506456 -0.534432 0.910228 1.000000 0.460853 -0.441808 0.543993 -0.468536
ptratio 0.289946 -0.391679 0.383248 -0.121515 0.188933 -0.355501 0.261515 -0.232471 0.464741 0.460853 1.000000 -0.177383 0.374044 -0.507787
b -0.385064 0.175520 -0.356977 0.048788 -0.380051 0.128069 -0.273534 0.291512 -0.444413 -0.441808 -0.177383 1.000000 -0.366087 0.333461
lstat 0.455621 -0.412995 0.603800 -0.053929 0.590879 -0.613808 0.602339 -0.496996 0.488676 0.543993 0.374044 -0.366087 1.000000 -0.737663
medv -0.388305 0.360445 -0.483725 0.175260 -0.427321 0.695360 -0.376955 0.249929 -0.381626 -0.468536 -0.507787 0.333461 -0.737663 1.000000
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Eliminating outliers
20
Analysis
● Outlier values in the training/validation data can make it harder to build an accurate model.
● Analyse the input features and automatically remove rows with outliers using an algorithm such as
interquartile range (IQR), i.e. those values that sit in the first or fourth quartile of distribution:
NOTICE: Outliers detected using IQR:
row crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
0 False False False False False False False False False False False False False False
1 False False False False False False False False False False False False False False
2 False False False False False False False False False False False False False False
3 False False False False False False False False False False False False False False
...
18 False False False False False False False False False False False True False False
19 False False False False False False False False False False False False False False
...
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Eliminating outliers
21
Example code
# Outlier detection
# Note: 'data' is a Pandas dataframe containing our raw data
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
plpy.notice('Outliers detected using IQR:n{}n'.
format((data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))))
# Outlier Removal
plpy.notice('Removing outliers...')
data = data[~((data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))).any(axis=1)]
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Visualisation
22
Everyone likes a pretty picture
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Creating data sets
23
Example code
# Figure out how many rows to use for training, validation and test
test_rows = int((actual_rows/100) * test_pct)
validation_rows = int((actual_rows/100) * validation_pct)
training_rows = actual_rows - test_rows - validation_rows
# Split the data into input and output dataframes (the last column is the output)
input = data[columns[:-1]]
output = data[columns[-1:]]
# Split the input and output into training, validation and test sets
training_input = input[:training_rows]
training_output = output[:training_rows]
validation_input = input[training_rows:training_rows+validation_rows]
validation_output = output[training_rows:training_rows+validation_rows]
test_input = input[training_rows+validation_rows:]
test_output = output[training_rows+validation_rows:]
Building
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Designing a model
25
● A model is an interconnected layered network of known mathematical functions with trainable
parameters (or filters); a.k.a. a neural network.
● Different model architectures are suited to different types of task:
○ Regression might use a simple network with multiple layers:
■ The number of input filters matches the number of input features.
■ Inner layers can be constructed as desired for best results; often based on trial and error and experience.
■ The number of output filters matches the number of outputs.
■ Layers are dense; an activation function allows modelling of non-linear functions.
○ The WaveNet architecture is well suited to time series analysis, despite being originally designed for audio
analysis:
■ A single filter on the input layer.
■ Multiple layers of filters with increasing dilation to detect seasonal patterns, e.g. 2, 4, 8, 16, 32.
■ A single filter on the output layer.
■ Layers are convolutional; all filters in one layer connect to all filters in the next.
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Creating the model
26
Regression analysis
# Define the model
# 2 layers of 13 filters for the input features, and one layer of one filter for the output
l1 = tf.keras.layers.Dense(units=13, input_shape=(2,), activation = 'relu')
l2 = tf.keras.layers.Dense(units=13, activation = 'relu')
l3 = tf.keras.layers.Dense(units=1))
model = tf.keras.Sequential([l1, l2, l3])
# Compile it
model.compile(loss=tf.keras.losses.MeanSquaredError(),
optimizer='adam')
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Creating the model
27
Time series analysis
# Define the model
model = keras.models.Sequential()
# Input layer
model.add(keras.layers.InputLayer(input_shape=[None, 1]))
# Add multiple 1D convolutional layers with increasing dilation rates to
# allow each layer to detect patterns over longer time frequencies
for dilation_rate in (1, 2, 4, 8, 16, 32):
model.add(keras.layers.Conv1D(filters=32, kernel_size=2, strides=1,
dilation_rate=dilation_rate, padding="causal", activation="relu"))
# Add one output layer, with 1 filter to give us one output per time step
model.add(keras.layers.Conv1D(filters=1, kernel_size=1))
# Create a learning optimiser and compile the model
optimizer = keras.optimizers.Adam(lr=3e-4)
model.compile(loss=keras.losses.Huber(), optimizer=optimizer, metrics=["mae"])
Training
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Training the model
29
● Training is repeated multiple times (or epochs), hopefully improving each time:
○ The training data set is used for learning.
○ The validation data set is used to validate results during training.
○ The test data is optionally used to test the model after training.
● We monitor a metric to assess how well the network is learning:
○ For regression, I've had success with Mean Squared Error (which I monitor as Root Mean Squared Error).
○ For time series, Huber loss works well (it's less sensitive to outliers than MSE).
● A callback is used to checkpoint (save) the model each time we see a better accuracy than any
previous epoch.
● With regression analysis, we use an 'early stopping' callback to exit the training epoch loop when
no further significant improvement is made, to prevent the network learning the training data
rather than the mathematical relationship.
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Training the model
30
Regression analysis
# Save a checkpoint each time our loss metric improves.
checkpoint = ModelCheckpoint("checkpoint.h5", save_best_only=True)
# Use early stopping
early_stopping = EarlyStopping(patience=50)
# Display output. This would go to stdout automatically if we weren't using pl/python
logger = LambdaCallback(
on_epoch_end=lambda epoch,
logs: plpy.notice(
'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format(
epoch,
sqrt(logs['loss']), round(100 / max_z * sqrt(logs['loss']), 5),
sqrt(logs['val_loss']), round(100 / max_z * sqrt(logs['val_loss']), 5))))
# Train it!
history = model.fit(training_input, training_output,
validation_data=(validation_input, validation_output),
epochs=epochs, verbose=False, batch_size=50,
callbacks=[logger, checkpoint, early_stopping])
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Training the model
31
Time series analysis
# Save checkpoints when we get the best model
model_checkpoint = keras.callbacks.ModelCheckpoint("checkpoint.h5", save_best_only=True)
# Use early stopping to prevent over fitting
early_stopping = keras.callbacks.EarlyStopping(patience=50)
# Display output. This would go to stdout automatically if we weren't using pl/python
logger = LambdaCallback(
on_epoch_end=lambda epoch,
logs: plpy.notice(
'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format(
epoch,
sqrt(logs['loss']), round(100 / max_z * sqrt(logs['loss']), 5),
sqrt(logs['val_loss']), round(100 / max_z * sqrt(logs['val_loss']), 5))))
# Train it!
history = model.fit(train_set, epochs=100,
validation_data=valid_set,
callbacks=[early_stopping, logger, model_checkpoint])
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Use once vs. use many
32
● Each model is trained with a specific data set.
● With regression analysis, we can re-use a model with any input features to predict an output:
○ In practice this means we might use the model repeatedly over time to model different inputs.
● With time series analysis we can reuse the model to predict different timeframes:
○ In practice, this means we might only use a model once when performing time series analysis.
● Models can be 're-trained' as new data becomes available:
○ If the data distribution has changed, the model might degrade.
○ It may be preferable to re-train from scratch.
● For complex problems, it may be useful to start with a suitable pre-trained generic model, and
continue training with specific data:
○ This is known as transfer learning.
Using
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Using the model
34
Regression analysis
CREATE OR REPLACE FUNCTION public.rg_analysis(
input_values double precision[],
model_path text)
RETURNS double precision[]
LANGUAGE 'plpython3u'
AS $BODY$
import tensorflow as tf
# Reset everything
tf.keras.backend.clear_session()
tf.random.set_seed(42)
# Load the model
model = tf.keras.models.load_model("checkpoint.h5")
# Are we dealing with a single prediction,
# or a list of them?
if not any(isinstance(sub, list) for sub in
input_values):
data = [input_values]
else:
data = input_values
# Make the prediction(s)
result = model.predict([data])[0]
result = [ item for elem in result for item in elem]
return result
$BODY$;
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Using the model
35
Time series analysis
# Load the best model from the last checkpoint
model = keras.models.load_model("checkpoint.h5")
cnn_forecast = model_forecast(model,
series[..., np.newaxis],
window_size)
cnn_forecast = cnn_forecast[train_samples - window_size:-1, -1, 0]
plt.figure(figsize=(10, 6))
plot_series(dates,
np.concatenate([series[:train_samples],
np.full(valid_samples, None, dtype=float)]),
label="Training Data")
plot_series(dates,
np.concatenate([np.full(train_samples, None, dtype=float),
series[train_samples:]]),
label="Validation Data")
plot_series(dates,
np.concatenate([np.full(train_samples, None, dtype=float),
cnn_forecast]),
label="Forecast Data")
plt.savefig('ts_analysis.png')
Conclusion
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Summary
37
In this talk:
● We introduced PostgreSQL, TensorFlow and pl/python3.
● Discussed why we might use them together.
● Introduced two (of many) types of analysis we can perform:
○ Regression.
○ Time Series.
● Showed how we can call TensorFlow from PostgreSQL using pl/python3.
● Walked through the main steps of performing an analysis, considering regression and time series
problems:
○ Preparing the data.
○ Creating a model.
○ Training the model.
○ Using the model.
2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Questions and resources
38
Questions?
● EDB blog, includes posts on machine learning and other topics:
○ https://www.enterprisedb.com/dave-page
● Experimental code from my ML/AI journey:
○ https://github.com/dpage/ml-experiments
● Other resources:
○ https://www.postgresql.org
○ https://www.tensorflow.org
○ https://www.postgresql.org/docs/current/plpython.html
○ https://pandas.pydata.org
○ https://numpy.org
○ https://matplotlib.org
○ https://seaborn.pydata.org

Más contenido relacionado

La actualidad más candente

Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthScyllaDB
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsMydbops
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1PoguttuezhiniVP
 
MySQL 8.0 achitecture and enhancement
MySQL 8.0 achitecture and enhancementMySQL 8.0 achitecture and enhancement
MySQL 8.0 achitecture and enhancementlalit choudhary
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenPostgresOpen
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPThomas Graf
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Introduction about Python by JanBask Training
Introduction about Python by JanBask TrainingIntroduction about Python by JanBask Training
Introduction about Python by JanBask TrainingJanBask Training
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialJason Terpko
 
IPTABLES Introduction
IPTABLES IntroductionIPTABLES Introduction
IPTABLES IntroductionHungWei Chiu
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
PMM database open source monitoring solution
PMM database open source monitoring solutionPMM database open source monitoring solution
PMM database open source monitoring solutionLior Altarescu
 

La actualidad más candente (20)

Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
 
MySQL 8.0 achitecture and enhancement
MySQL 8.0 achitecture and enhancementMySQL 8.0 achitecture and enhancement
MySQL 8.0 achitecture and enhancement
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
 
Deploying IPv6 on OpenStack
Deploying IPv6 on OpenStackDeploying IPv6 on OpenStack
Deploying IPv6 on OpenStack
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Introduction about Python by JanBask Training
Introduction about Python by JanBask TrainingIntroduction about Python by JanBask Training
Introduction about Python by JanBask Training
 
Python programming
Python  programmingPython  programming
Python programming
 
Advance python
Advance pythonAdvance python
Advance python
 
Linux crontab
Linux crontabLinux crontab
Linux crontab
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
IPTABLES Introduction
IPTABLES IntroductionIPTABLES Introduction
IPTABLES Introduction
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
OSPF Fundamental
OSPF FundamentalOSPF Fundamental
OSPF Fundamental
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Bgp tutorial for ISP
Bgp tutorial for ISPBgp tutorial for ISP
Bgp tutorial for ISP
 
PMM database open source monitoring solution
PMM database open source monitoring solutionPMM database open source monitoring solution
PMM database open source monitoring solution
 

Similar a Data Analysis with TensorFlow in PostgreSQL

Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Botnet detection in SDN by DL techniques
Botnet detection in SDN by DL techniquesBotnet detection in SDN by DL techniques
Botnet detection in SDN by DL techniquesIvan Letteri
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
An In-Depth Look Into Microcontrollers
An In-Depth Look Into MicrocontrollersAn In-Depth Look Into Microcontrollers
An In-Depth Look Into MicrocontrollersICS
 
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfdigitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfDuy-Hieu Bui
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockScyllaDB
 
Machine learning Experiments report
Machine learning Experiments report Machine learning Experiments report
Machine learning Experiments report AlmkdadAli
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with PythonGLC Networks
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analyticsSouth West Data Meetup
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyAlexander Kukushkin
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK FrameworkSecure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK FrameworkLeszek Mi?
 

Similar a Data Analysis with TensorFlow in PostgreSQL (20)

Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Botnet detection in SDN by DL techniques
Botnet detection in SDN by DL techniquesBotnet detection in SDN by DL techniques
Botnet detection in SDN by DL techniques
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
An In-Depth Look Into Microcontrollers
An In-Depth Look Into MicrocontrollersAn In-Depth Look Into Microcontrollers
An In-Depth Look Into Microcontrollers
 
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfdigitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
Machine learning Experiments report
Machine learning Experiments report Machine learning Experiments report
Machine learning Experiments report
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with Python
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK FrameworkSecure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
Secure 2019 - APT for Everyone - Adversary Simulations based on ATT&CK Framework
 

Más de EDB

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenEDB
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube EDB
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EDB
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLEDB
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLEDB
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?EDB
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresEDB
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLEDB
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!EDB
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesEDB
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoEDB
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJEDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 

Más de EDB (20)

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQL
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQL
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJ
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos données
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - Italiano
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Data Analysis with TensorFlow in PostgreSQL

  • 1. Data Analysis with TensorFlow in PostgreSQL Dave Page 12 May 2021
  • 2. Dave Page ● EDB (CTO Office) ○ VP & Chief Architect, Database Infrastructure ● PostgreSQL ○ Core Team ○ pgAdmin Lead Developer
  • 3. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved In this talk... 3 ● What are PostgreSQL, pl/python3 and TensorFlow? ● Why would I use them together? ● Examples of analysis types. ● Calling TensorFlow from PostgreSQL. ● Preparing data. ● Designing a network. ● Training a model. ● Performing analysis.
  • 5. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved What is PostgreSQL? 5 50,000 foot overview ● Relational, SQL based database. ● Fully enterprise ready; increasingly replacing Oracle, SQL Server, DB2 and more. ● Used in pretty much every sector: government, law enforcement, financial, healthcare… ● Possibly the most SQL Standard compliant database there is. ● Highly extensible: ○ Plugin extension modules. ○ Plugin procedural languages (e.g. Python, Perl, R, Java, v8). ○ Low level code hooks.
  • 6. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved What is pl/python3? 6 50,000 foot overview ● Procedural language for PostgreSQL. ● Write stored procedures, functions and anonymous blocks within your database. ● Supports Python 3: ○ Don’t try to use pl/python, which uses the now-obsolete Python 2! ● The vast Python ecosystem of libraries may be used. ● Combines the power of Python with PostgreSQL.
  • 7. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved What is TensorFlow? 7 50,000 foot overview ● Open Source Machine Learning library. ● Originated from the Google Brain team. ● Extremely powerful and flexible. ● Supports a variety of languages: ○ Python ○ C/C++ ○ R ○ Javascript ○ … ● Library of pre-built models and datasets. ● Supports distributed learning.
  • 8. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Why? 8 Not just for fun ● Our data is already in the database. ● We can easily use the power of SQL to choose and format data for analysis: ○ SQL is designed for working with datasets: ■ datum ~= scalar ■ tuple ~= vector ■ array/set ~= matrix/tensor ○ SELECT … FROM … WHERE … ○ Mathematical functions & operators: sqrt(), log(), power(), mod(), round()... ○ Aggregates and Window Functions, Common Table Expressions.
  • 10. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Regression analysis 10 ● Model relationships between input values (features) and outputs. ● Analyse new or hypothetical inputs and predict outputs. ● For example, house prices: ○ Inputs: ■ Number of bedrooms ■ Property type (detached, semi, flat etc.) ■ Property condition ■ Proximity to the beach ■ Proximity to major roads or a rail link to the city ■ Council tax cost ■ Number of nearby pubs serving CAMRA recommended beer ○ Output: ■ The price of the house
  • 11. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Time series analysis 11 ● Analyse time series data and make predictions. ● More powerful than linear analysis, predicting: ○ Linear trends (upwards or downwards) ○ Seasonal variability, e.g. ■ Summer is busier than winter. ■ Friday and Saturday night account for 60% of trade. ■ January is always the slowest month. ■ Multiple seasonalities can be predicted together. ○ Noise is inherently smoothed out, unless it overshadows trends and seasonal variations. ● Useful for multiple purposes: ○ Capacity management of application deployments. ○ Sales predictions. ○ Stock management.
  • 12. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Other types of analysis 12 Not covered in this talk! ● Text prediction/generation. ● Text classification. ● Image classification. ● Object detection. ● Audio analysis. ● Speech recognition. ● The list goes on!
  • 14. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Setting up pl/python3 14 ● Install PostgreSQL: ○ If using EDB installers, use StackBuilder to install the LanguagePack. ○ On Linux, install the pl/python3 package, e.g. on Debian/Ubuntu: postgresql-plpython3-13. ● Run psql or pgAdmin, and execute: ○ CREATE EXTENSION plpython3;
  • 15. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Setting up the Python environment 15 ● Any Python libraries that will be used need to be added to the Python environment, using pip or the OS package manager: ○ On Linux, using the system Python: ■ sudo pip3 install <package 1> … ○ On macOS, using the EDB LanguagePack: ■ sudo /Library/edb/languagepack/v1/Python-3.7/bin/pip install <package 1> … ○ On Window, using the EDB LanguagePack (as Administrator): ■ C:edblanguagepackv1Python-3.7binpip install <package 1> … ● Recommended starter packages: ○ tensorflow ○ numpy (will be installed automatically as a dependency of tensorflow) ○ pandas ○ matplotlib ○ seaborn
  • 16. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved A brief introduction to pl/python3 16 A.K.A. Making sure it all works
  • 18. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Preparing the data 18 ● Cleanup: ○ Goal: maximise the accuracy of the model. ○ Method: eliminate data that might skew results. ○ Requires: analysis and understanding of existing data. ○ Applies mostly to regression analysis where we're trying to model a relationship, rather than time series. ● Multiple data sets: ○ Training data is used to teach the model. ○ Validation data is used during training to validate what has been learnt. ○ Test data is optionally used to test the model. ○ Training vs. validation data is typically randomly selected for regression analysis. ○ Training vs. validation data is typically sequential for time series analysis. ○ Ratio of training to validation (and test) data is usually skewed towards training, e.g. 3:1 or 4:1.
  • 19. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Correlations 19 Analysis ● Some features have stronger correlations to the output than others. ● We can exclude uncorrelated or loosely correlated features to simplify the neural network (model) and increase accuracy. NOTICE: Correlation data: crim zn indus chas nox rm age dis rad tax ptratio b lstat medv crim 1.000000 -0.200469 0.406583 -0.055892 0.420972 -0.219247 0.352734 -0.379670 0.625505 0.582764 0.289946 -0.385064 0.455621 -0.388305 zn -0.200469 1.000000 -0.533828 -0.042697 -0.516604 0.311991 -0.569537 0.664408 -0.311948 -0.314563 -0.391679 0.175520 -0.412995 0.360445 indus 0.406583 -0.533828 1.000000 0.062938 0.763651 -0.391676 0.644779 -0.708027 0.595129 0.720760 0.383248 -0.356977 0.603800 -0.483725 chas -0.055892 -0.042697 0.062938 1.000000 0.091203 0.091251 0.086518 -0.099176 -0.007368 -0.035587 -0.121515 0.048788 -0.053929 0.175260 nox 0.420972 -0.516604 0.763651 0.091203 1.000000 -0.302188 0.731470 -0.769230 0.611441 0.668023 0.188933 -0.380051 0.590879 -0.427321 rm -0.219247 0.311991 -0.391676 0.091251 -0.302188 1.000000 -0.240265 0.205246 -0.209847 -0.292048 -0.355501 0.128069 -0.613808 0.695360 age 0.352734 -0.569537 0.644779 0.086518 0.731470 -0.240265 1.000000 -0.747881 0.456022 0.506456 0.261515 -0.273534 0.602339 -0.376955 dis -0.379670 0.664408 -0.708027 -0.099176 -0.769230 0.205246 -0.747881 1.000000 -0.494588 -0.534432 -0.232471 0.291512 -0.496996 0.249929 rad 0.625505 -0.311948 0.595129 -0.007368 0.611441 -0.209847 0.456022 -0.494588 1.000000 0.910228 0.464741 -0.444413 0.488676 -0.381626 tax 0.582764 -0.314563 0.720760 -0.035587 0.668023 -0.292048 0.506456 -0.534432 0.910228 1.000000 0.460853 -0.441808 0.543993 -0.468536 ptratio 0.289946 -0.391679 0.383248 -0.121515 0.188933 -0.355501 0.261515 -0.232471 0.464741 0.460853 1.000000 -0.177383 0.374044 -0.507787 b -0.385064 0.175520 -0.356977 0.048788 -0.380051 0.128069 -0.273534 0.291512 -0.444413 -0.441808 -0.177383 1.000000 -0.366087 0.333461 lstat 0.455621 -0.412995 0.603800 -0.053929 0.590879 -0.613808 0.602339 -0.496996 0.488676 0.543993 0.374044 -0.366087 1.000000 -0.737663 medv -0.388305 0.360445 -0.483725 0.175260 -0.427321 0.695360 -0.376955 0.249929 -0.381626 -0.468536 -0.507787 0.333461 -0.737663 1.000000
  • 20. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Eliminating outliers 20 Analysis ● Outlier values in the training/validation data can make it harder to build an accurate model. ● Analyse the input features and automatically remove rows with outliers using an algorithm such as interquartile range (IQR), i.e. those values that sit in the first or fourth quartile of distribution: NOTICE: Outliers detected using IQR: row crim zn indus chas nox rm age dis rad tax ptratio b lstat medv 0 False False False False False False False False False False False False False False 1 False False False False False False False False False False False False False False 2 False False False False False False False False False False False False False False 3 False False False False False False False False False False False False False False ... 18 False False False False False False False False False False False True False False 19 False False False False False False False False False False False False False False ...
  • 21. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Eliminating outliers 21 Example code # Outlier detection # Note: 'data' is a Pandas dataframe containing our raw data Q1 = data.quantile(0.25) Q3 = data.quantile(0.75) IQR = Q3 - Q1 plpy.notice('Outliers detected using IQR:n{}n'. format((data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR)))) # Outlier Removal plpy.notice('Removing outliers...') data = data[~((data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))).any(axis=1)]
  • 22. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Visualisation 22 Everyone likes a pretty picture
  • 23. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Creating data sets 23 Example code # Figure out how many rows to use for training, validation and test test_rows = int((actual_rows/100) * test_pct) validation_rows = int((actual_rows/100) * validation_pct) training_rows = actual_rows - test_rows - validation_rows # Split the data into input and output dataframes (the last column is the output) input = data[columns[:-1]] output = data[columns[-1:]] # Split the input and output into training, validation and test sets training_input = input[:training_rows] training_output = output[:training_rows] validation_input = input[training_rows:training_rows+validation_rows] validation_output = output[training_rows:training_rows+validation_rows] test_input = input[training_rows+validation_rows:] test_output = output[training_rows+validation_rows:]
  • 25. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Designing a model 25 ● A model is an interconnected layered network of known mathematical functions with trainable parameters (or filters); a.k.a. a neural network. ● Different model architectures are suited to different types of task: ○ Regression might use a simple network with multiple layers: ■ The number of input filters matches the number of input features. ■ Inner layers can be constructed as desired for best results; often based on trial and error and experience. ■ The number of output filters matches the number of outputs. ■ Layers are dense; an activation function allows modelling of non-linear functions. ○ The WaveNet architecture is well suited to time series analysis, despite being originally designed for audio analysis: ■ A single filter on the input layer. ■ Multiple layers of filters with increasing dilation to detect seasonal patterns, e.g. 2, 4, 8, 16, 32. ■ A single filter on the output layer. ■ Layers are convolutional; all filters in one layer connect to all filters in the next.
  • 26. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Creating the model 26 Regression analysis # Define the model # 2 layers of 13 filters for the input features, and one layer of one filter for the output l1 = tf.keras.layers.Dense(units=13, input_shape=(2,), activation = 'relu') l2 = tf.keras.layers.Dense(units=13, activation = 'relu') l3 = tf.keras.layers.Dense(units=1)) model = tf.keras.Sequential([l1, l2, l3]) # Compile it model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer='adam')
  • 27. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Creating the model 27 Time series analysis # Define the model model = keras.models.Sequential() # Input layer model.add(keras.layers.InputLayer(input_shape=[None, 1])) # Add multiple 1D convolutional layers with increasing dilation rates to # allow each layer to detect patterns over longer time frequencies for dilation_rate in (1, 2, 4, 8, 16, 32): model.add(keras.layers.Conv1D(filters=32, kernel_size=2, strides=1, dilation_rate=dilation_rate, padding="causal", activation="relu")) # Add one output layer, with 1 filter to give us one output per time step model.add(keras.layers.Conv1D(filters=1, kernel_size=1)) # Create a learning optimiser and compile the model optimizer = keras.optimizers.Adam(lr=3e-4) model.compile(loss=keras.losses.Huber(), optimizer=optimizer, metrics=["mae"])
  • 29. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Training the model 29 ● Training is repeated multiple times (or epochs), hopefully improving each time: ○ The training data set is used for learning. ○ The validation data set is used to validate results during training. ○ The test data is optionally used to test the model after training. ● We monitor a metric to assess how well the network is learning: ○ For regression, I've had success with Mean Squared Error (which I monitor as Root Mean Squared Error). ○ For time series, Huber loss works well (it's less sensitive to outliers than MSE). ● A callback is used to checkpoint (save) the model each time we see a better accuracy than any previous epoch. ● With regression analysis, we use an 'early stopping' callback to exit the training epoch loop when no further significant improvement is made, to prevent the network learning the training data rather than the mathematical relationship.
  • 30. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Training the model 30 Regression analysis # Save a checkpoint each time our loss metric improves. checkpoint = ModelCheckpoint("checkpoint.h5", save_best_only=True) # Use early stopping early_stopping = EarlyStopping(patience=50) # Display output. This would go to stdout automatically if we weren't using pl/python logger = LambdaCallback( on_epoch_end=lambda epoch, logs: plpy.notice( 'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format( epoch, sqrt(logs['loss']), round(100 / max_z * sqrt(logs['loss']), 5), sqrt(logs['val_loss']), round(100 / max_z * sqrt(logs['val_loss']), 5)))) # Train it! history = model.fit(training_input, training_output, validation_data=(validation_input, validation_output), epochs=epochs, verbose=False, batch_size=50, callbacks=[logger, checkpoint, early_stopping])
  • 31. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Training the model 31 Time series analysis # Save checkpoints when we get the best model model_checkpoint = keras.callbacks.ModelCheckpoint("checkpoint.h5", save_best_only=True) # Use early stopping to prevent over fitting early_stopping = keras.callbacks.EarlyStopping(patience=50) # Display output. This would go to stdout automatically if we weren't using pl/python logger = LambdaCallback( on_epoch_end=lambda epoch, logs: plpy.notice( 'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format( epoch, sqrt(logs['loss']), round(100 / max_z * sqrt(logs['loss']), 5), sqrt(logs['val_loss']), round(100 / max_z * sqrt(logs['val_loss']), 5)))) # Train it! history = model.fit(train_set, epochs=100, validation_data=valid_set, callbacks=[early_stopping, logger, model_checkpoint])
  • 32. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Use once vs. use many 32 ● Each model is trained with a specific data set. ● With regression analysis, we can re-use a model with any input features to predict an output: ○ In practice this means we might use the model repeatedly over time to model different inputs. ● With time series analysis we can reuse the model to predict different timeframes: ○ In practice, this means we might only use a model once when performing time series analysis. ● Models can be 're-trained' as new data becomes available: ○ If the data distribution has changed, the model might degrade. ○ It may be preferable to re-train from scratch. ● For complex problems, it may be useful to start with a suitable pre-trained generic model, and continue training with specific data: ○ This is known as transfer learning.
  • 33. Using
  • 34. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Using the model 34 Regression analysis CREATE OR REPLACE FUNCTION public.rg_analysis( input_values double precision[], model_path text) RETURNS double precision[] LANGUAGE 'plpython3u' AS $BODY$ import tensorflow as tf # Reset everything tf.keras.backend.clear_session() tf.random.set_seed(42) # Load the model model = tf.keras.models.load_model("checkpoint.h5") # Are we dealing with a single prediction, # or a list of them? if not any(isinstance(sub, list) for sub in input_values): data = [input_values] else: data = input_values # Make the prediction(s) result = model.predict([data])[0] result = [ item for elem in result for item in elem] return result $BODY$;
  • 35. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Using the model 35 Time series analysis # Load the best model from the last checkpoint model = keras.models.load_model("checkpoint.h5") cnn_forecast = model_forecast(model, series[..., np.newaxis], window_size) cnn_forecast = cnn_forecast[train_samples - window_size:-1, -1, 0] plt.figure(figsize=(10, 6)) plot_series(dates, np.concatenate([series[:train_samples], np.full(valid_samples, None, dtype=float)]), label="Training Data") plot_series(dates, np.concatenate([np.full(train_samples, None, dtype=float), series[train_samples:]]), label="Validation Data") plot_series(dates, np.concatenate([np.full(train_samples, None, dtype=float), cnn_forecast]), label="Forecast Data") plt.savefig('ts_analysis.png')
  • 37. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Summary 37 In this talk: ● We introduced PostgreSQL, TensorFlow and pl/python3. ● Discussed why we might use them together. ● Introduced two (of many) types of analysis we can perform: ○ Regression. ○ Time Series. ● Showed how we can call TensorFlow from PostgreSQL using pl/python3. ● Walked through the main steps of performing an analysis, considering regression and time series problems: ○ Preparing the data. ○ Creating a model. ○ Training the model. ○ Using the model.
  • 38. 2021 Copyright © EnterpriseDB Corporation All Rights Reserved Questions and resources 38 Questions? ● EDB blog, includes posts on machine learning and other topics: ○ https://www.enterprisedb.com/dave-page ● Experimental code from my ML/AI journey: ○ https://github.com/dpage/ml-experiments ● Other resources: ○ https://www.postgresql.org ○ https://www.tensorflow.org ○ https://www.postgresql.org/docs/current/plpython.html ○ https://pandas.pydata.org ○ https://numpy.org ○ https://matplotlib.org ○ https://seaborn.pydata.org