SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
© 2ndQuadrant 2016
Big Data & PostgreSQL
Using TABLESAMPLE to Analyze
Very Large Datasets
By Umair Shahid
© 2ndQuadrant 2016
Who am I?
● Got “pushed” into PostgreSQL in 2004, ended
up falling in love with it
● Not a hardcore techie, yet passionate about
open source software
● Heading the productization efforts at
2ndQuadrant
● Interested in Big Data, specifically the newer
PostgreSQL features supporting it
© 2ndQuadrant 2016
What is the problem?
Number of Rows Size on Disk (MB) Time Taken (ms)
1k 0.23 219.706
100k 24 1,302.135
1M 195 7,696.386
5M 951 40,691.603
10M 1,923 60,012.457
100M 19,456 801,493.319
© 2ndQuadrant 2016
Why is this significant?
● Data mining has typically been a painful process
● Major contributor to the pain has been the time it
takes for queries to return
● Many false steps before the required data is
identified
● Waiting time is wasted time
● Sampling, count based or time based, reduces
the wasted time significantly
© 2ndQuadrant 2016
What is TABLESAMPLE?
● Ability to read a random sample of
data in a table
● Defined in SQL:2003 (5th revision of
SQL)
● Implemented in PostgreSQL 9.5
© 2ndQuadrant 2016
Syntax
SELECT select_expression
FROM table_name
TABLESAMPLE sampling_method ( argument [, ...] )
[ REPEATABLE ( seed ) ]
...
© 2ndQuadrant 2016
sampling_method
● argument is percentage of rows
● SYSTEM
○ Block level sampling
○ Very fast
○ Non-independent rows
● BERNOULLI
○ Row level sampling
○ Slower than SYSTEM
○ Independent rows (uniformly random)
© 2ndQuadrant 2016
© 2ndQuadrant 2016
Demo sampling methods
© 2ndQuadrant 2016
REPEATABLE results
● (Reminder: [ REPEATABLE ( seed ) ])
● Optional argument
● Used if random, yet repeatable results are
required
● seed and argument need to be the same to
produce repeatable results
● Any changes made to the table will result in a
different data set
© 2ndQuadrant 2016
Now it gets interesting …
● TABLESAMPLE allows for additional sampling methods
via extensions
● tsm_system_time specifies max number of
milliseconds to spend reading a table
● Implements the syntax:
SELECT select_expression
FROM table_name
TABLESAMPLE SYSTEM_TIME (argument)
© 2ndQuadrant 2016
Demo tsm_system_time
© 2ndQuadrant 2016
Enter Orange ...
● Funded by AXLE (http:
//axleproject.eu)
● Same project funded
TABLESAMPLE
● Available integrated
with PostgreSQL in
2UDA (http:
//2ndquadrant.
com/2uda)
● Uses TABLESAMPLE
to very quickly create
visualizations for data
● Can quickly create
predictive models
© 2ndQuadrant 2016
Demo Orange
You can find a very helpful tutorial at
http://2ndquadrant.com/2uda
© 2ndQuadrant 2016
Other Big Data features in PostgreSQL
● JSON & JSONB
● HSTORE
● XML
● Scale-out by partitioning
○ Check out Postgres-XL (http://www.
postgres-xl.org/)
● etc ...
© 2ndQuadrant 2016
Umair Shahid
Email: umair.shahid@2ndQuadrant.com
Twitter: @pg_umair
2ndQuadrant is hiring - All geographies!
Thank you for your time!

Más contenido relacionado

La actualidad más candente

[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
[EPPG] Oracle to PostgreSQL, Challenges to OpportunityEqunix Business Solutions
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLPGConf APAC
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1PgTraining
 
PostgreSQL Rocks Indonesia
PostgreSQL Rocks IndonesiaPostgreSQL Rocks Indonesia
PostgreSQL Rocks IndonesiaPGConf APAC
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178Kai Sasaki
 
Oracle to PostgreSQL migration
Oracle to PostgreSQL migrationOracle to PostgreSQL migration
Oracle to PostgreSQL migrationstrikr .
 
Simple Works Best
 Simple Works Best Simple Works Best
Simple Works BestEDB
 
Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesPGConf APAC
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 
SQL Server In-Memory OLTP: What Every SQL Professional Should Know
SQL Server In-Memory OLTP: What Every SQL Professional Should KnowSQL Server In-Memory OLTP: What Every SQL Professional Should Know
SQL Server In-Memory OLTP: What Every SQL Professional Should KnowBob Ward
 
Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in CloudInMobi Technology
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder
 
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoEqunix Business Solutions
 
PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)Alexander Kukushkin
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPBob Ward
 

La actualidad más candente (20)

[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1
 
PostgreSQL Rocks Indonesia
PostgreSQL Rocks IndonesiaPostgreSQL Rocks Indonesia
PostgreSQL Rocks Indonesia
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
Oracle to PostgreSQL migration
Oracle to PostgreSQL migrationOracle to PostgreSQL migration
Oracle to PostgreSQL migration
 
Case Studies on PostgreSQL
Case Studies on PostgreSQLCase Studies on PostgreSQL
Case Studies on PostgreSQL
 
Simple Works Best
 Simple Works Best Simple Works Best
Simple Works Best
 
Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst Practices
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
SQL Server In-Memory OLTP: What Every SQL Professional Should Know
SQL Server In-Memory OLTP: What Every SQL Professional Should KnowSQL Server In-Memory OLTP: What Every SQL Professional Should Know
SQL Server In-Memory OLTP: What Every SQL Professional Should Know
 
Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
 
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
 
TPC-H in MongoDB
TPC-H in MongoDBTPC-H in MongoDB
TPC-H in MongoDB
 
PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTP
 

Destacado

(Ab)using 4d Indexing
(Ab)using 4d Indexing(Ab)using 4d Indexing
(Ab)using 4d IndexingPGConf APAC
 
Use Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsUse Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsPGConf APAC
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollPGConf APAC
 
PostgreSQL on Amazon RDS
PostgreSQL on Amazon RDSPostgreSQL on Amazon RDS
PostgreSQL on Amazon RDSPGConf APAC
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native CompilationPGConf APAC
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!PGConf APAC
 
PostgreSQL: Past present Future
PostgreSQL: Past present FuturePostgreSQL: Past present Future
PostgreSQL: Past present FuturePGConf APAC
 
Swapping Pacemaker Corosync with repmgr
Swapping Pacemaker Corosync with repmgrSwapping Pacemaker Corosync with repmgr
Swapping Pacemaker Corosync with repmgrPGConf APAC
 
There is Javascript in my SQL
There is Javascript in my SQLThere is Javascript in my SQL
There is Javascript in my SQLPGConf APAC
 
Introduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDIntroduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDPGConf APAC
 
Security Best Practices for your Postgres Deployment
Security Best Practices for your Postgres DeploymentSecurity Best Practices for your Postgres Deployment
Security Best Practices for your Postgres DeploymentPGConf APAC
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analyticsmason_s
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresqlZaid Shabbir
 
What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6EDB
 
Useful PostgreSQL Extensions
Useful PostgreSQL ExtensionsUseful PostgreSQL Extensions
Useful PostgreSQL ExtensionsEDB
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA EDB
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 
Managing replication of PostgreSQL (Simon Riggs)
Managing replication of PostgreSQL (Simon Riggs)Managing replication of PostgreSQL (Simon Riggs)
Managing replication of PostgreSQL (Simon Riggs)Ontico
 
Managing a 14 TB reporting datawarehouse with postgresql
Managing a 14 TB reporting datawarehouse with postgresql Managing a 14 TB reporting datawarehouse with postgresql
Managing a 14 TB reporting datawarehouse with postgresql Soumya Ranjan Subudhi
 

Destacado (20)

(Ab)using 4d Indexing
(Ab)using 4d Indexing(Ab)using 4d Indexing
(Ab)using 4d Indexing
 
Use Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsUse Case: PostGIS and Agribotics
Use Case: PostGIS and Agribotics
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
PostgreSQL on Amazon RDS
PostgreSQL on Amazon RDSPostgreSQL on Amazon RDS
PostgreSQL on Amazon RDS
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
 
PostgreSQL: Past present Future
PostgreSQL: Past present FuturePostgreSQL: Past present Future
PostgreSQL: Past present Future
 
Swapping Pacemaker Corosync with repmgr
Swapping Pacemaker Corosync with repmgrSwapping Pacemaker Corosync with repmgr
Swapping Pacemaker Corosync with repmgr
 
There is Javascript in my SQL
There is Javascript in my SQLThere is Javascript in my SQL
There is Javascript in my SQL
 
Introduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDIntroduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XID
 
Security Best Practices for your Postgres Deployment
Security Best Practices for your Postgres DeploymentSecurity Best Practices for your Postgres Deployment
Security Best Practices for your Postgres Deployment
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6
 
Useful PostgreSQL Extensions
Useful PostgreSQL ExtensionsUseful PostgreSQL Extensions
Useful PostgreSQL Extensions
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
Managing replication of PostgreSQL (Simon Riggs)
Managing replication of PostgreSQL (Simon Riggs)Managing replication of PostgreSQL (Simon Riggs)
Managing replication of PostgreSQL (Simon Riggs)
 
Managing a 14 TB reporting datawarehouse with postgresql
Managing a 14 TB reporting datawarehouse with postgresql Managing a 14 TB reporting datawarehouse with postgresql
Managing a 14 TB reporting datawarehouse with postgresql
 

Similar a Big Data and PostgreSQL

Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedShubham Tagra
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedShubham Tagra
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Distributed System explained (with Java Microservices)
Distributed System explained (with Java Microservices)Distributed System explained (with Java Microservices)
Distributed System explained (with Java Microservices)Mario Romano
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overviewjoeyrobert
 
Tubular Labs - Using Elastic to Search Over 2.5B Videos
Tubular Labs - Using Elastic to Search Over 2.5B VideosTubular Labs - Using Elastic to Search Over 2.5B Videos
Tubular Labs - Using Elastic to Search Over 2.5B VideosTubular Labs
 
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois TigeotImproving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois Tigeoteurobsdcon
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Experiences testing dev versions of MySQL and why it is good for you
Experiences testing dev versions of MySQL and why it is good for youExperiences testing dev versions of MySQL and why it is good for you
Experiences testing dev versions of MySQL and why it is good for youSimon J Mudd
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQueryRyuji Tamagawa
 
Travelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraTravelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraITCamp
 
Scaling FreeSWITCH Performance
Scaling FreeSWITCH PerformanceScaling FreeSWITCH Performance
Scaling FreeSWITCH PerformanceMoises Silva
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mydbops
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 

Similar a Big Data and PostgreSQL (20)

Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speed
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Distributed System explained (with Java Microservices)
Distributed System explained (with Java Microservices)Distributed System explained (with Java Microservices)
Distributed System explained (with Java Microservices)
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overview
 
Tubular Labs - Using Elastic to Search Over 2.5B Videos
Tubular Labs - Using Elastic to Search Over 2.5B VideosTubular Labs - Using Elastic to Search Over 2.5B Videos
Tubular Labs - Using Elastic to Search Over 2.5B Videos
 
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois TigeotImproving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Experiences testing dev versions of MySQL and why it is good for you
Experiences testing dev versions of MySQL and why it is good for youExperiences testing dev versions of MySQL and why it is good for you
Experiences testing dev versions of MySQL and why it is good for you
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
Travelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraTravelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian Widera
 
Scaling FreeSWITCH Performance
Scaling FreeSWITCH PerformanceScaling FreeSWITCH Performance
Scaling FreeSWITCH Performance
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Lecture 2a
Lecture 2aLecture 2a
Lecture 2a
 

Más de PGConf APAC

PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC
 
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes LogicalPGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes LogicalPGConf APAC
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC
 
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...PGConf APAC
 
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC
 
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
 
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQLPGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQLPGConf APAC
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC
 
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...PGConf APAC
 
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC
 
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...PGConf APAC
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from TrenchesPGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from TrenchesPGConf APAC
 
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes elevenPGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes elevenPGConf APAC
 
Amazon (AWS) Aurora
Amazon (AWS) AuroraAmazon (AWS) Aurora
Amazon (AWS) AuroraPGConf APAC
 

Más de PGConf APAC (17)

PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
 
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes LogicalPGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
 
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
 
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
 
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
 
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQLPGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
 
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
 
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
 
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from TrenchesPGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from Trenches
 
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes elevenPGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
 
Amazon (AWS) Aurora
Amazon (AWS) AuroraAmazon (AWS) Aurora
Amazon (AWS) Aurora
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Big Data and PostgreSQL

  • 1. © 2ndQuadrant 2016 Big Data & PostgreSQL Using TABLESAMPLE to Analyze Very Large Datasets By Umair Shahid
  • 2. © 2ndQuadrant 2016 Who am I? ● Got “pushed” into PostgreSQL in 2004, ended up falling in love with it ● Not a hardcore techie, yet passionate about open source software ● Heading the productization efforts at 2ndQuadrant ● Interested in Big Data, specifically the newer PostgreSQL features supporting it
  • 3. © 2ndQuadrant 2016 What is the problem? Number of Rows Size on Disk (MB) Time Taken (ms) 1k 0.23 219.706 100k 24 1,302.135 1M 195 7,696.386 5M 951 40,691.603 10M 1,923 60,012.457 100M 19,456 801,493.319
  • 4. © 2ndQuadrant 2016 Why is this significant? ● Data mining has typically been a painful process ● Major contributor to the pain has been the time it takes for queries to return ● Many false steps before the required data is identified ● Waiting time is wasted time ● Sampling, count based or time based, reduces the wasted time significantly
  • 5. © 2ndQuadrant 2016 What is TABLESAMPLE? ● Ability to read a random sample of data in a table ● Defined in SQL:2003 (5th revision of SQL) ● Implemented in PostgreSQL 9.5
  • 6. © 2ndQuadrant 2016 Syntax SELECT select_expression FROM table_name TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ] ...
  • 7. © 2ndQuadrant 2016 sampling_method ● argument is percentage of rows ● SYSTEM ○ Block level sampling ○ Very fast ○ Non-independent rows ● BERNOULLI ○ Row level sampling ○ Slower than SYSTEM ○ Independent rows (uniformly random)
  • 9. © 2ndQuadrant 2016 Demo sampling methods
  • 10. © 2ndQuadrant 2016 REPEATABLE results ● (Reminder: [ REPEATABLE ( seed ) ]) ● Optional argument ● Used if random, yet repeatable results are required ● seed and argument need to be the same to produce repeatable results ● Any changes made to the table will result in a different data set
  • 11. © 2ndQuadrant 2016 Now it gets interesting … ● TABLESAMPLE allows for additional sampling methods via extensions ● tsm_system_time specifies max number of milliseconds to spend reading a table ● Implements the syntax: SELECT select_expression FROM table_name TABLESAMPLE SYSTEM_TIME (argument)
  • 12. © 2ndQuadrant 2016 Demo tsm_system_time
  • 13. © 2ndQuadrant 2016 Enter Orange ... ● Funded by AXLE (http: //axleproject.eu) ● Same project funded TABLESAMPLE ● Available integrated with PostgreSQL in 2UDA (http: //2ndquadrant. com/2uda) ● Uses TABLESAMPLE to very quickly create visualizations for data ● Can quickly create predictive models
  • 14. © 2ndQuadrant 2016 Demo Orange You can find a very helpful tutorial at http://2ndquadrant.com/2uda
  • 15. © 2ndQuadrant 2016 Other Big Data features in PostgreSQL ● JSON & JSONB ● HSTORE ● XML ● Scale-out by partitioning ○ Check out Postgres-XL (http://www. postgres-xl.org/) ● etc ...
  • 16. © 2ndQuadrant 2016 Umair Shahid Email: umair.shahid@2ndQuadrant.com Twitter: @pg_umair 2ndQuadrant is hiring - All geographies! Thank you for your time!