SlideShare una empresa de Scribd logo
1 de 23
Simple Analytics with MongoDB
About Me
I’m Ross Affandy. Senior
Developer Cum System
Administrator at Carlist.MY
MongoPress Core
Developer
I will talking about:
- Our stack (architecture)
- Our problem
- Our solution
- Our lesson
Stack in cloud

Platform – Linux (Amazon Distro)
Database – MongoDB
Language – PHP (API)
Webserver – NginX
(Sorry node.js – I’m not developing event-driven programming or require long
pulling persistent connection)
Using Amazon EC2 micro instance
600MB RAM
8GB EBS root partition
30GB EBS partition for MongoDB storage (format as xfs filesystem)
Why Amazon Cloud?
I want to save 70% of my time managing infrastructure and focus to writing code
Business Analytics Essential
- Bank use business analytics to predict & prevent credit card fraud
- Retailers use business analytics to predict the best location for
store and reach target market
- Even sports team use business analytics to determine game
strategy and ticket price
Problem to solve
Real time data collection :
- Implementing pageview counter
- Simple Analytics

Why MongoDB?
- MySQL usually blocked on file system reads
- Good at saving large volume of data
- Support asynchronous insert ( fire & forget )
- Fast access to large binary object
- Read/write ratio is highly skewed to reads
- Upsert ( simplify my code )
Data structure and how it look like?
Now the story begin!
Problem / Challenge
We face many exciting challenges ( expect the unexpected )
Implementation
We use map reduce to gather the information that we collect
What is map reduce in MongoDB and why we use it?
- Equal to count/sum/avg/group by function with MySQL.
- Map reduce is easier to understand
- Useful to process large dataset concurrently in large cluster of machines
(sorry for this, we don’t have budget yet )
Problem
Map reduce very slow and crash the server due to the javascript engine
and lack of processing power (low RAM and cpu)
MongoDB also has a group() function. Why not use it?
Group() function only return single bson object (less than 16mb). Not
useful for unique data more than 10,000 value
Problem / Challenge
Problem / Challenge
Problem / Challenge
Problem / Challenge
Moving to aggregation framework
Quickly running latest version of MongoDB just to get aggregation
function
Changing PHP query to using aggregation instead of map reduce

Good news

Server not crash

Bad news

Aggregation is better but still need more RAM to process 2 million
document. Still slow.
Experiment

Test run on Amazon SSD + 64GB RAM (Virginia)
- Copy 12GB data to another amazon EC2 instance
- Run the map reduce and aggregation query to see what break.
Nothing break. Server look happy 
Problem Solve?
Yes, but server cost is too expensive.
Solution

Denormalization
- In computing, denormalization is the process of attempting to optimise the
read performance of a database by adding redundant data or by grouping
data.In some cases, denormalisation helps cover up the inefficiencies
inherent in relational database software. A relational normalised database
imposes a heavy access load over physical storage of data even if it is well
tuned for high performance.
- Copying of the same data into multiple documents or tables in order to
simplify/optimize query processing
- Be careful about duplicate data that will easier make database big
When to denormalize?
Query data volume or IO per query VS total data volume.
Processing complexity VS total data volume.
Now everytime user access the page, we run 2 query.
1) Capture the data for analytics
2) Update other collection to replace group by. Later on will be use to display
to user.
Summary / Lesson learned
- We learned what makes MongoDB a good analytics tool
- Data modeling is important.
What questions do I have?
What answers do I have?
- Design query before design schema
- Simplified everything
MapReduce is slower and is not supposed to be used in “real time.”
TIPS
Always run load / stress test before go live
1) capacity planning
2) capacity testing
3) performance tuning
Tools
1) Dex performance tuning tool from mongolab is really helpful https://github.com/mongolab/dex
It's not about winning,
It's all about taking part!
Contact
Website: http://www.carlist.my
Email: enquiries@carlist.my
We also hiring!
jobs@carlist.my
Q&A?

Más contenido relacionado

La actualidad más candente

In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsSrinath Perera
 
DAT304_Amazon Aurora Performance Optimization with MySQL
DAT304_Amazon Aurora Performance Optimization with MySQLDAT304_Amazon Aurora Performance Optimization with MySQL
DAT304_Amazon Aurora Performance Optimization with MySQLKamal Gupta
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...In-Memory Computing Summit
 
How to optimize asp dot-net application
How to optimize asp dot-net applicationHow to optimize asp dot-net application
How to optimize asp dot-net applicationsonia merchant
 
BlazingSQL + RAPIDS AI at GTC San Jose 2019
BlazingSQL + RAPIDS AI at GTC San Jose 2019BlazingSQL + RAPIDS AI at GTC San Jose 2019
BlazingSQL + RAPIDS AI at GTC San Jose 2019Rodrigo Aramburu
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseCloudera, Inc.
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
How to optimize asp dot net application ?
How to optimize asp dot net application ?How to optimize asp dot net application ?
How to optimize asp dot net application ?sonia merchant
 
Query Anything, Anywhere with Kubernetes
Query Anything, Anywhere with KubernetesQuery Anything, Anywhere with Kubernetes
Query Anything, Anywhere with KubernetesAlluxio, Inc.
 
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungScalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungRendy Bambang Junior
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashCloudera, Inc.
 
Scalable data systems at Traveloka
Scalable data systems at TravelokaScalable data systems at Traveloka
Scalable data systems at TravelokaRendy Bambang Junior
 
Cassandra vs. MongoDB
Cassandra vs. MongoDBCassandra vs. MongoDB
Cassandra vs. MongoDBScaleGrid.io
 
Microsoft SQL Server - Benchmark Presentation
Microsoft SQL Server - Benchmark PresentationMicrosoft SQL Server - Benchmark Presentation
Microsoft SQL Server - Benchmark PresentationMicrosoft Private Cloud
 

La actualidad más candente (18)

Sergejus Barinovas
Sergejus BarinovasSergejus Barinovas
Sergejus Barinovas
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common Patterns
 
DAT304_Amazon Aurora Performance Optimization with MySQL
DAT304_Amazon Aurora Performance Optimization with MySQLDAT304_Amazon Aurora Performance Optimization with MySQL
DAT304_Amazon Aurora Performance Optimization with MySQL
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
 
How to optimize asp dot-net application
How to optimize asp dot-net applicationHow to optimize asp dot-net application
How to optimize asp dot-net application
 
BlazingSQL + RAPIDS AI at GTC San Jose 2019
BlazingSQL + RAPIDS AI at GTC San Jose 2019BlazingSQL + RAPIDS AI at GTC San Jose 2019
BlazingSQL + RAPIDS AI at GTC San Jose 2019
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
How to optimize asp dot net application ?
How to optimize asp dot net application ?How to optimize asp dot net application ?
How to optimize asp dot net application ?
 
Query Anything, Anywhere with Kubernetes
Query Anything, Anywhere with KubernetesQuery Anything, Anywhere with Kubernetes
Query Anything, Anywhere with Kubernetes
 
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungScalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev Bandung
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
Scalable data systems at Traveloka
Scalable data systems at TravelokaScalable data systems at Traveloka
Scalable data systems at Traveloka
 
Cassandra vs. MongoDB
Cassandra vs. MongoDBCassandra vs. MongoDB
Cassandra vs. MongoDB
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
Microsoft SQL Server - Benchmark Presentation
Microsoft SQL Server - Benchmark PresentationMicrosoft SQL Server - Benchmark Presentation
Microsoft SQL Server - Benchmark Presentation
 

Destacado

Thoughts on MongoDB Analytics
Thoughts on MongoDB AnalyticsThoughts on MongoDB Analytics
Thoughts on MongoDB Analyticsrogerbodamer
 
Social Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCSocial Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCPatrick Stokes
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkMongoDB
 
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...MongoDB
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI ConnectorWebinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI ConnectorMongoDB
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishMongoDB
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB
 

Destacado (9)

Thoughts on MongoDB Analytics
Thoughts on MongoDB AnalyticsThoughts on MongoDB Analytics
Thoughts on MongoDB Analytics
 
Social Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYCSocial Analytics on MongoDB at MongoNYC
Social Analytics on MongoDB at MongoNYC
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI ConnectorWebinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDB
 

Similar a Klmug presentation - Simple Analytics with MongoDB

Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Scaling Your Web Application
Scaling Your Web ApplicationScaling Your Web Application
Scaling Your Web ApplicationKetan Deshmukh
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesronwarshawsky
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performanceMydbops
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11CloudExpoEurope
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
MongoDB Tips and Tricks
MongoDB Tips and TricksMongoDB Tips and Tricks
MongoDB Tips and TricksM Malai
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingGleicon Moraes
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 

Similar a Klmug presentation - Simple Analytics with MongoDB (20)

Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Scaling Your Web Application
Scaling Your Web ApplicationScaling Your Web Application
Scaling Your Web Application
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategies
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performance
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
MongoDB Tips and Tricks
MongoDB Tips and TricksMongoDB Tips and Tricks
MongoDB Tips and Tricks
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
Vectorization whitepaper
Vectorization whitepaperVectorization whitepaper
Vectorization whitepaper
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Klmug presentation - Simple Analytics with MongoDB

  • 2. About Me I’m Ross Affandy. Senior Developer Cum System Administrator at Carlist.MY MongoPress Core Developer
  • 3. I will talking about: - Our stack (architecture) - Our problem - Our solution - Our lesson
  • 4. Stack in cloud Platform – Linux (Amazon Distro) Database – MongoDB Language – PHP (API) Webserver – NginX (Sorry node.js – I’m not developing event-driven programming or require long pulling persistent connection) Using Amazon EC2 micro instance 600MB RAM 8GB EBS root partition 30GB EBS partition for MongoDB storage (format as xfs filesystem) Why Amazon Cloud? I want to save 70% of my time managing infrastructure and focus to writing code
  • 5. Business Analytics Essential - Bank use business analytics to predict & prevent credit card fraud - Retailers use business analytics to predict the best location for store and reach target market - Even sports team use business analytics to determine game strategy and ticket price
  • 6. Problem to solve Real time data collection : - Implementing pageview counter - Simple Analytics Why MongoDB? - MySQL usually blocked on file system reads - Good at saving large volume of data - Support asynchronous insert ( fire & forget ) - Fast access to large binary object - Read/write ratio is highly skewed to reads - Upsert ( simplify my code )
  • 7.
  • 8. Data structure and how it look like?
  • 9. Now the story begin!
  • 10. Problem / Challenge We face many exciting challenges ( expect the unexpected ) Implementation We use map reduce to gather the information that we collect What is map reduce in MongoDB and why we use it? - Equal to count/sum/avg/group by function with MySQL. - Map reduce is easier to understand - Useful to process large dataset concurrently in large cluster of machines (sorry for this, we don’t have budget yet ) Problem Map reduce very slow and crash the server due to the javascript engine and lack of processing power (low RAM and cpu) MongoDB also has a group() function. Why not use it? Group() function only return single bson object (less than 16mb). Not useful for unique data more than 10,000 value
  • 15.
  • 16. Moving to aggregation framework Quickly running latest version of MongoDB just to get aggregation function Changing PHP query to using aggregation instead of map reduce Good news Server not crash Bad news Aggregation is better but still need more RAM to process 2 million document. Still slow.
  • 17.
  • 18. Experiment Test run on Amazon SSD + 64GB RAM (Virginia) - Copy 12GB data to another amazon EC2 instance - Run the map reduce and aggregation query to see what break. Nothing break. Server look happy  Problem Solve? Yes, but server cost is too expensive.
  • 19. Solution Denormalization - In computing, denormalization is the process of attempting to optimise the read performance of a database by adding redundant data or by grouping data.In some cases, denormalisation helps cover up the inefficiencies inherent in relational database software. A relational normalised database imposes a heavy access load over physical storage of data even if it is well tuned for high performance. - Copying of the same data into multiple documents or tables in order to simplify/optimize query processing - Be careful about duplicate data that will easier make database big When to denormalize? Query data volume or IO per query VS total data volume. Processing complexity VS total data volume. Now everytime user access the page, we run 2 query. 1) Capture the data for analytics 2) Update other collection to replace group by. Later on will be use to display to user.
  • 20. Summary / Lesson learned - We learned what makes MongoDB a good analytics tool - Data modeling is important. What questions do I have? What answers do I have? - Design query before design schema - Simplified everything MapReduce is slower and is not supposed to be used in “real time.” TIPS Always run load / stress test before go live 1) capacity planning 2) capacity testing 3) performance tuning Tools 1) Dex performance tuning tool from mongolab is really helpful https://github.com/mongolab/dex
  • 21. It's not about winning, It's all about taking part!
  • 23. Q&A?