4. MariaDB AX
• GPLv2 Open Source
• Columnar, Massively Parallel
MariaDB Storage Engine
• Scalable, high-performance
analytics platform
• Built in redundancy and
high availability
• Runs on premise, on AWS cloud
• Full SQL syntax and capabilities
regardless of platform
Big Data Sources Analytics Insight
MariaDB ColumnStore
. . .
Node 1 Node 2 Node 3 Node N
Local / AWS®
/ GlusterFS
®
ELT
Tool
s
BI
Tool
s
5. MariaDB AX Architecture
Columnar Distributed Data Storage
User Connections
User Module
n
User Module
1
Performance
Module n
Performance
Module 2
Performance
Module 1
MariaDB
Front End
Query Engine
User Module
Processes SQL Requests
Performance Module
Distributed Processing Engine
6. MAX RANK
MIN DENSE_RANK
COUNT PERCENT_RANK
SUM NTH_VALUE
AVG FIRST_VALUE
VARIANCE LAST_VALUE
VAR_POP CUME_DIST
VAR_SAMP LAG
STD LEAD
STDDEV NTILE
STDDEV_POP PERCENTILE_CON
T
STDDEV_SAMP PERCENTILE_DISC
ROW_NUMBER MEDIAN
• Aggregate over a series of related rows
• Simplified function for complex statistical
analytics over sliding window per row
- Cumulative, moving or centered aggregates
- Simple Statistical functions like rank, max, min,
average, median
- More complex functions such as distribution,
percentile, lag, lead
- Without running complex sub-queries
Windowing Functions
Source : InfiniDB SQL Syntax Guide
8. MariaDB AX
High performance columnar storage engine that support wide variety of
analytical use cases with SQL in a highly scalable distributed environments
Parallel query
processing for
distributed
environments
Faster, More
Efficient Queries
Single SQL
Interface for OLTP
and analytics
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open
Source to Big Data
Analytics
Better Price
Performance
9. Industry Category Use Case
Gaming Behavior Analytics Projecting and predicting user behavior based on past and current data
Advertising Customer Analytics Customer behavior data for market segmentation and predictive analytics.
Advertising Loyalty Analytics Customer analytics focusing on a person’s commitment to a product, company, or brand.
Web,
E-commerce
Click Stream Analytics
Web activity analysis, software testing, market research with analytics on data about the clicks areas of web pages while
web browsing [Deal News]
Marketing Promotional Testing Using marketing and campaign management data to identify the best criteria to be used for a particular marketing offer.
Social Network Network Analytics Relationship analytics among network nodes
Financial Fraud Analytics
Monitoring user financial transactions and identifying patterns of behaviour to predict and detect abnormal or
fraudulent activity to prevent damage to user and institution.
Healthcare Patient Analytics Analyzing patient medical records to identify patterns to be used for improved medical treatment.
Healthcare Clinical Analytics Analyzing clinical data and its impact on patients to identify patterns to be used for improved medical treatment.
Telco
Network and Application
Performance Analytics
Streaming data from network devices and applications enriched with business operations data to uncover actionable
insights for network planning, operations and marketing analytics
Aviation Flight analytics
Proactively project parts replacement, maintenance and air-plane retirement based on real-time and historically
collected flight parameter data [Boeing]
Customer Use Cases
11. But First: What do Containers give me?
Encapsulation of Dependencies
• O/S packages & Patches
• Execution environment (e.g. Python 2.7)
• Application Code & Dependencies
Process Isolation
• Isolate the process from anything else running
Faster, Lightweight virtualization
12. Virtual Machines vs. Containers
App 1 App 2 App 3
Bins/Libs Bins/Libs Bins/Libs
Guest OS Guest OS Guest OS
Hypervisor
Host Operating System
Infrastructure
Docker Engine
Operating System
Infrastructure
App 1 App 2 App 3
Bins/Libs Bins/Libs Bins/Libs
13. What about orchestration and Management?
Orchestration and Management of Containers and higher-level
constructs (services, deployments, etc….) is evolving
Amazon
ECS
Google Container
Engine
Azure Container
Service
14. Brilliant for Stateless Components
Source: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
19. Containers + Distributed Database = Challenges
• Data Durability
– Ephemeral container storage
• Cluster Formation
– Configuration and Coordination of User Modules and Performance
Modules
• Cluster Maintenance & Changes
– Planned and unplanned node failures
– Scale-up and scale-down
• Application Connections
– Application tier (analytics tools) should not need to change
connection information when DB topology changes
21. Monthly AWS Bill Under-utilized Laptop
Motivation
● How to Test Complex, Scale-out Deployments
22. Aspirational
Pracitcal
100% “Cloud Native”
“Desire for Kubernetes
to enable low-friction
porting of apps from
VMs to containers”
source: https://kubernetes.io/docs/concepts/
cluster-administration/networking/
23. Minikube for Single Node Kubernetes
https://kubernetes.io/docs/getting-started-guides/minikube/
25. Minikube & Kubernetes Tips
• Do the tutorials (https://kubernetes.io/docs/tutorials/)
• Read (and re-read) the Documentation
• Visual queues (terminal colors) for layers
• Hypervisor selection
• VM location and size
• Recent version of Kubernetes (--kubernetes-version=v1.8.5
--bootstrapper kubeadm)
• setting DOCKER env (eval $(minikube docker-env)
26. What about orchestration and Management?
Orchestration and Management of Containers and higher-level
constructs (services, deployments, etc….) is evolving
Amazon
ECS
Google Container
Engine
Azure Container
Service
29. Mimicking Virtual Machines
“If there exists a headless service in the same namespace as the pod and with the
same name as the subdomain, the cluster’s KubeDNS Server also returns an A record
for the Pod’s fully qualified hostname.”
Source: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
Also: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
• Leverage KubeDNS Server for easy naming
• Host sshd daemon on each container
• Shared key for ssh as kubernetes secret
• Utilize StatefulSets
https://github.com/WonkyWumpus/easy-sshd-ubuntu-1604
30. SSH with Ease
tboyd$ kubectl describe pod m01|grep IP
IP: 172.17.0.7
tboyd$ minikube ssh
$ ssh -i .ssh/easy-key root@172.17.0.7
Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.9.13 x86_64)
root@m01:~# ssh root@m02
root@m02:~#
31. MariaDB AX with StatefulSets
• Builds on the easy-sshd github
• MariaDB AX software staged to images
• Manually run standard install process to create MariaDB AX Cluster
https://github.com/WonkyWumpus/mdb-cs-easy-sshd-ubuntu-1604
34. MariaDB AX: Standard Install & Config
• Beware: execute install from pm-0 container!
• Install .debs
• Run /usr/local/mariadb/columnstore/bin/postConfigure
• Access cluster through UM Service
https://mariadb.com/kb/en/library/installing-and-configuring-a-multi-
server-columnstore-system-11x/
35. MariaDB AX + Kubernetes: Possible Future Directions
• Leverage persistent volumes and persistent volume claims
• Cluster formation and config moved into docker images
– MariaDB AX running and waiting to join cluster
– Intelligent entrypoint script for automatic cluster join
• User Module Tier automatic scaling
• Performance Module Tier automatic scaling
• Logic to tie DB Roots and Persistent External Storage
– 24 DB Roots: Instantaneously burst from 1 to 24 PM nodes!
36. Resources
• Kubernetes Documentation
• MariaDB ColumnStore Documentation
• MariaDB AX Datasheet
• IHME Customer Story
• What’s New in MariaDB AX
• 5 Simple Steps to get Started with MariaDB and Tableau
• Extract more Value with MariaDB ColumnStore Analytics