NoSQL for the SQL Server Pro

NoSQL for the DBA
Lynn Langit

April 2013 – Big Data Tech Con

Data Expertise / Lynn Langit
• Industry awards
– Microsoft – MVP for SQL Server
– Google – GDE for Cloud Platform
– 10Gen – Master for MongoDB
• Practicing Architect
• Technical author / trainer
– Pluralsight – Google Cloud Series
– DevelopMentor – SQL Server Series
– 2 books on SQL Server BI
– Cloudera trainer (certified)
• Former MSFT FTE
– 4 years

but first…
Business Intelligence
to BigData

What is the relationship?

Business
NoSQL ????
Intelligence

“The Past” BI = Effective Reports

Data
optimized for
Static
READING

BI = Optimized RDBMS
SQL queries & Data Stored on disk

BI = Transactional Data

Collecting • What happened?
Transactional • Why did that happen?
data • Decision Support Systems

Enter
Big Data
Q: What is it?

A: Your Data, plus more data….

BigData Pipeline - STEP 1 – Acquire

Acquire
Process
Store
Query & Mine
Visualize

Big Data – an example from weather

13

Big Data – an example from weather
• Source Data
• National weather data
• Satellite data
• Airplanes with sensors
• Sensors on boats
• Sensors in the ocean
• Sensors on the ground
• Historical Data
• Social Media
• Results
• More accurate predictions
• Tsunami
• Tornado

Big Data – an example from health care
• Medical records
• Regular
• Emergency
• Genetic data – 23andMe
• Food data
• SparkPeople
• Purchasing
• Grocery card
• credit card
• Search – Google
• Social media
• Twitter
• Facebook
• Exercise
• Nike Fuel Band
• Kinect
• Location - phone

BigData = ‘Next State’ Questions

• What could happen?
Collecting • Why didn’t this happen?
• When will the next new thing
Behavioral happen?
data • What will the next new thing be?
• What happens?

What is the reality of personalized medicine?
2500

2000 Key Monitoring

1500

Sensor Readings
1000

500

Other Behavioral
0 data

12:00 12:30 1:00 1:30 2:00 2:30

BigData and Verticals
• Retail
• Manufacturing
• Health Care
• Banking
• Education

Collecting BigData
• Sensors everywhere
• Structured, Semi-structured, Unstructured vs. Data
Standards
• M2M
• Public Datasets
– Freebase
– Azure DataMarket
– Hillary Mason’s list

19

DEMO – Hilary Mason’s Datasets
• Who is Hilary Mason and why do you care
about her datasets?
• How do you get her datasets?
• What do you do with her datasets?

Collecting Data – a note about Faces
• Facial recognition
• Voice recognition
• Gesture capture and analysis

21

Big Data in India

Update: “The total number of AADHAARs issued as of 24-Mar-
2013 is over 304 million. This is more than 25% of the
population of India.”

BigData Pipeline – STEP 5 - Visualize

Acquire
Process
Store
Query & Mine
Visualize

DEMO - Visualizing Big Data: Wind Map

26

Demo - Visualizing Big Data – D3

27

BigData Pipeline – STEP 2 - Process

Acquire
Process
Store
Query & Mine
Visualize

How do you clean up the mess?
• Data Hygiene
• Data Scrubbing
• Data Sprawl
• The true cost of data
• …and what about data integrity?
• …and security?
• …should your data be in the cloud?

Is NoSQL just Hadoop?

HUGE Hype factor since 2011

Apache Hadoop
• a software framework that supports data-intensive distributed
applications
• under a free license enables applications to work with thousands of
nodes and petabytes of data
• was inspired by Google's MapReduce and Google File System (GFS)
papers

What is the relationship?

NoSQL Hadoop ??? BigData

How you ‘get’ Hadoop
Open source

• roll your own

Commercial distribution

• Cloudera
• MapR
• Hortonworks
• More…

Rent it via the cloud

• AWS

Demo – Get and Use Cloudera CDH4 VM

About Hadoop MapReduce

Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png

Demo - HDInsight – MapReduce w/Java

Demo - HDInsight – MapReduce w/ Hive

Example Comparison: RDBMS vs. Hadoop

Traditional RDBMS Hadoop / MapReduce

Data Size Gigabytes (Terabytes) Petabytes and greater

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Can be near immediate Has latency (due to batch
Response processing)
Time

BigData Pipeline STEP 3 – Store

Acquire
Process
Store
Query & Mine
Visualize

“Small” BigData vs. “Big” BigData

Hadoop

NoSQL

RDBMS

The reality…two pivots

Storage Methods Storage Locations
• SQL (RDBMS) • On premises
• NoSQL or Hadoop • Cloud-hosted

Cloud-hosted NoSQL up to 50x CHEAPER

So many NoSQL options
• More than just the Elephant in the room
• Over 120+ types of NoSQL databases

Flavors of NoSQL
Key/Value Key/value Wide-Column Document Graph
Volatile Persistent

Key / Value Database
• Just keys and values
– No schema
• Persistent or Volatile
• Examples
– AWS Dynamo DB
– Riak

DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud

NoSQL BLOB Storage Buckets in the Cloud

• Amazon – S3 or Glacier
• Google – Cloud Storage
• Microsoft Azure BLOBS
• Others
– Dropbox
– Box
– More…

DEMO - Battle of the Buckets
• Google Cloud Storage VS.
• Windows Azure BLOBS VS.
• AWS S3 / Glacier

Column Database
• Wide, sparse column sets
• Schema-light
• Examples:
– Cassandra
– HBase w/Hadoop
– BigTable
– GAE HR DS

Types of Column Databases
• Column-families
– Non-relational
– Sparse
– Examples:
• HBase
• Cassandra
• xVelocity (SQL 2012 Tabular)
• Column-stores
– Relational
– Dense
– Example:
• SQL Server 2012
– Columnstore index

DEMO – SQL Server ‘NoSQL’
• SQL Server 2012 Columnstore Index
• SQL Server 2012 Tabular Model (SSAS)

Document Database (Mongo DB)
• document-oriented (collection of
JSON documents) w/semi structured
data
– Encodings include BSON, JSON, XML…
• binary forms
– PDF, Microsoft Office documents --
Word, Excel…)
• Examples:
– MongoDB
– Couchbase

Graph Databases
• a lot of many-to-many relationships
• recursive self-joins
• when your primary objective is quickly
finding connections, patterns and
relationships between the objects
within lots of data
• Examples:
– Neo4J
– Google Freebase

CAP Theorem applied = ‘how big is it?’

• CA = RDBMS
– Highly-available consistency
– Ex. SQL Server
• CP = NoSQL
– Enforced consistency
– Ex. Hadoop
• AP = NoSQL
– Eventual consistency
– Ex. MongoDB

“Small” BigData vs. “Big” BigData

Hadoop

Key/Value or
Column

Document or
Graph

RDBMS

Cloud-hosted RDBMS

• AWS RDS – SQL Server,
mySQL, Oracle
– Medium cost
– Solid feature set, i.e.
backup, snapshot
– Use existing tooling
• Google – mySQL
– Lowest cost
– Most limited RDBMS
functionality
• Microsoft – SQLAzure
– Highest cost

DEMO - AWS RDS
• SQL Server, MySQL or Oracle
• Essential to understand pricing models

Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png

NoSQL Applied
Columnstore Log Files
HBase
Key/Value

Product Catalogs
DynamoDB
Document

Social Games
MongoDB
Graph

Social aggregators
Neo4j
RDBMS

Line-of-Business
SQL Server

Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
RDBMS RDS – all major mySQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL Key-Value DynamoDB H/R Data on GAE Azure Tables
Streaming ML or Custom EC2 Prospective Search StreamInsight
(Mahout) &
Prediction API
NoSQL Document or MongoDB on EC2 Freebase MongoDB on
Graph Windows Azure
NoSQL – Column Elastic MapReduce none HDInsight
Hadoop (HBase) using S3 & EC2
Dremel/Warehousi RedShift BigQuery none
ng

BigData Pipeline STEP 4 – Query

Acquire
Process
Store
Query & Mine
Visualize

Can Excel help?
• Connector to Hadoop
• Data Explorer
• Data Quality Services
• Master Data Services
• Integration with Azure Data Market
• Visualize with PowerView
• Data Mining w/Predixion

Demo - Hadoop Connector to Excel

Google BigQuery w/Excel
• Hadoop-like (Dremel) based service
• For massive amounts of data
• SQL-like query language

DEMO - Google BigQuery
• Hadoop-like (Dremel) based service
• For massive amounts of data
• SQL-like query language

Dremel Realized => Impala
• Interactive Hadoop?

Other types of cloud data services

Hosting public datasets Cleaning / matching (your)
• Pay to read data
• Earn revenue by offering for read • ETL – Microsoft Data
Explorer, Google Refine
• Data Quality – Windows Azure
Data
Market, InfoChimps, DataMarket
.com

NoSQL To-Do List
Understand CAP & types of NoSQL databases
• Use NoSQL when business needs designate
• Use the right type of NoSQL for your business problem

Try out NoSQL on the cloud
• Quick and cheap for behavioral data
• Mashup cloud datasets
• Good for specialized use cases, i.e. dev, test , training environments

Learn noSQL access technologies
• New query languages, i.e. MapReduce, R, Infer.NET
• New query tools (vendor-specific) – Google Refine, Amazon
Karmasphere, Microsoft Excel connectors, etc…

The Changing Data Landscape

Other
Services
RDBMS
NoSQL

• recipes)

www.TeachingKidsProgramming.org
• Free Courseware (
• Do a Recipe  Teach a Kid (Ages 10 ++)
• Java or Microsoft SmallBasic  TKP site
• C# via Pluralsight

Toward Data Craftsmanship…

Follow me @LynnLangit

RSS my blog
www.LynnLangit.com

Hire me
• To help build your BI/Big Data solution
• To teach your team next gen BI
• To learn more about using NoSQL solutions

NoSQL for the SQL Server Pro

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Similar a NoSQL for the SQL Server Pro

Similar a NoSQL for the SQL Server Pro (20)

Más de Lynn Langit

Más de Lynn Langit (20)

Último

Último (20)

NoSQL for the SQL Server Pro

Notas del editor