SlideShare una empresa de Scribd logo
1 de 28
NEW ENGLAND SQL SERVER
Big Data 101
SPONSORED BY
Paresh Motiwala
BIG DATA 101
SERIOUSLY, THIS IS JUST 101
BY PARESH MOTIWALA
PREPARED FOR NESQL
PARESH MOTIWALA, PMP ®
• pareshmotiwala@gmail.com
• http://www.linkedin.com/in/pareshmotiwala
• @pareshmotiwala
• www.circlesofgrowth.com
BIG DATA 101
• Who should attend
• DBAs
• CIO
• Marketing peeps
• Developers
• Big Data Enthusiasts
• Who should not attend
BIG DATA 101
• Agenda for the day:
• Sources
• Privacy concerns
• Storing- Hadoop
• Processing – MapReduce
• Presentation
• Summary
BIG DATA 101
SO WHY SHOULD I CARE ABOUT THIS?
Data is the new Electricity (Satya Nadella, Spring 2016)
https://www.microsoft.com/en-us/sql-server/data-driven
Companies Generate data, Distribute, Meter, and Use it
Where is data stored?
Current: SQL Server, Oracle, Teradata, DB2, Netezza, Open Source Databases
Casandra, MySQL, MongoDB
Unstructured: Hadoop, Spark, Data Lakes
What type of data is stored?
Traditional: Rows and Columns
Big Data Explosion: Images, streaming data, internet-connected devices (IoT),
Machine data
BIG DATA IS DRIVING TRANSFORMATIVE CHANGES
Traditional Big Data
Relational data
with highly modeled schema
All data
with schema agility
Specialized HW Commodity HW
Data
characteristics
Costs
Culture
Operational reporting
Focus on rear-view analysis
Experimentation leading
to intelligent action
With machine learning, graph, a/b testing
BIG DATA 101
• Sources
• Cell Phones
• Social Media
• Credit Cards
• GPSs
• Bread Crumbs
BIG DATA 101
• 5 Vs of Big Data
• Volume
• Variety
• Velocity
• Veracity
• Value
BIG DATA 101
• Desired Properties:
• Robustness- Fault Tolerance
• Low Latency
• Scalability
• Generalization
• Extensibility
• Ad hoc Queries
• Minimal Maintenance
• Debuggability
BIG DATA 101
• Flow
Collection Pre-processing
Intervention Visualization
Hygiene
Analysis
OVER 90% OF TODAY’S DATA
WAS CREATED IN PAST 2 YEARS
BIG DATA 101
• 5 Rs of Data Quality
• Relevancy
• Recency
• Range
• Robustness
• Reliability
• Ephemeral Vs. Durability
• Refresh of Data
BIG DATA 101
• Privacy of Data
• If I collect the data, is it mine?
• Ownership Vs Rights
• Share Answers not Data
• OpAl (http://www.trust.mit.edu/projects/)
• Enigma
• Let them know
• Why you are collecting
• What you are collecting
• FIPP- Fair Information Privacy Principles
• Individual Control
• Transparency
• Respect for Context
• Security
• Access and Accuracy
• Focused Collection
• FERPA- Family Education Rights and Privacy Act
BIG DATA 101
WHAT IS A DATA LAKE? ---COURTESY : JAMES SERRA
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.
• A place to store unlimited amounts of data in any format inexpensively, especially for archive
purposes
• Allows collection of data that you may or may not use later: “just in case”
• A way to describe any large data pool in which the schema and data requirements are not defined
until the data is queried: “just in time” or “schema on read”
• Complements EDW and can be seen as a data source for the EDW – capturing all data but only
passing relevant data to the EDW
• Frees up expensive EDW resources (storage and processing), especially for data refinement
• Allows for data exploration to be performed without waiting for the EDW team to model and load
the data (quick user access)
• Some processing in better done with Hadoop tools than ETL tools like SSIS
• Easily scalable
BIG DATA 101
THE “DATA LAKE” USES A BOTTOMS-UP APPROACH
Ingest all data
regardless of requirements
Store all data
in native format without
schema definition
Do analysis
Using analytic engines
like Hadoop
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
Courtesy : James Serra
BIG DATA 101
BIG DATA 101
BIG DATA 101
BIG DATA 101
• MapReduce
• Map –Sends Queries
• Reduce – Collects Results
• Job Tracker
• Task Tracker
• YARN
Near Realtime Data Analytics Pipeline using Azure Steam Analytics
Big Data Analytics Pipeline using Azure Data Lake
Interactive Analytics and Predictive Pipeline using Azure Data Factory
Base Architecture : Big Data Advanced Analytics Pipeline
Data Sources Ingest Prepare
(normalize, clean, etc.)
Analyze
(stat analysis, ML, etc.)
Publish
(for programmatic
consumption, BI/visualization)
Consume
(Alerts, Operational Stats,
Insights)
Machine Learning
Telemetry
Azure SQL
(Predictions)
HDI Custom ETL
Aggregate /Partition
Azure Storage Blob
dashboard of
predictions / alerts
Live / real-time data
stats, Anomalies and
aggregates
Customer
MIS
Event
Hub PowerBI
dashboard
Stream Analytics
(real-time analytics)
Azure Data Lake Analytics
(Big Data Processing)
Azure Data Lake
Storage
Azure SQL
Data
in Motion
Data
at Rest
dashboard of
operational stats
21
Scheduledhourly
transferusingAzure
DataFactory
Machine Learning
(Anomaly Detection)
VISION FOR BIG
DATA AND DATA
WAREHOUSING
Azure Data Factory
+
Federated Query
On-Premises
Data Warehouse “Big Data”
Cloud
Comprehensive
Connected
Choice
Microsoft Azure Microsoft Azure
Microsoft SQL
Server
BIG DATA 101
PRESENTATION
• R
• Python
• Power BI
• Power BI Desktop
BIG DATA 101
•Someday Big Data
will just become
data
BIG DATA 101
• Summary:
• Sources
• Privacy concerns
• Storing- Hadoop
• Processing – MapReduce
• Presentation
BIG DATA 101 - CONCLUSION
SQL Server is the best Relational Database
The world is much bigger than any one relational
database
What is your company’s data strategy?
What is your company’s cloud strategy?
Learn adjacent technologies that will make you
valuable.
Power BI?
Hadoop?
NoSQL?
BIG DATA 101
• BIBLIOGRAPHY –
• http://www.datasciencecentral.com/
• https://www.youtube.com/playlist?list=PLt-
0mOCwxJ6B_OxTlpevxJNAa7GfCLd3l
• https://www.dezyre.com/article/hadoop-components-and-architecture-
big-data-and-hadoop-training/114
• MIT Big Data Analytics Course
• Data Lake presentation by James Serra
• Future of Data…..(or something like that) by George Walters
BIBLIOGRAPHY- BIG DATA 101
Ignite (IT Pros) - https://myignite.microsoft.com/videos
Channel9 (Developers) - https://channel9.msdn.com/
Microsoft Virtual Academy (Both) – http://mva.microsoft.com
Technet Virtual Labs (Hands-on!) -
https://technet.microsoft.com/en-us/virtuallabs/default
Free Azure for 1 month - https://azure.microsoft.com/en-us/free/
Free HDInsight (Hadoop as a service) for a week -
https://azure.microsoft.com/en-us/services/hdinsight/information-
request/
MSDN? Link that to Azure for monthly Azure money.

Más contenido relacionado

La actualidad más candente

Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopFebiyan Rachman
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the CloudRob Thomas
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 

La actualidad más candente (20)

Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
BigData
BigDataBigData
BigData
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 

Destacado (12)

Tempdb, More permanent than you think
Tempdb, More permanent than you thinkTempdb, More permanent than you think
Tempdb, More permanent than you think
 
Network or perish
Network or perishNetwork or perish
Network or perish
 
Network or Perish
Network or PerishNetwork or Perish
Network or Perish
 
Copy data management
Copy data managementCopy data management
Copy data management
 
Why DBAs shun the use of tools
Why DBAs shun the use of toolsWhy DBAs shun the use of tools
Why DBAs shun the use of tools
 
Tempdb3
Tempdb3Tempdb3
Tempdb3
 
Copy data management
Copy data managementCopy data management
Copy data management
 
Georgia's Marsh and Swamp Habitat
Georgia's Marsh and Swamp HabitatGeorgia's Marsh and Swamp Habitat
Georgia's Marsh and Swamp Habitat
 
Db forensics for sql rally
Db forensics for sql rallyDb forensics for sql rally
Db forensics for sql rally
 
Intro to sql
Intro to sqlIntro to sql
Intro to sql
 
International ipr
International iprInternational ipr
International ipr
 
International intellectual property rights
International intellectual property rightsInternational intellectual property rights
International intellectual property rights
 

Similar a Big data 101

Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of dataArnon Shimoni
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
 
Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Martin Bém
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architectSaurabh K. Gupta
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 

Similar a Big data 101 (20)

Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of data
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 

Último

➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 

Último (20)

➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Big data 101

  • 1. NEW ENGLAND SQL SERVER Big Data 101 SPONSORED BY Paresh Motiwala
  • 2. BIG DATA 101 SERIOUSLY, THIS IS JUST 101 BY PARESH MOTIWALA PREPARED FOR NESQL
  • 3. PARESH MOTIWALA, PMP ® • pareshmotiwala@gmail.com • http://www.linkedin.com/in/pareshmotiwala • @pareshmotiwala • www.circlesofgrowth.com
  • 4. BIG DATA 101 • Who should attend • DBAs • CIO • Marketing peeps • Developers • Big Data Enthusiasts • Who should not attend
  • 5. BIG DATA 101 • Agenda for the day: • Sources • Privacy concerns • Storing- Hadoop • Processing – MapReduce • Presentation • Summary
  • 7. SO WHY SHOULD I CARE ABOUT THIS? Data is the new Electricity (Satya Nadella, Spring 2016) https://www.microsoft.com/en-us/sql-server/data-driven Companies Generate data, Distribute, Meter, and Use it Where is data stored? Current: SQL Server, Oracle, Teradata, DB2, Netezza, Open Source Databases Casandra, MySQL, MongoDB Unstructured: Hadoop, Spark, Data Lakes What type of data is stored? Traditional: Rows and Columns Big Data Explosion: Images, streaming data, internet-connected devices (IoT), Machine data
  • 8. BIG DATA IS DRIVING TRANSFORMATIVE CHANGES Traditional Big Data Relational data with highly modeled schema All data with schema agility Specialized HW Commodity HW Data characteristics Costs Culture Operational reporting Focus on rear-view analysis Experimentation leading to intelligent action With machine learning, graph, a/b testing
  • 9. BIG DATA 101 • Sources • Cell Phones • Social Media • Credit Cards • GPSs • Bread Crumbs
  • 10. BIG DATA 101 • 5 Vs of Big Data • Volume • Variety • Velocity • Veracity • Value
  • 11. BIG DATA 101 • Desired Properties: • Robustness- Fault Tolerance • Low Latency • Scalability • Generalization • Extensibility • Ad hoc Queries • Minimal Maintenance • Debuggability
  • 12. BIG DATA 101 • Flow Collection Pre-processing Intervention Visualization Hygiene Analysis OVER 90% OF TODAY’S DATA WAS CREATED IN PAST 2 YEARS
  • 13. BIG DATA 101 • 5 Rs of Data Quality • Relevancy • Recency • Range • Robustness • Reliability • Ephemeral Vs. Durability • Refresh of Data
  • 14. BIG DATA 101 • Privacy of Data • If I collect the data, is it mine? • Ownership Vs Rights • Share Answers not Data • OpAl (http://www.trust.mit.edu/projects/) • Enigma • Let them know • Why you are collecting • What you are collecting • FIPP- Fair Information Privacy Principles • Individual Control • Transparency • Respect for Context • Security • Access and Accuracy • Focused Collection • FERPA- Family Education Rights and Privacy Act
  • 15. BIG DATA 101 WHAT IS A DATA LAKE? ---COURTESY : JAMES SERRA A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. • A place to store unlimited amounts of data in any format inexpensively, especially for archive purposes • Allows collection of data that you may or may not use later: “just in case” • A way to describe any large data pool in which the schema and data requirements are not defined until the data is queried: “just in time” or “schema on read” • Complements EDW and can be seen as a data source for the EDW – capturing all data but only passing relevant data to the EDW • Frees up expensive EDW resources (storage and processing), especially for data refinement • Allows for data exploration to be performed without waiting for the EDW team to model and load the data (quick user access) • Some processing in better done with Hadoop tools than ETL tools like SSIS • Easily scalable
  • 16. BIG DATA 101 THE “DATA LAKE” USES A BOTTOMS-UP APPROACH Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices Courtesy : James Serra
  • 20. BIG DATA 101 • MapReduce • Map –Sends Queries • Reduce – Collects Results • Job Tracker • Task Tracker • YARN
  • 21. Near Realtime Data Analytics Pipeline using Azure Steam Analytics Big Data Analytics Pipeline using Azure Data Lake Interactive Analytics and Predictive Pipeline using Azure Data Factory Base Architecture : Big Data Advanced Analytics Pipeline Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) Machine Learning Telemetry Azure SQL (Predictions) HDI Custom ETL Aggregate /Partition Azure Storage Blob dashboard of predictions / alerts Live / real-time data stats, Anomalies and aggregates Customer MIS Event Hub PowerBI dashboard Stream Analytics (real-time analytics) Azure Data Lake Analytics (Big Data Processing) Azure Data Lake Storage Azure SQL Data in Motion Data at Rest dashboard of operational stats 21 Scheduledhourly transferusingAzure DataFactory Machine Learning (Anomaly Detection)
  • 22. VISION FOR BIG DATA AND DATA WAREHOUSING Azure Data Factory + Federated Query On-Premises Data Warehouse “Big Data” Cloud Comprehensive Connected Choice Microsoft Azure Microsoft Azure Microsoft SQL Server
  • 23. BIG DATA 101 PRESENTATION • R • Python • Power BI • Power BI Desktop
  • 24. BIG DATA 101 •Someday Big Data will just become data
  • 25. BIG DATA 101 • Summary: • Sources • Privacy concerns • Storing- Hadoop • Processing – MapReduce • Presentation
  • 26. BIG DATA 101 - CONCLUSION SQL Server is the best Relational Database The world is much bigger than any one relational database What is your company’s data strategy? What is your company’s cloud strategy? Learn adjacent technologies that will make you valuable. Power BI? Hadoop? NoSQL?
  • 27. BIG DATA 101 • BIBLIOGRAPHY – • http://www.datasciencecentral.com/ • https://www.youtube.com/playlist?list=PLt- 0mOCwxJ6B_OxTlpevxJNAa7GfCLd3l • https://www.dezyre.com/article/hadoop-components-and-architecture- big-data-and-hadoop-training/114 • MIT Big Data Analytics Course • Data Lake presentation by James Serra • Future of Data…..(or something like that) by George Walters
  • 28. BIBLIOGRAPHY- BIG DATA 101 Ignite (IT Pros) - https://myignite.microsoft.com/videos Channel9 (Developers) - https://channel9.msdn.com/ Microsoft Virtual Academy (Both) – http://mva.microsoft.com Technet Virtual Labs (Hands-on!) - https://technet.microsoft.com/en-us/virtuallabs/default Free Azure for 1 month - https://azure.microsoft.com/en-us/free/ Free HDInsight (Hadoop as a service) for a week - https://azure.microsoft.com/en-us/services/hdinsight/information- request/ MSDN? Link that to Azure for monthly Azure money.

Notas del editor

  1. Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera) http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/ http://www.jamesserra.com/archive/2014/12/the-modern-data-warehouse/ http://adtmag.com/articles/2014/07/28/gartner-warns-on-data-lakes.aspx http://intellyx.com/2015/01/30/make-sure-your-data-lake-is-both-just-in-case-and-just-in-time/ http://www.blue-granite.com/blog/bid/402596/Top-Five-Differences-between-Data-Lakes-and-Data-Warehouses http://www.martinsights.com/?p=1088 http://data-informed.com/hadoop-vs-data-warehouse-comparing-apples-oranges/ http://www.martinsights.com/?p=1082 http://www.martinsights.com/?p=1094 http://www.martinsights.com/?p=1102