SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Big Data from the
trenches
Advice from the FSI industry
By: Azrul MADISA
About me…
• VP – Enterprise Data
Architect @ Maybank
• Take care of Maybank’s
data world wide
• Nuts about data, analytics
and software dev.
• Very hands on, love to read
• Teach aikido to kids
Big Data landscape today
https://www.linkedin.com/pulse/big-data-still-thing-2016-landscape-matt-turck
Too many big data tech?
Wait … what?
I have to know ALL
that?
Let’s change the game a bit…
Usecase
The data journey
The data journey
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
Sandbox
Example: credit scoring and loan origination
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
Screens
Data staging
area
Data
warehouse
Score card
builder
Decisioning
Sandbox
Data
scientist
Acquisition with quality
Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data Entry
Data
StagingApplication
Over-night
Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data Entry
Data Staging
Application
Over-night
Audit trail
Weekly
Acquisition with quality
• Non-human error
• Use PEWMA algorithm
https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda/
Data sandbox
Creating a sandbox on the cloud
• Why cloud:
– Scale data discovery as needed
– Merging private with public data
– Less bureaucratic
• But…
– Customer data on the cloud is a no no
Creating a sandbox on the cloud
• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
Creating a sandbox on the cloud
• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
What if there is a way to mask numerical data
while keeping the statistical properties intact
Easier for the
regulators to
digest
Creating a sandbox on the cloud
• Random projection
• Usually used for dimension reduction
Original
data
(M x N)
Random
matrix
(N x N)
X =
Masked
data
(M x N)
Fast real-time vs. batch
analytics
Fast real-time analytics
• ‘Batch’ analytics:
User
Application
Over-night
batch
Data
warehouse
Predictive
analytics
Descriptive
analytics
Analytical
model
Monthly
Fast real-time analytics
• ‘Batch’ analytics:
User
Application
Over-night
batch
Data
warehouse
Predictive
analytics
Descriptive
analytics
Real time decisioning
Monthly
Fast real-time analytics
• So what is real time analytics:
User
Application
Real time decisioning analytics
Analytical
model
updated in
real time
Fast real-time analytics
• So what is real time analytics:
User
Application
Real time analytics and decisioning
Analytical
model
updated in
real time
Predictive
analytics
Batch
analytical
model
Real-time
analytical model
Fast real-time analytics
• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Location, user info
SMS campaign
Fast real-time analytics
• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Change behaviour
(E.g. buy
something else)
Learn new
behaviour
Fast real-time analytics : Real-time analytics in
action
Over time
Interest
in
concerts
Interest
in movies
Interest
in sports
Fast real-time analytics: Real time analytics in
action
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5 1
174
347
520
693
866
1039
1212
1385
1558
1731
1904
2077
2250
2423
2596
2769
2942
3115
3288
3461
3634
3807
3980
4153
4326
4499
4672
4845
5018
5191
5364
5537
5710
5883
6056
6229
6402
6575
6748
6921
7094
7267
7440
7613
7786
7959
8132
8305
8478
8651
8824
8997
9170
9343
9516
9689
9862
10…
10…
10…
10…
10…
10…
INTEREST
MESSAGES
SPORTS CONCERTS MOVIES
Interest
in
concerts
Interest
in movies
Interest
in sports
Fast real-time analytics: Real time analytics in
action
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5 1
174
347
520
693
866
1039
1212
1385
1558
1731
1904
2077
2250
2423
2596
2769
2942
3115
3288
3461
3634
3807
3980
4153
4326
4499
4672
4845
5018
5191
5364
5537
5710
5883
6056
6229
6402
6575
6748
6921
7094
7267
7440
7613
7786
7959
8132
8305
8478
8651
8824
8997
9170
9343
9516
9689
9862
10…
10…
10…
10…
10…
10…
INTEREST
MESSAGES
SPORTS CONCERTS MOVIES
Interest
in
concerts
Interest
in movies
Interest
in sports
Real time
analytical
tracking and
learning of
people’s
interest
Putting it all together
under one architecture
Data architecture
• Some difficult questions around big data and analytics
– How can I invest in big data while managing cost?
– How can I “experiment” with big data while mitigating risks?
– How can I create a 360 view of data without boiling the ocean?
– How can I use oversea data without violation regulations?
Tiered data architecture
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
Tiered data architecture
Data
consumer
Data virtualization
SQL /
Rest /
SOAP /
MQ
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
Official data model
Tiered data architecture
• Investment / level of support
Master data
Fast data
Hot data
Cold data
Investment
in CPU /
memory
Investment
in storage
Level 1
Level 1
Level 2
Level 3
Data virtualization Level 1
Level of
support
Tiered data architecture
• Invest where it matters
– Defer investment if needed
– Refocus investment without disrupting business
• Data virtualization
– Create a façade for data access
– Provide standard interface for data
– Single data model, single access, single quality checkpoint
• Allow ‘experimentation’
– E.g. cut-off point for hot / cold
• Oversea data access
– Data stays where they are, only aggregated data is transferred back
– More palatable to regulators
• 360 view
– Data can be ‘joined’ through the data virtualization layer – no laborious ETL needed
• Single place to check for data quality
That’s all folks…
• Linkedin:
– https://www.linkedin.com/in/azrul-madisa-6052419

Más contenido relacionado

La actualidad más candente

Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubCloudera, Inc.
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data PlatformAndrei Savu
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksMapR Technologies
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorDataWorks Summit
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use CasesInSemble
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Dataconomy Media
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Dr. Mohan K. Bavirisetty
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
Introduction: Architecting for Scale
Introduction: Architecting for ScaleIntroduction: Architecting for Scale
Introduction: Architecting for ScaleDataStax
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseFormant
 
Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerationsAseem Bansal
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14Phillip Delaney
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 

La actualidad más candente (20)

Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data Hub
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Introduction: Architecting for Scale
Introduction: Architecting for ScaleIntroduction: Architecting for Scale
Introduction: Architecting for Scale
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
 
Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerations
 
BigData
BigDataBigData
BigData
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 

Similar a Big data from the trenches

Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesDATAVERSITY
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes StrategicMapR Technologies
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
It's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonIt's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonCatalina Arango
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 

Similar a Big data from the trenches (20)

Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
It's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonIt's All About the Data - Tia Dubuisson
It's All About the Data - Tia Dubuisson
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 

Último

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 

Último (20)

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 

Big data from the trenches

  • 1. Big Data from the trenches Advice from the FSI industry By: Azrul MADISA
  • 2. About me… • VP – Enterprise Data Architect @ Maybank • Take care of Maybank’s data world wide • Nuts about data, analytics and software dev. • Very hands on, love to read • Teach aikido to kids
  • 3. Big Data landscape today https://www.linkedin.com/pulse/big-data-still-thing-2016-landscape-matt-turck
  • 4. Too many big data tech? Wait … what? I have to know ALL that?
  • 5. Let’s change the game a bit… Usecase
  • 7. The data journey Acquisition Dumping Tidy data Real Time Analytics Analytical model Sandbox
  • 8. Example: credit scoring and loan origination Acquisition Dumping Tidy data Real Time Analytics Analytical model Screens Data staging area Data warehouse Score card builder Decisioning Sandbox Data scientist
  • 10. Acquisition with quality • Manage data quality up front • Human-factor data quality Data Entry Data StagingApplication Over-night
  • 11. Acquisition with quality • Manage data quality up front • Human-factor data quality Data Entry Data Staging Application Over-night Audit trail Weekly
  • 12. Acquisition with quality • Non-human error • Use PEWMA algorithm https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda/
  • 14. Creating a sandbox on the cloud • Why cloud: – Scale data discovery as needed – Merging private with public data – Less bureaucratic • But… – Customer data on the cloud is a no no
  • 15. Creating a sandbox on the cloud • Masking – Non-numerical data => No sweat! – E.g. • En. Abdul Jalil => 837x2unxy237e832!@ • 720324-03-8891 => 472376-84-8732 • Masking numerical data?
  • 16. Creating a sandbox on the cloud • Masking – Non-numerical data => No sweat! – E.g. • En. Abdul Jalil => 837x2unxy237e832!@ • 720324-03-8891 => 472376-84-8732 • Masking numerical data? What if there is a way to mask numerical data while keeping the statistical properties intact Easier for the regulators to digest
  • 17. Creating a sandbox on the cloud • Random projection • Usually used for dimension reduction Original data (M x N) Random matrix (N x N) X = Masked data (M x N)
  • 18. Fast real-time vs. batch analytics
  • 19. Fast real-time analytics • ‘Batch’ analytics: User Application Over-night batch Data warehouse Predictive analytics Descriptive analytics Analytical model Monthly
  • 20. Fast real-time analytics • ‘Batch’ analytics: User Application Over-night batch Data warehouse Predictive analytics Descriptive analytics Real time decisioning Monthly
  • 21. Fast real-time analytics • So what is real time analytics: User Application Real time decisioning analytics Analytical model updated in real time
  • 22. Fast real-time analytics • So what is real time analytics: User Application Real time analytics and decisioning Analytical model updated in real time Predictive analytics Batch analytical model Real-time analytical model
  • 23. Fast real-time analytics • Q- learning • E.g. SMS advertisement campaign Real-time Analytical Marketting System Location, user info SMS campaign
  • 24. Fast real-time analytics • Q- learning • E.g. SMS advertisement campaign Real-time Analytical Marketting System Change behaviour (E.g. buy something else) Learn new behaviour
  • 25. Fast real-time analytics : Real-time analytics in action Over time Interest in concerts Interest in movies Interest in sports
  • 26. Fast real-time analytics: Real time analytics in action 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 174 347 520 693 866 1039 1212 1385 1558 1731 1904 2077 2250 2423 2596 2769 2942 3115 3288 3461 3634 3807 3980 4153 4326 4499 4672 4845 5018 5191 5364 5537 5710 5883 6056 6229 6402 6575 6748 6921 7094 7267 7440 7613 7786 7959 8132 8305 8478 8651 8824 8997 9170 9343 9516 9689 9862 10… 10… 10… 10… 10… 10… INTEREST MESSAGES SPORTS CONCERTS MOVIES Interest in concerts Interest in movies Interest in sports
  • 27. Fast real-time analytics: Real time analytics in action 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 174 347 520 693 866 1039 1212 1385 1558 1731 1904 2077 2250 2423 2596 2769 2942 3115 3288 3461 3634 3807 3980 4153 4326 4499 4672 4845 5018 5191 5364 5537 5710 5883 6056 6229 6402 6575 6748 6921 7094 7267 7440 7613 7786 7959 8132 8305 8478 8651 8824 8997 9170 9343 9516 9689 9862 10… 10… 10… 10… 10… 10… INTEREST MESSAGES SPORTS CONCERTS MOVIES Interest in concerts Interest in movies Interest in sports Real time analytical tracking and learning of people’s interest
  • 28. Putting it all together under one architecture
  • 29. Data architecture • Some difficult questions around big data and analytics – How can I invest in big data while managing cost? – How can I “experiment” with big data while mitigating risks? – How can I create a 360 view of data without boiling the ocean? – How can I use oversea data without violation regulations?
  • 30. Tiered data architecture Data warehouse - Staging - SQL access Big Data Infra (E.g. Hadoop) Data sources Batch Real-time Real-time store Master / Reference Data Social / Cloud Public Data Oversea Data Oversea data sources Social network Batch
  • 31. Tiered data architecture Data consumer Data virtualization SQL / Rest / SOAP / MQ Data warehouse - Staging - SQL access Big Data Infra (E.g. Hadoop) Data sources Batch Real-time Real-time store Master / Reference Data Social / Cloud Public Data Oversea Data Oversea data sources Social network Batch Official data model
  • 32. Tiered data architecture • Investment / level of support Master data Fast data Hot data Cold data Investment in CPU / memory Investment in storage Level 1 Level 1 Level 2 Level 3 Data virtualization Level 1 Level of support
  • 33. Tiered data architecture • Invest where it matters – Defer investment if needed – Refocus investment without disrupting business • Data virtualization – Create a façade for data access – Provide standard interface for data – Single data model, single access, single quality checkpoint • Allow ‘experimentation’ – E.g. cut-off point for hot / cold • Oversea data access – Data stays where they are, only aggregated data is transferred back – More palatable to regulators • 360 view – Data can be ‘joined’ through the data virtualization layer – no laborious ETL needed • Single place to check for data quality
  • 34. That’s all folks… • Linkedin: – https://www.linkedin.com/in/azrul-madisa-6052419