SlideShare a Scribd company logo
1 of 19
Download to read offline
BI/Analytics for NoSQL:
Review of Architectures
What we'll answer in 50 minutes
•   Who is this guy?
•   How do I enable AdHoc, self
    service reporting on NoSQL?
•   How do I improve the
    performance of dashboards
    on top of NoSQL?
•   How do I integrate NoSQL
    data with my other data not
    inside NoSQL?
•   How do I enable, easy to build
    simple reports but also
    preserve the ability for rich
    NoSQL queries?
Nicholas Goodman

•    Open Source BI thought leader
       –    50+ Open Source BI customer projects
       –    Blogger, whitepapers, etc
•    Entrepreneur
       –    DynamoBI Corporation
       –    Bayon Technologies, Inc.
•    Data Geek, hacker, tinkerer, committer



    GOAL: Share perspectives,
    research, opinions.
    DISCLAIMER: Your Mileage ...
How do we answer those Q's?
Promise of “Big Data”
•   NoSQL/Hadoop/MapReduce Systems
     –   Keep more of it
     –   Cost effective analysis
     –   “Massive scale” data, now accessible to everyone (elastic)
     –   Not just SQL queries, more complex analysis




     ACCOMPLISHED: WEB SCALE, MASSIVE
     NEVER BEFORE SEEN SCALE OF DATA
     STORAGE AND PROCESSING
Reality Check!


•   Petabytes? Y                  •   Fast Queries? N
•   Cheap Storage? Y              •   Ad Hoc access? N
•   Raw Processing? Y             •   Accessibility to commodity BI
                                      tools? N
•   Rich Query Languages? Y
•   Flexible data structures? Y•      Easy report authoring? N

•   Reliable, Fault Tolerant? Y•      Levels of Aggregation? N
                               •      Integrated Data? N

     Big Data has solved the INFRASTRUCTURE of
     raw/core data storage but has provided less value
     to what BUSINESS users want for analytics.
Data Gaps too!



•   Code, Developers             •   Analysts w/ Excel, Dashboards
•   MR, Rich Graph/Access        •   Simple 2D (tables, charts)
•   Hierarchical, Unstructured   •   Filtering and easy analytics
Levels of Aggregation

SAME DATA AT VARIOUS
LEVELS OF AGGREGATION
HUGELY IMPORTANT IN REAL
LIFE IMPLEMENTATIONS!

                               10K
1 ROW                       1 MILLION
TO                         100 MILLION
1 BILLION ROWS
                           100 BILLION
Architectures

•   NoSQL   reports
•   NoSQL   thru and thru
•   NoSQL   + MySQL
•   NoSQL   as ETL Source
•   NoSQL   programs in BI Tools
•   NoSQL   via BI Database (SQL)
NoSQL reports
•   Pay Developer to build applications for reports



                                              Apps




•      100% Richness of NoSQL           •     $$, developer driven process
•      Up to date, current              •     No commodity BI tools
•      Excellent performance on         •     Managing rollups/summaries
       large datasets                   •     Schema-less = Harder!
•      Custom built, beautiful          •     Hard to integrate other
       reports/dashboards                     reporting information
•      Single system to manage
NoSQL thru and thru
•   Pay Developer to build FLEXIBLE applications for reports


      Indices                                 Advanced
       Aggs                                   Apps




•      All of NoSQL report              •     $$, developer driven process
       advantages                       •     $$, app required for aggs
•      Managed aggregations,            •     No commodity BI tools
       rollups
                                        •     Hard to integrate other
•      “Guided Adhoc” available               reporting information
       inside application
                                        •     Limited AdHoc (only
•      Higher performance for                 developer built
       dashboards/summaries                   combinations)
NoSQL + MySQL
•   Pay Developer to build FLEXIBLE applications for reports


                         ETL
                         App                MySQL




•      Less IT $$ since developers      •     Data freshness (24 hrs old)
       aren't “building reports”        •     Once into MySQL no rich
•      Rich, NoSQL analysis left in           NoSQL application use (M/R)
       place (ETL + NoSQL)              •     BI Tool can connect ONLY to
•       Easy, Ad Hoc reporting via            data in MySQL, not NoSQL
       commodity BI tools               •     Aggregations still self
•      Easier to understand data for          managed in MySQL
       self service reports
NoSQL as ETL Data Source
•   NoSQL treated like any other data source


                    Informatica         Teradata




•   Allows use of consolidated,     •     ETL Development Expense
    BI tool for AdHoc               •     Data Latency
•   Enables integrated              •     Loss of NoSQL language
    (combined) datasets for               richness
    reporting
                                    •     Traditional DW tools are $$
•   Aggregations Often
    “managed”                       •     Scaling issues with DW
                                          Database
•   Best of Breed tools
NoSQL programs in BI Tools
•   Write a program in BI tool that flattens data, output into report




•   Rich use of NoSQL native         •      Developer required to write
    language                                program ($$)
•   Direct, up to date access        •      Slow-er (aggs, summaries)
•   Access to 100% of dataset        •      Lacks integration with other
•   Leverage “guided” report                datasets
    parameter pages                  •      Still (usually) no AdHoc
•   Less expensive than apps                access
NoSQL via BI Database (SQL)
•   Enable NoSQL data access via SQL (gasp!)            Live Query
                                                        Cached, 24hr data




•      Easy reports, easy (SQL)      •         Another system in between
•      Integration with other data   •         Still needs to be refreshed,
•      ETL is simple INSERT/MERGEs             nightly
•      Live, up to date access       •         Not all capabilities for NoSQL
                                               richness available via SQL
•      High performance, cached data
•      AdHoc access to Live + Cached
•      Aggregations/Summaries
Mozilla: NoSQL thru and thru(DB)
•   Socorro Project: Crash reports, optionally sent to Mozilla
•   https://crash-stats.mozilla.com
X: NoSQL via SQL
•   Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc)
•   Desire to use Tableau for advanced analytics/visualization
Meteor Solutions:
        NoSQL thru and thru
•   Using Cloudant BigCouch solution (SaaS)
•   High performance set of multi purpose indices on pre defined
    aggregations
•   Up to date aggregation/reports
•   Better fit for Social Media graph structures over relational DB
•   Custom built BI applications (dashboards/reports) providing a
    flexible guided view through data


                                          Advanced
                                          Apps
A,B,C: NoSQL + MySQL
•   Many Many companies (3 we've worked with)
•   All “web related” companies (semi structured, some, mostly
    volume)
•   Heavy lifting and storage, and “ETL/Data prepartion” inside
    Hadoop
•   Push summarized, aggregated data into MySQL for analysis by
    easy, dashboarding/BI Tools




                     ETL
                     App              MySQL

More Related Content

More from DATAVERSITY

The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
DATAVERSITY
 

More from DATAVERSITY (20)

Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and Analytics
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

BI/Analytics on NoSQL: Review of Architectures

  • 2. What we'll answer in 50 minutes • Who is this guy? • How do I enable AdHoc, self service reporting on NoSQL? • How do I improve the performance of dashboards on top of NoSQL? • How do I integrate NoSQL data with my other data not inside NoSQL? • How do I enable, easy to build simple reports but also preserve the ability for rich NoSQL queries?
  • 3. Nicholas Goodman • Open Source BI thought leader – 50+ Open Source BI customer projects – Blogger, whitepapers, etc • Entrepreneur – DynamoBI Corporation – Bayon Technologies, Inc. • Data Geek, hacker, tinkerer, committer GOAL: Share perspectives, research, opinions. DISCLAIMER: Your Mileage ...
  • 4. How do we answer those Q's?
  • 5. Promise of “Big Data” • NoSQL/Hadoop/MapReduce Systems – Keep more of it – Cost effective analysis – “Massive scale” data, now accessible to everyone (elastic) – Not just SQL queries, more complex analysis ACCOMPLISHED: WEB SCALE, MASSIVE NEVER BEFORE SEEN SCALE OF DATA STORAGE AND PROCESSING
  • 6. Reality Check! • Petabytes? Y • Fast Queries? N • Cheap Storage? Y • Ad Hoc access? N • Raw Processing? Y • Accessibility to commodity BI tools? N • Rich Query Languages? Y • Flexible data structures? Y• Easy report authoring? N • Reliable, Fault Tolerant? Y• Levels of Aggregation? N • Integrated Data? N Big Data has solved the INFRASTRUCTURE of raw/core data storage but has provided less value to what BUSINESS users want for analytics.
  • 7. Data Gaps too! • Code, Developers • Analysts w/ Excel, Dashboards • MR, Rich Graph/Access • Simple 2D (tables, charts) • Hierarchical, Unstructured • Filtering and easy analytics
  • 8. Levels of Aggregation SAME DATA AT VARIOUS LEVELS OF AGGREGATION HUGELY IMPORTANT IN REAL LIFE IMPLEMENTATIONS! 10K 1 ROW 1 MILLION TO 100 MILLION 1 BILLION ROWS 100 BILLION
  • 9. Architectures • NoSQL reports • NoSQL thru and thru • NoSQL + MySQL • NoSQL as ETL Source • NoSQL programs in BI Tools • NoSQL via BI Database (SQL)
  • 10. NoSQL reports • Pay Developer to build applications for reports Apps • 100% Richness of NoSQL • $$, developer driven process • Up to date, current • No commodity BI tools • Excellent performance on • Managing rollups/summaries large datasets • Schema-less = Harder! • Custom built, beautiful • Hard to integrate other reports/dashboards reporting information • Single system to manage
  • 11. NoSQL thru and thru • Pay Developer to build FLEXIBLE applications for reports Indices Advanced Aggs Apps • All of NoSQL report • $$, developer driven process advantages • $$, app required for aggs • Managed aggregations, • No commodity BI tools rollups • Hard to integrate other • “Guided Adhoc” available reporting information inside application • Limited AdHoc (only • Higher performance for developer built dashboards/summaries combinations)
  • 12. NoSQL + MySQL • Pay Developer to build FLEXIBLE applications for reports ETL App MySQL • Less IT $$ since developers • Data freshness (24 hrs old) aren't “building reports” • Once into MySQL no rich • Rich, NoSQL analysis left in NoSQL application use (M/R) place (ETL + NoSQL) • BI Tool can connect ONLY to • Easy, Ad Hoc reporting via data in MySQL, not NoSQL commodity BI tools • Aggregations still self • Easier to understand data for managed in MySQL self service reports
  • 13. NoSQL as ETL Data Source • NoSQL treated like any other data source Informatica Teradata • Allows use of consolidated, • ETL Development Expense BI tool for AdHoc • Data Latency • Enables integrated • Loss of NoSQL language (combined) datasets for richness reporting • Traditional DW tools are $$ • Aggregations Often “managed” • Scaling issues with DW Database • Best of Breed tools
  • 14. NoSQL programs in BI Tools • Write a program in BI tool that flattens data, output into report • Rich use of NoSQL native • Developer required to write language program ($$) • Direct, up to date access • Slow-er (aggs, summaries) • Access to 100% of dataset • Lacks integration with other • Leverage “guided” report datasets parameter pages • Still (usually) no AdHoc • Less expensive than apps access
  • 15. NoSQL via BI Database (SQL) • Enable NoSQL data access via SQL (gasp!) Live Query Cached, 24hr data • Easy reports, easy (SQL) • Another system in between • Integration with other data • Still needs to be refreshed, • ETL is simple INSERT/MERGEs nightly • Live, up to date access • Not all capabilities for NoSQL richness available via SQL • High performance, cached data • AdHoc access to Live + Cached • Aggregations/Summaries
  • 16. Mozilla: NoSQL thru and thru(DB) • Socorro Project: Crash reports, optionally sent to Mozilla • https://crash-stats.mozilla.com
  • 17. X: NoSQL via SQL • Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc) • Desire to use Tableau for advanced analytics/visualization
  • 18. Meteor Solutions: NoSQL thru and thru • Using Cloudant BigCouch solution (SaaS) • High performance set of multi purpose indices on pre defined aggregations • Up to date aggregation/reports • Better fit for Social Media graph structures over relational DB • Custom built BI applications (dashboards/reports) providing a flexible guided view through data Advanced Apps
  • 19. A,B,C: NoSQL + MySQL • Many Many companies (3 we've worked with) • All “web related” companies (semi structured, some, mostly volume) • Heavy lifting and storage, and “ETL/Data prepartion” inside Hadoop • Push summarized, aggregated data into MySQL for analysis by easy, dashboarding/BI Tools ETL App MySQL