SlideShare una empresa de Scribd logo
1 de 75
Cubes
                   light-weight OLAP




Stefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
source

  github.com/Stiivi/cubes

         documentation

packages.python.org/cubes/
Overview

■   purpose
■   analytical modelling and OLAP
■   slicing and dicing
■   OLAP server
■   SQL backend
analytical data modelling
        lightweight
http://tendre.sme.sk
aggregation browsing
     slicing and dicing
modelling   reporting
            aggregation browsing
Architecture
✂
 model     browser




             http

backends   server
Logical Model
 multidimensional, analytical
business/analyst’s
  point of view
transactions                 analysis
         OLTP                        OLAP




application (operational) data   analytical data
Model
           {
               “name” = “My Model”
               “description” = ....

               “cubes” = [...]
               “dimensions” = [...]
           }




cubes                         dimensions
measures                        levels, attributes, hierarchy
Facts

                  measurable


      fact

                    fact data cell




most detailed information
location




type




              time



           dimensions
Dimension

■ provide context for facts
■ used to filter queries or reports
■ control scope of aggregation of
  facts
Hierarchy


     2010 May 1st



        levels
Dimension

■   levels and attributes          “dimensions” = [
                                     {
■   hierarchy*                          “name”:”date”,
                                        “levels”: ...

■   key attributes                   },
                                        “hierarchy”: ...

                                     ...
■   label attributes               ]




                       *partial support for multiple hierarchies
label attribute   key attribute
                  for links to slices
Cube
               “cubes” = [
                 {
                    “name”:”contracts”,
                    “dimensions”: [ “date”,
                                    “category” ]
                    “measures”: [
■ dimensions          {
                        “name”: “amount”,
                        “label”: “Contract Amount”,
■ measures            }
                        “aggregations”: [“sum”]

                    ]
                 },
                 ...
               ]


                *partial support for multiple hierarchies
"attributes": [
                           {
                             "name":"group",
                             "label": "Group code"

localizable                },
                           {
                             "name":"group_label",
model and attributes         "label": "Group",
                             "locales": ["en", "sk"]
                           }
                       ]
Aggregation
  Browser

    ∑
∑ measures
get more details
Aggregation
                               Browser




SQL Snowflake   SQL Denormalized                               Some HTTP Data
                                            MongoDB Browser
  Browser          Browser                                     Service Browser




                                                                     ?




                        “batteries” that are included
Browser Workspace




logical model
                +   data
Cell
context of interest




cell
cell
Path

              [45,2]




[2012, 6]
                       list of level keys
1   load_model("model.json")

           Application



                  ∑

                                 3   model.cube("sales")
                                 4   workspace.browser(cube)


             cubes

       Aggregation Browser
            backend



2   create_workspace("sql",
                     model,
                     url="sqlite:///data.sqlite")
summary




drill-down
browser.aggregate(o cell)




                            summary
browser.aggregate(o cell,
                  . drilldown=[9 "sector"])




                         drill-down
for row in result.drilldown:




              row["amount_sum"]
row[q label_attribute]            row[k key]
received_amount_sum


measure      aggregation




           record_count
browser.facts(o cell)


browser.values(o cell, 9 dimension)


browser.cell_details(o cell)
✂
    Slicing and Dicing
✂
✂
✂
               April 2012
constructi
 on work                       construction work in
                                    april 2012
             type




    supplier



                            date
cut types
✂

point         set           range
           [[2010,10],   from=[2010,10]
[2010]
            [2010,12]]   to=[2010,12]
Implicit Hierarchy
       drilldown
whole cube


                                          o cell = Cell(cube)
                                          browser.aggregate(o cell)
                Total




                                          browser.aggregate(o cell,
                                                       drilldown=[9 “date”])


2006 2007 2008 2009 2010


                                          ✂ cut = PointCut(9 “date”, [2010])
                                          o cell = o cell.slice(✂ cut)

                                          browser.aggregate(o cell,
                                                       drilldown=[9 “date”])
Jan   Feb Mar Apr March April May   ...
Drill-down Level
. drilldown = [9 "date"]


                implicit: next from o cell




. drilldown = {9 "date": "month"}


                              explicit
Cross Table
 experimental interface
2009     2010

     Assets           Due from Banks     3044     1803
     Assets              Investments    41012    36012
     Assets        Loans Outstanding   103657   118104
     Assets            Nonnegotiable     1202     1123
     Assets             Other Assets     2247     3071
     Assets        Other Receivables      984      811
     Assets              Receivables      176      171
     Assets               Securities       33      289
     Equity            Capital Stock    11491    11492
     Equity         Deferred Amounts      359      313
     Equity                    Other    -1683    -3043
     Equity        Retained Earnings    29870    28793
Liabilities               Borrowings   110040   128577
Liabilities   Derivative Liabilities   115642   110418
Liabilities                    Other       57        8
Liabilities        Other Liabilities     7321     5454
Liabilities             Sold or Lent     2323      998
rows = ["item.category",
        "item.subcategory"]

columns = ["year"]

measures = ["amount_sum"]

table = result.cross_table(
              rows,
              columns,
              measures
        )
Slicer
The HTTP OLAP Server



      ✂
Application




HTTP                         JSON
             Slicer



                   ∑




       Aggregation Browser
GET /model

GET /aggregate

GET /values

GET /report
w
 logical model       configuration   data




$ slicer serve slicer.ini
[server]
backend: sql
log_level: info

[model]
path: model.json
locales: en,sk

[workspace]
url: postgres://localhost/database
schema: datamart
fact_prefix: ft_
dimension_prefix: dm_



                                 w
∑      amount




GET /aggregate
GET aggregate




{
    "cell": [],
    "drilldown": [],
    "summary": {
        "record_count": 62,
        "amount_sum": 1116860
    }
}
∑         amount
✂




GET /aggregate?cut=date:2010
GET aggregate?cut=year:2010




{
    "cell": [
        {
            "path": ["2010"],
            "type": "point",
            "dimension": "year",
            "level_depth": 1
        }
    ],
    "drilldown": [],
    "summary": {
        "record_count": 31,
        "amount_sum": 566020
    }
}
GET aggregate?drilldown=year



{
     "cell": [],
     "total_cell_count": 2,
     "drilldown": [
         {
             "record_count": 31,
             "amount_sum": 550840,
             "year": 2009
         },
         {
             "record_count": 31,
             "amount_sum": 566020,
             "year": 2010
         }
     ],
     "summary": {
         "record_count": 62,
         "amount_sum": 1116860
     }
}
GET report


                     Content-Type: application/json
list of cuts         {
                         "cell": [
                             {
                                 "dimension": "date",
                                 "type": "range",
                                 "from": [2009],
                                 "to": [2011,6]
                             }
                         ],
                         "queries": {
        list of              "by_segment": {
     named queries               "query": "aggregate",
                                 "drilldown": ["segment"]
                             },
                             "by_year": {
                                 "query": "aggregate",
                                 "drilldown": {"date":"year"}
                             }
                         }
                     }
SQL Backend
 What data it works with?
★   or
         ❄
★

dimensions   fact table
❄


             fact table
dimensions
Aggregation Browser


                     Browsing Context


               Snowflake            Denormalized
                             or
                Mapper               Mapper



denormalized view




snowflake
           ❄
logical




              physical
          ❄
SQL Features
■ does not require DB write access
■ denormalisation
 ■   denormalised browsing, indexing


■ simple date datatype dimension
 ■   extraction of date parts during mapping


■ multiple schema support
Slicer
command-line tool
■ model validation
  slicer model validate model.json



■ model translation
  slicer model translate model.json translation.json



■ workspace testing
  slicer test config.ini



■ denormalization
  slicer denormalize --materialize --index config.ini
Future
■ formatters for visualisation libraries
■ JavaScript library*             help needed

■ backends
■ derived measures


                        *http://github.com/Stiivi/cubes-js
Open Data

■ shared repository of models
■ shared repository of dimensions
■ public cubes
   open Slicer HTTP APIs




                           http://github.com/Stiivi/cubes/wiki
stay light
 Nutrition Facts
 Serving Size 1 cube

 Amount Per Serving
                       % Daily Value
 Total Fat 0g                    0%

   Saturated Fat 0g
   Trans Fat 0g
Thank You
              source:
    github.com/Stiivi/cubes
           documentation:
  packages.python.org/cubes/
             examples:
github.com/Stiivi/cubes-examples
Backup
Transactions                 Reporting
                              multidimensional
object–relational modelling
                                 modelling

      ORM mapping              logical model
                                 (and mapping)


   database connection            browser

     database engine             workspace
Limitations

■ one cut per dimension in a cell
 ■   logical conjunction of cuts (cut1 AND cut2 AND cut3 ...)


■ dimension-only selection
■ one - default hierarchy
 ■   some internals are ready for multiple

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
Graph Databases at Netflix
Graph Databases at NetflixGraph Databases at Netflix
Graph Databases at Netflix
 
ELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log Management
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
 
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
MongoDB
MongoDBMongoDB
MongoDB
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 

Similar a Cubes - Lightweight Python OLAP (EuroPython 2012 talk)

Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
Paulo Gandra de Sousa
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
Neo4j
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit
 

Similar a Cubes - Lightweight Python OLAP (EuroPython 2012 talk) (20)

Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
PoEAA by Example
PoEAA by ExamplePoEAA by Example
PoEAA by Example
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The Browser
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Query for json databases
Query for json databasesQuery for json databases
Query for json databases
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
D3 meetup (Backbone and D3)
D3 meetup (Backbone and D3)D3 meetup (Backbone and D3)
D3 meetup (Backbone and D3)
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 

Más de Stefan Urbanek

Cubes – ways of deployment
Cubes – ways of deploymentCubes – ways of deployment
Cubes – ways of deployment
Stefan Urbanek
 
Knowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsKnowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizations
Stefan Urbanek
 
Knowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceKnowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presence
Stefan Urbanek
 
Knowledge Management Introduction
Knowledge Management IntroductionKnowledge Management Introduction
Knowledge Management Introduction
Stefan Urbanek
 

Más de Stefan Urbanek (18)

StepTalk Introduction
StepTalk IntroductionStepTalk Introduction
StepTalk Introduction
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
 
Sepro - introduction
Sepro - introductionSepro - introduction
Sepro - introduction
 
New york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionNew york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introduction
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explained
 
Cubes – ways of deployment
Cubes – ways of deploymentCubes – ways of deployment
Cubes – ways of deployment
 
Knowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsKnowledge Management Lecture 4: Models
Knowledge Management Lecture 4: Models
 
Dallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality PerceptionDallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality Perception
 
Dallas Data Brewery - introduction
Dallas Data Brewery - introductionDallas Data Brewery - introduction
Dallas Data Brewery - introduction
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects
 
Knowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: CycleKnowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: Cycle
 
Knowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsKnowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizations
 
Knowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceKnowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presence
 
Open spending as-is 2011-06
Open spending   as-is 2011-06Open spending   as-is 2011-06
Open spending as-is 2011-06
 
Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP Framework
 
Open Data Decentralisation
Open Data DecentralisationOpen Data Decentralisation
Open Data Decentralisation
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)
 
Knowledge Management Introduction
Knowledge Management IntroductionKnowledge Management Introduction
Knowledge Management Introduction
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Cubes - Lightweight Python OLAP (EuroPython 2012 talk)

  • 1. Cubes light-weight OLAP Stefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
  • 2. source github.com/Stiivi/cubes documentation packages.python.org/cubes/
  • 3. Overview ■ purpose ■ analytical modelling and OLAP ■ slicing and dicing ■ OLAP server ■ SQL backend
  • 6. aggregation browsing slicing and dicing
  • 7. modelling reporting aggregation browsing
  • 9. ✂ model browser http backends server
  • 12. transactions analysis OLTP OLAP application (operational) data analytical data
  • 13. Model { “name” = “My Model” “description” = .... “cubes” = [...] “dimensions” = [...] } cubes dimensions measures levels, attributes, hierarchy
  • 14. Facts measurable fact fact data cell most detailed information
  • 15. location type time dimensions
  • 16. Dimension ■ provide context for facts ■ used to filter queries or reports ■ control scope of aggregation of facts
  • 17. Hierarchy 2010 May 1st levels
  • 18. Dimension ■ levels and attributes “dimensions” = [ { ■ hierarchy* “name”:”date”, “levels”: ... ■ key attributes }, “hierarchy”: ... ... ■ label attributes ] *partial support for multiple hierarchies
  • 19. label attribute key attribute for links to slices
  • 20. Cube “cubes” = [ { “name”:”contracts”, “dimensions”: [ “date”, “category” ] “measures”: [ ■ dimensions { “name”: “amount”, “label”: “Contract Amount”, ■ measures } “aggregations”: [“sum”] ] }, ... ] *partial support for multiple hierarchies
  • 21. "attributes": [ { "name":"group", "label": "Group code" localizable }, { "name":"group_label", model and attributes "label": "Group", "locales": ["en", "sk"] } ]
  • 25. Aggregation Browser SQL Snowflake SQL Denormalized Some HTTP Data MongoDB Browser Browser Browser Service Browser ? “batteries” that are included
  • 27. Cell
  • 29. cell
  • 30. Path [45,2] [2012, 6] list of level keys
  • 31. 1 load_model("model.json") Application ∑ 3 model.cube("sales") 4 workspace.browser(cube) cubes Aggregation Browser backend 2 create_workspace("sql", model, url="sqlite:///data.sqlite")
  • 34. browser.aggregate(o cell, . drilldown=[9 "sector"]) drill-down
  • 35. for row in result.drilldown: row["amount_sum"] row[q label_attribute] row[k key]
  • 36. received_amount_sum measure aggregation record_count
  • 37. browser.facts(o cell) browser.values(o cell, 9 dimension) browser.cell_details(o cell)
  • 38. Slicing and Dicing ✂
  • 39. ✂ ✂ April 2012 constructi on work construction work in april 2012 type supplier date
  • 40. cut types ✂ point set range [[2010,10], from=[2010,10] [2010] [2010,12]] to=[2010,12]
  • 41. Implicit Hierarchy drilldown
  • 42. whole cube o cell = Cell(cube) browser.aggregate(o cell) Total browser.aggregate(o cell, drilldown=[9 “date”]) 2006 2007 2008 2009 2010 ✂ cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(✂ cut) browser.aggregate(o cell, drilldown=[9 “date”]) Jan Feb Mar Apr March April May ...
  • 43. Drill-down Level . drilldown = [9 "date"] implicit: next from o cell . drilldown = {9 "date": "month"} explicit
  • 45. 2009 2010 Assets Due from Banks 3044 1803 Assets Investments 41012 36012 Assets Loans Outstanding 103657 118104 Assets Nonnegotiable 1202 1123 Assets Other Assets 2247 3071 Assets Other Receivables 984 811 Assets Receivables 176 171 Assets Securities 33 289 Equity Capital Stock 11491 11492 Equity Deferred Amounts 359 313 Equity Other -1683 -3043 Equity Retained Earnings 29870 28793 Liabilities Borrowings 110040 128577 Liabilities Derivative Liabilities 115642 110418 Liabilities Other 57 8 Liabilities Other Liabilities 7321 5454 Liabilities Sold or Lent 2323 998
  • 46. rows = ["item.category", "item.subcategory"] columns = ["year"] measures = ["amount_sum"] table = result.cross_table( rows, columns, measures )
  • 47. Slicer The HTTP OLAP Server ✂
  • 48. Application HTTP JSON Slicer ∑ Aggregation Browser
  • 49. GET /model GET /aggregate GET /values GET /report
  • 50. w logical model configuration data $ slicer serve slicer.ini
  • 51. [server] backend: sql log_level: info [model] path: model.json locales: en,sk [workspace] url: postgres://localhost/database schema: datamart fact_prefix: ft_ dimension_prefix: dm_ w
  • 52. amount GET /aggregate
  • 53. GET aggregate { "cell": [], "drilldown": [], "summary": { "record_count": 62, "amount_sum": 1116860 } }
  • 54. amount ✂ GET /aggregate?cut=date:2010
  • 55. GET aggregate?cut=year:2010 { "cell": [ { "path": ["2010"], "type": "point", "dimension": "year", "level_depth": 1 } ], "drilldown": [], "summary": { "record_count": 31, "amount_sum": 566020 } }
  • 56. GET aggregate?drilldown=year { "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, "year": 2009 }, { "record_count": 31, "amount_sum": 566020, "year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 } }
  • 57. GET report Content-Type: application/json list of cuts { "cell": [ { "dimension": "date", "type": "range", "from": [2009], "to": [2011,6] } ], "queries": { list of "by_segment": { named queries "query": "aggregate", "drilldown": ["segment"] }, "by_year": { "query": "aggregate", "drilldown": {"date":"year"} } } }
  • 58. SQL Backend What data it works with?
  • 59. or ❄
  • 60. ★ dimensions fact table
  • 61. fact table dimensions
  • 62.
  • 63. Aggregation Browser Browsing Context Snowflake Denormalized or Mapper Mapper denormalized view snowflake ❄
  • 64. logical physical ❄
  • 65. SQL Features ■ does not require DB write access ■ denormalisation ■ denormalised browsing, indexing ■ simple date datatype dimension ■ extraction of date parts during mapping ■ multiple schema support
  • 67. ■ model validation slicer model validate model.json ■ model translation slicer model translate model.json translation.json ■ workspace testing slicer test config.ini ■ denormalization slicer denormalize --materialize --index config.ini
  • 69. ■ formatters for visualisation libraries ■ JavaScript library* help needed ■ backends ■ derived measures *http://github.com/Stiivi/cubes-js
  • 70. Open Data ■ shared repository of models ■ shared repository of dimensions ■ public cubes open Slicer HTTP APIs http://github.com/Stiivi/cubes/wiki
  • 71. stay light Nutrition Facts Serving Size 1 cube Amount Per Serving % Daily Value Total Fat 0g 0% Saturated Fat 0g Trans Fat 0g
  • 72. Thank You source: github.com/Stiivi/cubes documentation: packages.python.org/cubes/ examples: github.com/Stiivi/cubes-examples
  • 74. Transactions Reporting multidimensional object–relational modelling modelling ORM mapping logical model (and mapping) database connection browser database engine workspace
  • 75. Limitations ■ one cut per dimension in a cell ■ logical conjunction of cuts (cut1 AND cut2 AND cut3 ...) ■ dimension-only selection ■ one - default hierarchy ■ some internals are ready for multiple

Notas del editor

  1. OLAP and Logical Model, Architecture, Slicing and Dicing, HTTP Server, SQL Backend\n\n
  2. \n
  3. \n
  4. Q: Who is familiar with OLAP?\n
  5. quick setup and reporting\ndoes not cover everything (intentionally)\n
  6. example application - public procurements of slovakia\n
  7. quick setup and reporting\ndoes not cover everything (intentionally)\n
  8. will talk about modelling first, then reporting, then going to mix\n
  9. how it looks like and what it does?\n
  10. FIXME: add slicer tool here\n
  11. not going into details, but just to align terminology and define context\n
  12. not so rare we see creating reports directly from what is available, instead of starting with business needs and tryig to find a way how to derive it from what is available\n
  13. different approach to data use, different needs\nwhile in apps you are focusing on transactions - trans data/oltp, in reporting you are focusing on analysis -> analytical data\nlogically separate (does not have to be physically separate)\n
  14. \n
  15. \n
  16. \n
  17. CONTEXT: where did the sale happened? who signed the contract?\nFILTER: how much was spent for construction work?\nAGGREGATION SCOPE: what was the revenue by country?\n\nused for ordering or sorting\ndefine master-detail relationships\n
  18. \n
  19. \n
  20. provides metadata to easily create apps\n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. what the browser does?\n
  28. aggregating measures\n
  29. \n
  30. aggregation browser has to have concrete backend implementation\n
  31. + bunch of other stuff\n
  32. context\n
  33. before I will talk about aggregation browser, I have to introduce a cell\n
  34. \n
  35. \n
  36. our filter/selection defines the cell\nthis is kind of multidimensional “breadcrumbs”\n
  37. path - taken from file system terminology for easier understanding\nthose are keys\nnote that displayed is level label, not a key\n
  38. ... let’s put it into a picture\n
  39. \n
  40. “aggregation result” was created according to usual report look\n
  41. FIXME: add picture\n
  42. you can specify multiple dimensions and explicit level to be drilled down (for example “month” level of a date dimension)\n
  43. it provides list of records, which are represented as dictionaries \nyou have to find out which one is level attribute or the key\n\n
  44. no need to find the context of dimension of interest\nif not sufficient, one can still fall-back to the manual method\n
  45. \n
  46. facts – get details\nvalues - can be used to create selection boxes, also level can be specified\ncell_details is used for creating the multidimensional breadcrumbs mentioned before - it contains data to humanly describe current context of interest\nordering and pagination is supported\n
  47. what was that “cell” thing?\n
  48. \n
  49. also show hierarchy\n
  50. \n
  51. \n
  52. same drilldown, different cell\n
  53. implicit: raises error if current level is the last one\nexample: you are exploring year 2010 (cell) and would like to see split by year (higher level)\n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. just to name a few...\n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. powered by sqlalchemy\n
  73. powered by great abstraction framework\nconstruction of SQL statements\n
  74. \n
  75. \n
  76. \n
  77. denormalized\n
  78. thanks to new browser and browsing context it is possible to transparently switch between original snowflake and generated denormalized view (which can be materialized and indexed based on dimension level keys)\n
  79. in which table and which column is the attribute?\n
  80. \n
  81. \n
  82. \n
  83. \n
  84. if someone would like to contribute with his skills, he is more than welcome and I will help\n
  85. so if you have OS app, like Django that more users use, you can publish reporting model for others.\nput your cube in the Wiki\n
  86. \n
  87. MIT license\n
  88. \n
  89. \n
  90. \n