SlideShare una empresa de Scribd logo
1 de 33
Dancing with the Elephant




1    4/8/2013        Teradata Confidential
UDA IN PRACTICE


• Teradata and Big Data
• Customer Churn Example
  > Examples of Code
  > How the UDA works in Practice
• IPTV Example
  > Data Science Workflow
  > Real-life Example
TERADATA AND BIG DATA
Modern information management: year zero



  In 1970, computer scientist
  and former war-time Royal
  Air Force pilot Ted Codd
  published a seminal
  academic paper that
  would change Information
  Management forever…
Lots of transactions, or lots of data to analyse?


  …Codd had envisaged
  “large, shared data banks”,
  queried any-which-way;
  but the first RDBMS
  implementations had
  focused on providing
  support for on-line
  transaction processing…
Modern information management: year nine


  …so in 1979, four
  academics and software
  engineers quit their days
  jobs, maxed-out their
  credit cards – and built the
  world’s first MPP Relational
  Database Computer in a
  garage in California.
Teradata’s “shared nothing” hardware appliance
model has since been widely emulated*…

    1st Teradata implementa on             Netezza        DATAllegro                      Oracle Exadata
        goes live at Wells Fargo
                                                                              Greenplum

                           IBM DB2 Parallel Edi on



   1980             1985               1990            1995            2000       2005               2010

                  Kogni o (WhiteCross)               Aster Data               Ver ca            NeoView




  * But some are more Massively Parallel Processor than others!
“Teradata was Big Data before there was Big Data”


  Total data        ~40 Exabytes
  volume under
  management:
  Largest single  ~40 Petabytes
  implementation:
  # customers in    25
  the Teradata PB
  club:
  Largest hybrid    1,500 SSDs;
  system:           12,000 HDDs
Key takeaway: “Big Data” are typically non-relational
or “multi-structured”

   I didn’t say Bill was ugly.
   I didn’t say Bill was ugly.
   I didn’t say Bill was ugly.
   I didn’t say Bill was ugly.
   I didn’t say Bill was ugly.
   I didn’t say Bill was ugly.
The Unified Data Architecture


        Engineers       Data Scientists      Quants          Business Analysts

      Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.




       Discovery Platform                            Integrated Data
                                                       Warehouse




                              Capture, Store, Refine


    Audio/                       Web &     Machine
             Images    Text                            CRM       SCM       ERP
    Video                        Social     Logs
Aster SQL-H Integration with Hadoop Catalog
A Business User’s Bridge to Analyzing Data in Hadoop

• Industry’s First Database Integration
  with Hadoop’s HCatalog                                                             SQL-H
• Abstraction layer to easily and
  efficiently read structured & multi-
  structured data stored in HDFS
                                                                            Hadoop
• Uses Hadoop Catalog (HCatalog) to                                           MR
  perform data abstraction functions
  (e.g. automatically understands
  tables, data partitions)                                                   Hive        HCatalog
• HDFS data presented to users as
  Aster tables
                                                                             Pig
• Fully accessible within the Aster SQL
  and SQL-MapReduce processing
  engines, plus ODBC/JDBC & BI tools
                                                                                     HDFS

11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
UNIFIED DATA ARCHITECTURE
SQL-MapReduce 3-Way Join Example

     Scenario:     A Telco company has noticed an increase in the
     number of their customers cancelling their service. They want
     to know what customer behavior is leading to termination.
     They have data in Hadoop, processed web logs on Aster, and
     store data in a Teradata EDW. They need to combine it to see all
     channels and get answers


     What will we see?
          • Real working code examples

          • A 3-way join between Aster, Teradata, and Hadoop

          • Execution of nPath and Pathmap SQL-MapReduce
            functions sourced by the 3 way join.

          • Visualization of the results using Tableau.
13     Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Golden Path Analysis of Cancellation Paths
Identifying Top Multi-channel Cancellation Paths




                                                                                    Data on
                                                                                   TERADATA
     HCatalog
     metadata
        &
 Data on HDFS



                                                                         Data on
                                                                          ASTER


14      Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Create Table structure in HCatalog

                         drop table if exists hive_callcenter;
                         create table hive_callcenter(
                                      customer_id int,
                                      sessionid      int,
                                      channel string,
                                      action string,
                                      datestamp        string
                             )
                         row format delimited
                         fields terminated by 't'
                         stored as TEXTFILE
                         location
                         '/apps/hive/warehouse/hive_callcenter';




15   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Create the view into Hadoop using SQL-MR
function load_from_hcatalog

                        DROP VIEW if exists hcat_telco_callcenter;

                        CREATE VIEW hcat_telco_callcenter AS
                         select "customer_id","sessionid","channel" ::
                          character varying as "channel","action" ::
                          character varying as "action","datestamp" ::
                          timestamp without time zone as "datestamp"
                          from "nc_system"."load_from_hcatalog"
                             (on "public"."mr_driver"
                             server ('presales27.asterdata.com')
                             port ('9083')
                             dbname ('default')
                             tablename ('hive_callcenter')
                             username ('hive') );



16   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Create the view into Teradata using SQL-MR
function load_from_teradata

                               DROP VIEW IF EXISTS td_telco_store;

                               CREATE VIEW td_telco_store AS
                               SELECT * FROM
                                 load_from_teradata(on mr_driver
                                 tdpid('dbc')
                                 username('dbc')
                                 PASSWORD('dbc')
                                 QUERY('SELECT * from
                               icw.td_telco_store')
                                 NUM_INSTANCES('2')
                               );




17   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Create 3-Way View/Join as input to nPath

Drop View if exists td_telco_multi;

CREATE VIEW td_telco_multi AS select "u"."customer_id" as
"customer_id","u"."sessionid" as "sessionid","u"."channel"
     as "channel","u"."action" as "action","u"."datestamp" as "datestamp" from
     (( ( ( ( select
"t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp"
     from "public"."td_telco_store" as "t" ) union all
     ( select "t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp"
     from "public"."hcat_telco_callcenter" as "t" ) ) ) union all
     ( select "t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp"
     from "public"."telco_online" as "t" ) ) ) as "u";




 18    Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Views of External Tables from Aster




19   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
First Pass Aster nPath for Churn Pathway




                       3 way Join




20   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
First Pass nPath Visual




21   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Final Pass Aster nPath for Churn Pathway



                  3 way Join




22   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Last Pass nPath Visual




23   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
IPTV
Starting point: Complaints Data




25   4/8/2013     Teradata Confidential
Churners – and data quality




26   4/8/2013     Teradata Confidential
What events lead up to a reboot?



 Note number of
  paths with a
reboot, following
 another reboot!




     CREATE dimension table wrk.npath_reboot_5events
     AS SELECT path, COUNT(*) AS path_count
     FROM nPath
                       (ON wrk.w_event_f
                        PARTITION BY srv_id                                        SELECT *
                        ORDER BY evt_ts desc                                       FROM GraphGen (ON
                        MODE (NONOVERLAPPING )                                                          (SELECT * from wrk.npath_reboot_5events
                        PATTERN ('X{0,5}.reboot')                                                ORDER BY path_count
                        SYMBOLS                                                                          LIMIT 30 )
                            (true as X,                                            PARTITION BY 1
                 evt_name = 'REBOOT' AS reboot)                                    ORDER BY path_count desc
            RESULT                                                                 item_format('npath')
                (FIRST( srv_id OF X) AS srv_id,                                    item1_col('path')
                 ACCUMULATE (evt_name OF ANY (X,reboot))                           score_col('path_count')
                        AS path)                                                   output_format('sankey')
            ) GROUP BY 1 ;                                                         justify('right'));



27         4/8/2013                                        Teradata Confidential
View events data in Tableau




                                          Looks like an issue with the
                                          data on the 30th September
                                          and beyond, the Reboot data
                                          for October seems to have
                                          been aggregated and added
                                          to September the 30th




28   4/8/2013     Teradata Confidential
Address data quality
• Remove paths will all reboots and exclude data from 30th
  September




                                                      Would appear
                                                      that events
                                                      with suffix 1
                                                      and 2 can be
                                                      added together




29   4/8/2013             Teradata Confidential
Visualise as a Graph using Aster GraphGen

                                                                  Size of Node =
                                                                  number of customers
                                                                  Width of Edge =
                                                                  number of errors




                                          SELECT *
                                          FROM graphgen
                                            (ON
                                                              (SELECT DISTINCT dmt_act_dslam,
                                                               nra_id,
                                                               nbr_of_srvid,
                                                               errorspersrv,
                                                               nbr_of_dslam
                                                   FROM wrk.srvid_dslam_err)
                                            PARTITION BY 1
                                            ORDER BY errorspersrv
                                            item_format('cfilter')
                                            item1_col('dmt_act_dslam')
                                            item2_col('nra_id')
                                            score_col('errorspersrv')
                                            cnt1_col('nbr_of_srvid')
                                            cnt2_col('nbr_of_dslam')
                                            output_format('sigma')
                                            directed('false')
                                            width_max(10)
                                            width_min(1)
                                            nodesize_max (3)
                                            nodesize_min (1));




30   4/8/2013     Teradata Confidential
Synch Issues by Hub Type




31   4/8/2013    Teradata Confidential
Error and Complaint rates by equipment type




32   4/8/2013       Teradata Confidential
Thank You, Any questions?




33   4/8/2013    Teradata Confidential

Más contenido relacionado

La actualidad más candente

DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...Dataconomy Media
 
When Graphs Meet Machine Learning
When Graphs Meet Machine LearningWhen Graphs Meet Machine Learning
When Graphs Meet Machine LearningJean Ihm
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUsCarol McDonald
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensCitus Data
 
Hadoop and Vertica: Data Analytics Platform at Twitter
Hadoop and Vertica: Data Analytics Platform at TwitterHadoop and Vertica: Data Analytics Platform at Twitter
Hadoop and Vertica: Data Analytics Platform at TwitterDataWorks Summit
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADataconomy Media
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...Databricks
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataSpark Summit
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriChetan Khatri
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityCurtis Mosters
 
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas FittlMonitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas FittlCitus Data
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databasesBerta Hermida Plaza
 
Gain Insights with Graph Analytics
Gain Insights with Graph Analytics Gain Insights with Graph Analytics
Gain Insights with Graph Analytics Jean Ihm
 
Klout changing landscape of social media
Klout changing landscape of social mediaKlout changing landscape of social media
Klout changing landscape of social mediaDataWorks Summit
 
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraphOracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraphKarin Patenge
 
Hadoop World Vertica
Hadoop World VerticaHadoop World Vertica
Hadoop World VerticaOmer Trajman
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with SparkKhalid Salama
 
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
Building a Recommendation Engine Using Diverse Features by Divyanshu VatsBuilding a Recommendation Engine Using Diverse Features by Divyanshu Vats
Building a Recommendation Engine Using Diverse Features by Divyanshu VatsSpark Summit
 

La actualidad más candente (20)

DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
 
When Graphs Meet Machine Learning
When Graphs Meet Machine LearningWhen Graphs Meet Machine Learning
When Graphs Meet Machine Learning
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Hadoop and Vertica: Data Analytics Platform at Twitter
Hadoop and Vertica: Data Analytics Platform at TwitterHadoop and Vertica: Data Analytics Platform at Twitter
Hadoop and Vertica: Data Analytics Platform at Twitter
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionality
 
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas FittlMonitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
 
Gain Insights with Graph Analytics
Gain Insights with Graph Analytics Gain Insights with Graph Analytics
Gain Insights with Graph Analytics
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
Klout changing landscape of social media
Klout changing landscape of social mediaKlout changing landscape of social media
Klout changing landscape of social media
 
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraphOracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
 
Hadoop World Vertica
Hadoop World VerticaHadoop World Vertica
Hadoop World Vertica
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
Building a Recommendation Engine Using Diverse Features by Divyanshu VatsBuilding a Recommendation Engine Using Diverse Features by Divyanshu Vats
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
 

Similar a Dancing with the Elephant: Analyzing Customer Churn Using Teradata's Unified Data Architecture

Hybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerianHybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerianData Con LA
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Data Warehouse Offload
Data Warehouse OffloadData Warehouse Offload
Data Warehouse OffloadJohn Berns
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsDataWorks Summit
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystemGrzegorz Kolpuc
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 
Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkTim Vincent
 
Microsoft R - ScaleR Overview
Microsoft R - ScaleR OverviewMicrosoft R - ScaleR Overview
Microsoft R - ScaleR OverviewKhalid Salama
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 

Similar a Dancing with the Elephant: Analyzing Customer Churn Using Teradata's Unified Data Architecture (20)

Hybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerianHybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerian
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Data Warehouse Offload
Data Warehouse OffloadData Warehouse Offload
Data Warehouse Offload
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analytics
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystem
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and Spark
 
Microsoft R - ScaleR Overview
Microsoft R - ScaleR OverviewMicrosoft R - ScaleR Overview
Microsoft R - ScaleR Overview
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 

Último (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 

Dancing with the Elephant: Analyzing Customer Churn Using Teradata's Unified Data Architecture

  • 1. Dancing with the Elephant 1 4/8/2013 Teradata Confidential
  • 2. UDA IN PRACTICE • Teradata and Big Data • Customer Churn Example > Examples of Code > How the UDA works in Practice • IPTV Example > Data Science Workflow > Real-life Example
  • 4. Modern information management: year zero In 1970, computer scientist and former war-time Royal Air Force pilot Ted Codd published a seminal academic paper that would change Information Management forever…
  • 5. Lots of transactions, or lots of data to analyse? …Codd had envisaged “large, shared data banks”, queried any-which-way; but the first RDBMS implementations had focused on providing support for on-line transaction processing…
  • 6. Modern information management: year nine …so in 1979, four academics and software engineers quit their days jobs, maxed-out their credit cards – and built the world’s first MPP Relational Database Computer in a garage in California.
  • 7. Teradata’s “shared nothing” hardware appliance model has since been widely emulated*… 1st Teradata implementa on Netezza DATAllegro Oracle Exadata goes live at Wells Fargo Greenplum IBM DB2 Parallel Edi on 1980 1985 1990 1995 2000 2005 2010 Kogni o (WhiteCross) Aster Data Ver ca NeoView * But some are more Massively Parallel Processor than others!
  • 8. “Teradata was Big Data before there was Big Data” Total data ~40 Exabytes volume under management: Largest single ~40 Petabytes implementation: # customers in 25 the Teradata PB club: Largest hybrid 1,500 SSDs; system: 12,000 HDDs
  • 9. Key takeaway: “Big Data” are typically non-relational or “multi-structured” I didn’t say Bill was ugly. I didn’t say Bill was ugly. I didn’t say Bill was ugly. I didn’t say Bill was ugly. I didn’t say Bill was ugly. I didn’t say Bill was ugly.
  • 10. The Unified Data Architecture Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Discovery Platform Integrated Data Warehouse Capture, Store, Refine Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs
  • 11. Aster SQL-H Integration with Hadoop Catalog A Business User’s Bridge to Analyzing Data in Hadoop • Industry’s First Database Integration with Hadoop’s HCatalog SQL-H • Abstraction layer to easily and efficiently read structured & multi- structured data stored in HDFS Hadoop • Uses Hadoop Catalog (HCatalog) to MR perform data abstraction functions (e.g. automatically understands tables, data partitions) Hive HCatalog • HDFS data presented to users as Aster tables Pig • Fully accessible within the Aster SQL and SQL-MapReduce processing engines, plus ODBC/JDBC & BI tools HDFS 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 13. SQL-MapReduce 3-Way Join Example Scenario: A Telco company has noticed an increase in the number of their customers cancelling their service. They want to know what customer behavior is leading to termination. They have data in Hadoop, processed web logs on Aster, and store data in a Teradata EDW. They need to combine it to see all channels and get answers What will we see? • Real working code examples • A 3-way join between Aster, Teradata, and Hadoop • Execution of nPath and Pathmap SQL-MapReduce functions sourced by the 3 way join. • Visualization of the results using Tableau. 13 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 14. Golden Path Analysis of Cancellation Paths Identifying Top Multi-channel Cancellation Paths Data on TERADATA HCatalog metadata & Data on HDFS Data on ASTER 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 15. Create Table structure in HCatalog drop table if exists hive_callcenter; create table hive_callcenter( customer_id int, sessionid int, channel string, action string, datestamp string ) row format delimited fields terminated by 't' stored as TEXTFILE location '/apps/hive/warehouse/hive_callcenter'; 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 16. Create the view into Hadoop using SQL-MR function load_from_hcatalog DROP VIEW if exists hcat_telco_callcenter; CREATE VIEW hcat_telco_callcenter AS select "customer_id","sessionid","channel" :: character varying as "channel","action" :: character varying as "action","datestamp" :: timestamp without time zone as "datestamp" from "nc_system"."load_from_hcatalog" (on "public"."mr_driver" server ('presales27.asterdata.com') port ('9083') dbname ('default') tablename ('hive_callcenter') username ('hive') ); 16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 17. Create the view into Teradata using SQL-MR function load_from_teradata DROP VIEW IF EXISTS td_telco_store; CREATE VIEW td_telco_store AS SELECT * FROM load_from_teradata(on mr_driver tdpid('dbc') username('dbc') PASSWORD('dbc') QUERY('SELECT * from icw.td_telco_store') NUM_INSTANCES('2') ); 17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 18. Create 3-Way View/Join as input to nPath Drop View if exists td_telco_multi; CREATE VIEW td_telco_multi AS select "u"."customer_id" as "customer_id","u"."sessionid" as "sessionid","u"."channel" as "channel","u"."action" as "action","u"."datestamp" as "datestamp" from (( ( ( ( select "t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp" from "public"."td_telco_store" as "t" ) union all ( select "t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp" from "public"."hcat_telco_callcenter" as "t" ) ) ) union all ( select "t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp" from "public"."telco_online" as "t" ) ) ) as "u"; 18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 19. Views of External Tables from Aster 19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 20. First Pass Aster nPath for Churn Pathway 3 way Join 20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 21. First Pass nPath Visual 21 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 22. Final Pass Aster nPath for Churn Pathway 3 way Join 22 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 23. Last Pass nPath Visual 23 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 24. IPTV
  • 25. Starting point: Complaints Data 25 4/8/2013 Teradata Confidential
  • 26. Churners – and data quality 26 4/8/2013 Teradata Confidential
  • 27. What events lead up to a reboot? Note number of paths with a reboot, following another reboot! CREATE dimension table wrk.npath_reboot_5events AS SELECT path, COUNT(*) AS path_count FROM nPath (ON wrk.w_event_f PARTITION BY srv_id SELECT * ORDER BY evt_ts desc FROM GraphGen (ON MODE (NONOVERLAPPING ) (SELECT * from wrk.npath_reboot_5events PATTERN ('X{0,5}.reboot') ORDER BY path_count SYMBOLS LIMIT 30 ) (true as X, PARTITION BY 1 evt_name = 'REBOOT' AS reboot) ORDER BY path_count desc RESULT item_format('npath') (FIRST( srv_id OF X) AS srv_id, item1_col('path') ACCUMULATE (evt_name OF ANY (X,reboot)) score_col('path_count') AS path) output_format('sankey') ) GROUP BY 1 ; justify('right')); 27 4/8/2013 Teradata Confidential
  • 28. View events data in Tableau Looks like an issue with the data on the 30th September and beyond, the Reboot data for October seems to have been aggregated and added to September the 30th 28 4/8/2013 Teradata Confidential
  • 29. Address data quality • Remove paths will all reboots and exclude data from 30th September Would appear that events with suffix 1 and 2 can be added together 29 4/8/2013 Teradata Confidential
  • 30. Visualise as a Graph using Aster GraphGen Size of Node = number of customers Width of Edge = number of errors SELECT * FROM graphgen (ON (SELECT DISTINCT dmt_act_dslam, nra_id, nbr_of_srvid, errorspersrv, nbr_of_dslam FROM wrk.srvid_dslam_err) PARTITION BY 1 ORDER BY errorspersrv item_format('cfilter') item1_col('dmt_act_dslam') item2_col('nra_id') score_col('errorspersrv') cnt1_col('nbr_of_srvid') cnt2_col('nbr_of_dslam') output_format('sigma') directed('false') width_max(10) width_min(1) nodesize_max (3) nodesize_min (1)); 30 4/8/2013 Teradata Confidential
  • 31. Synch Issues by Hub Type 31 4/8/2013 Teradata Confidential
  • 32. Error and Complaint rates by equipment type 32 4/8/2013 Teradata Confidential
  • 33. Thank You, Any questions? 33 4/8/2013 Teradata Confidential

Notas del editor

  1. Slides from a real PoC using data from an IPTV network looking at Quality of Service and Churn
  2. So eBay measured…Note latency – Hadoop is batch-oriented;Note parallel efficiency – if the unit cost of acquisition is relatively low, but I have to buy very many more units, total cost of acquisition is still higherAnd total cost of acquisition is not TCO – also have to factor in development, integration, sys admin and maintenance, etc., etc., etc. costs.Note also that Hadoop is an implementation of the MR programming model, not a DBMS;Impact of, for example, lack of indexes, lack of cost-based optimization, etc., etc. likely to be even more significant for more complex queries.
  3. Slides from a real PoC using data from an IPTV network looking at Quality of Service and Churn
  4. This scenario involves a Telco company that is experiencing an increased number of cancellations. They want to know what behaviors are leading up to the cancellation and have been unable to discover those reasons until now. The challenge has been twofold. First, their data is on multiple platforms. Secondly, Analysis has been so time consuming that they have been unable to estimate and budget the effort. They have data on Hadoop, processed Web Logs on Aster, and Store data housed on their Teradata EDW. All of this data needs to be combined together then analyzed in a timely fashion. This is a common situation today across many industries. You may see a solution here to your own challenges.During this presentation we will see the real code behind the solution. We will see a 3 way join of data across the three platforms. This has never been done before until done for this demonstration. We will see the analytic results output by nPath, a SQL-MR function that comes with the Teradata Aster platform, and the visualization of those analytic results using Tableau.
  5. This is what the environment looks like. On the left is a hadoop cluster storing data on HDFS. This is a large volume of call-center data originally stored as VRU files. After processing on Hadoop it is made available through SQL-H, a new product released with Aster database 5 that allows SQL queries against hadoop data.On the right is the Teradata EDW which contains structured store transactions. It accessed through our Teradata connector using SQL also.In the middle is Online Web log data stored and pre-processed on Aster using our SQL-MR functions.All of these sources are pulled together in a single SQL query on Aster and processed through nPath to discover the customers behavior before cancellation.
  6. Let’s walk through the code required to perform this analysis.First, we create an Hcatalog entry for the table.This code shows what is done on the hadoop machine in order to create a table called hive-callcenter in Hcatalog. It is what you might expect for any table definition. Drop the table if it exists then create the structure. You can see that the data is actually stored as a text file in hadoop with the location being a directory hierarchy.
  7. Next, we create a view on Aster pointing to hive-callcenter on hadoop. In this case we create a view. However, since a view is actually just SQL code, we could actually put the select statement anywhere in our code. We are creating a permanent view since we will be using this table often.Notice that we called the view hcat_telco_callcenter. We’ll see this again later.
  8. This is how we created the view into Teradata for the store data and called it td_telco_store.Again, just plain old SQL.
  9. Here is where the 3 way join takes place. We create the view td_telco_multi using the views into Hadoop and Teradata along with data stored on Aster.Remember td_telco_store from the last page and hcat_telco_callcenter from the one before that. telco_online is the data stored on Aster. This is an ansi-standard view created on Aster. This has, quite literally, never been done before it was done for this presentation.
  10. Here is another view of the views from Aqua Data Studio. Look closely and you can see the views td_telco_store and hcat_telco_callcenter. Telco_online is a regular table on Aster and is not seen here.Each of these tables/views has around a million rows. When we run a count on the view td_telco_multi we see a little over 3 million rows returned for the time period. As Aqua Data Studio and Tableau demonstrate, these data sources are available to most any BI, system tool, or application that understands ODBC/JDBC. So now that we have all of the views and in place to bring this data together in real-time, how do we supply it to nPath?
  11. It’s actually very simple. There is the 3-way join supplied into nPath. Notice that this is just another SQL query. There is some very sophisticated MapReduce code running under the covers of nPath, but to the business user, it is exposed as an external table function with replaceable parameters. This is what makes the very powerfull SQL-Mapreduce functions of Teradata Aster available to the business user without programming experience beyond SQL. Its just replaceable parameters on the function. Being this straightforward is also what makes fast analytic iterations possible. The most important parts of this nPath function are seen in the patterns searched for and the actions taken. They are very simple in this case. Look for all events that end the session in a cancellation of service. If it just an event label it as such. If it is a cancellation of service label it as Cancel Service. Getting the parameters right is the most challenging thing about using the SQL-MR functions. However, since no programming or projects are required, a business user can afford to try lots of different parameters and experiment and explore the data.
  12. This is the visualization of the output data we looked at on the previous slide. This represents all customers who cancelled their service and the pathways they took for 4 steps preceding cancellation. Starting at the left we see all of the channels that a customer could have entered with. There are 14 of them. They represent the callcenter data on Hadoop, the online web-logs on Aster, and the store transactional data on the Teradata EDW. This is what the first pass of analysis often looks like in the real world. Its very busy, it’s the first attempt at exploration. There is little or no filtering of data. As you may recall, the nPath statement we looked at was relatively simple. We can see from the thickness of the colored lines on the right side that there is a lot of activity around the call center and the store, but there is too much noise to determine what common behaviors exist that might be actionable. Following this there are numerous iterations of altering the nPath parameters to get to the final, quiet, determination of common behavior.
  13. This is the final nPath function that will show us a real Golden Pathway for customer cancellations. It is very similar to the first pass nPath. It’s only a few lines of SQL and some additional parameters. It uses the same 3-way join of data and will execute the next steps identically to the first nPath. Notice in the PATTERN parameters that there is more specificity, and that the actions are more granular. This is how noise was removed from the data. Again, this is the real code that creates the visualizations. Let’s take a look at what this data looks like.
  14. Here it is. The Golden Path toward cancellation. It’s a lot cleaner and actually shows us what customers were doing before they cancelled in a way that we can do something about. Starting from the left, we see that customers came in through the onllinechanell and reviewed their contract, followed by at least one, and usually two calls in the call center either disputing their bill or registering a service complaint . The thickness of the lines show us that there were more disputes than service complaints. These calls were followed by visit to the store with a dispute or complain. That is where the cancellations occurred. This is actionable. We can implement this model in our production systems by counting the online visits and calls to the call center. For the entire population of customers, if the number of online reviews > 0, and the number of calls into the call center is >1, then we have a customer who has a higher probablity of cancelling their service and can be flagged for intervention on their next contact. This entire analysis took place over a few days.Let’s think about this for a moment. Imagine trying to come to this conclusion using traditional SQL, without the SQL-MR function of nPath, and without the ability to join this data. The first challenge is pulling the data together with the biggest challenge coming from the data on Hadoop. This currently requireds a skilled engineer writing MapReduce code in a lower level language just to pull the data out. The manipulation of the data once gathered together requires around 350-400 lines of complex, recursive SQL code. Neither the pulling of the Hadoop data or the SQL development is trivial. Both require skilled programmers and, most likely, several months of work. In most shops, this level of resource allocation and time requires a that a project be scoped with detailed requirements, resourced, approved and budgeted. As challenging, expensive, and time-consuming as that project might be, the real problem is that this analysis requires many iterations. In fact, an unknown number of iterations. Each of those iterations may require a separate project. You know, Phase 1, 2, and 3, etc. This actually took more on the order of nine iterations through nPath over several days. So what really happens when confronted by analysis needs like this without nPath. I can tell you that it is usually nothing. You can’t pre-determine the number of iterations, so you can’t scope it. If you can’t do that, you aren’t going to get approval to budget and resource a project that has no end-date in sight. The reality is that most organizations never get to an answer like this. However, using nPath, a business analyst, and a few days work, without ever having to approve a project, not only can one get to the answer, but can also formulate an action plan. That is the real value proposition here. Difficult analysis done quickly by business analysts without the need to budget expansive and in-demand resources.
  15. Slides from a real PoC using data from an IPTV network looking at Quality of Service and Churn
  16. First looked at analysing the complaints data which was text files stored in Hadoop, got nowhere with this. The text analytics showed that the comments fields held standard phrases such as “No fault found” “customer issue” or was just blank.Good example of fail fast. If it isn’t going to work, realise this and stop doing it as quick as possible.
  17. Looked at patterns in data usage prior to a customer closing their account. Here each line represents a customer, it appears that just prior to account closure, there was a huge surge in usage. This turned out to be an error in the data (again!)
  18. Decided to look at number of home router reboots as a measure of quality of service.Here the pattern of 5 events preceding a reboot can be seen and the code used to generate the sankey chart ( now native aster format which is viewed in a web browser)
  19. As previous data issues found, went back and used SQL and tableau to check the data. Found an issue on september 30th but as the data on;y needs to be “good enough” to run analysis, we can safely ignore this day and just use 1st to 29th for our investigation
  20. Final pattern with some of the noise cleaned up… the high transmitted blocks doesn’t help much because it just shows that if you use the serice a lot then you are more likely to reboot…But the other 3 events show a thing called Synchronisation speed errors which are something that can be detected on the network and leads to issues with the iptv signal at the customer end.
  21. Using Aster’s built in Graph viz. we can now see the way the synchroerros affect users across the entire network in a single picture. Note the thick red line in the highlighted area and another one down and to the right of it.
  22. Talking to the network engineers, we found out that there are two different types of hub used.The older ones are on the left and the newer ones on the right, you can see from the colours that the newer ones are reporting far more errors than the older ones.
  23. Final chart.Blue = new hubsOrange = old hubs4th chart shows that the customers connected to the new hubs are complaining more3rd chart shows that complaints by customers connected to the new hubs take longer to resolve-- these show proof of the quality of service issues2nd chart shows bandwidth, higher is better, so new hubs are actually getting a better bandwidth1st chart shows synchro speed, higher is better, so new hubs having worse synchro speed.It looks like the top two are mirror images, so as the bandwidth increases, the synchro speed decreases causing the QoS issue. This turned out to be a firmware issue and not faulty hubs at all.