Dancing with the Elephant: Analyzing Customer Churn Using Teradata's Unified Data Architecture

Dancing with the Elephant

1 4/8/2013 Teradata Confidential

UDA IN PRACTICE

• Teradata and Big Data
• Customer Churn Example
> Examples of Code
> How the UDA works in Practice
• IPTV Example
> Data Science Workflow
> Real-life Example

Modern information management: year zero

In 1970, computer scientist
and former war-time Royal
Air Force pilot Ted Codd
published a seminal
academic paper that
would change Information
Management forever…

Lots of transactions, or lots of data to analyse?

…Codd had envisaged
“large, shared data banks”,
queried any-which-way;
but the first RDBMS
implementations had
focused on providing
support for on-line
transaction processing…

Modern information management: year nine

…so in 1979, four
academics and software
engineers quit their days
jobs, maxed-out their
credit cards – and built the
world’s first MPP Relational
Database Computer in a
garage in California.

Teradata’s “shared nothing” hardware appliance
model has since been widely emulated*…

1st Teradata implementa on Netezza DATAllegro Oracle Exadata
goes live at Wells Fargo
Greenplum

IBM DB2 Parallel Edi on

1980 1985 1990 1995 2000 2005 2010

Kogni o (WhiteCross) Aster Data Ver ca NeoView

* But some are more Massively Parallel Processor than others!

“Teradata was Big Data before there was Big Data”

Total data ~40 Exabytes
volume under
management:
Largest single ~40 Petabytes
implementation:
# customers in 25
the Teradata PB
club:
Largest hybrid 1,500 SSDs;
system: 12,000 HDDs

Key takeaway: “Big Data” are typically non-relational
or “multi-structured”

I didn’t say Bill was ugly.

The Unified Data Architecture

Engineers Data Scientists Quants Business Analysts

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Discovery Platform Integrated Data
Warehouse

Capture, Store, Refine

Audio/ Web & Machine
Images Text CRM SCM ERP
Video Social Logs

Aster SQL-H Integration with Hadoop Catalog
A Business User’s Bridge to Analyzing Data in Hadoop

• Industry’s First Database Integration
with Hadoop’s HCatalog SQL-H
• Abstraction layer to easily and
efficiently read structured & multi-
structured data stored in HDFS
Hadoop
• Uses Hadoop Catalog (HCatalog) to MR
perform data abstraction functions
(e.g. automatically understands
tables, data partitions) Hive HCatalog
• HDFS data presented to users as
Aster tables
Pig
• Fully accessible within the Aster SQL
and SQL-MapReduce processing
engines, plus ODBC/JDBC & BI tools
HDFS

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

SQL-MapReduce 3-Way Join Example

Scenario: A Telco company has noticed an increase in the
number of their customers cancelling their service. They want
to know what customer behavior is leading to termination.
They have data in Hadoop, processed web logs on Aster, and
store data in a Teradata EDW. They need to combine it to see all
channels and get answers

What will we see?
• Real working code examples

• A 3-way join between Aster, Teradata, and Hadoop

• Execution of nPath and Pathmap SQL-MapReduce
functions sourced by the 3 way join.

• Visualization of the results using Tableau.

Golden Path Analysis of Cancellation Paths
Identifying Top Multi-channel Cancellation Paths

Data on
TERADATA
HCatalog
metadata
&
Data on HDFS

Data on
ASTER


Create Table structure in HCatalog

drop table if exists hive_callcenter;
create table hive_callcenter(
customer_id int,
sessionid int,
channel string,
action string,
datestamp string
)
row format delimited
fields terminated by 't'
stored as TEXTFILE
location
'/apps/hive/warehouse/hive_callcenter';


Create the view into Hadoop using SQL-MR
function load_from_hcatalog

DROP VIEW if exists hcat_telco_callcenter;

CREATE VIEW hcat_telco_callcenter AS
select "customer_id","sessionid","channel" ::
character varying as "channel","action" ::
character varying as "action","datestamp" ::
timestamp without time zone as "datestamp"
from "nc_system"."load_from_hcatalog"
(on "public"."mr_driver"
server ('presales27.asterdata.com')
port ('9083')
dbname ('default')
tablename ('hive_callcenter')
username ('hive') );


Create the view into Teradata using SQL-MR
function load_from_teradata

DROP VIEW IF EXISTS td_telco_store;

CREATE VIEW td_telco_store AS
SELECT * FROM
load_from_teradata(on mr_driver
tdpid('dbc')
username('dbc')
PASSWORD('dbc')
QUERY('SELECT * from
icw.td_telco_store')
NUM_INSTANCES('2')
);


Create 3-Way View/Join as input to nPath

Drop View if exists td_telco_multi;

CREATE VIEW td_telco_multi AS select "u"."customer_id" as
"customer_id","u"."sessionid" as "sessionid","u"."channel"
as "channel","u"."action" as "action","u"."datestamp" as "datestamp" from
(( ( ( ( select
"t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp"
from "public"."td_telco_store" as "t" ) union all
( select "t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp"
from "public"."hcat_telco_callcenter" as "t" ) ) ) union all
( select "t"."customer_id","t"."sessionid","t"."channel","t"."action","t"."datestamp"
from "public"."telco_online" as "t" ) ) ) as "u";


Views of External Tables from Aster


First Pass Aster nPath for Churn Pathway

3 way Join


First Pass nPath Visual


Final Pass Aster nPath for Churn Pathway

3 way Join


Last Pass nPath Visual


Starting point: Complaints Data


Churners – and data quality


What events lead up to a reboot?

Note number of
paths with a
reboot, following
another reboot!

CREATE dimension table wrk.npath_reboot_5events
AS SELECT path, COUNT(*) AS path_count
FROM nPath
(ON wrk.w_event_f
PARTITION BY srv_id SELECT *
ORDER BY evt_ts desc FROM GraphGen (ON
MODE (NONOVERLAPPING ) (SELECT * from wrk.npath_reboot_5events
PATTERN ('X{0,5}.reboot') ORDER BY path_count
SYMBOLS LIMIT 30 )
(true as X, PARTITION BY 1
evt_name = 'REBOOT' AS reboot) ORDER BY path_count desc
RESULT item_format('npath')
(FIRST( srv_id OF X) AS srv_id, item1_col('path')
ACCUMULATE (evt_name OF ANY (X,reboot)) score_col('path_count')
AS path) output_format('sankey')
) GROUP BY 1 ; justify('right'));


View events data in Tableau

Looks like an issue with the
data on the 30th September
and beyond, the Reboot data
for October seems to have
been aggregated and added
to September the 30th


Address data quality
• Remove paths will all reboots and exclude data from 30th
September

Would appear
that events
with suffix 1
and 2 can be
added together


Visualise as a Graph using Aster GraphGen

Size of Node =
number of customers
Width of Edge =
number of errors

SELECT *
FROM graphgen
(ON
(SELECT DISTINCT dmt_act_dslam,
nra_id,
nbr_of_srvid,
errorspersrv,
nbr_of_dslam
FROM wrk.srvid_dslam_err)
PARTITION BY 1
ORDER BY errorspersrv
item_format('cfilter')
item1_col('dmt_act_dslam')
item2_col('nra_id')
score_col('errorspersrv')
cnt1_col('nbr_of_srvid')
cnt2_col('nbr_of_dslam')
output_format('sigma')
directed('false')
width_max(10)
width_min(1)
nodesize_max (3)
nodesize_min (1));


Synch Issues by Hub Type


Error and Complaint rates by equipment type


Thank You, Any questions?


Dancing with the Elephant: Analyzing Customer Churn Using Teradata's Unified Data Architecture

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Dancing with the Elephant: Analyzing Customer Churn Using Teradata's Unified Data Architecture

Similar a Dancing with the Elephant: Analyzing Customer Churn Using Teradata's Unified Data Architecture (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Dancing with the Elephant: Analyzing Customer Churn Using Teradata's Unified Data Architecture

Notas del editor