KNIME Meetup 2016-04-16

Creating Insights
at the
Speed of Business
W. Daniel Cox, III CPA, CMA, CFM
Chief Executive Officer

Energise Organisational
Advantage through
Awareness and Insight
Registration & Networking
Keynote – Dan Cox, CEO of Data Transformed
KNIME & Harvest Analytics – Tom Park
Office of State Revenue Case Study – Anand Antony
Using Spark with KNIME – Chhitesh Shrestha
Networking & Drinks

Journey to Best in Class Analytics
We Help our Clients along this Path
Time
Value
Proactive
Discover and
Predict Performers
Reactive
Monitor and Alert FollowersStatic
Report and Drill-down
Laggards
Dynamic
Analytics-enabled
business processes
Innovators

YOUR DATA. CLEARLY
Source
Your
Data
Realise
Data
Value
Prepare
Your
Data
Data Preparation
Plan
With
Data
Budget/Planning
Visualise
All
Data
Visualisation

BUDGET PLANNING Budgeting
Forecasting
Planning
Demand Planning
Workforce Management
Accounting
Financing
Cashflow
Sales Forecasting
Modelling
Campaign Forecasting
DATA PREPARATION
Data Governance
Data Quality
Master Data Management
Data Warehousing
Data Science
ETL Applications
Data Analytics
SQL Language
Python Language
Scripting
Database Management
Application Development
Database Development
Textual ETL
Text Analytics
Hadoop Ecosphere
Analytical Databases
Relational Databases
Microsoft Analysis Server
OLAP
OLTP
Multi-Dimensional Databases
Data Vault Architectures
Star-Schema Architectures
Data Marting
Data Transformed Skill Sets
VISUALISATION
30%
BUDGET
PLANNING
20%
DATA
PREPARATION
50%
VISUALISATION
Dashboarding
Reporting
Charting
Location Analytics
Statistical Analytics
Data Analytics
Business Analysis
Story Telling
Symmantic Layer
Presentation Layer
Collabration

Slow Fast
Immature
Industrial
Strength
EnterpriseReadiness
Performance
Good Enough
Production
Ready
Traditional
Operational
Open Source
Vortex
Actian – Fast, Industrialized, Open
Superior Big Data SQL with Industrialized strength

Global Data Snapshot
…
7,254,549,796
Total World Population
3,035,749,340
Internet Users
2,078,680,860
Active Social Network Users
6,572,950,124
Mobile Subscribers

• Challenges
• Constrains data to app
• Can’t manage new data
• Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
44 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2
New Data
ERP CRM SCM
New
Traditional
Traditional systems under pressure
12 Zettabytes

Volume Exponential Growth
Variety New Data Types
Velocity Time To Value
The Digital Floodgates have opened…
and will never be turned off again

Big Data equals Big Opportunity
Data Source & Type Untouched
Value New Possibilities
88OF BIG DATA
15TRILLION
$
Universal Access Time To Value
OF COMPANIES
%
%
1

Trends for BIG DATA
In the Cloud

Trends for BIG DATA
Personal ETL

Trends for BIG DATA
Internet of
Things

Big Data Trends
1. Big Data in the Cloud
2. Personal ETL
3. NoSQL
4. Hadoop
5. Data Lakes
6. Big Data Ecosystem
7. Internet of Things

BIG DATA
is STILL just
Data
It needs to be translated into Answers

Acquire, Grow & Retain Customers
Who are your best customers
and how can you keep them
satisfied?
Where can you find more
customers like them?
Big data holds the insights into
who your customers are and
what motivates them.

Optimise Operations & Reduce Fraud
Are your operational processes
and systems as efficient as
they could be?
Could you reduce waste and
fraud if you had real-time
visibility into your business?
Adopting a big data and analytics
strategy can help you plan,
manage and maximise
operations, supply chains and the
use of infrastructure assets.

Transform Financial Processes
Do you have real-time access
to reliable information about all
aspects of your business?
Do you have the visibility,
insight and control over
financial performance to better
measure, monitor and shape
business outcomes?
Analysing all of your data,
including big data, can drive
enterprise agility and provide
insights to help you make better
decisions

Manage Risk
How can you mitigate the
financial and operational risks
that could devastate your
organisation?
How can you manage
regulatory change and reduce
the risk of non-compliance?
Proactively identifying,
understanding and managing
financial and operational risk can
enable more risk-aware,
confident decision making

Create New Business Models
Are your competitors making
bigger strides in changing your
industry or creating new markets
than you?
Does your organisation’s culture
support innovative thinking and
exploration?
Explore strategic options for
business growth, using new
perspectives gained from exploiting
big data and analytics

Improve IT Economics
Is your existing IT infrastructure
able to provide the insights that
decision makers need?
Are you doing enough to protect
your data centre and data from
potential criminal activity or
fraud?
Lead the creation of new value
and agility for your business by
optimising big data and analytics
for faster insight at a lower cost

Analytics Trends
1. Data Governance
2. Social Intelligence
3. Analytics Organisation-Wide
4. Community Collaboration
5. Integration of Everything
6. Cloud Analytics
7. Conversational Data
8. Journalism Data
9. Mature Mobility
10.Smart Analytics

Areas BIG DATA is Helping
1. Operations & Optimising
2. Product Development
3. Customer Experience
4. Understanding and Targeting Customers

Performance Examples
Actian is Helping These Companies Achieve Leadership
Digital Marketing: Hyper-segmentation every hour
Banking: Enterprise Risk every 2 minutes
Retail: Enterprise Market Basket Analysis every minute
Defense: Network intrusion models every second
Fraud: Adjustments every nano-second
Amazon Redshift – Actian Matrix Cloud-based, Petabyte
Scale Data Warehouse

The Value of Business Intelligence
Organisations
competing with Analytics
Substantially OUTPERFORM
their peers by
220%

Actian Vector: Example
https://youtu.be/dYTF5ZNioEI
Identical 150 Million Transaction Query
Comparison between Actian Vector & Oracle DBMS

Overview
KNIME & Big Data
Tom Park

Gartner 2016 Magic Quadrant
Advanced Analytics Platforms
Niche Players (5):
FICO
Lavastorm
Megaputer
Prognoz
Accenture
Leaders (5):
SAS
IBM
KNIME
RapidMiner
Dell
Visionaries (4):
Microsoft
Alteryx
Alpine Data Labs
Predixion
Challengers (2):
SAP
Angoss

Changes from 2015 to 2016
X Salford & TIBCO
Dropped due to not
satisfying the visual
composition

Main Big Data Technologies
NO SQL

Missing Ingredient to Success?

Office of State
Revenue
Anand Antony

KNIME @ OSR
Anand Antony
Senior Data Analyst
Operations Analytics and Intelligence
Office of State Revenue
anandjantony@gmail.com
Ph. 0414491765

OSR: Who are we?
 As NSW’s principal revenue agency, OSR
administers state taxation and revenue for, and
on behalf of, the people of NSW
◦ Payroll tax
◦ Land tax
◦ Duties
◦ Grants such as First Home Benefits

Data Analytics Team: Who are we?
 Operations Analytics & Intelligence is the
analytics wing of the Operations Division in OSR
◦ Three teams – Business Intelligence, Data Analytics and
Data Team
 Data Analytics team consists of 10 analysts
 Supports tax auditors by detecting possible non-
compliant clients
◦ Via matching data from various sources and analysing
them
◦ 60+ data sources

Data Analytics Scenario - Past
 Data matching, preparation and analysis
◦ SPSS Clementine, SAS Enterprise Guide
 Data mining
◦ Salford Systems
 Reporting/Dashboards
◦ Excel
 Fuzzy data matching
◦ SSA Name (Informatica)

Data Analytics Scenario - Current
 Data matching, preparation and analysis
◦ KNIME (around 70% transitioned from
Clementine/SAS)
 Data mining
◦ Salford Systems
◦ Will be evaluating KNIME
 Reporting/Dashboards
◦ Excel
 Fuzzy data matching
◦ SSA Name (Informatica)

Internal&ExternalDataSources
Data Governance
Data
Quality
Data
Matching
Metadata
Management
MapR Hadoop Distribution
Data Lake
VortexMapR
Advanced Data Analytics
Actian/Knime
Machine Learning
H2O/ Spark
Actian/Knime
Future: Unified Analytic & Data Management Platform
Governance
Visualisation
Presentation
Layer
Datamart
On the fly / Sandpit
Spotfire/
Tableau/
Graph DBs

Why KNIME?
 Enrich with coding via coding snippets
◦ Mostly Java snippet at the moment
 Start with canvas programming
 Fast and easy learning curve for data
scientists
 Can tackle almost any analytic task

KNIME - Having the best of both worlds!
◦ Canvas programming  Coding

What do we use KNIME for?
 Pretty much for everything! (except
reporting and datamining)
◦ Data reading (text files, databases, non-
standard formats)
◦ Data merging (potentially fuzzy matching too
in future)
◦ Data manipulation
◦ Creating new variables
◦ Data Output
◦ Modelling (possibly in future)

Key nodes/functionalities
◦ Sorter, Column Reorder, Column Filter, Column
Rename
◦ Concatenate, Joiner, Reference Row Filter (anti-
join)
◦ Missing value
◦ Math Formula, String Manipulation, Rule
Engine, Java Snippet
◦ GroupBy (aggregate, dedupe)
◦ Value Counter, Pivoting
◦ Looping
◦ Regular expressions/wildcards in various nodes

Case study 1
 Officers fill in a questionnaire on the
entity audited – one excel spreadsheet
for one entity
 Collate all the spreadsheets stored in a
location
 Massage the data to produce an analysis
dataset with one row per entity
 Key KNIME nodes/functionalities used
◦ List files
◦ Table Row to Variable Loop Start, Loop End
◦ Java Snippet

 Questionnaire
data for one
client

Bring data to tabular form
Within this Meta node, there is one
Java Snippet for each question in the
questionnaire

Result of the Meta Node
To get a single record for a client
- Just take the last row for a “client
block”!
- Explained in the next slide

For each “client block” aggregate
the variables

End result
1000 spread-sheets 1000 rows

Case study 2 – Use of Flow variables
 Technique
◦ Input metadata rules into a file
◦ Read and convert into flow variables
 Example
◦ Reorder variables in a dataset as per the
order in the data dictionary
◦ We use “Flow variables” tab in Column
Reorder tab to achieve this

Use of flow variables
Use this tab
Do not use this “manual” tab

KNIME wishlist!
 Offset function in some nodes
eg. Rule Engine, Math formula
Offset function gives the value of a variable in a
previous row.
Eg. In SPSS Clementine @OFFSET(var,1) gives the
value in the previous row.
Note:- Within Java Snippet this is readily achieved
since a variable retains its value until it is
over-written. Therefore we can conveniently first utilise
the value populated from the previous row inside a formula.
Then we can update the value from the current row so as to
be used in the next row.

Data Transformed
Chhitesh Shrestha

Apache Spark on KNIME
Unleash the power of Big Data on Hadoop

The Big Data Problem: Data Volume
1. Storage are getting cheaper
2. Data sources are increasing
3. Thus, data is growing faster
YARN
But, Still processing them is a problem. Why ?

The Big Data Problem: Processing
Now, as the memory is cheaper.

Why Apache Spark ?
Apache Spark is an open source parallel
processing framework that enables users to
run large scale data analytics across clustered
computers.
• Speed
• Flexible with programming platform
• Generality
• Run Everywhere

Spark Comparison on Calculation of Average

Getting the data in and out of Spark
Data into Spark Data out of Spark

Statistics and Data Manipulation Nodes
Statistics Data Manipulation

Mining Nodes
Learners Predictors

KNIME Spark Executor Architecture

Current Supported Hadoop and KNIME Versions
Hadoop Versions
• Hortonworks HDP 2.2 with Spark 1.2.x
• Hortonworks HDP 2.3 with Spark 1.3.x
• Cloudera CDH 5.3 with Spark 1.2.x
• Cloudera CDH 5.4 with Spark 1.3.x
KNIME Versions
• KNIME Analytics Platform 3.1
• KNIME Server 4.2

Lots of talking… Lets view a demo

Data Transformed
YOUR DATA. CLEARLY.
info@DataTransformed.com.au
02 9956 3781

Actian Vortex on Hadoop 10 minute Demo
http://videos.actian.com/watch/6iEZqvJrEKL2btoqIDImcg
Demonstration of Vortex, Dataflow & Vector
Comparison between Actian Vortex & Cloudera Impala

KNIME Meetup 2016-04-16

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (11)

Similar a KNIME Meetup 2016-04-16

Similar a KNIME Meetup 2016-04-16 (20)

Último

Último (20)

KNIME Meetup 2016-04-16

Notas del editor