SlideShare una empresa de Scribd logo
1 de 88
Creating Insights
at the
Speed of Business
W. Daniel Cox, III CPA, CMA, CFM
Chief Executive Officer
WELCOME
to
Meet Up Group
Energise Organisational
Advantage through
Awareness and Insight
Registration & Networking
Keynote – Dan Cox, CEO of Data Transformed
KNIME & Harvest Analytics – Tom Park
Office of State Revenue Case Study – Anand Antony
Using Spark with KNIME – Chhitesh Shrestha
Networking & Drinks
Journey to Best in Class Analytics
We Help our Clients along this Path
Time
Value
Proactive
Discover and
Predict Performers
Reactive
Monitor and Alert FollowersStatic
Report and Drill-down
Laggards
Dynamic
Analytics-enabled
business processes
Innovators
YOUR DATA. CLEARLY
Source
Your
Data
Realise
Data
Value
Prepare
Your
Data
Data Preparation
Plan
With
Data
Budget/Planning
Visualise
All
Data
Visualisation
BUDGET PLANNING Budgeting
Forecasting
Planning
Demand Planning
Workforce Management
Accounting
Financing
Cashflow
Sales Forecasting
Modelling
Campaign Forecasting
DATA PREPARATION
Data Governance
Data Quality
Master Data Management
Data Warehousing
Data Science
ETL Applications
Data Analytics
SQL Language
Python Language
Scripting
Database Management
Application Development
Database Development
Textual ETL
Text Analytics
Hadoop Ecosphere
Analytical Databases
Relational Databases
Microsoft Analysis Server
OLAP
OLTP
Multi-Dimensional Databases
Data Vault Architectures
Star-Schema Architectures
Data Marting
Data Transformed Skill Sets
VISUALISATION
30%
BUDGET
PLANNING
20%
DATA
PREPARATION
50%
VISUALISATION
Dashboarding
Reporting
Charting
Location Analytics
Statistical Analytics
Data Analytics
Business Analysis
Story Telling
Symmantic Layer
Presentation Layer
Collabration
Slow Fast
Immature
Industrial
Strength
EnterpriseReadiness
Performance
Good Enough
Production
Ready
Traditional
Operational
Open Source
Vortex
Actian – Fast, Industrialized, Open
Superior Big Data SQL with Industrialized strength
Do YOU
Have a
BIG DATA Role
Global Data Snapshot
…
7,254,549,796
Total World Population
3,035,749,340
Internet Users
2,078,680,860
Active Social Network Users
6,572,950,124
Mobile Subscribers
• Challenges
• Constrains data to app
• Can’t manage new data
• Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
44 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2
New Data
ERP CRM SCM
New
Traditional
Traditional systems under pressure
12 Zettabytes
Volume Exponential Growth
Variety New Data Types
Velocity Time To Value
The Digital Floodgates have opened…
and will never be turned off again
Big Data equals Big Opportunity
Data Source & Type Untouched
Value New Possibilities
88OF BIG DATA
15TRILLION
$
Universal Access Time To Value
OF COMPANIES
%
%
1
Trends for BIG DATA
In the Cloud
Trends for BIG DATA
Personal ETL
Trends for BIG DATA
NoSQL
Trends for BIG DATA
Hadoop
Trends for BIG DATA
Data Lake
Trends for BIG DATA
Ecosystem
Trends for BIG DATA
Internet of
Things
Big Data Trends
1. Big Data in the Cloud
2. Personal ETL
3. NoSQL
4. Hadoop
5. Data Lakes
6. Big Data Ecosystem
7. Internet of Things
BIG DATA
is STILL just
Data
It needs to be translated into Answers
Acquire, Grow & Retain Customers
Who are your best customers
and how can you keep them
satisfied?
Where can you find more
customers like them?
Big data holds the insights into
who your customers are and
what motivates them.
Optimise Operations & Reduce Fraud
Are your operational processes
and systems as efficient as
they could be?
Could you reduce waste and
fraud if you had real-time
visibility into your business?
Adopting a big data and analytics
strategy can help you plan,
manage and maximise
operations, supply chains and the
use of infrastructure assets.
Transform Financial Processes
Do you have real-time access
to reliable information about all
aspects of your business?
Do you have the visibility,
insight and control over
financial performance to better
measure, monitor and shape
business outcomes?
Analysing all of your data,
including big data, can drive
enterprise agility and provide
insights to help you make better
decisions
Manage Risk
How can you mitigate the
financial and operational risks
that could devastate your
organisation?
How can you manage
regulatory change and reduce
the risk of non-compliance?
Proactively identifying,
understanding and managing
financial and operational risk can
enable more risk-aware,
confident decision making
Create New Business Models
Are your competitors making
bigger strides in changing your
industry or creating new markets
than you?
Does your organisation’s culture
support innovative thinking and
exploration?
Explore strategic options for
business growth, using new
perspectives gained from exploiting
big data and analytics
Improve IT Economics
Is your existing IT infrastructure
able to provide the insights that
decision makers need?
Are you doing enough to protect
your data centre and data from
potential criminal activity or
fraud?
Lead the creation of new value
and agility for your business by
optimising big data and analytics
for faster insight at a lower cost
Analytics Trends
1. Data Governance
2. Social Intelligence
3. Analytics Organisation-Wide
4. Community Collaboration
5. Integration of Everything
6. Cloud Analytics
7. Conversational Data
8. Journalism Data
9. Mature Mobility
10.Smart Analytics
Areas BIG DATA is Helping
1. Operations & Optimising
2. Product Development
3. Customer Experience
4. Understanding and Targeting Customers
Performance Examples
Actian is Helping These Companies Achieve Leadership
Digital Marketing: Hyper-segmentation every hour
Banking: Enterprise Risk every 2 minutes
Retail: Enterprise Market Basket Analysis every minute
Defense: Network intrusion models every second
Fraud: Adjustments every nano-second
Amazon Redshift – Actian Matrix Cloud-based, Petabyte
Scale Data Warehouse
The Value of Business Intelligence
Organisations
competing with Analytics
Substantially OUTPERFORM
their peers by
220%
Data Transformed
Actian Vector: Example
https://youtu.be/dYTF5ZNioEI
Identical 150 Million Transaction Query
Comparison between Actian Vector & Oracle DBMS
Harvest Analytics
Tom Park
Overview
KNIME & Big Data
Tom Park
Gartner 2016 Magic Quadrant
Advanced Analytics Platforms
Niche Players (5):
FICO
Lavastorm
Megaputer
Prognoz
Accenture
Leaders (5):
SAS
IBM
KNIME
RapidMiner
Dell
Visionaries (4):
Microsoft
Alteryx
Alpine Data Labs
Predixion
Challengers (2):
SAP
Angoss
Changes from 2015 to 2016
X Salford & TIBCO
Dropped due to not
satisfying the visual
composition
Main Big Data Technologies
NO SQL
Big Data Architecture
KNIME Big Data Extensions
Future Trends
Missing Ingredient to Success?
www.dataroos.com
Office of State
Revenue
Anand Antony
KNIME @ OSR
Anand Antony
Senior Data Analyst
Operations Analytics and Intelligence
Office of State Revenue
anandjantony@gmail.com
Ph. 0414491765
OSR: Who are we?
 As NSW’s principal revenue agency, OSR
administers state taxation and revenue for, and
on behalf of, the people of NSW
◦ Payroll tax
◦ Land tax
◦ Duties
◦ Grants such as First Home Benefits
Data Analytics Team: Who are we?
 Operations Analytics & Intelligence is the
analytics wing of the Operations Division in OSR
◦ Three teams – Business Intelligence, Data Analytics and
Data Team
 Data Analytics team consists of 10 analysts
 Supports tax auditors by detecting possible non-
compliant clients
◦ Via matching data from various sources and analysing
them
◦ 60+ data sources
Data Analytics Scenario - Past
 Data matching, preparation and analysis
◦ SPSS Clementine, SAS Enterprise Guide
 Data mining
◦ Salford Systems
 Reporting/Dashboards
◦ Excel
 Fuzzy data matching
◦ SSA Name (Informatica)
Data Analytics Scenario - Current
 Data matching, preparation and analysis
◦ KNIME (around 70% transitioned from
Clementine/SAS)
 Data mining
◦ Salford Systems
◦ Will be evaluating KNIME
 Reporting/Dashboards
◦ Excel
 Fuzzy data matching
◦ SSA Name (Informatica)
Internal&ExternalDataSources
Data Governance
Data
Quality
Data
Matching
Metadata
Management
MapR Hadoop Distribution
Data Lake
VortexMapR
Advanced Data Analytics
Actian/Knime
Machine Learning
H2O/ Spark
Actian/Knime
Future: Unified Analytic & Data Management Platform
Governance
Visualisation
Presentation
Layer
Datamart
On the fly / Sandpit
Spotfire/
Tableau/
Graph DBs
Why KNIME?
 Enrich with coding via coding snippets
◦ Mostly Java snippet at the moment
 Start with canvas programming
 Fast and easy learning curve for data
scientists
 Can tackle almost any analytic task
KNIME - Having the best of both worlds!
◦ Canvas programming  Coding
What do we use KNIME for?
 Pretty much for everything! (except
reporting and datamining)
◦ Data reading (text files, databases, non-
standard formats)
◦ Data merging (potentially fuzzy matching too
in future)
◦ Data manipulation
◦ Creating new variables
◦ Data Output
◦ Modelling (possibly in future)
Key nodes/functionalities
◦ Sorter, Column Reorder, Column Filter, Column
Rename
◦ Concatenate, Joiner, Reference Row Filter (anti-
join)
◦ Missing value
◦ Math Formula, String Manipulation, Rule
Engine, Java Snippet
◦ GroupBy (aggregate, dedupe)
◦ Value Counter, Pivoting
◦ Looping
◦ Regular expressions/wildcards in various nodes
Data Preparation Example
Case study 1
 Officers fill in a questionnaire on the
entity audited – one excel spreadsheet
for one entity
 Collate all the spreadsheets stored in a
location
 Massage the data to produce an analysis
dataset with one row per entity
 Key KNIME nodes/functionalities used
◦ List files
◦ Table Row to Variable Loop Start, Loop End
◦ Java Snippet
 Questionnaire
data for one
client
Overview of Knime flow
Bring data to tabular form
Within this Meta node, there is one
Java Snippet for each question in the
questionnaire
Details of a Java Snippet
Result of the Meta Node
To get a single record for a client
- Just take the last row for a “client
block”!
- Explained in the next slide
For each “client block” aggregate
the variables
End result
1000 spread-sheets 1000 rows
Case study 2 – Use of Flow variables
 Technique
◦ Input metadata rules into a file
◦ Read and convert into flow variables
 Example
◦ Reorder variables in a dataset as per the
order in the data dictionary
◦ We use “Flow variables” tab in Column
Reorder tab to achieve this
Use of flow variables
Use this tab
Do not use this “manual” tab
KNIME wishlist!
 Offset function in some nodes
eg. Rule Engine, Math formula
Offset function gives the value of a variable in a
previous row.
Eg. In SPSS Clementine @OFFSET(var,1) gives the
value in the previous row.
Note:- Within Java Snippet this is readily achieved
since a variable retains its value until it is
over-written. Therefore we can conveniently first utilise
the value populated from the previous row inside a formula.
Then we can update the value from the current row so as to
be used in the next row.
Questions?
Data Transformed
Chhitesh Shrestha
Apache Spark on KNIME
Unleash the power of Big Data on Hadoop
The Big Data Problem: Data Volume
1. Storage are getting cheaper
2. Data sources are increasing
3. Thus, data is growing faster
YARN
But, Still processing them is a problem. Why ?
The Big Data Problem: Processing
Now, as the memory is cheaper.
Why Apache Spark ?
Apache Spark is an open source parallel
processing framework that enables users to
run large scale data analytics across clustered
computers.
• Speed
• Flexible with programming platform
• Generality
• Run Everywhere
Spark Components
Spark Comparison on Calculation of Average
List of Spark Nodes
Getting the data in and out of Spark
Data into Spark Data out of Spark
Statistics and Data Manipulation Nodes
Statistics Data Manipulation
Mining Nodes
Learners Predictors
Other Nodes
KNIME Spark Executor Architecture
Current Supported Hadoop and KNIME Versions
Hadoop Versions
• Hortonworks HDP 2.2 with Spark 1.2.x
• Hortonworks HDP 2.3 with Spark 1.3.x
• Cloudera CDH 5.3 with Spark 1.2.x
• Cloudera CDH 5.4 with Spark 1.3.x
KNIME Versions
• KNIME Analytics Platform 3.1
• KNIME Server 4.2
Lots of talking… Lets view a demo
Data Transformed
YOUR DATA. CLEARLY.
info@DataTransformed.com.au
02 9956 3781
Actian Vortex on Hadoop 10 minute Demo
http://videos.actian.com/watch/6iEZqvJrEKL2btoqIDImcg
Demonstration of Vortex, Dataflow & Vector
Comparison between Actian Vortex & Cloudera Impala
Actian Vector: Example
https://youtu.be/dYTF5ZNioEI
Identical 150 Million Transaction Query
Comparison between Actian Vector & Oracle DBMS

Más contenido relacionado

La actualidad más candente

KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
NetApp ONTAP Select for Service Providers
NetApp ONTAP Select for Service Providers  NetApp ONTAP Select for Service Providers
NetApp ONTAP Select for Service Providers AScholl
 
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Flink Forward
 
Instaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to CassandraInstaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to CassandraDataStax Academy
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with GimelAlluxio, Inc.
 
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, CouchbaseDatabase Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase✔ Eric David Benari, PMP
 
Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Spark Summit EU talk by Shaun Klopfenstein and Neelesh ShastrySpark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Spark Summit EU talk by Shaun Klopfenstein and Neelesh ShastrySpark Summit
 
Disaster recovery on demand on the cloud
Disaster recovery on demand on the cloudDisaster recovery on demand on the cloud
Disaster recovery on demand on the cloudNati Shalom
 
How Element 84 Raises the Bar on Streaming Satellite Data
How Element 84 Raises the Bar on Streaming Satellite DataHow Element 84 Raises the Bar on Streaming Satellite Data
How Element 84 Raises the Bar on Streaming Satellite DataAmazon Web Services
 
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...Databricks
 
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...InfluxData
 
Cassandra summit 2015 - Simplifying Streaming Analytics
Cassandra summit 2015 - Simplifying Streaming AnalyticsCassandra summit 2015 - Simplifying Streaming Analytics
Cassandra summit 2015 - Simplifying Streaming AnalyticsBrenden Matthews
 
CloudCrowd- BAT Presentation on building a private mobile sync cloud
CloudCrowd- BAT Presentation on building a private mobile sync cloudCloudCrowd- BAT Presentation on building a private mobile sync cloud
CloudCrowd- BAT Presentation on building a private mobile sync cloudNati Shalom
 
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7   Bigger Faster Data by Andrew Hood and Cameron Gray from LynchpinMeasureCamp 7   Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from LynchpinLynchpin Analytics Consultancy
 
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...HostedbyConfluent
 
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...Flink Forward
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScyllaDB
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
 

La actualidad más candente (20)

KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
KNIME tutorial
KNIME tutorialKNIME tutorial
KNIME tutorial
 
NetApp ONTAP Select for Service Providers
NetApp ONTAP Select for Service Providers  NetApp ONTAP Select for Service Providers
NetApp ONTAP Select for Service Providers
 
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
 
Instaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to CassandraInstaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to Cassandra
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, CouchbaseDatabase Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
 
Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Spark Summit EU talk by Shaun Klopfenstein and Neelesh ShastrySpark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
 
Making Sense of Remote Sensing
Making Sense of Remote SensingMaking Sense of Remote Sensing
Making Sense of Remote Sensing
 
Disaster recovery on demand on the cloud
Disaster recovery on demand on the cloudDisaster recovery on demand on the cloud
Disaster recovery on demand on the cloud
 
How Element 84 Raises the Bar on Streaming Satellite Data
How Element 84 Raises the Bar on Streaming Satellite DataHow Element 84 Raises the Bar on Streaming Satellite Data
How Element 84 Raises the Bar on Streaming Satellite Data
 
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
 
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
 
Cassandra summit 2015 - Simplifying Streaming Analytics
Cassandra summit 2015 - Simplifying Streaming AnalyticsCassandra summit 2015 - Simplifying Streaming Analytics
Cassandra summit 2015 - Simplifying Streaming Analytics
 
CloudCrowd- BAT Presentation on building a private mobile sync cloud
CloudCrowd- BAT Presentation on building a private mobile sync cloudCloudCrowd- BAT Presentation on building a private mobile sync cloud
CloudCrowd- BAT Presentation on building a private mobile sync cloud
 
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7   Bigger Faster Data by Andrew Hood and Cameron Gray from LynchpinMeasureCamp 7   Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
 
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
 
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 

Destacado

Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterpri...
EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterpri...EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterpri...
EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterpri...ChemAxon
 
Extracting Truths From Social Media - KNIME Fall Summit 2016
Extracting Truths From Social Media - KNIME Fall Summit 2016Extracting Truths From Social Media - KNIME Fall Summit 2016
Extracting Truths From Social Media - KNIME Fall Summit 2016MMI Agency
 
Knime Evaluation Smaller
Knime Evaluation SmallerKnime Evaluation Smaller
Knime Evaluation Smallervijaydj
 
Just add Imagination
Just add ImaginationJust add Imagination
Just add ImaginationKNIMESlides
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...Paul Shapiro
 
Métodos para pronosticar las ventas y desarrollar el potencial de mercado
Métodos para pronosticar las ventas y desarrollar el potencial de mercadoMétodos para pronosticar las ventas y desarrollar el potencial de mercado
Métodos para pronosticar las ventas y desarrollar el potencial de mercadoAlfonzo Campos
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
3 métodos para pronosticar las ventas
3 métodos para pronosticar las ventas3 métodos para pronosticar las ventas
3 métodos para pronosticar las ventasCELOGIS
 

Destacado (11)

Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterpri...
EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterpri...EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterpri...
EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterpri...
 
Extracting Truths From Social Media - KNIME Fall Summit 2016
Extracting Truths From Social Media - KNIME Fall Summit 2016Extracting Truths From Social Media - KNIME Fall Summit 2016
Extracting Truths From Social Media - KNIME Fall Summit 2016
 
Knime Evaluation Smaller
Knime Evaluation SmallerKnime Evaluation Smaller
Knime Evaluation Smaller
 
Just add Imagination
Just add ImaginationJust add Imagination
Just add Imagination
 
CURRICULO_LeonardoLopes _20160623
CURRICULO_LeonardoLopes _20160623CURRICULO_LeonardoLopes _20160623
CURRICULO_LeonardoLopes _20160623
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
 
Métodos para pronosticar las ventas y desarrollar el potencial de mercado
Métodos para pronosticar las ventas y desarrollar el potencial de mercadoMétodos para pronosticar las ventas y desarrollar el potencial de mercado
Métodos para pronosticar las ventas y desarrollar el potencial de mercado
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
3 métodos para pronosticar las ventas
3 métodos para pronosticar las ventas3 métodos para pronosticar las ventas
3 métodos para pronosticar las ventas
 

Similar a KNIME Meetup 2016-04-16

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
 
IBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big DataIBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big DataIBM Software India
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"MDS ap
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big DataJames Serra
 
Digital marketing pharma - google event
Digital marketing   pharma - google eventDigital marketing   pharma - google event
Digital marketing pharma - google eventDaniel Viveiros
 
The Next Digital Marketing- Digital Pharma presentation by Ci&T and Google
The Next Digital Marketing- Digital Pharma presentation by Ci&T and GoogleThe Next Digital Marketing- Digital Pharma presentation by Ci&T and Google
The Next Digital Marketing- Digital Pharma presentation by Ci&T and GoogleCI&T
 
Big data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardBig data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardEdward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...amdia
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkkguest4e975e2
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Top SAP Online training institute in Hyderabad
Top SAP Online training institute in HyderabadTop SAP Online training institute in Hyderabad
Top SAP Online training institute in HyderabadAadhyaKrishnan
 
Keynote: Future of IT - future of enterprise it Canada
Keynote: Future of IT - future of enterprise it CanadaKeynote: Future of IT - future of enterprise it Canada
Keynote: Future of IT - future of enterprise it CanadaAmazon Web Services
 
Tableau reseller partner in Djibouti Bilytica Best business Intelligence Comp...
Tableau reseller partner in Djibouti Bilytica Best business Intelligence Comp...Tableau reseller partner in Djibouti Bilytica Best business Intelligence Comp...
Tableau reseller partner in Djibouti Bilytica Best business Intelligence Comp...Carie John
 

Similar a KNIME Meetup 2016-04-16 (20)

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
 
IBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big DataIBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big Data
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
Digital marketing pharma - google event
Digital marketing   pharma - google eventDigital marketing   pharma - google event
Digital marketing pharma - google event
 
The Next Digital Marketing- Digital Pharma presentation by Ci&T and Google
The Next Digital Marketing- Digital Pharma presentation by Ci&T and GoogleThe Next Digital Marketing- Digital Pharma presentation by Ci&T and Google
The Next Digital Marketing- Digital Pharma presentation by Ci&T and Google
 
Big data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardBig data and Marketing by Edward Chenard
Big data and Marketing by Edward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Top SAP Online training institute in Hyderabad
Top SAP Online training institute in HyderabadTop SAP Online training institute in Hyderabad
Top SAP Online training institute in Hyderabad
 
Keynote: Future of IT - future of enterprise it Canada
Keynote: Future of IT - future of enterprise it CanadaKeynote: Future of IT - future of enterprise it Canada
Keynote: Future of IT - future of enterprise it Canada
 
Tableau reseller partner in Djibouti Bilytica Best business Intelligence Comp...
Tableau reseller partner in Djibouti Bilytica Best business Intelligence Comp...Tableau reseller partner in Djibouti Bilytica Best business Intelligence Comp...
Tableau reseller partner in Djibouti Bilytica Best business Intelligence Comp...
 

Último

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Último (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

KNIME Meetup 2016-04-16

  • 1.
  • 2. Creating Insights at the Speed of Business W. Daniel Cox, III CPA, CMA, CFM Chief Executive Officer
  • 4. Energise Organisational Advantage through Awareness and Insight Registration & Networking Keynote – Dan Cox, CEO of Data Transformed KNIME & Harvest Analytics – Tom Park Office of State Revenue Case Study – Anand Antony Using Spark with KNIME – Chhitesh Shrestha Networking & Drinks
  • 5. Journey to Best in Class Analytics We Help our Clients along this Path Time Value Proactive Discover and Predict Performers Reactive Monitor and Alert FollowersStatic Report and Drill-down Laggards Dynamic Analytics-enabled business processes Innovators
  • 6. YOUR DATA. CLEARLY Source Your Data Realise Data Value Prepare Your Data Data Preparation Plan With Data Budget/Planning Visualise All Data Visualisation
  • 7. BUDGET PLANNING Budgeting Forecasting Planning Demand Planning Workforce Management Accounting Financing Cashflow Sales Forecasting Modelling Campaign Forecasting DATA PREPARATION Data Governance Data Quality Master Data Management Data Warehousing Data Science ETL Applications Data Analytics SQL Language Python Language Scripting Database Management Application Development Database Development Textual ETL Text Analytics Hadoop Ecosphere Analytical Databases Relational Databases Microsoft Analysis Server OLAP OLTP Multi-Dimensional Databases Data Vault Architectures Star-Schema Architectures Data Marting Data Transformed Skill Sets VISUALISATION 30% BUDGET PLANNING 20% DATA PREPARATION 50% VISUALISATION Dashboarding Reporting Charting Location Analytics Statistical Analytics Data Analytics Business Analysis Story Telling Symmantic Layer Presentation Layer Collabration
  • 8. Slow Fast Immature Industrial Strength EnterpriseReadiness Performance Good Enough Production Ready Traditional Operational Open Source Vortex Actian – Fast, Industrialized, Open Superior Big Data SQL with Industrialized strength
  • 9. Do YOU Have a BIG DATA Role
  • 10. Global Data Snapshot … 7,254,549,796 Total World Population 3,035,749,340 Internet Users 2,078,680,860 Active Social Network Users 6,572,950,124 Mobile Subscribers
  • 11. • Challenges • Constrains data to app • Can’t manage new data • Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 44 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional Traditional systems under pressure 12 Zettabytes
  • 12. Volume Exponential Growth Variety New Data Types Velocity Time To Value The Digital Floodgates have opened… and will never be turned off again
  • 13. Big Data equals Big Opportunity Data Source & Type Untouched Value New Possibilities 88OF BIG DATA 15TRILLION $ Universal Access Time To Value OF COMPANIES % % 1
  • 14. Trends for BIG DATA In the Cloud
  • 15. Trends for BIG DATA Personal ETL
  • 16. Trends for BIG DATA NoSQL
  • 17. Trends for BIG DATA Hadoop
  • 18. Trends for BIG DATA Data Lake
  • 19. Trends for BIG DATA Ecosystem
  • 20. Trends for BIG DATA Internet of Things
  • 21. Big Data Trends 1. Big Data in the Cloud 2. Personal ETL 3. NoSQL 4. Hadoop 5. Data Lakes 6. Big Data Ecosystem 7. Internet of Things
  • 22. BIG DATA is STILL just Data It needs to be translated into Answers
  • 23. Acquire, Grow & Retain Customers Who are your best customers and how can you keep them satisfied? Where can you find more customers like them? Big data holds the insights into who your customers are and what motivates them.
  • 24. Optimise Operations & Reduce Fraud Are your operational processes and systems as efficient as they could be? Could you reduce waste and fraud if you had real-time visibility into your business? Adopting a big data and analytics strategy can help you plan, manage and maximise operations, supply chains and the use of infrastructure assets.
  • 25. Transform Financial Processes Do you have real-time access to reliable information about all aspects of your business? Do you have the visibility, insight and control over financial performance to better measure, monitor and shape business outcomes? Analysing all of your data, including big data, can drive enterprise agility and provide insights to help you make better decisions
  • 26. Manage Risk How can you mitigate the financial and operational risks that could devastate your organisation? How can you manage regulatory change and reduce the risk of non-compliance? Proactively identifying, understanding and managing financial and operational risk can enable more risk-aware, confident decision making
  • 27. Create New Business Models Are your competitors making bigger strides in changing your industry or creating new markets than you? Does your organisation’s culture support innovative thinking and exploration? Explore strategic options for business growth, using new perspectives gained from exploiting big data and analytics
  • 28. Improve IT Economics Is your existing IT infrastructure able to provide the insights that decision makers need? Are you doing enough to protect your data centre and data from potential criminal activity or fraud? Lead the creation of new value and agility for your business by optimising big data and analytics for faster insight at a lower cost
  • 29. Analytics Trends 1. Data Governance 2. Social Intelligence 3. Analytics Organisation-Wide 4. Community Collaboration 5. Integration of Everything 6. Cloud Analytics 7. Conversational Data 8. Journalism Data 9. Mature Mobility 10.Smart Analytics
  • 30. Areas BIG DATA is Helping 1. Operations & Optimising 2. Product Development 3. Customer Experience 4. Understanding and Targeting Customers
  • 31. Performance Examples Actian is Helping These Companies Achieve Leadership Digital Marketing: Hyper-segmentation every hour Banking: Enterprise Risk every 2 minutes Retail: Enterprise Market Basket Analysis every minute Defense: Network intrusion models every second Fraud: Adjustments every nano-second Amazon Redshift – Actian Matrix Cloud-based, Petabyte Scale Data Warehouse
  • 32. The Value of Business Intelligence Organisations competing with Analytics Substantially OUTPERFORM their peers by 220%
  • 34. Actian Vector: Example https://youtu.be/dYTF5ZNioEI Identical 150 Million Transaction Query Comparison between Actian Vector & Oracle DBMS
  • 36. Overview KNIME & Big Data Tom Park
  • 37. Gartner 2016 Magic Quadrant Advanced Analytics Platforms Niche Players (5): FICO Lavastorm Megaputer Prognoz Accenture Leaders (5): SAS IBM KNIME RapidMiner Dell Visionaries (4): Microsoft Alteryx Alpine Data Labs Predixion Challengers (2): SAP Angoss
  • 38. Changes from 2015 to 2016 X Salford & TIBCO Dropped due to not satisfying the visual composition
  • 39. Main Big Data Technologies NO SQL
  • 41. KNIME Big Data Extensions
  • 45.
  • 47. KNIME @ OSR Anand Antony Senior Data Analyst Operations Analytics and Intelligence Office of State Revenue anandjantony@gmail.com Ph. 0414491765
  • 48. OSR: Who are we?  As NSW’s principal revenue agency, OSR administers state taxation and revenue for, and on behalf of, the people of NSW ◦ Payroll tax ◦ Land tax ◦ Duties ◦ Grants such as First Home Benefits
  • 49. Data Analytics Team: Who are we?  Operations Analytics & Intelligence is the analytics wing of the Operations Division in OSR ◦ Three teams – Business Intelligence, Data Analytics and Data Team  Data Analytics team consists of 10 analysts  Supports tax auditors by detecting possible non- compliant clients ◦ Via matching data from various sources and analysing them ◦ 60+ data sources
  • 50. Data Analytics Scenario - Past  Data matching, preparation and analysis ◦ SPSS Clementine, SAS Enterprise Guide  Data mining ◦ Salford Systems  Reporting/Dashboards ◦ Excel  Fuzzy data matching ◦ SSA Name (Informatica)
  • 51. Data Analytics Scenario - Current  Data matching, preparation and analysis ◦ KNIME (around 70% transitioned from Clementine/SAS)  Data mining ◦ Salford Systems ◦ Will be evaluating KNIME  Reporting/Dashboards ◦ Excel  Fuzzy data matching ◦ SSA Name (Informatica)
  • 52. Internal&ExternalDataSources Data Governance Data Quality Data Matching Metadata Management MapR Hadoop Distribution Data Lake VortexMapR Advanced Data Analytics Actian/Knime Machine Learning H2O/ Spark Actian/Knime Future: Unified Analytic & Data Management Platform Governance Visualisation Presentation Layer Datamart On the fly / Sandpit Spotfire/ Tableau/ Graph DBs
  • 53. Why KNIME?  Enrich with coding via coding snippets ◦ Mostly Java snippet at the moment  Start with canvas programming  Fast and easy learning curve for data scientists  Can tackle almost any analytic task
  • 54. KNIME - Having the best of both worlds! ◦ Canvas programming  Coding
  • 55. What do we use KNIME for?  Pretty much for everything! (except reporting and datamining) ◦ Data reading (text files, databases, non- standard formats) ◦ Data merging (potentially fuzzy matching too in future) ◦ Data manipulation ◦ Creating new variables ◦ Data Output ◦ Modelling (possibly in future)
  • 56. Key nodes/functionalities ◦ Sorter, Column Reorder, Column Filter, Column Rename ◦ Concatenate, Joiner, Reference Row Filter (anti- join) ◦ Missing value ◦ Math Formula, String Manipulation, Rule Engine, Java Snippet ◦ GroupBy (aggregate, dedupe) ◦ Value Counter, Pivoting ◦ Looping ◦ Regular expressions/wildcards in various nodes
  • 58. Case study 1  Officers fill in a questionnaire on the entity audited – one excel spreadsheet for one entity  Collate all the spreadsheets stored in a location  Massage the data to produce an analysis dataset with one row per entity  Key KNIME nodes/functionalities used ◦ List files ◦ Table Row to Variable Loop Start, Loop End ◦ Java Snippet
  • 61. Bring data to tabular form Within this Meta node, there is one Java Snippet for each question in the questionnaire
  • 62. Details of a Java Snippet
  • 63. Result of the Meta Node To get a single record for a client - Just take the last row for a “client block”! - Explained in the next slide
  • 64. For each “client block” aggregate the variables
  • 66. Case study 2 – Use of Flow variables  Technique ◦ Input metadata rules into a file ◦ Read and convert into flow variables  Example ◦ Reorder variables in a dataset as per the order in the data dictionary ◦ We use “Flow variables” tab in Column Reorder tab to achieve this
  • 67. Use of flow variables Use this tab Do not use this “manual” tab
  • 68. KNIME wishlist!  Offset function in some nodes eg. Rule Engine, Math formula Offset function gives the value of a variable in a previous row. Eg. In SPSS Clementine @OFFSET(var,1) gives the value in the previous row. Note:- Within Java Snippet this is readily achieved since a variable retains its value until it is over-written. Therefore we can conveniently first utilise the value populated from the previous row inside a formula. Then we can update the value from the current row so as to be used in the next row.
  • 70.
  • 72. Apache Spark on KNIME Unleash the power of Big Data on Hadoop
  • 73. The Big Data Problem: Data Volume 1. Storage are getting cheaper 2. Data sources are increasing 3. Thus, data is growing faster YARN But, Still processing them is a problem. Why ?
  • 74. The Big Data Problem: Processing Now, as the memory is cheaper.
  • 75. Why Apache Spark ? Apache Spark is an open source parallel processing framework that enables users to run large scale data analytics across clustered computers. • Speed • Flexible with programming platform • Generality • Run Everywhere
  • 77. Spark Comparison on Calculation of Average
  • 78. List of Spark Nodes
  • 79. Getting the data in and out of Spark Data into Spark Data out of Spark
  • 80. Statistics and Data Manipulation Nodes Statistics Data Manipulation
  • 83. KNIME Spark Executor Architecture
  • 84. Current Supported Hadoop and KNIME Versions Hadoop Versions • Hortonworks HDP 2.2 with Spark 1.2.x • Hortonworks HDP 2.3 with Spark 1.3.x • Cloudera CDH 5.3 with Spark 1.2.x • Cloudera CDH 5.4 with Spark 1.3.x KNIME Versions • KNIME Analytics Platform 3.1 • KNIME Server 4.2
  • 85. Lots of talking… Lets view a demo
  • 86. Data Transformed YOUR DATA. CLEARLY. info@DataTransformed.com.au 02 9956 3781
  • 87. Actian Vortex on Hadoop 10 minute Demo http://videos.actian.com/watch/6iEZqvJrEKL2btoqIDImcg Demonstration of Vortex, Dataflow & Vector Comparison between Actian Vortex & Cloudera Impala
  • 88. Actian Vector: Example https://youtu.be/dYTF5ZNioEI Identical 150 Million Transaction Query Comparison between Actian Vector & Oracle DBMS

Notas del editor

  1. The cloud is everywhere, and we will continue to see adoption at extreme volumes. And big data is driving a lot of clouds growth: Revenues for the top 50 public cloud providers shot up 47% in Q4 of 2013 to $6.2B according to Technology Business Research. Amazon Redshift and Google Big Query are growing dramatically. Database players like Teradata are also jumping in the game. Snowflake
  2. It has been suggested that 80% of an analyst’s time is spent on data prep, while only 20% is spent looking for insights. Enter the personal data cleansing tools focused on the analyst. Tools like Trifacta, Alteryx, Paxata and Informatica Rev are making data preparation easier to use with less technology and infrastructure required to support it. KNIME
  3. Some may think that the jury is still deliberating, but NoSQL is making a mark in the industry. NoSQL was founded to provide scale, flexibility, and the ability to leverage large sets of data faster. Companies like MarkLogic, Casandra, Couchbase, and MongoDB are bringing new innovation to the SQL database market and are doing quite well with large production implementations in surprising places.
  4. Whether you are of the belief that Hadoop will take over current database architecture, or there will be a mix of Hadoop and other styles of databases, one thing is clear, Hadoop is now a part of the big data architecture in many companies. The legacy data storage vendors have incorporated Hadoop into their architecture in one way or another. Some classical database providers have embraced the market leading Hadoop players like Teradata, SAP, and HP. Others, like IBM, have built their own flavor of Hadoop. Spark and Impala continue to mature, putting more pressure on the traditional stack. In any case, Hadoop looks like it is here to stay and is synonymous with big data architectures.
  5. The concept of a big data lake, a large body of data that exists in a natural or unrefined state, is in early stages. This idea answers some fundamental questions around how to effectively store, manage and use the massive amounts of incoming data. The cutting edge companies Google and Facebook have developed useful ways to leverage the data lake, but should be considered early adopters. As it is, the data lake is still in a nascent concept, and we should expect to see advances in managing and securing the big data lake this year. And as Gartner points out, the data lake requires a new kind of management to be effective.
  6. When new ways of doing things come about, it creates a new ecosystem around it. The same holds true for big data. We have new ways to store data, clean data, add content to data, bring in social media, analyze machine data, do deep analysis on data and, of course, visualize data. Over the next year we will see some surprising changes in the current ecosystem. Specifically, we will see MPP (Massively Parallel Processing) databases play a different and less prominent role. Actian Matrix (or more well known as Amazon Redshift)
  7. Your Ford Fusion sends 250GB of data back to Ford, who in turn lets you know that something is wrong with your car. Sounds like fantasy, but hardware and semi-conductor companies are betting on it. Companies like Ford, GE, and Rolls Royce jet engines are just a few examples of companies investing in IoT. In 2015, we will see a greater use from manufacturers. Some technology companies like Cisco will create solutions around the concept to help manage the massive amounts of data.
  8. The cloud is everywhere, and we will continue to see adoption at extreme volumes. And big data is driving a lot of clouds growth: Revenues for the top 50 public cloud providers shot up 47% in Q4 of 2013 to $6.2B according to Technology Business Research. Amazon Redshift and Google Big Query are growing dramatically. Database players like Teradata are also jumping in the game. It has been suggested that 80% of an analyst’s time is spent on data prep, while only 20% is spent looking for insights. Enter the personal data cleansing tools focused on the analyst. Tools like Trifacta, Alteryx, Paxata and Informatica Rev are making data preparation easier to use with less technology and infrastructure required to support it. Some may think that the jury is still deliberating, but NoSQL is making a mark in the industry. NoSQL was founded to provide scale, flexibility, and the ability to leverage large sets of data faster. Companies like MarkLogic, Casandra, Couchbase, and MongoDB are bringing new innovation to the SQL database market and are doing quite well with large production implementations in surprising places. Whether you are of the belief that Hadoop will take over current database architecture, or there will be a mix of Hadoop and other styles of databases, one thing is clear, Hadoop is now a part of the big data architecture in many companies. The legacy data storage vendors have incorporated Hadoop into their architecture in one way or another. Some classical database providers have embraced the market leading Hadoop players like Teradata, SAP, and HP. Others, like IBM, have built their own flavor of Hadoop. Spark and Impala continue to mature, putting more pressure on the traditional stack. In any case, Hadoop looks like it is here to stay and is synonymous with big data architectures. The concept of a big data lake, a large body of data that exists in a natural or unrefined state, is in early stages. This idea answers some fundamental questions around how to effectively store, manage and use the massive amounts of incoming data. The cutting edge companies Google and Facebook have developed useful ways to leverage the data lake, but should be considered early adopters. As it is, the data lake is still in a nascent concept, and we should expect to see advances in managing and securing the big data lake this year. And as Gartner points out, the data lake requires a new kind of management to be effective. When new ways of doing things come about, it creates a new ecosystem around it. The same holds true for big data. We have new ways to store data, clean data, add content to data, bring in social media, analyze machine data, do deep analysis on data and, of course, visualize data. Over the next year we will see some surprising changes in the current ecosystem. Specifically, we will see MPP (Massively Parallel Processing) databases play a different and less prominent role. Your Ford Fusion sends 250GB of data back to Ford, who in turn lets you know that something is wrong with your car. Sounds like fantasy, but hardware and semi-conductor companies are betting on it. Companies like Ford, GE, and Rolls Royce jet engines are just a few examples of companies investing in IoT. In 2015, we will see a greater use from manufacturers. Some technology companies like Cisco will create solutions around the concept to help manage the massive amounts of data.
  9. The cloud is everywhere, and we will continue to see adoption at extreme volumes. And big data is driving a lot of clouds growth: Revenues for the top 50 public cloud providers shot up 47% in Q4 of 2013 to $6.2B according to Technology Business Research. Amazon Redshift and Google Big Query are growing dramatically. Database players like Teradata are also jumping in the game. It has been suggested that 80% of an analyst’s time is spent on data prep, while only 20% is spent looking for insights. Enter the personal data cleansing tools focused on the analyst. Tools like Trifacta, Alteryx, Paxata and Informatica Rev are making data preparation easier to use with less technology and infrastructure required to support it. Some may think that the jury is still deliberating, but NoSQL is making a mark in the industry. NoSQL was founded to provide scale, flexibility, and the ability to leverage large sets of data faster. Companies like MarkLogic, Casandra, Couchbase, and MongoDB are bringing new innovation to the SQL database market and are doing quite well with large production implementations in surprising places. Whether you are of the belief that Hadoop will take over current database architecture, or there will be a mix of Hadoop and other styles of databases, one thing is clear, Hadoop is now a part of the big data architecture in many companies. The legacy data storage vendors have incorporated Hadoop into their architecture in one way or another. Some classical database providers have embraced the market leading Hadoop players like Teradata, SAP, and HP. Others, like IBM, have built their own flavor of Hadoop. Spark and Impala continue to mature, putting more pressure on the traditional stack. In any case, Hadoop looks like it is here to stay and is synonymous with big data architectures. The concept of a big data lake, a large body of data that exists in a natural or unrefined state, is in early stages. This idea answers some fundamental questions around how to effectively store, manage and use the massive amounts of incoming data. The cutting edge companies Google and Facebook have developed useful ways to leverage the data lake, but should be considered early adopters. As it is, the data lake is still in a nascent concept, and we should expect to see advances in managing and securing the big data lake this year. And as Gartner points out, the data lake requires a new kind of management to be effective. When new ways of doing things come about, it creates a new ecosystem around it. The same holds true for big data. We have new ways to store data, clean data, add content to data, bring in social media, analyze machine data, do deep analysis on data and, of course, visualize data. Over the next year we will see some surprising changes in the current ecosystem. Specifically, we will see MPP (Massively Parallel Processing) databases play a different and less prominent role. Your Ford Fusion sends 250GB of data back to Ford, who in turn lets you know that something is wrong with your car. Sounds like fantasy, but hardware and semi-conductor companies are betting on it. Companies like Ford, GE, and Rolls Royce jet engines are just a few examples of companies investing in IoT. In 2015, we will see a greater use from manufacturers. Some technology companies like Cisco will create solutions around the concept to help manage the massive amounts of data.
  10. Acquire, grow and retain customers: Who are your best customers and how can you keep them satisfied?  Where can you find more customers like them?  Big data holds the insights into who your customers are and what motivates them. Analysing big data can help you discover ways to improve customer interactions, add value and build relationships that last.
  11. Optimise operations and reduce fraud: Are your operational processes and systems as efficient as they could be?  Could you reduce waste and fraud if you had real-time visibility into your business?  Adopting a big data and analytics strategy can help you plan, manage and maximise operations, supply chains and the use of infrastructure assets. Gain the insights you need to reduce costs, increase efficiencies and productivity, and limit threats.
  12. Transform financial processes Do you have real-time access to reliable information about all aspects of your business?  Do you have the visibility, insight and control over financial performance to better measure, monitor and shape business outcomes?  Analysing all of your data, including big data, can drive enterprise agility and provide insights to help you make better decisions
  13. Manage risk How can you mitigate the financial and operational risks that could devastate your organisation?  How can you manage regulatory change and reduce the risk of non-compliance?  Proactively identifying, understanding and managing financial and operational risk can enable more risk-aware, confident decision making.
  14. Create new business models Are your competitors making bigger strides in changing your industry or creating new markets than you?  Does your organisation’s culture support innovative thinking and exploration?  Explore strategic options for business growth, using new perspectives gained from exploiting big data and analytics.
  15. Improve IT economics Is your existing IT infrastructure able to provide the insights that decision makers need?  Are you doing enough to protect your data centre and data from potential criminal activity or fraud?  Lead the creation of new value and agility for your business by optimising big data and analytics for faster insight at a lower cost.
  16. Just as the business intelligence landscape has transformed to self-service data, so too must governance transform. Simple approaches like locking down all enterprise data won’t work any longer—nor will the approach of doing away with any process at all. Organizations will begin to investigate what governance means in a world of self-service analytics. In 2014 we saw organizations begin to analyze social data in earnest. In 2015, the leading edge will start to take advantage of their capabilities. Tracking conversations at scale via social will let companies find out when a topic is starting to trend and what their customers are talking about. Social analytics will open the door to responsive product optimization. Today’s data analyst may be an operations manager, a supply chain executive or even a salesperson. New, easier to use technologies that provide browserbased analytics let people answer ad-hoc business questions. Companies that recognize this as a strategic advantage will begin to support everyday analysts with data, tools and training to help them do what they’re doing. The consumerization of IT is no longer theoretical, it’s a fact. People use products that they enjoy using, and analytics software is no different. Companies whose products inspire and empower are seeing their communities flourish. And prospective customers will also look to the health of product communities as important proof points in crowded marketplaces. The last 10 years have seen a massive amount of innovation across the data space, resulting in mixed environments for everything from data storage to analytics to business applications. We won’t see a return to the age of monolithic systems. However, organizations are losing patience with multiple logins and clunky processes to move and manage data. Rapid integration leveraging simple interfaces is going to become the standard. In 2015, we’ll start to see the first major use of cloud analytics—for onpremise data. Til now, cloud analytics have been primarily used for data in cloud apps. In 2015 companies will begin to choose the cloud when it makes sense for their business case, not only because the data is there. We are starting to see an age when data is interactive enough that it can become the backbone of a conversation. Now that people have speed-ofthought analytical tools, they can quickly analyze data, mash it up with other data and redesign it to create a new perspective. And as a result of these data conversations, organizations will get more insight from their data. The arrival on the scene of vox and continued ascendance of sites like fivethirtyeight.com will force more newsrooms to integrate data analytics into their online presence. This trend will have a spillover effect from the public sphere to organizations, encouraging companies that are lagging in analytics to get with the times. Workers are spending less time at their desks. But that doesn’t mean they should be less informed by data; in fact they have a greater need for data than ever before. Mobile solutions for many analytics emerged years ago and are finally reaching a level of maturity that means that mobile workers really can do light analysis from the road. And the emphasis on mobile has forced vendors to offer more natural and intuitive interfaces across the board. Advances in graphical, intuitive modeling will mean that business users can begin to use predictive analytics without the need for extensive expert consultation or scripting. As self-service analytics becomes more mainstream, tasks such as forecasting and prediction, will become more common– and a lot less painful.
  17. Since graduating with a Master of Statistics , Analytics has been a core theme in my 20+ year career. Using data to solve problems is a passion that drives me to seekout and apply technology innovatively. In the new digital world, I aim to be a champion and an evangelist to the principle of "Evidence based Decision Making". Currently Director Risk Analytics Deloitte Australia
  18. A Data Analyst with 15 years of experience (Taxation - 10 years, Data driven marketing - 5 years) Experience across a spectrum of data analysis tasks (exploratory analysis, developing risk/predictive variables, predictive modelling, reporting) Well developed programming skills in a range of data analysis softwares such as Knime, SAS, SPSS Clementine (IBM Modeller) He’s a highly regarded Data Analyst at OSR.
  19. A Data Analyst with 15 years of experience (Taxation - 10 years, Data driven marketing - 5 years) Experience across a spectrum of data analysis tasks (exploratory analysis, developing risk/predictive variables, predictive modelling, reporting) Well developed programming skills in a range of data analysis softwares such as Knime, SAS, SPSS Clementine (IBM Modeller) He’s a highly regarded Data Analyst at OSR.
  20. In Slide Show mode, click the arrow to enter the PowerPoint Getting Started Center.