Kognitio Webinar: Showcasing the Data Scientist Lab functionality with External Scripting and how it can be used to run ‘R’ in an MPP environment
April 18, 8:00am pst, 11:00am est, 4pm bst, 5pm cest
Duration: 45mins plus Q&A
Register
Dr. Sharon Kirkham, Principal, Kognitio Analytics Center of Excellence, showcases the power of external scripting with a demonstration of the ‘R’ statistical language, running in the massively parallel Kognitio Analytical Platform environment.
2. Today’s Web Seminar -
Presenters Host
Michael Hiskey
Vice President
Marketing & Business Development
Format &
Agenda
Keynote Presenters
Dr. Sharon Kirkham
Data Scientist
Kognitio Analytics Center of Excellence
• Big Data and Complexity– the need for Data Scientists
Question Break #1
• Data Manipulation – functional demonstration
Question Break #2
• Product forecasting with parallel R ‐ practical demonstration
Question Break # 3
4. The Data Science Lab
Data
Scientists &
Staff
Mathematic
Algorithms
MPP
Computing
BIG DATA
11
5. What do business users want to do?
Find patterns
Track life
time
journeys
Predict
behavior
Forecast
scenarios
Allocate
scarce
resources
Model
value
Characterize
groups
Visualize
discovery
Respond,
trigger,
manage,
promote
6. I’m a data scientist! Are you?
Entry level skills and development - aspiration
Machine
Learning
Graduates
7. I’m a data scientist! Are you?
Business
Expertise
Machine
Learning
Interpretation
skills
= Insight
Graduates
Need
guidance
Data
Scientist
9. Supporting the data scientist
Typical process – direct data preparation
Database
SQL processing
10. Supporting the data scientist
Typical process – produces analytical data set
Database
SQL processingData Set
11. Supporting the data scientist
Typical process – run analytics from server
Database
SQL processingData Set
???
12. Supporting the data scientist
Typical process – data samples often used
Database
SQL processingData Set
???
Data Samples
Process run
iteratively
= slow
13. Supporting the data scientist
Typical process – modelling process is honed
Database
SQL processingData Set
???
Data Samples
Process run
iteratively
= slow
14. Supporting the data scientist
Typical process – model is complete
Database
Data Set
???
15. Supporting the data scientist
Typical process – score full data (Ouch!)
Database
Data Set
???
Full data
to score
16. Supporting the data scientist
Push processes to DB – still produce analytical data set
Analytical Platform
SQL processingData Set
17. Supporting the data scientist
Push processes to DB – translate specific processes
Analytical Platform
SQL processingData Set
???
Translation
18. Supporting the data scientist
Push processes to DB – results passed back
Analytical Platform
SQL processingData Set
???
Translation
Result Data Set
19. Supporting the data scientist
Push processes to DB– modelling process is honed
Analytical Platform
SQL processingData Set
???
Translation
Result Data Set
20. Supporting the data scientist
Push processes to DB– model scoring done in DB
Analytical Platform
SQL processingData Set
???
Result Data Set
21. Supporting the data scientist
But we always want more! Complex data structure
Analytical Platform
Data Set
???
Result Data Set
SQL cannot handle
Data complexity.
How do I integrate
into my model?
22. Supporting the data scientist
But we always want more! non-standard processes
Database
SQL processingData Set
???
Data Samples Back where
we started
23. Supporting the data scientist
Bring Analytics to data – still produce analytical data set
SQL processing
SQL processing
24. Supporting the data scientist
Bring Analytics to data – can use other code for data prep
SQL processing
Kognitio scripting
Code executed
Using MPP
Data held in
Memory. Fast
access to CPUs
25. Supporting the data scientist
Bring Analytics to data – run analytics natively in Kognitio
SQL processing
Kognitio scripting
Code executed
Using MPP
Data held in
Memory. Fast
access to CPUs
One platform flexible working
from data prep through analytical
process
26. New! Kognitio version 8:
Enabling and extending the Analytical Platform
External Tables
External Functions
Not Only SQL
Hadoop Connector Other Connectors
Kognitio Storage
as an External table
General Availability:
June 2013
27. External Scripting – Data Transformation
Converting structured data into
XML format, i.e. furnishing
personalised content
Assembly
Converting XML into structured
data
Disassembly
Extracting complex information
from URLs
Pulling words from large text fields,
i.e. sentiment analysis
Parsing
Converting row based information
into columns for data mining,
i.e. supporting classification or
segmentation
Transposition
e.g. using perl
Examples where SQL is typically complex and extensive
30. R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
31. R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
Kognitio
platform
specification
16 servers
462GB
Kognitio
RAM
128 Cores
This is old kit
2.9 billion
rows of
epos
184 day time series
for 12K products
32. R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
33. R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
1 output table
in RAM
128 parallel
instances of R
34. R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
Application &
Client Layer
ExcelAll BI Tools
35. R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
Application &
Client Layer
ExcelAll BI Tools
13 views of
different analytical
output
36. R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
Application &
Client Layer
ExcelAll BI Tools
Result set
contained
# rows
12K forecasts and
stats calculated
in # seconds
2.9B EPOS items
collated into
time series
in # seconds
38. Thank you for your participation today
• More information on today’s topic can be found at:
• kognitio.com/mpp_r
• kognitio.com/product‐forecasting
• FREE TO USE – perpetual license
– www.kognitio.com/free
– Contact us for the pre‐release version 8
• Analyst White Papers
– EMA Comparative Analysis
– In‐memory database platforms
– www.kognitio.com/emacompinmem
• Today’s slides (and more): www.slideshare.net/Kognitio