SlideShare a Scribd company logo
1 of 73
Download to read offline
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Exploratory Analysis in the Data Lab
Team-Sport or for Nerds only?
Harald Erb
Oracle Business Analytics & Big Data
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
ā€¢ Harald Erb
ā€¢ Principal Sales Consultant
ā€¢ Information Architect
ā€¢ Kontakt
+49 (0)6103 397-403
ā€¢ harald.erb@oracle.com
Kontakt
DOAG 2016 Konferenz, NĆ¼rnberg 2
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Characteristics of Digital Business Leaders
DOAG 2016 Konferenz, NĆ¼rnberg 3
They ā€˜Reframeā€™ Challenges
Looking at them from new
perspectives and multiple
angles
They Sprint
They work at pace - researching,
testing and evaluating current
ideas while generating new ones
They Appreciate That
Failure Can Be Good
and are not afraid of new ideas
They Convert Data Into Value
They invest heavily in analyzing
their own data and data from
external sources to establish
patterns and un-noticed
opportunities
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Synergizing Skills
4
Perf.
Mgmt.
Knowledge
Discovery
Dynamic Dashboards
and Reports
Volume and Fixed
Reporting
Knowledge Driven Business
Process
Executive:
Decisions effecting
strategy and direction
Business Analysts:
Day-to-Day performance
of a business unit
Information Consumer:
Reporting on
individual transactions
Automated Process:
Decisions effecting
execution of an
indiv. transactions
Insight
Data Scientists:
Information analysis to
meet strategic goals
BICC
Analytical Competence Center (ACC)
Ā» Separate group reporting to CxO. not
part of a Business Intelligence
Competence Center (BICC)
Ā» Mission: broadening the adoption
of Analytics across the organization
Ā» Skilled resource pool of Data
Scientists, Statisticians and Business
Experts
Ā» Data-driven approach (not
development-driven) with privileged
access to enterprise data sources
Ā» Group will be assigned to projects
for a limited time
ACC
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Enabling Data-driven Decisions
5
Identify(business)question
Become clear
about all
aspects of the
decision to be
taken or the
problem to be
solved.
Try to identify
alternatives to
your percep-
tion
Verifyearlierfindings
Find out who
has investi-
gated such or
a similar
problem in the
past and the
approach that
has been taken
Designofasolutionmodel
Formulate a
detailled
hypothesis
how specific
variables
might
influence the
result of the
chosen model
Gatherallnecessarydata
Analysethedata
Present&implementresults
Gather all
available
information
about the
variables of
your hypo-
thesis. The
relevance of a
dataset might
address your
business
question
directly or
needs to be
derived
Apply a
statistical
model and
evaluate the
correctness of
the approach.
Repeat this
procedure
until the right
method has
been
identified.
Frame the
results obtained
in a compre-
hensible story.
This kind of
presentation
intends to
motivate
decision makers
and relevant
stake-holders
to take action
ļ‚Œ ļ‚ ļ‚Ž ļ‚ ļ‚ ļ‚‘
Non-Analysts & Executives: should take a closer look on steps 1 and 6 of the
analysis process if they plan to make use of statistical analysis.
DOAG 2016 Konferenz, NĆ¼rnberg
Knowledge Discovery
Adopted from Thomas H. Davenport, Harvard Business Manager 2013
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Projects: Process
6DOAG 2016 Konferenz, NĆ¼rnberg
AdoptedfromHugenberg2011-S.168
Week
Task
Create Work Plan
Hypothesis
Business
question
Analysis
Source
Create Analysis Plan
Structure Problem
What?
How?
Hypothesis
Yes
No
?
Why?
Define Problem
Fundamental business question to
be solved:
Problem area:
Root of
problem:
Decision
maker:
Decision
criteria:
Boundaries of
problem
handling:
Solution
limitations:
ā€¢Necessary information?
ā€¢Available Information?
Which quality?
ā€¢Data owner?
ā€¢Available data sets?
ā€¢Business problem?
ā€¢What is at issue?
ā€¢What needs to be
analyzed?
ā€¢Precise goal definition
ā€¢Deliminations
ā€¢Useful data / structure?
ā€¢Hypothesis definition
ā€¢Verify correlations
ā€¢Descriptive analysis
ā€¢Data preparation
ā€¢Select
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Lab: Key Requirements
Based on
Raw Data
Full Access to Data
Sources
(Select only)
Complete
Sandbox
Environment
Agile
Experimentation,
Fail Fast
7DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Lab Scenario
Sandbox
Data Management
DOAG 2016 Konferenz, NĆ¼rnberg 8
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Stages of Data Transformation
Refinement of Raw Data
DOAG 2016 Konferenz, NĆ¼rnberg
Signal Data Information Knowledge Wisdom
L0 - Ingestion L1 - Cleansed L2 - Normalised
Accounts
Parties
Account
Parties
Party
Addresses
Party
Contacts
Party IDs
Party
Events
Party
Ratings
Account
Limits
Party
History
Collaterals
Account
Collaterals
Party
Collaterals
Account
Balances
Account
Relations
L3 ā€“ Presented
Customer
Dimension
Account
Dimension
Currency
Dimension
Product
Dimension
Organization
Dimension
Calendar
Dimension
Account Daily
Facts
Account
Transactions
Transaction
Types
Channel
Dimension
CoA
Dimension
Company
Dimension
ā€¢Format/Domain checks
ā€¢Completeness checks
ā€¢Duplicates detection
ā€¢Not null validations
ā€¢Enrichment
ā€¢Record level cleansing and
business rules
ā€¢Referential integrity
ā€¢Context based business rules
and quality checks
ā€¢Aggregate level checks
ā€¢Derived and enriched data for
Self-service Business Intelligence
ā€¢File validation
ā€¢Row completeness
ā€¢Raw Data Stores for
Data Science
Know nothing Know what Know how Know why
Data WarehouseData Lake
Source Systems
Addressing a
key requirement
for Data Labs
9
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Management: Architecture (Logical View)
DOAG 2016 Konferenz, NĆ¼rnberg 10
Line of Governance
Data Lake
Data
Processing
Data
EnrichmentRaw Data
Sets
Curated &
Transformed
Data Sets
Data
Aggregation
Data Lab
Sandboxes
Data Catalog Data Discovery
Transformations
Prototyping
Analytic Tools
Enterprise
Information
Store
Operational
Data Store
Data Federation &
Virtualization Layer
CommonSQLAccessto
ALLData
Orchestration, Scheduling & Monitoring
Metadata Management
Data
Ingestion
Batch
Integration
Real-Time
Integration
Data
Streaming
Data
Wrangling
Reporting /
Business
Intelligence
Data Driven
Applications
Advanced
Analytics
Non-structured
Sources
Logs
Social
Media
External
Data
Interactions
Structured Data
Master Data
Applications
Channels
Data Stores
Adhoc Files
or Relational
Data Sets
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Management: Oracle Platform
DOAG 2016 Konferenz, NĆ¼rnberg 11
Non-structured
Sources
Logs
Social
Media
External
Data
Interactions
Structured Data
Master Data
Applications
Channels
Data Stores
Oracle Software
Cloudera CDH 5.7+ /
Apache Software
Oracle Platform
Oracle ExadataOracle Big Data Appliance
Oracle Exalytics
Oracle x86 Servers
Orchestration, Scheduling & Monitoring
Metadata Management
Reporting /
Business
Intelligence
Data Driven
Applications
Advanced
Analytics
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Management: Functional Areas
DOAG 2016 Konferenz, NĆ¼rnberg 12
Non-structured
Sources
Logs
Social
Media
External
Data
Interactions
Structured Data
Master Data
Applications
Channels
Data Stores
Oracle Software
Cloudera CDH 5.7+ /
Apache Software
Oracle Platform
Orchestration, Scheduling & Monitoring
Metadata Management
Oracle ExadataOracle Big Data Appliance
Data
Ingestion
Data Store Data Discovery &
Analyze
Unified Data
Services
Process
Online
Lifecycle/Governance
Data Warehouse
Reporting /
Business
Intelligence
Data Driven
Applications
Advanced
Analytics
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Management: Data Discovery & Analytics
DOAG 2016 Konferenz, NĆ¼rnberg 13
Reporting /
Business
Intelligence
Data Driven
Applications
Advanced
Analytics
Non-structured
Sources
Logs
Social
Media
External
Data
Interactions
Structured Data
Master Data
Applications
Channels
Data Stores
Oracle Software
Cloudera CDH 5.7+ /
Apache Software
Oracle Platform
Oracle ExadataOracle Big Data Appliance
Oracle
GoldenGate
for Big Data
Flume
& Kafka
Oracle Data
Integrator
Oracle
Stream
Analytics
Scoop
Oracle
NoSQL DB
Kudu
(Relational)
Filesystem
(HDFS)
HBase
(NoSQL)
Oracle Data Integrator
Batch
(Map Reduce, Hive, Pig, Spark)
Stream (Spark)
HBase (NoSQL)
WebHDFS,
Fluentd,
Storm, Tika
....
Oracle Database
Oracle SQL
Database Security
(Roles, View, VPD, ā€¦)
Oracle Advanced Analytics
Oracle Advanced Security
In-Memory
Data
Ingestion
Data Store
Process
Online
Data WarehouseData Discovery &
Analyze
Security
(Sentry+RecordService)
ResourceManagement
(Yarn)
Unified Data
Services
Search (Solr)
SQL
(Impala)
Model
(Spark ML)
Big Data Spatial
& Graph
Adv. Analytics
for Hadoop
Big Data
Discovery
Oracle Big Data SQL
OracleBigData
Connectors
Cloudera Navigator
Lifecycle/Governance
Oracle Enterprise Metadata Management (OEMM)
Oracle Data Factory Engine | Oracle Data Integrator | Oracle Enterprise Data Quality
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Lab Scenario
Exploratory Analysis
Oracle Big Data Discovery
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Activities
Data Lab
15DOAG 2016 Konferenz, NĆ¼rnberg
Download from: http://www.the-modeling-agency.com/crisp-dm.pdf
Generic tasks (bold) and
outputs (italic)
CRISP-DM reference model
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Discovery
16
Know nothing Know what Know how Know why
Signal Data Information Knowledge Wisdom
(Operational) Business Intelligence (100ā€¦1000+ User)
Data WarehouseData Lake
L0 - Ingestion L1 - Cleansed L2 - Normalised
Accounts
Parties
Account
Parties
Party
Addresses
Party
Contacts
Party IDs
Party
Events
Party
Ratings
Account
Limits
Party
History
Collaterals
Account
Collaterals
Party
Collaterals
Account
Balances
Account
Relations
L3 ā€“ Presented
Customer
Dimension
Account
Dimension
Currency
Dimension
Product
Dimension
Organization
Dimension
Calendar
Dimension
Account Daily
Facts
Account
Transactions
Transaction
Types
Channel
Dimension
CoA
Dimension
Company
Dimension
Source Systems
Oracle BI
Plattform
Common
Enterprise
Information
Model
Oracle BI Data
Visualization
Oracle BI Dashboards,
(Ad-hoc) Reports, ā€¦
Data Projects (1ā€¦20+ User)
Oracle Data
Visualization Desktop
DOAG 2016 Konferenz, NĆ¼rnberg
Oracle Big Data Discovery
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Team Sport: One tool for Business Analysts and Data Scientists
Oracle Big Data Discovery
17
DWH /
OLTP
Databases
Database
Administrator
(Enterprise IT)
Hadoop
Data
Integration
Specialist
(Enterprise IT)
Data
Engineer
Data
Science
Discovery
Output
Business
Analyst
New KPI, Report
Requirement
Data
Scientist
New Data Set
(cleaned / enriched)
Members of
the same
Data Project
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Analysis Scenario 1: Prototype Testing
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Analysis Scenario 1: Prototype Testing
1. A flexible environment to exploit all available data for prototype testing discovery
2. Can driver comments really add value to our prototype testing discovery?
3. What is the relationship between errors?
Oracle Confidential ā€“ Internal/Restricted/Highly Restricted
Telemetry
3
1
2 Errors
Driver Comments
Analysis &
Dashboarding
Discovery
Lab
1.2 Billion rows
at 100Hz
Data Platform
FactoryStorage
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | DOAG 2016 Konferenz, NĆ¼rnberg 20
Analysis Scenario 2: Investigate Car Complaints
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | DOAG 2016 Konferenz, NĆ¼rnberg 21
M I S S I O N
Analysis Scenario 2: Investigate Car Complaints
Help the Quality Team to trace back warranty claims and
support issues to reduce warranty cost and minimize supplier risk,
in order to improve product quality and customer satisfaction.
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Available Data
DOAG 2016 Konferenz, NĆ¼rnberg 22
hadoop fs -cat /user/oracle/warranty/claims_full.txt | less
Internal Data
(Warranty Claims)
Additional Data
(i.e. demographics)
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Processing Workflow
Oracle Big Data Discovery mit Daten versorgen
23
File Upload
BDD Studio
Big Data Discovery
Data Proc. Client
New Data Set in
BDD Data Catalog
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 24
1
2
5
43
Data Loading ā€“ Data Ingest Overview
1. Data ingest is triggered by
data upload, the command
line interface, or the Hive
Table detector
2. Records are read and
sampled into Spark
3. Data profiling occurs, to
determine schema, search
configuration and which
enrichment apply
4. Auto enrichments are
performed
5. Data is ingested into Big
Data Discovery
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
About the Sampling technique
25DOAG 2016 Konferenz, NĆ¼rnberg
ā€¢ BDD leverages a Simple Random Sampling algorithm
ā€“ Each individual is chosen randomly and entirely by chance with the
same probability of being chosen at any stage during the sampling
process. Each subset of k individuals has the same probability of being
chosen for the sample as any other subset of k individuals
ā€¢ Sampling is dependable
ā€“ Accuracy is about the size of the sample, not the size of the source
ā€¢ A 1M random sample provides more than 99% confidence that the answer is
within 0.2 % of the value shown, no matter how big the source dataset is
(1B/1T/1Q+).
ā€¢ Sampling makes interactivity cheap
ā€“ Will you pay 10, 100, 1000x the cost to get the last <<1% of the
confidence?
ā€¢ Maybe sometimes, but not in discovery and not for every dataset
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Profiling & Enrichments
26DOAG 2016 Konferenz, NĆ¼rnberg
ā€¢ Profiling is a process that determines the characteristics (columns) in the Hive tables,
for each source Hive table discovered by Big Data Discovery during data processing.
ā€“ Attribute type determination (discovery)
ā€¢ Includes strings to dates, geocodes, long or boolean
ā€“ Attribute value distributions
ā€“ Determines attribute searchability
ā€“ Provides ā€œhintsā€ to Studio as to what content (components) should be on the default Project page
ā€¢ Enrichments are derived from a data set's additional information such as terms,
locations, the language used, sentiment, and views. Big Data Discovery determines
which enrichments are useful for each discovered data set, and automatically runs
them on samples of the data. As a result of automatically applied enrichments,
additional derived metadata (columns) are added to the data set, such as geographic
data, a suggestion of the detected language, or positive or negative sentiment.
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Find
27
ā€¢ A rich, interactive
catalog of all data in
Hadoop
ā€¢ Familiar search and
guided navigation for
ease of use
ā€¢ Data set summaries,
user annotation and
recommendations
ā€¢ Personal and enterprise
data upload to Hadoop
via self-service
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Find
28
ā€¢ A rich, interactive
catalog of all data in
Hadoop
ā€¢ Familiar search and
guided navigation for
ease of use
ā€¢ Data set summaries,
user annotation and
recommendations
ā€¢ Personal and enterprise
data upload to Hadoop
via self-service
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Like shopping online for your data
Find ā€“ Data Sets Tab
29
Navigate
ā€¢ Project or Dataset ā€“ by
Author and Tags
ā€¢ Contains ā€“ datetime or Geo
ā€¢ Number of records or
attributes
ā€¢ Recently Viewed, Most
Popular, Newly Added
Data Quick Look
ā€¢ Data Set Info
ā€“ Tags, Views, Last Updated
ā€“ Project, used, created by
ā€¢ Actions
ā€“ Explore, Add to project, Edit
Tags, Delete
ā€¢ Related Data Sets by data source
ā€“ ā€œOften used with these data
setsā€
ā€¢ Preview
ā€“ First 15 rows, all columns
Search
ā€¢ Keyword
ā€¢ Data Sets
ā€¢ Projects
ā€¢ Data Set Metadata
ā€¢ Project Metadata
ā€¢ Recently Viewed, Most
Popular, Newly Added
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Find (Catalog) ā€“ Data Set Quick Look
30DOAG 2016 Konferenz, NĆ¼rnberg
ā€¢ Data Set Info
ā€“ Tag
ā€¢ Actions
ā€“ Explore
ā€“ Add to project
ā€“ Edit Tags
ā€“ Delete
ā€¢ Summary
ā€“ Views
ā€“ Last Updated
Data Set Info
Quick Look
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Find (Catalog) ā€“ Data Set Quick Look
31DOAG 2016 Konferenz, NĆ¼rnberg
ā€¢ Used in Projects
ā€“ Project name
ā€“ Data Sets used
ā€“ Created by
Data Set
Used in Project
Quick Look
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Find (Catalog) ā€“ Navigation & Search
32DOAG 2016 Konferenz, NĆ¼rnberg
ā€¢ Searches
ā€“ Keyword
ā€“ Data Sets
ā€“ Projects
ā€“ Data Set Metadata
ā€“ Project Metadata
ā€“ Attribute Metadata
Search Everything
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Like shopping online for your data
Find ā€“ Projects Tab
33
Projects Tab
ā€¢ Search and navigate
ā€¢ Project Categories
ā€“Recently Viewed
ā€“Most Popular
ā€“View all
Projects Quick Look
ā€¢ Add Tags, attributes, data
sets, pages
ā€¢ Open Project, Edit Tags,
Delete
ā€¢ Summary
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Find (Catalog) - Projects Tab
34DOAG 2016 Konferenz, NĆ¼rnberg
ā€¢ Searchable
ā€¢ Navigable
ā€¢ Project
Categories
ā€“ Recently
Viewed
ā€“ Most
Popular
ā€“ View all
Search Everything
Discover Analytical
Projects
Navigate by Metadata
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Find (Catalog) - Project Quick Look
ā€¢ Add Tags
ā€¢ Attributes
ā€¢ Data Sets
ā€¢ Pages
ā€¢ Actions
ā€“ Open Project
ā€“ Edit Tags
ā€“ Delete
ā€¢ Summary
35
Project
Quick Look
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Explore
36
ā€¢ Visualize all attributes
by type
ā€¢ Sort attributes by
information potential
ā€¢ Assess attribute
statistics, data quality
and outliers
ā€¢ Use scratch pad to
uncover correlations
between attributes
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Quicklook by Geocode
ā€¢ Overview
ā€¢ Details
ā€¢ Summary stats
ā€¢ Refineable
Explore
Scratchpad
ā€¢ Graphic type
changes as
additional attributes
are added
ā€¢ Autoselects best
visualization
ā€¢ Offers next best
graphics option(s)
Intuitive, machine-guided data exploration
Search, Navigate, Sort
ā€¢ Attributes, refinements,
keyword
ā€¢ Sort by:
ā€“ Name (alpha)
ā€“ Information Potential
ā€“ Relationship to an
attribute
37DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Explore
ā€¢ Search:
ā€“ Attributes
ā€“ Refinements
ā€“ Keyword
ā€¢ Sort order
ā€“ Name
ā€“ Information
Potential
ā€“ Relationship
to an
attribute
ā€¢ Navigable
38
Tiles vs. Tabular view
Navigation menu
Search menu
Sort Order
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Explore ā€“ Sort Attributes
ā€¢ Sort order
ā€“ Name (alpha)
ā€“ Information Potential
ā€¢ Based on Entropy
ā€“ Relationship to an attribute
ā€¢ Based on Information Gain
39
Sort by
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Core Capabilities: Explore ā€“ Quick Look ā€“Geocode
ā€¢ Overview
ā€¢ Details
ā€¢ Summary
stats
ā€“ Refineable
40DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Explore ā€“ Scratchpad
ā€¢ Graphic type
changes as
additional
attributes are
added
ā€¢ Autoselects
best
visualization
ā€¢ Offers next
best graphics
option(s)
41DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Document all relevant information and insights about the original data sources
Working with Metadata
DOAG 2016 Konferenz, NĆ¼rnberg 42
Available Code Books,
Data Set specifications
Document all relevant
information, insights directly
in BDD Studio...
... and make immediately use of it, i.e. via text
search in BDD
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 43
ā€¢ Intuitive, user driven data
wrangling
ā€¢ Data Shaping
ā€¢ Extensive library of
powerful data
transformations and
enrichments
ā€¢ Preview results, undo,
commit and replay
transforms
ā€¢ Test on sample data
then apply to full data
set in Hadoop
Transform - Overview
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 44
Transform ā€“ Function Families
Available via a User Guided Interface
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Full Guided Navigation
Attribute Transformation
Smart Attribute Filtering
Interactive Transform
History
Visual Data
Quality Summaries
45
Data Shaping TAB
Aggregation, Join, etc.
Transform - Overview
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
ā€¢ Uses Groovy - an object-oriented programming language for the Java platform.
ā€“ Code written in the Java language is valid in Groovy
ā€“ It is flexible and easy to use.
ā€¢ Features of the Editor include:
46
ā€“ Syntax highlighting enables color-coding of
different elements in your transformation to
indicate their type.
ā€“ Auto complete lets you view a list of
autocomplete suggestions for the word you're
typing, by pressing Ctrl+space.
ā€“ Error checking includes a built-in static parser
that performs error checking when you preview
or save your transformation.
Transform Editor
Available via a User Guided Interface
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Enrichments
ā€¢ Infer language
ā€¢ Detect sentiment
ā€¢ Identify key phrases, entities,
noun groups
ā€¢ Whitelist tagger
ā€¢ Address and IP geotagger
Function Families
ā€¢ String
ā€¢ Mathematical
ā€¢ Conversion
ā€¢ Conditional
ā€¢ Datetime
ā€¢ Data cleansing
ā€¢ Geotagging
User-driven data wrangling
Commit, Create Data Set
ā€¢ Commit
ā€“ Execute transformations on sample
data available in the app ā€“ this
updates the original collection in the
Dgraph.
ā€¢ Create a Data Set
ā€“ Executes transformation on the
entire data set in HDFS, creating a
new entry in the BDD Catalog &
Hive.
ā€¢ Rollback
ā€“ Attributes created by a
transformation, then deleted, will
be removed on Commit.
47
Data Shaping
ā€¢ Aggregate
ā€¢ Persistent Join
ā€¢ Row Filter
Transform - Overview
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
ā€¢ Geo Tagging
ā€“ Partial Address match
ā€“ Address disambiguation
(based on population)
ā€“ IPv4 support
ā€¢ Numeric Transformation
ā€“ Round, Ceiling, Floor,
Absolute Value
ā€“ Operators (+ - / *), sqrt, mod,
log, ln, min/max,
sin/asin/sinh, cos/acos/cosh,
tan/atan/tanh
ā€¢ String
ā€“ Split, Trim, Uppercase,
Lowercase, Titlecase,
Concatenate
ā€¢ Type Conversion
ā€“ Boolean, Datetime, Double,
Integer, String, Long, Time,
Geocode
ā€¢ Datetime (accessed via
Transformation Editor)
ā€“ Date Diff, Date Add, getYear,
Month, Day, Hour, Minute
ā€“ Truncate Date
ā€¢ Data Cleansing
ā€“ Binning numerics
ā€¢ Conditional
ā€“ Accessed via Transformation
Editor
ā€¢ If /else if statement
ā€“ Optionally based on refinement
state
ā€“ Can be nested
ā€¢ Drag/drop or double-click
functions and attributes onto
editor
48
Transform ā€“ Function Families
Transformation Functions
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
ā€¢ In addition to transforming data, Big Data Discovery allows for users to enrich data sets
in numerous ways:
ā€¢ Term Extraction ā€“ Extract relevant phrases and noun groups from unstructured text
ā€¢ Entity Recognition ā€“ Find people, places and organizations mentioned in unstructured text
ā€¢ Sentiment Analysis ā€“ Determine the document and sub-document level sentiment of unstructured
text
ā€¢ Geo Tagging ā€“ Generate standardized geographic information based on an unstructured address or
IP address
ā€¢ Language Detection
ā€¢ Applied via User Guided Interface
49
Transform ā€“ Function Families
Enrichment Functions
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Windows Metro
Driver Issues
Usb Port
Light Gaming
Gaming Laptop
Key Phrases
Document Sentiment
Detected Language
ā€¢ I was looking for a laptop mainly to browse and for mult
imedia purpose and found this inexpensive built-like-a-t
ank Dell XPS 14. I am very impressed with the build qua
lity, stunning looks and there is absolutely a very nice fe
el to it. I have many MBPs at home and this one is bette
r in appearance and worked out of the box with minima
l set up. I was up and running in 10 minutes or so.
ā€¢ I found the touch pad not up to par with Apple track pa
d but with the latest driver update it came pretty close.
I uninstalled the McAfee software and went with Norto
n security as it comes free with Comcast. There is a new
BIOS update from February 2014 and also audio driver
update that you may want to apply immediately. Do the
Wifi-BT update as well. en
POSITIVE
50
Transform ā€“ Function Families
Text Enrichments
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Extract Key Phrases from Freetext
Text Enrichment
DOAG 2016 Konferenz, NĆ¼rnberg 51
ļ‚Œ
ļ‚
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Determine sentiment and extract related phrases
Text Enrichment
DOAG 2016 Konferenz, NĆ¼rnberg 52
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
ā€¢ Partial Address match
ā€¢ Address disambiguation (based on population)
ā€¢ IPv4 support
53
Input: west loop chicago Input: 148.87.19.206
city Chicago
country US
county Cook County
geocode 41.85003 -87.65005
latitude 41.85003
longitude -87.65005
population 2695598
state Illinois
city Redwood City
country US
geocode 37.4852 -122.2364
latitude 37.4852
longitude -122.2364
state California
Transform ā€“ Function Families
Geo-Tagging Enrichments
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
One goal of all the ETL work: Linking Data Sets via Joins
Combining Data Sets
DOAG 2016 Konferenz, NĆ¼rnberg 54
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 55
ā€¢ Join and blend data for
deeper perspectives
ā€¢ Compose project pages
via drag and drop
ā€¢ Use powerful search
and guided navigation
to ask questions
ā€¢ See new patterns in
rich, interactive data
visualizations
Discover
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Discovery Dashboards
ā€¢ Control over layout, filtering,
exposed dimensions/metrics
ā€¢ Formatting controls
ā€“ Is Dimension, available aggregations, multi-
OR/And, Include in Available Refinements
ā€“ Component level: currency, # of decimals,
date format, etc.
ā€¢ Data set linking: 2 or more
data sets within a project,
visual attribute linking
ā€¢ Automatic view creation and
widening of data sets
Discover
Drag-and-drop dashboards for fast , easy analysis
Navigation
ā€¢ Intelligent refine using any
combination of data
elements
ā€¢ Color highlighting
ā€¢ Alternate counts for
metrics
ā€¢ Linked navigation across
data sets
Visualizations
ā€¢ New visualizations
ā€“ D3 Chart library
ā€“ More intuitive interface
ā€“ Seamless integration of new
visualization components
(Custom Visualization
Component extension)
56DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Rich and highly interactive Library of Visualization Portlets
Oracle Big Data Discovery
57DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Lab Scenario
Advanced Analytics
BDD Shell
Jupyter Notebook
DOAG 2016 Konferenz, NĆ¼rnberg 58
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Scientist continues with Machine Learning tasks
Oracle Big Data Discovery
59
DWH /
OLTP
Databases
Database
Administrator
(Enterprise IT)
Hadoop
Data
Integration
Specialist
(Enterprise IT)
Data
Engineer
Data
Science
Discovery
Output
Business
Analyst
New KPI, Report
Requirement
Data
Scientist
New Data Set
(cleaned / enriched)
DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Handling of sparse Data / NULL values
Shaping a Data Set for further processing
DOAG 2016 Konferenz, NĆ¼rnberg 60
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Aggregation
Shaping a Data Set for further processing
DOAG 2016 Konferenz, NĆ¼rnberg 61
ā€¢ Roll up low-level data to
higher grains
ā€“ Production Year
ā€“ Vehicle Model Year
ā€“ Vehicle Make
ā€¢ Intuitive UI helps analysts
find the right grains
ā€¢ Execute at full scale using
Spark
ā€¢ Results can be sampled or
indexed in full
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Combining multiple Data Sets
Shaping a Data Set for further processing
DOAG 2016 Konferenz, NĆ¼rnberg 62
ā€¢ Blend huge datasets in BDD
ā€“ UI to support experimentation,
preview
ā€“ Execute at scale with Spark
ā€¢ Results can be sampled or
indexed in full
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Export new Data Set Hive Table in Hadoop
Shaping a Data Set for further processing
DOAG 2016 Konferenz, NĆ¼rnberg 63
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Data Density, Validation of attribute correlations
Analytic features in Oracle Big Data Discovery
64DOAG 2016 Konferenz, NĆ¼rnberg
Scatter Plot Scatter Plot Matrix
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
ā€¢ Python-based shell
ā€¢ Exposes all BDD data
objects
ā€¢ Easy-to-use Python
Wrappers for BDD APIs
and Python Utilities
ā€¢ Use of Third-party
Libraries, e.g., Pandas and
NumPy
BDD-Shell interface
Point of Contact with Data Scientists
ā€¢ BDD Shell is an interactive
tool designed to work
with BDD without using
Studio's front-end
ā€¢ Provides a way to explore
and manipulate the
internals of BDD and
interact with Hadoop
66DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
(Re-)use data from Oracle Big Data Discovery while working with the BDD Shell
Data Analysis with Python
DOAG 2016 Konferenz, NĆ¼rnberg 67
List of Oracle
Big Data Discovery
Data Sets
Import Spark
Machine Learning
library MLlib
Converting a Oracle
Big Data Discovery
Data Set into an
Apache Spark
Dataframe
Import Package
NumPy (Numerical
Python)
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
ā€¢ Easiest way to use the BDD-Shell
ā€“ Visual appeal, ease of use, collaboration features of an integrated platform
ā€“ Power and flexibility of custom code
ā€“ Pick up BDDā€™s datasets and leverage Machine Learning algorithms to infer new insight
68
Leveraging Notebooks for a better user experience
Point of Contact with Data Scientists
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 69DOAG 2016 Konferenz, NĆ¼rnberg
www.jupyter.org
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
(Re-)use data from Oracle Big Data Discovery while working with Jupyter
Data Analysis with Python
70DOAG 2016 Konferenz, NĆ¼rnberg
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Explorative Datenanalyse im Data Lab...
...besser/nachhaltiger im interdisziplinƤren Team!
DOAG 2016 Konferenz, NĆ¼rnberg 71
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. |
Supporting agile data experiments and data projects
Oracleā€™s Unified Big Data Management & Analytics Strategy
72DOAG 2016 Konferenz, NĆ¼rnberg
Experiment
ā€¢ Big Data Discovery
ā€¢ R on Hadoop
ā€¢ Spatial and Graph
for Hadoop
In der Cloud und On-premises
Aggregate
ā€¢ Big Data Preparation
ā€¢ Data Integrator
ā€¢ GoldenGate
ā€¢ IoT
connect people to the
information they need
Manage
ā€¢ Hadoop Platform
ā€¢ Big Data SQL
ā€¢ NoSQL Database
ā€¢ Oracle Database
collect, secure and
make data available
innovation through
experimentation with data
Analyze & act
ā€¢ Data Visualization
ā€¢ Business Intelligence
ā€¢ Spatial and Graph
ā€¢ Advanced Analytics
transform the workplace
with actionable insights
Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 73DOAG 2016 Konferenz, NĆ¼rnberg
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?

More Related Content

What's hot

Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
Ā 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product ManagersPentaho
Ā 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
Ā 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
Ā 
Oracle Data Science Platform
Oracle Data Science PlatformOracle Data Science Platform
Oracle Data Science PlatformOracle Developers
Ā 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big dataDr. Wilfred Lin (Ph.D.)
Ā 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Ā 
Conociendo y entendiendo a tu cliente mediante monitoreo, analĆ­ticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analĆ­ticos y big dataConociendo y entendiendo a tu cliente mediante monitoreo, analĆ­ticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analĆ­ticos y big dataMundo Contact
Ā 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
Ā 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Seeling Cheung
Ā 
Embedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementEmbedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementPentaho
Ā 
Data science workshop
Data science workshopData science workshop
Data science workshopHortonworks
Ā 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
Ā 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
Ā 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in GovernmentDeepak Ramanathan
Ā 
The curious case of data lake redemption
The curious case of data lake redemptionThe curious case of data lake redemption
The curious case of data lake redemptionDataWorks Summit
Ā 
Splunk Business Analytics
Splunk Business AnalyticsSplunk Business Analytics
Splunk Business AnalyticsCleverDATA
Ā 
Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Bratamay Majumder
Ā 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
Ā 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
Ā 

What's hot (20)

Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
Ā 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product Managers
Ā 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
Ā 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Ā 
Oracle Data Science Platform
Oracle Data Science PlatformOracle Data Science Platform
Oracle Data Science Platform
Ā 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big data
Ā 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Ā 
Conociendo y entendiendo a tu cliente mediante monitoreo, analĆ­ticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analĆ­ticos y big dataConociendo y entendiendo a tu cliente mediante monitoreo, analĆ­ticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analĆ­ticos y big data
Ā 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Ā 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Ā 
Embedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementEmbedded Analytics in Human Capital Management
Embedded Analytics in Human Capital Management
Ā 
Data science workshop
Data science workshopData science workshop
Data science workshop
Ā 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
Ā 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
Ā 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
Ā 
The curious case of data lake redemption
The curious case of data lake redemptionThe curious case of data lake redemption
The curious case of data lake redemption
Ā 
Splunk Business Analytics
Splunk Business AnalyticsSplunk Business Analytics
Splunk Business Analytics
Ā 
Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0
Ā 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
Ā 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Ā 

Viewers also liked

Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gMark Rittman
Ā 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
Ā 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
Ā 
PeerĀ³ Junge Medienmacher in Sozialen Netzwerken
PeerĀ³ Junge Medienmacher in Sozialen NetzwerkenPeerĀ³ Junge Medienmacher in Sozialen Netzwerken
PeerĀ³ Junge Medienmacher in Sozialen NetzwerkenChristian KleinhanƟ
Ā 
2008_Cnvyr-SpecialED
2008_Cnvyr-SpecialED2008_Cnvyr-SpecialED
2008_Cnvyr-SpecialEDMegan Gallagher
Ā 
Goovis
GoovisGoovis
GoovisIntekio
Ā 
Presentacion octubre 21
Presentacion octubre 21Presentacion octubre 21
Presentacion octubre 21Lore Portero
Ā 
InterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewInterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewISCMarketing
Ā 
Las lalos.ppt
Las lalos.pptLas lalos.ppt
Las lalos.pptolafo360
Ā 
As294 297
As294 297As294 297
As294 297lortegap
Ā 
Mapa cultural de las provincias de Guayas, Manabƍ,Bolivar, Cotopaxi, Morona S...
Mapa cultural de las provincias de Guayas, Manabƍ,Bolivar, Cotopaxi, Morona S...Mapa cultural de las provincias de Guayas, Manabƍ,Bolivar, Cotopaxi, Morona S...
Mapa cultural de las provincias de Guayas, Manabƍ,Bolivar, Cotopaxi, Morona S...Paz Garcia
Ā 
Treball de "La llum" Andrea-Paula
Treball de "La llum" Andrea-PaulaTreball de "La llum" Andrea-Paula
Treball de "La llum" Andrea-PaulaMĆ³nica Salom Sastre
Ā 
Iniciacion al softbol
Iniciacion al softbolIniciacion al softbol
Iniciacion al softbolGabriela Garcia
Ā 
Creatividad y Estrategia Digital
Creatividad y Estrategia DigitalCreatividad y Estrategia Digital
Creatividad y Estrategia DigitalIBVillanueva
Ā 
EQUIPAMIENTO INTERIOR DE FURGONETAS TALLER - CATALOGO GENERAL PEUGEOT 2014
EQUIPAMIENTO INTERIOR DE FURGONETAS TALLER - CATALOGO GENERAL PEUGEOT 2014EQUIPAMIENTO INTERIOR DE FURGONETAS TALLER - CATALOGO GENERAL PEUGEOT 2014
EQUIPAMIENTO INTERIOR DE FURGONETAS TALLER - CATALOGO GENERAL PEUGEOT 2014Inansur Equipamiento de Furgonetas
Ā 

Viewers also liked (20)

Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Ā 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Ā 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
Ā 
Monthly Perspectives - Geopolitics - October 2016
Monthly Perspectives - Geopolitics - October 2016Monthly Perspectives - Geopolitics - October 2016
Monthly Perspectives - Geopolitics - October 2016
Ā 
Dona una biblia
Dona una bibliaDona una biblia
Dona una biblia
Ā 
PeerĀ³ Junge Medienmacher in Sozialen Netzwerken
PeerĀ³ Junge Medienmacher in Sozialen NetzwerkenPeerĀ³ Junge Medienmacher in Sozialen Netzwerken
PeerĀ³ Junge Medienmacher in Sozialen Netzwerken
Ā 
Historia del libro
Historia del libroHistoria del libro
Historia del libro
Ā 
Geteasy Es
Geteasy EsGeteasy Es
Geteasy Es
Ā 
2008_Cnvyr-SpecialED
2008_Cnvyr-SpecialED2008_Cnvyr-SpecialED
2008_Cnvyr-SpecialED
Ā 
Goovis
GoovisGoovis
Goovis
Ā 
Presentacion octubre 21
Presentacion octubre 21Presentacion octubre 21
Presentacion octubre 21
Ā 
InterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewInterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate Overview
Ā 
Las lalos.ppt
Las lalos.pptLas lalos.ppt
Las lalos.ppt
Ā 
Record linkage methods applied to population data deduplication
Record linkage methods applied to population data deduplicationRecord linkage methods applied to population data deduplication
Record linkage methods applied to population data deduplication
Ā 
As294 297
As294 297As294 297
As294 297
Ā 
Mapa cultural de las provincias de Guayas, Manabƍ,Bolivar, Cotopaxi, Morona S...
Mapa cultural de las provincias de Guayas, Manabƍ,Bolivar, Cotopaxi, Morona S...Mapa cultural de las provincias de Guayas, Manabƍ,Bolivar, Cotopaxi, Morona S...
Mapa cultural de las provincias de Guayas, Manabƍ,Bolivar, Cotopaxi, Morona S...
Ā 
Treball de "La llum" Andrea-Paula
Treball de "La llum" Andrea-PaulaTreball de "La llum" Andrea-Paula
Treball de "La llum" Andrea-Paula
Ā 
Iniciacion al softbol
Iniciacion al softbolIniciacion al softbol
Iniciacion al softbol
Ā 
Creatividad y Estrategia Digital
Creatividad y Estrategia DigitalCreatividad y Estrategia Digital
Creatividad y Estrategia Digital
Ā 
EQUIPAMIENTO INTERIOR DE FURGONETAS TALLER - CATALOGO GENERAL PEUGEOT 2014
EQUIPAMIENTO INTERIOR DE FURGONETAS TALLER - CATALOGO GENERAL PEUGEOT 2014EQUIPAMIENTO INTERIOR DE FURGONETAS TALLER - CATALOGO GENERAL PEUGEOT 2014
EQUIPAMIENTO INTERIOR DE FURGONETAS TALLER - CATALOGO GENERAL PEUGEOT 2014
Ā 

Similar to Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?

Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data DiscoveryHarald Erb
Ā 
Turn Data into Business Value ā€“ Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value ā€“ Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value ā€“ Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value ā€“ Starting with Data Analytics on Oracle Cloud ...Lucas Jellema
Ā 
oracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxoracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxAdityaDas899782
Ā 
Oracleā€™s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracleā€™s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracleā€™s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracleā€™s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Charlie Berger
Ā 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
Ā 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
Ā 
Hub16: Why Bespoke Supply Chain Analytics?
Hub16: Why Bespoke Supply Chain Analytics?Hub16: Why Bespoke Supply Chain Analytics?
Hub16: Why Bespoke Supply Chain Analytics?Anaplan
Ā 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleatSistemas
Ā 
Predictive Data Analytics and Artificial Intelligence by 40Ā°
Predictive Data Analytics and Artificial Intelligence by 40Ā°Predictive Data Analytics and Artificial Intelligence by 40Ā°
Predictive Data Analytics and Artificial Intelligence by 40Ā°40Ā° Labor fĆ¼r Innovation
Ā 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
Ā 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceDatabricks
Ā 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
Ā 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Chain Sys Corporation
Ā 
Overview of SAP HANA Cloud Platform
Overview of SAP HANA Cloud PlatformOverview of SAP HANA Cloud Platform
Overview of SAP HANA Cloud PlatformVitaliy Rudnytskiy
Ā 
How to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistHow to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistInside Analysis
Ā 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksMapR Technologies
Ā 
Oracle analytics cloud overview feb 2017
Oracle analytics cloud overview   feb 2017Oracle analytics cloud overview   feb 2017
Oracle analytics cloud overview feb 2017aioughydchapter
Ā 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
Ā 

Similar to Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only? (20)

Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
Ā 
Turn Data into Business Value ā€“ Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value ā€“ Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value ā€“ Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value ā€“ Starting with Data Analytics on Oracle Cloud ...
Ā 
oracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxoracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptx
Ā 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
Ā 
Oracleā€™s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracleā€™s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracleā€™s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracleā€™s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Ā 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old Constraints
Ā 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
Ā 
Hub16: Why Bespoke Supply Chain Analytics?
Hub16: Why Bespoke Supply Chain Analytics?Hub16: Why Bespoke Supply Chain Analytics?
Hub16: Why Bespoke Supply Chain Analytics?
Ā 
Oracle big data publix sector 1
Oracle big data publix sector 1Oracle big data publix sector 1
Oracle big data publix sector 1
Ā 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
Ā 
Predictive Data Analytics and Artificial Intelligence by 40Ā°
Predictive Data Analytics and Artificial Intelligence by 40Ā°Predictive Data Analytics and Artificial Intelligence by 40Ā°
Predictive Data Analytics and Artificial Intelligence by 40Ā°
Ā 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
Ā 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field Experience
Ā 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Ā 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Ā 
Overview of SAP HANA Cloud Platform
Overview of SAP HANA Cloud PlatformOverview of SAP HANA Cloud Platform
Overview of SAP HANA Cloud Platform
Ā 
How to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistHow to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data Scientist
Ā 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Ā 
Oracle analytics cloud overview feb 2017
Oracle analytics cloud overview   feb 2017Oracle analytics cloud overview   feb 2017
Oracle analytics cloud overview feb 2017
Ā 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
Ā 

More from Harald Erb

Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
Ā 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
Ā 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Harald Erb
Ā 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?Harald Erb
Ā 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Ā 
Machine Learning - Eine Challenge fĆ¼r Architekten
Machine Learning - Eine Challenge fĆ¼r ArchitektenMachine Learning - Eine Challenge fĆ¼r Architekten
Machine Learning - Eine Challenge fĆ¼r ArchitektenHarald Erb
Ā 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyHarald Erb
Ā 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Harald Erb
Ā 
Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!Harald Erb
Ā 
DOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big DataDOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big DataHarald Erb
Ā 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
Ā 
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Harald Erb
Ā 

More from Harald Erb (12)

Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Ā 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
Ā 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
Ā 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
Ā 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
Ā 
Machine Learning - Eine Challenge fĆ¼r Architekten
Machine Learning - Eine Challenge fĆ¼r ArchitektenMachine Learning - Eine Challenge fĆ¼r Architekten
Machine Learning - Eine Challenge fĆ¼r Architekten
Ā 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud Journey
Ā 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen
Ā 
Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!
Ā 
DOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big DataDOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big Data
Ā 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
Ā 
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Ā 

Recently uploaded

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
Ā 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
Ā 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
Ā 
Call Girls Indiranagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...Call Girls Indiranagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...amitlee9823
Ā 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
Ā 
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...amitlee9823
Ā 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
Ā 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...amitlee9823
Ā 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
Ā 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
Ā 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
Ā 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
Ā 
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceBDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceDelhi Call girls
Ā 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
Ā 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
Ā 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
Ā 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Ā 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
Ā 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
Ā 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
Ā 
Call Girls Indiranagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...Call Girls Indiranagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Ā 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Ā 
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Ā 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Ā 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
Ā 
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Ā 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Ā 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
Ā 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Ā 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
Ā 
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceBDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
Ā 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
Ā 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
Ā 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Ā 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
Ā 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
Ā 

Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?

  • 1. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Exploratory Analysis in the Data Lab Team-Sport or for Nerds only? Harald Erb Oracle Business Analytics & Big Data DOAG 2016 Konferenz, NĆ¼rnberg
  • 2. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | ā€¢ Harald Erb ā€¢ Principal Sales Consultant ā€¢ Information Architect ā€¢ Kontakt +49 (0)6103 397-403 ā€¢ harald.erb@oracle.com Kontakt DOAG 2016 Konferenz, NĆ¼rnberg 2
  • 3. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Characteristics of Digital Business Leaders DOAG 2016 Konferenz, NĆ¼rnberg 3 They ā€˜Reframeā€™ Challenges Looking at them from new perspectives and multiple angles They Sprint They work at pace - researching, testing and evaluating current ideas while generating new ones They Appreciate That Failure Can Be Good and are not afraid of new ideas They Convert Data Into Value They invest heavily in analyzing their own data and data from external sources to establish patterns and un-noticed opportunities
  • 4. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Synergizing Skills 4 Perf. Mgmt. Knowledge Discovery Dynamic Dashboards and Reports Volume and Fixed Reporting Knowledge Driven Business Process Executive: Decisions effecting strategy and direction Business Analysts: Day-to-Day performance of a business unit Information Consumer: Reporting on individual transactions Automated Process: Decisions effecting execution of an indiv. transactions Insight Data Scientists: Information analysis to meet strategic goals BICC Analytical Competence Center (ACC) Ā» Separate group reporting to CxO. not part of a Business Intelligence Competence Center (BICC) Ā» Mission: broadening the adoption of Analytics across the organization Ā» Skilled resource pool of Data Scientists, Statisticians and Business Experts Ā» Data-driven approach (not development-driven) with privileged access to enterprise data sources Ā» Group will be assigned to projects for a limited time ACC DOAG 2016 Konferenz, NĆ¼rnberg
  • 5. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Enabling Data-driven Decisions 5 Identify(business)question Become clear about all aspects of the decision to be taken or the problem to be solved. Try to identify alternatives to your percep- tion Verifyearlierfindings Find out who has investi- gated such or a similar problem in the past and the approach that has been taken Designofasolutionmodel Formulate a detailled hypothesis how specific variables might influence the result of the chosen model Gatherallnecessarydata Analysethedata Present&implementresults Gather all available information about the variables of your hypo- thesis. The relevance of a dataset might address your business question directly or needs to be derived Apply a statistical model and evaluate the correctness of the approach. Repeat this procedure until the right method has been identified. Frame the results obtained in a compre- hensible story. This kind of presentation intends to motivate decision makers and relevant stake-holders to take action ļ‚Œ ļ‚ ļ‚Ž ļ‚ ļ‚ ļ‚‘ Non-Analysts & Executives: should take a closer look on steps 1 and 6 of the analysis process if they plan to make use of statistical analysis. DOAG 2016 Konferenz, NĆ¼rnberg Knowledge Discovery Adopted from Thomas H. Davenport, Harvard Business Manager 2013
  • 6. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Projects: Process 6DOAG 2016 Konferenz, NĆ¼rnberg AdoptedfromHugenberg2011-S.168 Week Task Create Work Plan Hypothesis Business question Analysis Source Create Analysis Plan Structure Problem What? How? Hypothesis Yes No ? Why? Define Problem Fundamental business question to be solved: Problem area: Root of problem: Decision maker: Decision criteria: Boundaries of problem handling: Solution limitations: ā€¢Necessary information? ā€¢Available Information? Which quality? ā€¢Data owner? ā€¢Available data sets? ā€¢Business problem? ā€¢What is at issue? ā€¢What needs to be analyzed? ā€¢Precise goal definition ā€¢Deliminations ā€¢Useful data / structure? ā€¢Hypothesis definition ā€¢Verify correlations ā€¢Descriptive analysis ā€¢Data preparation ā€¢Select
  • 7. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Lab: Key Requirements Based on Raw Data Full Access to Data Sources (Select only) Complete Sandbox Environment Agile Experimentation, Fail Fast 7DOAG 2016 Konferenz, NĆ¼rnberg
  • 8. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Lab Scenario Sandbox Data Management DOAG 2016 Konferenz, NĆ¼rnberg 8
  • 9. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Stages of Data Transformation Refinement of Raw Data DOAG 2016 Konferenz, NĆ¼rnberg Signal Data Information Knowledge Wisdom L0 - Ingestion L1 - Cleansed L2 - Normalised Accounts Parties Account Parties Party Addresses Party Contacts Party IDs Party Events Party Ratings Account Limits Party History Collaterals Account Collaterals Party Collaterals Account Balances Account Relations L3 ā€“ Presented Customer Dimension Account Dimension Currency Dimension Product Dimension Organization Dimension Calendar Dimension Account Daily Facts Account Transactions Transaction Types Channel Dimension CoA Dimension Company Dimension ā€¢Format/Domain checks ā€¢Completeness checks ā€¢Duplicates detection ā€¢Not null validations ā€¢Enrichment ā€¢Record level cleansing and business rules ā€¢Referential integrity ā€¢Context based business rules and quality checks ā€¢Aggregate level checks ā€¢Derived and enriched data for Self-service Business Intelligence ā€¢File validation ā€¢Row completeness ā€¢Raw Data Stores for Data Science Know nothing Know what Know how Know why Data WarehouseData Lake Source Systems Addressing a key requirement for Data Labs 9
  • 10. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Management: Architecture (Logical View) DOAG 2016 Konferenz, NĆ¼rnberg 10 Line of Governance Data Lake Data Processing Data EnrichmentRaw Data Sets Curated & Transformed Data Sets Data Aggregation Data Lab Sandboxes Data Catalog Data Discovery Transformations Prototyping Analytic Tools Enterprise Information Store Operational Data Store Data Federation & Virtualization Layer CommonSQLAccessto ALLData Orchestration, Scheduling & Monitoring Metadata Management Data Ingestion Batch Integration Real-Time Integration Data Streaming Data Wrangling Reporting / Business Intelligence Data Driven Applications Advanced Analytics Non-structured Sources Logs Social Media External Data Interactions Structured Data Master Data Applications Channels Data Stores Adhoc Files or Relational Data Sets
  • 11. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Management: Oracle Platform DOAG 2016 Konferenz, NĆ¼rnberg 11 Non-structured Sources Logs Social Media External Data Interactions Structured Data Master Data Applications Channels Data Stores Oracle Software Cloudera CDH 5.7+ / Apache Software Oracle Platform Oracle ExadataOracle Big Data Appliance Oracle Exalytics Oracle x86 Servers Orchestration, Scheduling & Monitoring Metadata Management Reporting / Business Intelligence Data Driven Applications Advanced Analytics
  • 12. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Management: Functional Areas DOAG 2016 Konferenz, NĆ¼rnberg 12 Non-structured Sources Logs Social Media External Data Interactions Structured Data Master Data Applications Channels Data Stores Oracle Software Cloudera CDH 5.7+ / Apache Software Oracle Platform Orchestration, Scheduling & Monitoring Metadata Management Oracle ExadataOracle Big Data Appliance Data Ingestion Data Store Data Discovery & Analyze Unified Data Services Process Online Lifecycle/Governance Data Warehouse Reporting / Business Intelligence Data Driven Applications Advanced Analytics
  • 13. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Management: Data Discovery & Analytics DOAG 2016 Konferenz, NĆ¼rnberg 13 Reporting / Business Intelligence Data Driven Applications Advanced Analytics Non-structured Sources Logs Social Media External Data Interactions Structured Data Master Data Applications Channels Data Stores Oracle Software Cloudera CDH 5.7+ / Apache Software Oracle Platform Oracle ExadataOracle Big Data Appliance Oracle GoldenGate for Big Data Flume & Kafka Oracle Data Integrator Oracle Stream Analytics Scoop Oracle NoSQL DB Kudu (Relational) Filesystem (HDFS) HBase (NoSQL) Oracle Data Integrator Batch (Map Reduce, Hive, Pig, Spark) Stream (Spark) HBase (NoSQL) WebHDFS, Fluentd, Storm, Tika .... Oracle Database Oracle SQL Database Security (Roles, View, VPD, ā€¦) Oracle Advanced Analytics Oracle Advanced Security In-Memory Data Ingestion Data Store Process Online Data WarehouseData Discovery & Analyze Security (Sentry+RecordService) ResourceManagement (Yarn) Unified Data Services Search (Solr) SQL (Impala) Model (Spark ML) Big Data Spatial & Graph Adv. Analytics for Hadoop Big Data Discovery Oracle Big Data SQL OracleBigData Connectors Cloudera Navigator Lifecycle/Governance Oracle Enterprise Metadata Management (OEMM) Oracle Data Factory Engine | Oracle Data Integrator | Oracle Enterprise Data Quality
  • 14. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Lab Scenario Exploratory Analysis Oracle Big Data Discovery DOAG 2016 Konferenz, NĆ¼rnberg
  • 15. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Activities Data Lab 15DOAG 2016 Konferenz, NĆ¼rnberg Download from: http://www.the-modeling-agency.com/crisp-dm.pdf Generic tasks (bold) and outputs (italic) CRISP-DM reference model
  • 16. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Oracle Big Data Discovery 16 Know nothing Know what Know how Know why Signal Data Information Knowledge Wisdom (Operational) Business Intelligence (100ā€¦1000+ User) Data WarehouseData Lake L0 - Ingestion L1 - Cleansed L2 - Normalised Accounts Parties Account Parties Party Addresses Party Contacts Party IDs Party Events Party Ratings Account Limits Party History Collaterals Account Collaterals Party Collaterals Account Balances Account Relations L3 ā€“ Presented Customer Dimension Account Dimension Currency Dimension Product Dimension Organization Dimension Calendar Dimension Account Daily Facts Account Transactions Transaction Types Channel Dimension CoA Dimension Company Dimension Source Systems Oracle BI Plattform Common Enterprise Information Model Oracle BI Data Visualization Oracle BI Dashboards, (Ad-hoc) Reports, ā€¦ Data Projects (1ā€¦20+ User) Oracle Data Visualization Desktop DOAG 2016 Konferenz, NĆ¼rnberg Oracle Big Data Discovery
  • 17. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Team Sport: One tool for Business Analysts and Data Scientists Oracle Big Data Discovery 17 DWH / OLTP Databases Database Administrator (Enterprise IT) Hadoop Data Integration Specialist (Enterprise IT) Data Engineer Data Science Discovery Output Business Analyst New KPI, Report Requirement Data Scientist New Data Set (cleaned / enriched) Members of the same Data Project DOAG 2016 Konferenz, NĆ¼rnberg
  • 18. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Analysis Scenario 1: Prototype Testing
  • 19. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Analysis Scenario 1: Prototype Testing 1. A flexible environment to exploit all available data for prototype testing discovery 2. Can driver comments really add value to our prototype testing discovery? 3. What is the relationship between errors? Oracle Confidential ā€“ Internal/Restricted/Highly Restricted Telemetry 3 1 2 Errors Driver Comments Analysis & Dashboarding Discovery Lab 1.2 Billion rows at 100Hz Data Platform FactoryStorage
  • 20. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | DOAG 2016 Konferenz, NĆ¼rnberg 20 Analysis Scenario 2: Investigate Car Complaints
  • 21. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | DOAG 2016 Konferenz, NĆ¼rnberg 21 M I S S I O N Analysis Scenario 2: Investigate Car Complaints Help the Quality Team to trace back warranty claims and support issues to reduce warranty cost and minimize supplier risk, in order to improve product quality and customer satisfaction.
  • 22. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Available Data DOAG 2016 Konferenz, NĆ¼rnberg 22 hadoop fs -cat /user/oracle/warranty/claims_full.txt | less Internal Data (Warranty Claims) Additional Data (i.e. demographics)
  • 23. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Processing Workflow Oracle Big Data Discovery mit Daten versorgen 23 File Upload BDD Studio Big Data Discovery Data Proc. Client New Data Set in BDD Data Catalog
  • 24. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 24 1 2 5 43 Data Loading ā€“ Data Ingest Overview 1. Data ingest is triggered by data upload, the command line interface, or the Hive Table detector 2. Records are read and sampled into Spark 3. Data profiling occurs, to determine schema, search configuration and which enrichment apply 4. Auto enrichments are performed 5. Data is ingested into Big Data Discovery
  • 25. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | About the Sampling technique 25DOAG 2016 Konferenz, NĆ¼rnberg ā€¢ BDD leverages a Simple Random Sampling algorithm ā€“ Each individual is chosen randomly and entirely by chance with the same probability of being chosen at any stage during the sampling process. Each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals ā€¢ Sampling is dependable ā€“ Accuracy is about the size of the sample, not the size of the source ā€¢ A 1M random sample provides more than 99% confidence that the answer is within 0.2 % of the value shown, no matter how big the source dataset is (1B/1T/1Q+). ā€¢ Sampling makes interactivity cheap ā€“ Will you pay 10, 100, 1000x the cost to get the last <<1% of the confidence? ā€¢ Maybe sometimes, but not in discovery and not for every dataset
  • 26. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Profiling & Enrichments 26DOAG 2016 Konferenz, NĆ¼rnberg ā€¢ Profiling is a process that determines the characteristics (columns) in the Hive tables, for each source Hive table discovered by Big Data Discovery during data processing. ā€“ Attribute type determination (discovery) ā€¢ Includes strings to dates, geocodes, long or boolean ā€“ Attribute value distributions ā€“ Determines attribute searchability ā€“ Provides ā€œhintsā€ to Studio as to what content (components) should be on the default Project page ā€¢ Enrichments are derived from a data set's additional information such as terms, locations, the language used, sentiment, and views. Big Data Discovery determines which enrichments are useful for each discovered data set, and automatically runs them on samples of the data. As a result of automatically applied enrichments, additional derived metadata (columns) are added to the data set, such as geographic data, a suggestion of the detected language, or positive or negative sentiment.
  • 27. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Find 27 ā€¢ A rich, interactive catalog of all data in Hadoop ā€¢ Familiar search and guided navigation for ease of use ā€¢ Data set summaries, user annotation and recommendations ā€¢ Personal and enterprise data upload to Hadoop via self-service DOAG 2016 Konferenz, NĆ¼rnberg
  • 28. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Find 28 ā€¢ A rich, interactive catalog of all data in Hadoop ā€¢ Familiar search and guided navigation for ease of use ā€¢ Data set summaries, user annotation and recommendations ā€¢ Personal and enterprise data upload to Hadoop via self-service DOAG 2016 Konferenz, NĆ¼rnberg
  • 29. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Like shopping online for your data Find ā€“ Data Sets Tab 29 Navigate ā€¢ Project or Dataset ā€“ by Author and Tags ā€¢ Contains ā€“ datetime or Geo ā€¢ Number of records or attributes ā€¢ Recently Viewed, Most Popular, Newly Added Data Quick Look ā€¢ Data Set Info ā€“ Tags, Views, Last Updated ā€“ Project, used, created by ā€¢ Actions ā€“ Explore, Add to project, Edit Tags, Delete ā€¢ Related Data Sets by data source ā€“ ā€œOften used with these data setsā€ ā€¢ Preview ā€“ First 15 rows, all columns Search ā€¢ Keyword ā€¢ Data Sets ā€¢ Projects ā€¢ Data Set Metadata ā€¢ Project Metadata ā€¢ Recently Viewed, Most Popular, Newly Added DOAG 2016 Konferenz, NĆ¼rnberg
  • 30. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Find (Catalog) ā€“ Data Set Quick Look 30DOAG 2016 Konferenz, NĆ¼rnberg ā€¢ Data Set Info ā€“ Tag ā€¢ Actions ā€“ Explore ā€“ Add to project ā€“ Edit Tags ā€“ Delete ā€¢ Summary ā€“ Views ā€“ Last Updated Data Set Info Quick Look
  • 31. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Find (Catalog) ā€“ Data Set Quick Look 31DOAG 2016 Konferenz, NĆ¼rnberg ā€¢ Used in Projects ā€“ Project name ā€“ Data Sets used ā€“ Created by Data Set Used in Project Quick Look
  • 32. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Find (Catalog) ā€“ Navigation & Search 32DOAG 2016 Konferenz, NĆ¼rnberg ā€¢ Searches ā€“ Keyword ā€“ Data Sets ā€“ Projects ā€“ Data Set Metadata ā€“ Project Metadata ā€“ Attribute Metadata Search Everything
  • 33. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Like shopping online for your data Find ā€“ Projects Tab 33 Projects Tab ā€¢ Search and navigate ā€¢ Project Categories ā€“Recently Viewed ā€“Most Popular ā€“View all Projects Quick Look ā€¢ Add Tags, attributes, data sets, pages ā€¢ Open Project, Edit Tags, Delete ā€¢ Summary DOAG 2016 Konferenz, NĆ¼rnberg
  • 34. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Find (Catalog) - Projects Tab 34DOAG 2016 Konferenz, NĆ¼rnberg ā€¢ Searchable ā€¢ Navigable ā€¢ Project Categories ā€“ Recently Viewed ā€“ Most Popular ā€“ View all Search Everything Discover Analytical Projects Navigate by Metadata
  • 35. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Find (Catalog) - Project Quick Look ā€¢ Add Tags ā€¢ Attributes ā€¢ Data Sets ā€¢ Pages ā€¢ Actions ā€“ Open Project ā€“ Edit Tags ā€“ Delete ā€¢ Summary 35 Project Quick Look DOAG 2016 Konferenz, NĆ¼rnberg
  • 36. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Explore 36 ā€¢ Visualize all attributes by type ā€¢ Sort attributes by information potential ā€¢ Assess attribute statistics, data quality and outliers ā€¢ Use scratch pad to uncover correlations between attributes DOAG 2016 Konferenz, NĆ¼rnberg
  • 37. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Quicklook by Geocode ā€¢ Overview ā€¢ Details ā€¢ Summary stats ā€¢ Refineable Explore Scratchpad ā€¢ Graphic type changes as additional attributes are added ā€¢ Autoselects best visualization ā€¢ Offers next best graphics option(s) Intuitive, machine-guided data exploration Search, Navigate, Sort ā€¢ Attributes, refinements, keyword ā€¢ Sort by: ā€“ Name (alpha) ā€“ Information Potential ā€“ Relationship to an attribute 37DOAG 2016 Konferenz, NĆ¼rnberg
  • 38. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Explore ā€¢ Search: ā€“ Attributes ā€“ Refinements ā€“ Keyword ā€¢ Sort order ā€“ Name ā€“ Information Potential ā€“ Relationship to an attribute ā€¢ Navigable 38 Tiles vs. Tabular view Navigation menu Search menu Sort Order DOAG 2016 Konferenz, NĆ¼rnberg
  • 39. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Explore ā€“ Sort Attributes ā€¢ Sort order ā€“ Name (alpha) ā€“ Information Potential ā€¢ Based on Entropy ā€“ Relationship to an attribute ā€¢ Based on Information Gain 39 Sort by DOAG 2016 Konferenz, NĆ¼rnberg
  • 40. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Core Capabilities: Explore ā€“ Quick Look ā€“Geocode ā€¢ Overview ā€¢ Details ā€¢ Summary stats ā€“ Refineable 40DOAG 2016 Konferenz, NĆ¼rnberg
  • 41. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Explore ā€“ Scratchpad ā€¢ Graphic type changes as additional attributes are added ā€¢ Autoselects best visualization ā€¢ Offers next best graphics option(s) 41DOAG 2016 Konferenz, NĆ¼rnberg
  • 42. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Document all relevant information and insights about the original data sources Working with Metadata DOAG 2016 Konferenz, NĆ¼rnberg 42 Available Code Books, Data Set specifications Document all relevant information, insights directly in BDD Studio... ... and make immediately use of it, i.e. via text search in BDD
  • 43. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 43 ā€¢ Intuitive, user driven data wrangling ā€¢ Data Shaping ā€¢ Extensive library of powerful data transformations and enrichments ā€¢ Preview results, undo, commit and replay transforms ā€¢ Test on sample data then apply to full data set in Hadoop Transform - Overview DOAG 2016 Konferenz, NĆ¼rnberg
  • 44. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 44 Transform ā€“ Function Families Available via a User Guided Interface DOAG 2016 Konferenz, NĆ¼rnberg
  • 45. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Full Guided Navigation Attribute Transformation Smart Attribute Filtering Interactive Transform History Visual Data Quality Summaries 45 Data Shaping TAB Aggregation, Join, etc. Transform - Overview DOAG 2016 Konferenz, NĆ¼rnberg
  • 46. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | ā€¢ Uses Groovy - an object-oriented programming language for the Java platform. ā€“ Code written in the Java language is valid in Groovy ā€“ It is flexible and easy to use. ā€¢ Features of the Editor include: 46 ā€“ Syntax highlighting enables color-coding of different elements in your transformation to indicate their type. ā€“ Auto complete lets you view a list of autocomplete suggestions for the word you're typing, by pressing Ctrl+space. ā€“ Error checking includes a built-in static parser that performs error checking when you preview or save your transformation. Transform Editor Available via a User Guided Interface DOAG 2016 Konferenz, NĆ¼rnberg
  • 47. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Enrichments ā€¢ Infer language ā€¢ Detect sentiment ā€¢ Identify key phrases, entities, noun groups ā€¢ Whitelist tagger ā€¢ Address and IP geotagger Function Families ā€¢ String ā€¢ Mathematical ā€¢ Conversion ā€¢ Conditional ā€¢ Datetime ā€¢ Data cleansing ā€¢ Geotagging User-driven data wrangling Commit, Create Data Set ā€¢ Commit ā€“ Execute transformations on sample data available in the app ā€“ this updates the original collection in the Dgraph. ā€¢ Create a Data Set ā€“ Executes transformation on the entire data set in HDFS, creating a new entry in the BDD Catalog & Hive. ā€¢ Rollback ā€“ Attributes created by a transformation, then deleted, will be removed on Commit. 47 Data Shaping ā€¢ Aggregate ā€¢ Persistent Join ā€¢ Row Filter Transform - Overview DOAG 2016 Konferenz, NĆ¼rnberg
  • 48. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | ā€¢ Geo Tagging ā€“ Partial Address match ā€“ Address disambiguation (based on population) ā€“ IPv4 support ā€¢ Numeric Transformation ā€“ Round, Ceiling, Floor, Absolute Value ā€“ Operators (+ - / *), sqrt, mod, log, ln, min/max, sin/asin/sinh, cos/acos/cosh, tan/atan/tanh ā€¢ String ā€“ Split, Trim, Uppercase, Lowercase, Titlecase, Concatenate ā€¢ Type Conversion ā€“ Boolean, Datetime, Double, Integer, String, Long, Time, Geocode ā€¢ Datetime (accessed via Transformation Editor) ā€“ Date Diff, Date Add, getYear, Month, Day, Hour, Minute ā€“ Truncate Date ā€¢ Data Cleansing ā€“ Binning numerics ā€¢ Conditional ā€“ Accessed via Transformation Editor ā€¢ If /else if statement ā€“ Optionally based on refinement state ā€“ Can be nested ā€¢ Drag/drop or double-click functions and attributes onto editor 48 Transform ā€“ Function Families Transformation Functions DOAG 2016 Konferenz, NĆ¼rnberg
  • 49. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | ā€¢ In addition to transforming data, Big Data Discovery allows for users to enrich data sets in numerous ways: ā€¢ Term Extraction ā€“ Extract relevant phrases and noun groups from unstructured text ā€¢ Entity Recognition ā€“ Find people, places and organizations mentioned in unstructured text ā€¢ Sentiment Analysis ā€“ Determine the document and sub-document level sentiment of unstructured text ā€¢ Geo Tagging ā€“ Generate standardized geographic information based on an unstructured address or IP address ā€¢ Language Detection ā€¢ Applied via User Guided Interface 49 Transform ā€“ Function Families Enrichment Functions DOAG 2016 Konferenz, NĆ¼rnberg
  • 50. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Windows Metro Driver Issues Usb Port Light Gaming Gaming Laptop Key Phrases Document Sentiment Detected Language ā€¢ I was looking for a laptop mainly to browse and for mult imedia purpose and found this inexpensive built-like-a-t ank Dell XPS 14. I am very impressed with the build qua lity, stunning looks and there is absolutely a very nice fe el to it. I have many MBPs at home and this one is bette r in appearance and worked out of the box with minima l set up. I was up and running in 10 minutes or so. ā€¢ I found the touch pad not up to par with Apple track pa d but with the latest driver update it came pretty close. I uninstalled the McAfee software and went with Norto n security as it comes free with Comcast. There is a new BIOS update from February 2014 and also audio driver update that you may want to apply immediately. Do the Wifi-BT update as well. en POSITIVE 50 Transform ā€“ Function Families Text Enrichments DOAG 2016 Konferenz, NĆ¼rnberg
  • 51. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Extract Key Phrases from Freetext Text Enrichment DOAG 2016 Konferenz, NĆ¼rnberg 51 ļ‚Œ ļ‚
  • 52. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Determine sentiment and extract related phrases Text Enrichment DOAG 2016 Konferenz, NĆ¼rnberg 52
  • 53. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | ā€¢ Partial Address match ā€¢ Address disambiguation (based on population) ā€¢ IPv4 support 53 Input: west loop chicago Input: 148.87.19.206 city Chicago country US county Cook County geocode 41.85003 -87.65005 latitude 41.85003 longitude -87.65005 population 2695598 state Illinois city Redwood City country US geocode 37.4852 -122.2364 latitude 37.4852 longitude -122.2364 state California Transform ā€“ Function Families Geo-Tagging Enrichments DOAG 2016 Konferenz, NĆ¼rnberg
  • 54. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | One goal of all the ETL work: Linking Data Sets via Joins Combining Data Sets DOAG 2016 Konferenz, NĆ¼rnberg 54
  • 55. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 55 ā€¢ Join and blend data for deeper perspectives ā€¢ Compose project pages via drag and drop ā€¢ Use powerful search and guided navigation to ask questions ā€¢ See new patterns in rich, interactive data visualizations Discover DOAG 2016 Konferenz, NĆ¼rnberg
  • 56. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Discovery Dashboards ā€¢ Control over layout, filtering, exposed dimensions/metrics ā€¢ Formatting controls ā€“ Is Dimension, available aggregations, multi- OR/And, Include in Available Refinements ā€“ Component level: currency, # of decimals, date format, etc. ā€¢ Data set linking: 2 or more data sets within a project, visual attribute linking ā€¢ Automatic view creation and widening of data sets Discover Drag-and-drop dashboards for fast , easy analysis Navigation ā€¢ Intelligent refine using any combination of data elements ā€¢ Color highlighting ā€¢ Alternate counts for metrics ā€¢ Linked navigation across data sets Visualizations ā€¢ New visualizations ā€“ D3 Chart library ā€“ More intuitive interface ā€“ Seamless integration of new visualization components (Custom Visualization Component extension) 56DOAG 2016 Konferenz, NĆ¼rnberg
  • 57. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Rich and highly interactive Library of Visualization Portlets Oracle Big Data Discovery 57DOAG 2016 Konferenz, NĆ¼rnberg
  • 58. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Lab Scenario Advanced Analytics BDD Shell Jupyter Notebook DOAG 2016 Konferenz, NĆ¼rnberg 58
  • 59. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Scientist continues with Machine Learning tasks Oracle Big Data Discovery 59 DWH / OLTP Databases Database Administrator (Enterprise IT) Hadoop Data Integration Specialist (Enterprise IT) Data Engineer Data Science Discovery Output Business Analyst New KPI, Report Requirement Data Scientist New Data Set (cleaned / enriched) DOAG 2016 Konferenz, NĆ¼rnberg
  • 60. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Handling of sparse Data / NULL values Shaping a Data Set for further processing DOAG 2016 Konferenz, NĆ¼rnberg 60
  • 61. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Aggregation Shaping a Data Set for further processing DOAG 2016 Konferenz, NĆ¼rnberg 61 ā€¢ Roll up low-level data to higher grains ā€“ Production Year ā€“ Vehicle Model Year ā€“ Vehicle Make ā€¢ Intuitive UI helps analysts find the right grains ā€¢ Execute at full scale using Spark ā€¢ Results can be sampled or indexed in full
  • 62. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Combining multiple Data Sets Shaping a Data Set for further processing DOAG 2016 Konferenz, NĆ¼rnberg 62 ā€¢ Blend huge datasets in BDD ā€“ UI to support experimentation, preview ā€“ Execute at scale with Spark ā€¢ Results can be sampled or indexed in full
  • 63. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Export new Data Set Hive Table in Hadoop Shaping a Data Set for further processing DOAG 2016 Konferenz, NĆ¼rnberg 63
  • 64. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Data Density, Validation of attribute correlations Analytic features in Oracle Big Data Discovery 64DOAG 2016 Konferenz, NĆ¼rnberg Scatter Plot Scatter Plot Matrix
  • 65. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | ā€¢ Python-based shell ā€¢ Exposes all BDD data objects ā€¢ Easy-to-use Python Wrappers for BDD APIs and Python Utilities ā€¢ Use of Third-party Libraries, e.g., Pandas and NumPy BDD-Shell interface Point of Contact with Data Scientists ā€¢ BDD Shell is an interactive tool designed to work with BDD without using Studio's front-end ā€¢ Provides a way to explore and manipulate the internals of BDD and interact with Hadoop 66DOAG 2016 Konferenz, NĆ¼rnberg
  • 66. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | (Re-)use data from Oracle Big Data Discovery while working with the BDD Shell Data Analysis with Python DOAG 2016 Konferenz, NĆ¼rnberg 67 List of Oracle Big Data Discovery Data Sets Import Spark Machine Learning library MLlib Converting a Oracle Big Data Discovery Data Set into an Apache Spark Dataframe Import Package NumPy (Numerical Python)
  • 67. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | ā€¢ Easiest way to use the BDD-Shell ā€“ Visual appeal, ease of use, collaboration features of an integrated platform ā€“ Power and flexibility of custom code ā€“ Pick up BDDā€™s datasets and leverage Machine Learning algorithms to infer new insight 68 Leveraging Notebooks for a better user experience Point of Contact with Data Scientists
  • 68. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 69DOAG 2016 Konferenz, NĆ¼rnberg www.jupyter.org
  • 69. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | (Re-)use data from Oracle Big Data Discovery while working with Jupyter Data Analysis with Python 70DOAG 2016 Konferenz, NĆ¼rnberg
  • 70. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Explorative Datenanalyse im Data Lab... ...besser/nachhaltiger im interdisziplinƤren Team! DOAG 2016 Konferenz, NĆ¼rnberg 71
  • 71. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | Supporting agile data experiments and data projects Oracleā€™s Unified Big Data Management & Analytics Strategy 72DOAG 2016 Konferenz, NĆ¼rnberg Experiment ā€¢ Big Data Discovery ā€¢ R on Hadoop ā€¢ Spatial and Graph for Hadoop In der Cloud und On-premises Aggregate ā€¢ Big Data Preparation ā€¢ Data Integrator ā€¢ GoldenGate ā€¢ IoT connect people to the information they need Manage ā€¢ Hadoop Platform ā€¢ Big Data SQL ā€¢ NoSQL Database ā€¢ Oracle Database collect, secure and make data available innovation through experimentation with data Analyze & act ā€¢ Data Visualization ā€¢ Business Intelligence ā€¢ Spatial and Graph ā€¢ Advanced Analytics transform the workplace with actionable insights
  • 72. Copyright Ā© 2016, Oracle and/or its affiliates. All rights reserved. | 73DOAG 2016 Konferenz, NĆ¼rnberg