SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
H Y P E R G I A N T 2 0 1 9
HYPERGIANT2019|CONFIDENTIAL
In this workshop we will
discuss:
•Pilosa, what it does,
•How this might impact
recommendation engines
•How this might impact
association schemes
•The Geometry of data in
a Pilosa Index
•An experimental variant
of the winnow algorithms
run on a Pilosa Index.
G O A L :
2
AN EXPERIMENT
ON DATA
SCIENCE
ALGORITHMS
ENABLED BY A
PILOSA INDEX—
W H O W E A R E
—
EXPERIENCE
+
INTELLIGENCE
A bleeding-edge dream team with
a deep understanding of the rich
panoply of advancements
available through MI.
PEOPLE
We blend strategy, design, and
development capabilities to
create experiences and define
new capabilities leveraging
Machine Intelligence.
PROCESS
Our signature, tech-agnostic
approach balances the
utilitarian and the
evolutionary.
METHOD
HYPERGIANT2019|CONFIDENTIAL
01: Strategy
02: Design
03: Applied Sciences
04: Development
We are comprised of digital product
strategists, data scientists, machine
learning focused engineers, creative
technologists, user experience designers
& developers for all endpoints.
5
OUR SERVICES
DIV. 0001:
SPACE AGE SOLUTIONS
/ HYPERGIANT - 2019
Strategy + Design+ Applied Science + Delivery > Technology-Agnostic Artificial Intelligence
Design
Applied Sciences
DevelopmentStrategy
M e t h o d o l o g y
H Y P E R G I A N T
HYPERGIANT2018|CONFIDENTIAL
OUTPUT
UserExperience
MachineIntelligence
R E A L I Z E T H A T
E V E R Y T H I N G
C O N N E C T S T O
E V E R Y T H I N G E L S E
USERS
BUSINESS
DATA
The traditional design model wherein one weighs
the user value and the business value of a given
feature is outdated. It has been replaced with a
framework in which one weighs user value,
business value, and data value. If choices are
not made that respect the value of each, the
result will be unsatisfactory to one group.
W H A T W E B E L I E V E
H Y P E R G I A N T + P I L O S A
—
/ HYPERGIANT 2019
DIV. 0001:
SPACE AGE SOLUTIONS
WHAT WE BELIEVE
9
E N T E R A D I S T R I B U T E D B I N A R Y
I N D E X : P I L O S A .
•We see Pilosa as an important technology to
a more extensible future.
•We see it as a potential solution to
connecting the quagmire of enterprise
dataset into meaningful data puddles that
are required to drive more fluid data science
mechanics.
•We see it as a competitive advantage to deal
with the cost of realtime data access.
PILOSA
CHANGES THE
DIALOG AROUND
LARGE DATA
SETS, BOTH
STATIC AND IN
MOTION.
P I L O S A : C O N C E P T
—
1 1
HYPERGIANT2019|CONFIDENTIAL
W H A T I S P I L O S A ?
•“an open source, distributed bitmap index that
dramatically accelerates continuous analysis
across multiple, massive data sets.”
W H A T D O E S T H I S M E A N ?
•Data Lakes are a problem, especially when we are
trying to do the initial exploratory statistical
analysis of a dataset, even finding values can be
slow and tedious.
•Pilosa allows for queries to be run over the
entire dataset quickly:
•Example: 1.2B data points, 8 features, .07seconds
P I L O S A : C O N C E P T
Hypergiant
1 2
Hypergiant
HYPERGIANT2019|CONFIDENTIAL
P I L O S A : C O N C E P T
H O W D O E S I T D O T H I S ?
•Pilosa focuses on “relationships between objects
and storing those relationships in bitmaps.”
•That is: It is data feature focused.
W H A T D O E S T H I S M E A N F O R U S ?
•This allows a data scientist to search for a
combination of features over the entire data set,
quickly, finding data points with those features,
count the number of them, etc.
1 3
Hypergiant
HYPERGIANT2019|CONFIDENTIAL
P I L O S A : C O N C E P T
H O W D O W E T H I N K O F I T ?
•Pilosa is a bitmap index. At the heart it is a
boolean vector for the features of a data point.
W H A T C A N T H E I N D E X D O F O R U S ?
•This allows initial multiway explorations to be
done quickly (if the data is already in an index)
•This allows for combination features to be built
and tested quickly
•Balanced data sets can be built quickly
1 4
HYPERGIANT2019|CONFIDENTIAL
Hypergiant
P I L O S A : C O N C E P T
W H A T E L S E C O U L D W E D O W I T H I T ?
•The index can be treated as a dataset in itself
•It is a data set built over binary features.
W H A T A L G O R I T H M S R U N O N T H I S ?
•Recommendation Engines
•Association Rule Learning
•Winnow Algorithms
•(Others)
R E C O M M E N D A T I O N E N G I N E S
& A S S O C I A T I O N R U L E
L E A R N I N G
—
HYPERGIANT2019|CONFIDENTIAL
1 6
R E C O M M E N D A T I O N E N G I N E S
Hypergiant
A Deep Belief Network (DBN) is made of
layers of Restricted Boltzmann Machines
(RBMs). RBMs are made of two parts, a
hidden part, and a visible part, data
bounces back and forth from the visible
to the hidden probabilistically
approximated and then used to update the
probability distributions.
D E E P B E L I E F N E T W O R K S
HYPERGIANT2019|CONFIDENTIAL
1 7
R E C O M M E N D A T I O N E N G I N E S
Hypergiant
“A [Recommendation Engine] is a subclass
of information filtering system[s] that
seeks to predict the ‘rating’ or
‘preference’ a user would give to an
item.” — Wikipedia



The key idea is that they do not need to
be trained on complete data.
R E C O M M E N D A T I O N E N G I N E S
1 8
Hypergiant
HYPERGIANT2019|CONFIDENTIAL
•Two features can work together:
•Did the user watch the film? (Yes/No)
•Did the user give a positive review? (Yes/No)
•In this setting (No,__) represents an incomplete
data point, no known value. [0, *]
•Similarly a richer ranking can be used:
•Did the user watch the film? (Yes/No)
•Did the user give a n-star review? (Yes/No)
•In this setting (Yes, No, No, Yes, No, No) is a 3
star review for a movie watched. [1,0,0,1,0,0]
R E C O M M E N D A T I O N E N G I N E S
R E P R E S E N T A T I O N S !
1 9
Quality Assurance (QA) for Recommendation Engines and
Machine Learning in general, is hard. There is a general
lack of QA tools for ML, and a lack of knowledge around
what types of errors occur and what they look like.
Using a DBN Recommendation Engine we can build out a
probability distribution for the population, based upon a
set of features, and then query, quickly, across the
population to see what which predictions differ from the
population proportion, over the remaining features.
R E C O M M E N D A T I O N M E E T R E A L I T Y :
HYPERGIANT2019|CONFIDENTIAL
Hypergiant
R E C O M M E N D A T I O N E N G I N E S
HYPERGIANT2019|CONFIDENTIAL
2 0
A S S O C I A T I O N R U L E L E A R N I N G
Hypergiant
“Association rule learning is a rule-based
machine learning method for discovering
interesting relations between variables in
large databases. It is intended to
identify strong rules discovered in
databases using some measures of
interestingness.” — Wikipedia
W H A T I S A S S O C I A T I O N R U L E
L E A R N I N G ?
2 1
Following the original definition the problem of
association rule mining can be defined as:
•I = {i1, i2, … , in} a set of n binary attributes,
called items in the literature, for us: features.
•D = {t1, t2, … , tm} the database or set of data points.
•A rule is given as “X implies Y” where X and Y are sets
of features.
H U H ?
HYPERGIANT2019|CONFIDENTIAL
Hypergiant
A S S O C I A T I O N R U L E L E A R N I N G
HYPERGIANT2019|CONFIDENTIAL
2 2
A S S O C I A T I O N R U L E L E A R N I N G
Hypergiant
As always there are useful metric:
•Supp(X): Proportion of the data which
contains all of X.
•Conf(Y|X): Supp(X and Y)/Supp(X) the
proportion of the data containing X
which also contains Y.
•Lift(X,Y): a measurement of independence
between X and Y.
•Conviction(Y|X) the ratio of the
frequency that X occurs without Y.
U S E F U L I D E A S
HYPERGIANT2019|CONFIDENTIAL
2 3
Hypergiant
The (long storied) example of a learned
association rule is the “beer and diapers”
rule for shopping between 5:00pm and
7:00pm.
This rule could be stated then to be
“[5:00-7:00] & [diapers] Implies [beer]”
Or similar, dependent upon the confidence.
E X A M P L E !
A S S O C I A T I O N R U L E L E A R N I N G
HYPERGIANT2019|CONFIDENTIAL
2 4
Hypergiant
Association Rule Learning has downsides in
that the number of potential rules grows
exponentially with the size of the feature
set, and most of the definitions for
‘interesting’ rules requires a large
sampling over the dataset.
These problems can be reduced through the
use of multiple queries over a Pilosa
index related to feature pairs.
D O W N S I D E S …
A S S O C I A T I O N R U L E L E A R N I N G
W I N N O W A L G O R I T H M
—
HYPERGIANT2019|CONFIDENTIAL
2 6
W I N N O W A L G O R I T H M
Hypergiant
What does the geometry of a discretized
dataset in a Pilosa layer look like?
There are discrete features, and
discretized continuous features. These
give it the geometry that looks like:
G E O M E T R Y !
Hn0
× Sn1
× … × Snm
HYPERGIANT2019|CONFIDENTIAL
2 7
Hypergiant
Hypercubes and Simplices both have good
behavior towards hyperplane separators.



Note that this implies there is a good
reason to believe that a linear
separator between two classes, or
several one-vs-many linear separators,
will behave well when treating the index
itself as a dataset.
W H A T D O E S T H I S M E A N ?
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
2 8
Hypergiant
There are many classification algorithms
that find a linear separator between the
classes:
•SVM with a Linear Kernel
•Perceptron
•Winnow
L I N E A R S E P A R A T O R S
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
2 9
Hypergiant
There are several versions of the Winnow
algorithm which differ mainly in how
they treat the ‘other’ class.
They differ from perceptron algorithms
in that they are generally updated
multiplicatively rather than additively
and can only be used on binary data.
W I N N O W A L G O R I T H M
W I N N O W A L G O R I T H M
3 0
•Define two classes {0,1}, initialize weights (wi) to be
all ones, and set a threshold value (n/2 generally)
and a learning rate r (2 generally).
•For each data point (x,y) do:
•Check if:
•If true, and y=1, prediction is correct
•If true, and y=0, then set wi=0 for all xi>0
•If false, and y=0, prediction is correct
•If false, and y=1, then set wi=r*wi for all xi>0.
•Returns weights for the linear classifier.
W I N N O W 1
HYPERGIANT2019|CONFIDENTIAL
Hypergiant
W I N N O W A L G O R I T H M
n
∑
i=1
wixi > θ
θ
3 1
What does setting a coefficient to zero do? Once a
variable is set to zero, it can not be changed! This
allows the removal of ‘noisy’ features or features that
may indicate a non-inclusion of the class.
This reduction of the space of variables ‘winnows’ the
useful (positive) features from the rest of them.
Since all the variables are normalized, this means the
algorithm does (in some sense) dimension reduction,
variable importance, and produces a classifier.
D R O P U N I M P O R T A N T V A R I A B L E S ?
HYPERGIANT2019|CONFIDENTIAL
Hypergiant
W I N N O W A L G O R I T H M
3 2
HYPERGIANT2019|CONFIDENTIAL
•Define two classes {0,1}, initialize weights (wi) to be
all ones, and set a threshold value (n/2 generally)
and a learning rate r (2 generally).
•For each data point (x,y) do:
•Check if:
•If true, and y=1, prediction is correct
•If true, and y=0, then set wi=wi/r for all xi>0
•If false, and y=0, prediction is correct
•If false, and y=1, then set wi=r*wi for all xi>0.
•Returns weights for the linear classifier.
W I N N O W 2
Hypergiant
W I N N O W A L G O R I T H M
θ
n
∑
i=1
wixi > θ
HYPERGIANT2019|CONFIDENTIAL
3 3
The Pilosa demo database:
•Contains a information related to taxi
cabs in New York City,
•Over 1.2billion entries,
•Has several thousand features (I did
not play with all of them)
•Many discretized continuous variables
•Has two types of taxi: green (0) and
yellow (1). With only 45million green
taxi data points in the entire set.
D E M O D A T A !
Hypergiant
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
3 4
Two general approaches, and two winnow
algorithms:
•Choose a set of features (independent),
find the sub-population, and choose a
sample from it.
•Choose a set of features (independent),
find the sub-population, and assign it
to be 0 or 1 based upon if there is
more (weighted) 0 or 1 in it.
S T R A T E G I E S !
Hypergiant
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
3 5
From playing with the Pilosa Queries and
the results of the algorithms we learned
that the dataset is very sparse in terms
of combinations of features. This with
the 27x over sampling of the green taxi
leads to a fairly rigid separation of
the green from the yellow taxi, as the
yellow seems more distributed.
P O S T - F A C T O O B S E R V A T I O N S
Hypergiant
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
3 6
Literature suggests that a threshold
value of half the number of features
seems to produce good values and
convergence. Experimentation with
smaller samples (and the known geometry)
suggests that a smaller threshold would
have faster initial convergence.
T H R E S H O L D
Hypergiant
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
3 7
Experimentation suggests that finding a
set of features with a non-empty sub-
population is the biggest difficulty in
these approaches. Running the algorithms
on 1000 subpopulations took over
35minutes, with most of that time taken
up with many queries over the features.
T I M E B E N C H M A R K S
Hypergiant
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
3 8
Hypergiant
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
3 9
Hypergiant
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
4 0
Hypergiant
W I N N O W A L G O R I T H M
4 1
This was run on a small virtual machine I was given access
to by Pilosa, it did not take advantage of many of the
cloud computing resources available.
In particular Pilosa does have a TensorFlow interface,
which would have dramatically improved the computations.
That being said, the difference between running the
algorithm and just finding the features was a few seconds.
A C A V E A T A B O U T T I M E …
HYPERGIANT2019|CONFIDENTIAL
Hypergiant
W I N N O W A L G O R I T H M
HYPERGIANT2019|CONFIDENTIAL
4 2
1. See how much a TensorFlow
implementation would speed up
computation
2. Experiment with alternatives to
Winnow1 and Winnow2
3. Design a better feature sampling
method than uniform over each feature
4. Run the experiment on a different
dataset to check performance
F U T U R E S T E P S
Hypergiant
W I N N O W A L G O R I T H M
QUESTIONS FOR
TOMORROW
TODAY
™
T O M O R R O W I N G T O D A Y T M
Marc Boudria & Dr. Drew Lipman
marc@hypergiant.com
drew@hypergiant.com

Más contenido relacionado

La actualidad más candente

What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsDerek Kane
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Benjamin Taylor
 
[台灣人工智慧學校] 主題演講 - 張智威總經理 (President of HTC DeepQ)
[台灣人工智慧學校] 主題演講 - 張智威總經理 (President of HTC DeepQ)[台灣人工智慧學校] 主題演講 - 張智威總經理 (President of HTC DeepQ)
[台灣人工智慧學校] 主題演講 - 張智威總經理 (President of HTC DeepQ)台灣資料科學年會
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Frieda Brioschi
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
The data science revolution in insurance
The data science revolution in insuranceThe data science revolution in insurance
The data science revolution in insuranceStefano Perfetti
 
IRJET- Data Visualization and Stock Market and Prediction
IRJET- Data Visualization and Stock Market and PredictionIRJET- Data Visualization and Stock Market and Prediction
IRJET- Data Visualization and Stock Market and PredictionIRJET Journal
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industryStefano Perfetti
 
O’reilly media 2014 data-science-salary-survey
O’reilly media 2014 data-science-salary-surveyO’reilly media 2014 data-science-salary-survey
O’reilly media 2014 data-science-salary-surveyAdam Rabinovitch
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science ProcessVishal Patel
 
Aditya report finaL
Aditya report finaLAditya report finaL
Aditya report finaL2767882
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPromptCloud
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary SurveyTrieu Nguyen
 
Stock Market Analysis and Prediction
Stock Market Analysis and PredictionStock Market Analysis and Prediction
Stock Market Analysis and PredictionAnil Shrestha
 
Google Stock Price Forecasting
Google Stock Price ForecastingGoogle Stock Price Forecasting
Google Stock Price ForecastingArkaprava Kundu
 
Responsible AI
Responsible AIResponsible AI
Responsible AINeo4j
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Quantopian
 

La actualidad más candente (20)

Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
[台灣人工智慧學校] 主題演講 - 張智威總經理 (President of HTC DeepQ)
[台灣人工智慧學校] 主題演講 - 張智威總經理 (President of HTC DeepQ)[台灣人工智慧學校] 主題演講 - 張智威總經理 (President of HTC DeepQ)
[台灣人工智慧學校] 主題演講 - 張智威總經理 (President of HTC DeepQ)
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
The data science revolution in insurance
The data science revolution in insuranceThe data science revolution in insurance
The data science revolution in insurance
 
IRJET- Data Visualization and Stock Market and Prediction
IRJET- Data Visualization and Stock Market and PredictionIRJET- Data Visualization and Stock Market and Prediction
IRJET- Data Visualization and Stock Market and Prediction
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industry
 
O’reilly media 2014 data-science-salary-survey
O’reilly media 2014 data-science-salary-surveyO’reilly media 2014 data-science-salary-survey
O’reilly media 2014 data-science-salary-survey
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
Aditya report finaL
Aditya report finaLAditya report finaL
Aditya report finaL
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
Stock Market Analysis and Prediction
Stock Market Analysis and PredictionStock Market Analysis and Prediction
Stock Market Analysis and Prediction
 
Google Stock Price Forecasting
Google Stock Price ForecastingGoogle Stock Price Forecasting
Google Stock Price Forecasting
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
 

Similar a Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pilosa Index

The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayAmazon Web Services
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsRedox Engine
 
Software Bill of Materials and the Vulnerability Exploitability eXchange
Software Bill of Materials and the Vulnerability Exploitability eXchange Software Bill of Materials and the Vulnerability Exploitability eXchange
Software Bill of Materials and the Vulnerability Exploitability eXchange Petar Radanliev
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMONeo4j
 
The Power of Digital Twins: A Comprehensive Guide
The Power of Digital Twins: A Comprehensive GuideThe Power of Digital Twins: A Comprehensive Guide
The Power of Digital Twins: A Comprehensive Guideefiletax
 
Autonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligenceAutonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligenceChristy Abraham Joy
 
The Unreasonable Effectiveness of Data
The Unreasonable Effectiveness of DataThe Unreasonable Effectiveness of Data
The Unreasonable Effectiveness of DataScientificRevenue
 
Scientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataScientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataWilliam Grosso
 
The Changing Face of Government IT
The Changing Face of Government ITThe Changing Face of Government IT
The Changing Face of Government ITDustin Haisler
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...
Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...
Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...CSSI_Inc
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 
Visual Analytics in Big Data
Visual Analytics in Big DataVisual Analytics in Big Data
Visual Analytics in Big DataSaurabh Shanbhag
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
 

Similar a Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pilosa Index (20)

The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDay
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise Integrations
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Recommender system
Recommender system Recommender system
Recommender system
 
Software Bill of Materials and the Vulnerability Exploitability eXchange
Software Bill of Materials and the Vulnerability Exploitability eXchange Software Bill of Materials and the Vulnerability Exploitability eXchange
Software Bill of Materials and the Vulnerability Exploitability eXchange
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
Data visualization
Data visualizationData visualization
Data visualization
 
The Power of Digital Twins: A Comprehensive Guide
The Power of Digital Twins: A Comprehensive GuideThe Power of Digital Twins: A Comprehensive Guide
The Power of Digital Twins: A Comprehensive Guide
 
Vikram emerging technologies
Vikram emerging technologiesVikram emerging technologies
Vikram emerging technologies
 
Autonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligenceAutonomous robot & sp theory of intelligence
Autonomous robot & sp theory of intelligence
 
The Unreasonable Effectiveness of Data
The Unreasonable Effectiveness of DataThe Unreasonable Effectiveness of Data
The Unreasonable Effectiveness of Data
 
Scientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataScientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of data
 
The Changing Face of Government IT
The Changing Face of Government ITThe Changing Face of Government IT
The Changing Face of Government IT
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Bi assignment
Bi assignmentBi assignment
Bi assignment
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...
Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...
Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Visual Analytics in Big Data
Visual Analytics in Big DataVisual Analytics in Big Data
Visual Analytics in Big Data
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 

Más de Formulatedby

Data Science Salon: Are you sure you're an ethical technologist?: Build your ...
Data Science Salon: Are you sure you're an ethical technologist?: Build your ...Data Science Salon: Are you sure you're an ethical technologist?: Build your ...
Data Science Salon: Are you sure you're an ethical technologist?: Build your ...Formulatedby
 
Data Science Salon: In your own words: computing customer similarity from tex...
Data Science Salon: In your own words: computing customer similarity from tex...Data Science Salon: In your own words: computing customer similarity from tex...
Data Science Salon: In your own words: computing customer similarity from tex...Formulatedby
 
Data Science Salon: nterpretable Predictive Models in the Healthcare Domain
Data Science Salon: nterpretable Predictive Models in the Healthcare DomainData Science Salon: nterpretable Predictive Models in the Healthcare Domain
Data Science Salon: nterpretable Predictive Models in the Healthcare DomainFormulatedby
 
Data Science Salon: Applications of Embeddings and Deep Learning at Groupon
Data Science Salon: Applications of Embeddings and Deep Learning at GrouponData Science Salon: Applications of Embeddings and Deep Learning at Groupon
Data Science Salon: Applications of Embeddings and Deep Learning at GrouponFormulatedby
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Formulatedby
 
Data Science Salon: Smart Cities
Data Science Salon: Smart Cities Data Science Salon: Smart Cities
Data Science Salon: Smart Cities Formulatedby
 
Data Science Salon: Building a Data Driven Product Mindset
Data Science Salon: Building a Data Driven Product MindsetData Science Salon: Building a Data Driven Product Mindset
Data Science Salon: Building a Data Driven Product MindsetFormulatedby
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseFormulatedby
 
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market ShareData Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market ShareFormulatedby
 
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...Formulatedby
 
Data Science Salon: Machine Learning for Personalized Cancer Vaccines
Data Science Salon: Machine Learning for Personalized Cancer VaccinesData Science Salon: Machine Learning for Personalized Cancer Vaccines
Data Science Salon: Machine Learning for Personalized Cancer VaccinesFormulatedby
 
Data Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science CultureData Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science CultureFormulatedby
 
Data Science Salon: Digital Transformation: The Data Science Catalyst
Data Science Salon: Digital Transformation: The Data Science CatalystData Science Salon: Digital Transformation: The Data Science Catalyst
Data Science Salon: Digital Transformation: The Data Science CatalystFormulatedby
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Formulatedby
 
Data Science Salon: Enabling self-service predictive analytics at Bidtellect
Data Science Salon: Enabling self-service predictive analytics at BidtellectData Science Salon: Enabling self-service predictive analytics at Bidtellect
Data Science Salon: Enabling self-service predictive analytics at BidtellectFormulatedby
 
Data Science Salon: MCL Clustering of Sparse Graphs
Data Science Salon: MCL Clustering of Sparse GraphsData Science Salon: MCL Clustering of Sparse Graphs
Data Science Salon: MCL Clustering of Sparse GraphsFormulatedby
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesFormulatedby
 
Data Science Salon: Deep Learning as a Product @ Scribd
Data Science Salon: Deep Learning as a Product @ ScribdData Science Salon: Deep Learning as a Product @ Scribd
Data Science Salon: Deep Learning as a Product @ ScribdFormulatedby
 
Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...
Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...
Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...Formulatedby
 
Data Science Salon: The Age of Co-creation
Data Science Salon: The Age of Co-creationData Science Salon: The Age of Co-creation
Data Science Salon: The Age of Co-creationFormulatedby
 

Más de Formulatedby (20)

Data Science Salon: Are you sure you're an ethical technologist?: Build your ...
Data Science Salon: Are you sure you're an ethical technologist?: Build your ...Data Science Salon: Are you sure you're an ethical technologist?: Build your ...
Data Science Salon: Are you sure you're an ethical technologist?: Build your ...
 
Data Science Salon: In your own words: computing customer similarity from tex...
Data Science Salon: In your own words: computing customer similarity from tex...Data Science Salon: In your own words: computing customer similarity from tex...
Data Science Salon: In your own words: computing customer similarity from tex...
 
Data Science Salon: nterpretable Predictive Models in the Healthcare Domain
Data Science Salon: nterpretable Predictive Models in the Healthcare DomainData Science Salon: nterpretable Predictive Models in the Healthcare Domain
Data Science Salon: nterpretable Predictive Models in the Healthcare Domain
 
Data Science Salon: Applications of Embeddings and Deep Learning at Groupon
Data Science Salon: Applications of Embeddings and Deep Learning at GrouponData Science Salon: Applications of Embeddings and Deep Learning at Groupon
Data Science Salon: Applications of Embeddings and Deep Learning at Groupon
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
 
Data Science Salon: Smart Cities
Data Science Salon: Smart Cities Data Science Salon: Smart Cities
Data Science Salon: Smart Cities
 
Data Science Salon: Building a Data Driven Product Mindset
Data Science Salon: Building a Data Driven Product MindsetData Science Salon: Building a Data Driven Product Mindset
Data Science Salon: Building a Data Driven Product Mindset
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
 
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market ShareData Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
 
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
 
Data Science Salon: Machine Learning for Personalized Cancer Vaccines
Data Science Salon: Machine Learning for Personalized Cancer VaccinesData Science Salon: Machine Learning for Personalized Cancer Vaccines
Data Science Salon: Machine Learning for Personalized Cancer Vaccines
 
Data Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science CultureData Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science Culture
 
Data Science Salon: Digital Transformation: The Data Science Catalyst
Data Science Salon: Digital Transformation: The Data Science CatalystData Science Salon: Digital Transformation: The Data Science Catalyst
Data Science Salon: Digital Transformation: The Data Science Catalyst
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
 
Data Science Salon: Enabling self-service predictive analytics at Bidtellect
Data Science Salon: Enabling self-service predictive analytics at BidtellectData Science Salon: Enabling self-service predictive analytics at Bidtellect
Data Science Salon: Enabling self-service predictive analytics at Bidtellect
 
Data Science Salon: MCL Clustering of Sparse Graphs
Data Science Salon: MCL Clustering of Sparse GraphsData Science Salon: MCL Clustering of Sparse Graphs
Data Science Salon: MCL Clustering of Sparse Graphs
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
 
Data Science Salon: Deep Learning as a Product @ Scribd
Data Science Salon: Deep Learning as a Product @ ScribdData Science Salon: Deep Learning as a Product @ Scribd
Data Science Salon: Deep Learning as a Product @ Scribd
 
Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...
Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...
Data Science Salon: Building smart AI: How Deep Learning Can Get You Into Dee...
 
Data Science Salon: The Age of Co-creation
Data Science Salon: The Age of Co-creationData Science Salon: The Age of Co-creation
Data Science Salon: The Age of Co-creation
 

Último

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 

Último (20)

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 

Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pilosa Index

  • 1. H Y P E R G I A N T 2 0 1 9
  • 2. HYPERGIANT2019|CONFIDENTIAL In this workshop we will discuss: •Pilosa, what it does, •How this might impact recommendation engines •How this might impact association schemes •The Geometry of data in a Pilosa Index •An experimental variant of the winnow algorithms run on a Pilosa Index. G O A L : 2 AN EXPERIMENT ON DATA SCIENCE ALGORITHMS ENABLED BY A PILOSA INDEX—
  • 3. W H O W E A R E —
  • 4. EXPERIENCE + INTELLIGENCE A bleeding-edge dream team with a deep understanding of the rich panoply of advancements available through MI. PEOPLE We blend strategy, design, and development capabilities to create experiences and define new capabilities leveraging Machine Intelligence. PROCESS Our signature, tech-agnostic approach balances the utilitarian and the evolutionary. METHOD HYPERGIANT2019|CONFIDENTIAL
  • 5. 01: Strategy 02: Design 03: Applied Sciences 04: Development We are comprised of digital product strategists, data scientists, machine learning focused engineers, creative technologists, user experience designers & developers for all endpoints. 5 OUR SERVICES DIV. 0001: SPACE AGE SOLUTIONS / HYPERGIANT - 2019 Strategy + Design+ Applied Science + Delivery > Technology-Agnostic Artificial Intelligence Design Applied Sciences DevelopmentStrategy
  • 6. M e t h o d o l o g y H Y P E R G I A N T HYPERGIANT2018|CONFIDENTIAL OUTPUT UserExperience MachineIntelligence
  • 7. R E A L I Z E T H A T E V E R Y T H I N G C O N N E C T S T O E V E R Y T H I N G E L S E USERS BUSINESS DATA The traditional design model wherein one weighs the user value and the business value of a given feature is outdated. It has been replaced with a framework in which one weighs user value, business value, and data value. If choices are not made that respect the value of each, the result will be unsatisfactory to one group.
  • 8. W H A T W E B E L I E V E H Y P E R G I A N T + P I L O S A —
  • 9. / HYPERGIANT 2019 DIV. 0001: SPACE AGE SOLUTIONS WHAT WE BELIEVE 9 E N T E R A D I S T R I B U T E D B I N A R Y I N D E X : P I L O S A . •We see Pilosa as an important technology to a more extensible future. •We see it as a potential solution to connecting the quagmire of enterprise dataset into meaningful data puddles that are required to drive more fluid data science mechanics. •We see it as a competitive advantage to deal with the cost of realtime data access. PILOSA CHANGES THE DIALOG AROUND LARGE DATA SETS, BOTH STATIC AND IN MOTION.
  • 10. P I L O S A : C O N C E P T —
  • 11. 1 1 HYPERGIANT2019|CONFIDENTIAL W H A T I S P I L O S A ? •“an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets.” W H A T D O E S T H I S M E A N ? •Data Lakes are a problem, especially when we are trying to do the initial exploratory statistical analysis of a dataset, even finding values can be slow and tedious. •Pilosa allows for queries to be run over the entire dataset quickly: •Example: 1.2B data points, 8 features, .07seconds P I L O S A : C O N C E P T Hypergiant
  • 12. 1 2 Hypergiant HYPERGIANT2019|CONFIDENTIAL P I L O S A : C O N C E P T H O W D O E S I T D O T H I S ? •Pilosa focuses on “relationships between objects and storing those relationships in bitmaps.” •That is: It is data feature focused. W H A T D O E S T H I S M E A N F O R U S ? •This allows a data scientist to search for a combination of features over the entire data set, quickly, finding data points with those features, count the number of them, etc.
  • 13. 1 3 Hypergiant HYPERGIANT2019|CONFIDENTIAL P I L O S A : C O N C E P T H O W D O W E T H I N K O F I T ? •Pilosa is a bitmap index. At the heart it is a boolean vector for the features of a data point. W H A T C A N T H E I N D E X D O F O R U S ? •This allows initial multiway explorations to be done quickly (if the data is already in an index) •This allows for combination features to be built and tested quickly •Balanced data sets can be built quickly
  • 14. 1 4 HYPERGIANT2019|CONFIDENTIAL Hypergiant P I L O S A : C O N C E P T W H A T E L S E C O U L D W E D O W I T H I T ? •The index can be treated as a dataset in itself •It is a data set built over binary features. W H A T A L G O R I T H M S R U N O N T H I S ? •Recommendation Engines •Association Rule Learning •Winnow Algorithms •(Others)
  • 15. R E C O M M E N D A T I O N E N G I N E S & A S S O C I A T I O N R U L E L E A R N I N G —
  • 16. HYPERGIANT2019|CONFIDENTIAL 1 6 R E C O M M E N D A T I O N E N G I N E S Hypergiant A Deep Belief Network (DBN) is made of layers of Restricted Boltzmann Machines (RBMs). RBMs are made of two parts, a hidden part, and a visible part, data bounces back and forth from the visible to the hidden probabilistically approximated and then used to update the probability distributions. D E E P B E L I E F N E T W O R K S
  • 17. HYPERGIANT2019|CONFIDENTIAL 1 7 R E C O M M E N D A T I O N E N G I N E S Hypergiant “A [Recommendation Engine] is a subclass of information filtering system[s] that seeks to predict the ‘rating’ or ‘preference’ a user would give to an item.” — Wikipedia
 
 The key idea is that they do not need to be trained on complete data. R E C O M M E N D A T I O N E N G I N E S
  • 18. 1 8 Hypergiant HYPERGIANT2019|CONFIDENTIAL •Two features can work together: •Did the user watch the film? (Yes/No) •Did the user give a positive review? (Yes/No) •In this setting (No,__) represents an incomplete data point, no known value. [0, *] •Similarly a richer ranking can be used: •Did the user watch the film? (Yes/No) •Did the user give a n-star review? (Yes/No) •In this setting (Yes, No, No, Yes, No, No) is a 3 star review for a movie watched. [1,0,0,1,0,0] R E C O M M E N D A T I O N E N G I N E S R E P R E S E N T A T I O N S !
  • 19. 1 9 Quality Assurance (QA) for Recommendation Engines and Machine Learning in general, is hard. There is a general lack of QA tools for ML, and a lack of knowledge around what types of errors occur and what they look like. Using a DBN Recommendation Engine we can build out a probability distribution for the population, based upon a set of features, and then query, quickly, across the population to see what which predictions differ from the population proportion, over the remaining features. R E C O M M E N D A T I O N M E E T R E A L I T Y : HYPERGIANT2019|CONFIDENTIAL Hypergiant R E C O M M E N D A T I O N E N G I N E S
  • 20. HYPERGIANT2019|CONFIDENTIAL 2 0 A S S O C I A T I O N R U L E L E A R N I N G Hypergiant “Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.” — Wikipedia W H A T I S A S S O C I A T I O N R U L E L E A R N I N G ?
  • 21. 2 1 Following the original definition the problem of association rule mining can be defined as: •I = {i1, i2, … , in} a set of n binary attributes, called items in the literature, for us: features. •D = {t1, t2, … , tm} the database or set of data points. •A rule is given as “X implies Y” where X and Y are sets of features. H U H ? HYPERGIANT2019|CONFIDENTIAL Hypergiant A S S O C I A T I O N R U L E L E A R N I N G
  • 22. HYPERGIANT2019|CONFIDENTIAL 2 2 A S S O C I A T I O N R U L E L E A R N I N G Hypergiant As always there are useful metric: •Supp(X): Proportion of the data which contains all of X. •Conf(Y|X): Supp(X and Y)/Supp(X) the proportion of the data containing X which also contains Y. •Lift(X,Y): a measurement of independence between X and Y. •Conviction(Y|X) the ratio of the frequency that X occurs without Y. U S E F U L I D E A S
  • 23. HYPERGIANT2019|CONFIDENTIAL 2 3 Hypergiant The (long storied) example of a learned association rule is the “beer and diapers” rule for shopping between 5:00pm and 7:00pm. This rule could be stated then to be “[5:00-7:00] & [diapers] Implies [beer]” Or similar, dependent upon the confidence. E X A M P L E ! A S S O C I A T I O N R U L E L E A R N I N G
  • 24. HYPERGIANT2019|CONFIDENTIAL 2 4 Hypergiant Association Rule Learning has downsides in that the number of potential rules grows exponentially with the size of the feature set, and most of the definitions for ‘interesting’ rules requires a large sampling over the dataset. These problems can be reduced through the use of multiple queries over a Pilosa index related to feature pairs. D O W N S I D E S … A S S O C I A T I O N R U L E L E A R N I N G
  • 25. W I N N O W A L G O R I T H M —
  • 26. HYPERGIANT2019|CONFIDENTIAL 2 6 W I N N O W A L G O R I T H M Hypergiant What does the geometry of a discretized dataset in a Pilosa layer look like? There are discrete features, and discretized continuous features. These give it the geometry that looks like: G E O M E T R Y ! Hn0 × Sn1 × … × Snm
  • 27. HYPERGIANT2019|CONFIDENTIAL 2 7 Hypergiant Hypercubes and Simplices both have good behavior towards hyperplane separators.
 
 Note that this implies there is a good reason to believe that a linear separator between two classes, or several one-vs-many linear separators, will behave well when treating the index itself as a dataset. W H A T D O E S T H I S M E A N ? W I N N O W A L G O R I T H M
  • 28. HYPERGIANT2019|CONFIDENTIAL 2 8 Hypergiant There are many classification algorithms that find a linear separator between the classes: •SVM with a Linear Kernel •Perceptron •Winnow L I N E A R S E P A R A T O R S W I N N O W A L G O R I T H M
  • 29. HYPERGIANT2019|CONFIDENTIAL 2 9 Hypergiant There are several versions of the Winnow algorithm which differ mainly in how they treat the ‘other’ class. They differ from perceptron algorithms in that they are generally updated multiplicatively rather than additively and can only be used on binary data. W I N N O W A L G O R I T H M W I N N O W A L G O R I T H M
  • 30. 3 0 •Define two classes {0,1}, initialize weights (wi) to be all ones, and set a threshold value (n/2 generally) and a learning rate r (2 generally). •For each data point (x,y) do: •Check if: •If true, and y=1, prediction is correct •If true, and y=0, then set wi=0 for all xi>0 •If false, and y=0, prediction is correct •If false, and y=1, then set wi=r*wi for all xi>0. •Returns weights for the linear classifier. W I N N O W 1 HYPERGIANT2019|CONFIDENTIAL Hypergiant W I N N O W A L G O R I T H M n ∑ i=1 wixi > θ θ
  • 31. 3 1 What does setting a coefficient to zero do? Once a variable is set to zero, it can not be changed! This allows the removal of ‘noisy’ features or features that may indicate a non-inclusion of the class. This reduction of the space of variables ‘winnows’ the useful (positive) features from the rest of them. Since all the variables are normalized, this means the algorithm does (in some sense) dimension reduction, variable importance, and produces a classifier. D R O P U N I M P O R T A N T V A R I A B L E S ? HYPERGIANT2019|CONFIDENTIAL Hypergiant W I N N O W A L G O R I T H M
  • 32. 3 2 HYPERGIANT2019|CONFIDENTIAL •Define two classes {0,1}, initialize weights (wi) to be all ones, and set a threshold value (n/2 generally) and a learning rate r (2 generally). •For each data point (x,y) do: •Check if: •If true, and y=1, prediction is correct •If true, and y=0, then set wi=wi/r for all xi>0 •If false, and y=0, prediction is correct •If false, and y=1, then set wi=r*wi for all xi>0. •Returns weights for the linear classifier. W I N N O W 2 Hypergiant W I N N O W A L G O R I T H M θ n ∑ i=1 wixi > θ
  • 33. HYPERGIANT2019|CONFIDENTIAL 3 3 The Pilosa demo database: •Contains a information related to taxi cabs in New York City, •Over 1.2billion entries, •Has several thousand features (I did not play with all of them) •Many discretized continuous variables •Has two types of taxi: green (0) and yellow (1). With only 45million green taxi data points in the entire set. D E M O D A T A ! Hypergiant W I N N O W A L G O R I T H M
  • 34. HYPERGIANT2019|CONFIDENTIAL 3 4 Two general approaches, and two winnow algorithms: •Choose a set of features (independent), find the sub-population, and choose a sample from it. •Choose a set of features (independent), find the sub-population, and assign it to be 0 or 1 based upon if there is more (weighted) 0 or 1 in it. S T R A T E G I E S ! Hypergiant W I N N O W A L G O R I T H M
  • 35. HYPERGIANT2019|CONFIDENTIAL 3 5 From playing with the Pilosa Queries and the results of the algorithms we learned that the dataset is very sparse in terms of combinations of features. This with the 27x over sampling of the green taxi leads to a fairly rigid separation of the green from the yellow taxi, as the yellow seems more distributed. P O S T - F A C T O O B S E R V A T I O N S Hypergiant W I N N O W A L G O R I T H M
  • 36. HYPERGIANT2019|CONFIDENTIAL 3 6 Literature suggests that a threshold value of half the number of features seems to produce good values and convergence. Experimentation with smaller samples (and the known geometry) suggests that a smaller threshold would have faster initial convergence. T H R E S H O L D Hypergiant W I N N O W A L G O R I T H M
  • 37. HYPERGIANT2019|CONFIDENTIAL 3 7 Experimentation suggests that finding a set of features with a non-empty sub- population is the biggest difficulty in these approaches. Running the algorithms on 1000 subpopulations took over 35minutes, with most of that time taken up with many queries over the features. T I M E B E N C H M A R K S Hypergiant W I N N O W A L G O R I T H M
  • 41. 4 1 This was run on a small virtual machine I was given access to by Pilosa, it did not take advantage of many of the cloud computing resources available. In particular Pilosa does have a TensorFlow interface, which would have dramatically improved the computations. That being said, the difference between running the algorithm and just finding the features was a few seconds. A C A V E A T A B O U T T I M E … HYPERGIANT2019|CONFIDENTIAL Hypergiant W I N N O W A L G O R I T H M
  • 42. HYPERGIANT2019|CONFIDENTIAL 4 2 1. See how much a TensorFlow implementation would speed up computation 2. Experiment with alternatives to Winnow1 and Winnow2 3. Design a better feature sampling method than uniform over each feature 4. Run the experiment on a different dataset to check performance F U T U R E S T E P S Hypergiant W I N N O W A L G O R I T H M
  • 44. T O M O R R O W I N G T O D A Y T M Marc Boudria & Dr. Drew Lipman marc@hypergiant.com drew@hypergiant.com