SlideShare a Scribd company logo
1 of 10
Download to read offline
SAME DATA.
BETTER RESULTS.
PAUL SALAZAR
PAUL@SKYTREE.NET!
1
SKYTREE’S FOCUS
"
PRODUCTION GRADE"
MACHINE LEARNING
Machine learning: the modern science of finding patterns and making predictions from data.!
aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
Machine Learning Use Cases!
Predict categories and classes!
Predict values and numbers!
Grouping and segmentation!
Detection and characterization!
Visualization and reduction!
Find similar items !
Classification !
Regression!
Clustering!
Density Estimation !
Dimension Reduction!
Multidimensional Querying!
Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest
Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine,
2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression
Recommendations Predictions
Outlier
Detection
What are the current options for ML for Big Data!
1.  Just use a subset of the data!!
–  e.g. just take the first 1,000 rows. Result to expect: Capture only
the broadest patterns. à Lower accuracy."
2.  Just use a simple ML method!!
–  e.g. use logistic regression instead of nonlinear SVM. Result to
expect: Entire types of patterns cannot be found. à Lower
accuracy."
3.  Just use simple parallelism/MapReduce!!
–  i.e. replace all the for-loops with parallel ones. Result to expect:
Only the simplest of ML methods (not O(N2)/O(N3)) can be
significantly sped up this way. à See #2."
4.  Just throw it in the cloud!!
–  i.e. somehow use the large compute power of the cloud. Result
to expect: The cost of sending it to the cloud is even greater than
the compute cost. à See #1.  See also #3."
Skytree’s Unique Differentiation:

Fundamental Technology Breakthrough!
Complexity of State-of-the-Art Machine Learning methods:!
1.  Querying: all-nearest-neighbors O(N2)!
2.  Density estimation: kernel density estimation O(N2), kernel conditional density est.
O(N3) !
3.  Classification: logistic regression, decision tree, neural nets, nearest-neighbor 

classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), !
4.  Regression: linear regression, LASSO, kernel regression O(N2), regression tree, 

Gaussian process regression O(N3)!
5.  Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), 

maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical
models!
6.  Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)!
7.  Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 

2-sample testing O(Nn), n=2, 3, 4, …!
►  Unfortunately O(N2), O(N3) are computationally prohibitive for big data!
Skytree has invented a way to reduce the complexity of above
methods from O(N2) and O(N3) to O(N) or O(N log N).
5
Performance!
Up to 10,000x !
speedups!
(on one CPU)!
6
How Does Skytree Do This?!
7
Deep knowledge of algorithms
Drawing from the latest from academia
Smart programming
Efficient ways to compute order N(2) and N(3)
Distributed systems
Take advantage of parallel computing speed
Team!
8
Martin Hack, CEO & Co-Founder

Sun, GreenBorder (Google)!
Alexander Gray, PhD, CTO & Co-Founder

Leading Light for Large-Scale, Fast Algorithms!
Paul Salazar, VP Sales

RedHat, Greenplum!
Leland Wilkinson, PhD, VP Data Visualization

Creator of SYSTAT (SPSS/IBM).!
Tim Marsland, PhD, VP Engineering

Sun Fellow, CTO Software, Apple, Oracle!
!
!
!
EXECUTIVE
TEAM!
BOARD OF
DIRECTORS!
Rick Lewis, USVP

Noah Doyle, Javelin Venture Partners!
David Toth, Founder and CEO NetRatings (Nielsen)!
Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’!
Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)!
Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)!
Prof. James Demmel, UC Berkeley: high-performance computing!
INVESTORS!
TECH!
ADVISORY!
BOARD!
USVP, Javelin Venture Partners, Scott McNealy, UPS
Product Overview!
9
Skytree Adviser
for Desktop
Data Science for Everyone
Skytree Server
for Enterprises
Enterprise Machine Learning
•  Predict Categories/Classes
•  Detect Anomalies
•  Find Trends
•  Predict Values/Numbers
•  Identify Patterns
•  Find Outliers
Advanced Analytics:
Thank you for learning about Skytree
Read more at www.skytree.net
!
•  We’re hiring: check out our careers page.!
•  Download Skytree Adviser for Free.!
•  Pick up a T-Shirt.!

More Related Content

What's hot

Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataRicard de la Vega
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API akvalex
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudAnsgar Scherp
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learningStanley Wang
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
RasterFrames + STAC
RasterFrames + STACRasterFrames + STAC
RasterFrames + STACSimeon Fitch
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Similar image search
Similar image searchSimilar image search
Similar image searchaliaishang
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniquesPoonam Kshirsagar
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisYuanyuan Tian
 

What's hot (18)

Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
NBITSearch. Features.
NBITSearch. Features.NBITSearch. Features.
NBITSearch. Features.
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
PyTables
PyTablesPyTables
PyTables
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
RasterFrames + STAC
RasterFrames + STACRasterFrames + STAC
RasterFrames + STAC
 
Slide 1
Slide 1Slide 1
Slide 1
 
Similar image search
Similar image searchSimilar image search
Similar image search
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 

Similar to Skytree big data london meetup - may 2013

Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Python for data science
Python for data sciencePython for data science
Python for data sciencebotsplash.com
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.MohammadMoreb
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataKathleneNgo
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Machine Learning with JavaScript
Machine Learning with JavaScriptMachine Learning with JavaScript
Machine Learning with JavaScriptIvo Andreev
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2DMohamed Nassar
 
Fr pca lda
Fr pca ldaFr pca lda
Fr pca ldaultraraj
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 

Similar to Skytree big data london meetup - may 2013 (20)

Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Machine Learning with JavaScript
Machine Learning with JavaScriptMachine Learning with JavaScript
Machine Learning with JavaScript
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
 
Fr pca lda
Fr pca ldaFr pca lda
Fr pca lda
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Skytree big data london meetup - may 2013

  • 1. SAME DATA. BETTER RESULTS. PAUL SALAZAR PAUL@SKYTREE.NET! 1
  • 2. SKYTREE’S FOCUS " PRODUCTION GRADE" MACHINE LEARNING Machine learning: the modern science of finding patterns and making predictions from data.! aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
  • 3. Machine Learning Use Cases! Predict categories and classes! Predict values and numbers! Grouping and segmentation! Detection and characterization! Visualization and reduction! Find similar items ! Classification ! Regression! Clustering! Density Estimation ! Dimension Reduction! Multidimensional Querying! Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine, 2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression Recommendations Predictions Outlier Detection
  • 4. What are the current options for ML for Big Data! 1.  Just use a subset of the data!! –  e.g. just take the first 1,000 rows. Result to expect: Capture only the broadest patterns. à Lower accuracy." 2.  Just use a simple ML method!! –  e.g. use logistic regression instead of nonlinear SVM. Result to expect: Entire types of patterns cannot be found. à Lower accuracy." 3.  Just use simple parallelism/MapReduce!! –  i.e. replace all the for-loops with parallel ones. Result to expect: Only the simplest of ML methods (not O(N2)/O(N3)) can be significantly sped up this way. à See #2." 4.  Just throw it in the cloud!! –  i.e. somehow use the large compute power of the cloud. Result to expect: The cost of sending it to the cloud is even greater than the compute cost. à See #1.  See also #3."
  • 5. Skytree’s Unique Differentiation:
 Fundamental Technology Breakthrough! Complexity of State-of-the-Art Machine Learning methods:! 1.  Querying: all-nearest-neighbors O(N2)! 2.  Density estimation: kernel density estimation O(N2), kernel conditional density est. O(N3) ! 3.  Classification: logistic regression, decision tree, neural nets, nearest-neighbor 
 classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), ! 4.  Regression: linear regression, LASSO, kernel regression O(N2), regression tree, 
 Gaussian process regression O(N3)! 5.  Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), 
 maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical models! 6.  Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)! 7.  Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 
 2-sample testing O(Nn), n=2, 3, 4, …! ►  Unfortunately O(N2), O(N3) are computationally prohibitive for big data! Skytree has invented a way to reduce the complexity of above methods from O(N2) and O(N3) to O(N) or O(N log N). 5
  • 6. Performance! Up to 10,000x ! speedups! (on one CPU)! 6
  • 7. How Does Skytree Do This?! 7 Deep knowledge of algorithms Drawing from the latest from academia Smart programming Efficient ways to compute order N(2) and N(3) Distributed systems Take advantage of parallel computing speed
  • 8. Team! 8 Martin Hack, CEO & Co-Founder
 Sun, GreenBorder (Google)! Alexander Gray, PhD, CTO & Co-Founder
 Leading Light for Large-Scale, Fast Algorithms! Paul Salazar, VP Sales
 RedHat, Greenplum! Leland Wilkinson, PhD, VP Data Visualization
 Creator of SYSTAT (SPSS/IBM).! Tim Marsland, PhD, VP Engineering
 Sun Fellow, CTO Software, Apple, Oracle! ! ! ! EXECUTIVE TEAM! BOARD OF DIRECTORS! Rick Lewis, USVP
 Noah Doyle, Javelin Venture Partners! David Toth, Founder and CEO NetRatings (Nielsen)! Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’! Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)! Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)! Prof. James Demmel, UC Berkeley: high-performance computing! INVESTORS! TECH! ADVISORY! BOARD! USVP, Javelin Venture Partners, Scott McNealy, UPS
  • 9. Product Overview! 9 Skytree Adviser for Desktop Data Science for Everyone Skytree Server for Enterprises Enterprise Machine Learning •  Predict Categories/Classes •  Detect Anomalies •  Find Trends •  Predict Values/Numbers •  Identify Patterns •  Find Outliers Advanced Analytics:
  • 10. Thank you for learning about Skytree Read more at www.skytree.net ! •  We’re hiring: check out our careers page.! •  Download Skytree Adviser for Free.! •  Pick up a T-Shirt.!