SlideShare una empresa de Scribd logo
1 de 14
Combining NEO4J graph databse with WEKA Basic “toy” example drawn upon mining SEC filings of Form -D
Experiment :Find intersection among VC firms related to Google and its latest acquisitions (i.e the “Dataset”) and play with “predicting” the chance of newly funded startup being acquired by Google by examining proximity.
Weka: Machine learning toolkit containing classification and clustering algorithms. In this case used for creating recommendations based on input. Neo4j: Graph Database. Very suitable for social networks data. Used here for finding “shortest path” between two nodes
Neo4J can handle large sets of unstructured linked data:
RDF : Subject- Property- Object Neo4J: Node 1–Relationship-Node2
Statement: “Sequoia Capital Funded Google” Initialize Database: grapb = new EmbeddedGraphDatabase( “SEC" ); index = new LuceneIndexService( graphDb ); Create the Nodes: Node Sequoia  = graphDb.createNode(); Sequoia.setProperty( "name", “Seqioua Capital” ); Node Google  = graphDb.createNode(); Google.setProperty( "name", “Google” ); index.index(Sequoia , "name“,” Seqioua Capital”) ); Create Relationship: Relationship rel = Sequoia.createRelationshipTo(Google, Relationship.FUNDED);
Traversertraverser = node.traverse( Order.DEPTH_FIRST, topEvaluator.END_OF_NETWORK, new ReturnableEvaluator(){public booleanisReturnableNode(TraversalPositioncurrentPosition){Relationship last =currentPosition.lastRelationshipTraversed();	 		return( last.getType().equals(InvestorRelationTypes.FUNDED)     )		return false;				}			}, InvestorRelationTypes.BOARD, Direction.INCOMING, 		 	InvestorRelationTypes.FUNDED, Direction.INCOMING,			InvestorRelationTypes.ACQUIRED, Direction.OUTGOING ); 		return traverser.getAllNodes();
“Path to Google:”
Weka Create Attributes (table input) Create DataSet for Learning  Build predictive model Evaluate quality of Model Predict the rank based on input
Basic terms in WEKA ,[object Object],A set of data items, the dataset, is a very basic concept of machine learning. A dataset is roughly equivalent to a two-dimensional spreadsheet or database table. In WEKA  a dataset is a collection of Instances. ,[object Object]
Instance –Dataset consist of Instances
Attribute –Each instance consist of attributes
Classifier ,[object Object]
Example:Attributes

Más contenido relacionado

La actualidad más candente

Automating Drug Design Nov 13th 2009 97
Automating Drug Design Nov 13th 2009 97Automating Drug Design Nov 13th 2009 97
Automating Drug Design Nov 13th 2009 97
David Leahy
 
07 Retrieving Objects
07 Retrieving Objects07 Retrieving Objects
07 Retrieving Objects
Ranjan Kumar
 

La actualidad más candente (20)

Signals from outer space
Signals from outer spaceSignals from outer space
Signals from outer space
 
Introduce to PredictionIO
Introduce to PredictionIOIntroduce to PredictionIO
Introduce to PredictionIO
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Automating Drug Design Nov 13th 2009 97
Automating Drug Design Nov 13th 2009 97Automating Drug Design Nov 13th 2009 97
Automating Drug Design Nov 13th 2009 97
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
 
Power of Polyglot Search
Power of Polyglot SearchPower of Polyglot Search
Power of Polyglot Search
 
Apache Spark Side of Funnels
Apache Spark Side of FunnelsApache Spark Side of Funnels
Apache Spark Side of Funnels
 
A Data Model, Workflow, and Architecture for Integrating Data
A Data Model, Workflow, and Architecture for Integrating DataA Data Model, Workflow, and Architecture for Integrating Data
A Data Model, Workflow, and Architecture for Integrating Data
 
Softwares used in data mining
Softwares used in data miningSoftwares used in data mining
Softwares used in data mining
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
2014.06.24.what is ubix
2014.06.24.what is ubix2014.06.24.what is ubix
2014.06.24.what is ubix
 
07 Retrieving Objects
07 Retrieving Objects07 Retrieving Objects
07 Retrieving Objects
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubJoining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 

Destacado

Building an Online-Recommendation Engine with MongoDB
Building an Online-Recommendation Engine with MongoDBBuilding an Online-Recommendation Engine with MongoDB
Building an Online-Recommendation Engine with MongoDB
MongoDB
 
Driving Predictive Roadway Analytics with the Power of Neo4j
Driving Predictive Roadway Analytics with the Power of Neo4jDriving Predictive Roadway Analytics with the Power of Neo4j
Driving Predictive Roadway Analytics with the Power of Neo4j
Neo4j
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
NYC Predictive Analytics
 

Destacado (19)

Building an Online-Recommendation Engine with MongoDB
Building an Online-Recommendation Engine with MongoDBBuilding an Online-Recommendation Engine with MongoDB
Building an Online-Recommendation Engine with MongoDB
 
Neo4J with Docker and Azure - GraphConnect 2015
Neo4J with Docker and Azure - GraphConnect 2015Neo4J with Docker and Azure - GraphConnect 2015
Neo4J with Docker and Azure - GraphConnect 2015
 
Driving Predictive Roadway Analytics with the Power of Neo4j
Driving Predictive Roadway Analytics with the Power of Neo4jDriving Predictive Roadway Analytics with the Power of Neo4j
Driving Predictive Roadway Analytics with the Power of Neo4j
 
Neo4j on Azure Step by Step
Neo4j on Azure Step by StepNeo4j on Azure Step by Step
Neo4j on Azure Step by Step
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
 
Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4j
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
 
Neo4j + Tableau Visual Analytics - GraphConnect SF 2015
Neo4j + Tableau Visual Analytics - GraphConnect SF 2015 Neo4j + Tableau Visual Analytics - GraphConnect SF 2015
Neo4j + Tableau Visual Analytics - GraphConnect SF 2015
 
Document Classification with Neo4j
Document Classification with Neo4jDocument Classification with Neo4j
Document Classification with Neo4j
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 

Similar a Neo4J and Weka 2

Elastic search integration with hadoop leveragebigdata
Elastic search integration with hadoop   leveragebigdataElastic search integration with hadoop   leveragebigdata
Elastic search integration with hadoop leveragebigdata
Pooja Gupta
 
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter PilgrimJavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
JavaCro 2014 Scala and Java EE 7 Development Experiences
JavaCro 2014 Scala and Java EE 7 Development ExperiencesJavaCro 2014 Scala and Java EE 7 Development Experiences
JavaCro 2014 Scala and Java EE 7 Development Experiences
Peter Pilgrim
 
Nhibernatethe Orm For Net Platform 1226744632929962 8
Nhibernatethe Orm For Net Platform 1226744632929962 8Nhibernatethe Orm For Net Platform 1226744632929962 8
Nhibernatethe Orm For Net Platform 1226744632929962 8
Nicolas Thon
 

Similar a Neo4J and Weka 2 (20)

Elastic search integration with hadoop leveragebigdata
Elastic search integration with hadoop   leveragebigdataElastic search integration with hadoop   leveragebigdata
Elastic search integration with hadoop leveragebigdata
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
 
Understanding backbonejs
Understanding backbonejsUnderstanding backbonejs
Understanding backbonejs
 
Meetup ml spark_ppt
Meetup ml spark_pptMeetup ml spark_ppt
Meetup ml spark_ppt
 
.NET Database Toolkit
.NET Database Toolkit.NET Database Toolkit
.NET Database Toolkit
 
Spark devoxx2014
Spark devoxx2014Spark devoxx2014
Spark devoxx2014
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka application
 
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter PilgrimJavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
 
JavaCro 2014 Scala and Java EE 7 Development Experiences
JavaCro 2014 Scala and Java EE 7 Development ExperiencesJavaCro 2014 Scala and Java EE 7 Development Experiences
JavaCro 2014 Scala and Java EE 7 Development Experiences
 
Adding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsAdding a modern twist to legacy web applications
Adding a modern twist to legacy web applications
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Django
 
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Nhibernatethe Orm For Net Platform 1226744632929962 8
Nhibernatethe Orm For Net Platform 1226744632929962 8Nhibernatethe Orm For Net Platform 1226744632929962 8
Nhibernatethe Orm For Net Platform 1226744632929962 8
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
 
Flink Forward Berlin 2018: Jared Stehler - "Streaming ETL with Flink and Elas...
Flink Forward Berlin 2018: Jared Stehler - "Streaming ETL with Flink and Elas...Flink Forward Berlin 2018: Jared Stehler - "Streaming ETL with Flink and Elas...
Flink Forward Berlin 2018: Jared Stehler - "Streaming ETL with Flink and Elas...
 
Accessing data with android cursors
Accessing data with android cursorsAccessing data with android cursors
Accessing data with android cursors
 
Accessing data with android cursors
Accessing data with android cursorsAccessing data with android cursors
Accessing data with android cursors
 

Último

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Neo4J and Weka 2

  • 1. Combining NEO4J graph databse with WEKA Basic “toy” example drawn upon mining SEC filings of Form -D
  • 2. Experiment :Find intersection among VC firms related to Google and its latest acquisitions (i.e the “Dataset”) and play with “predicting” the chance of newly funded startup being acquired by Google by examining proximity.
  • 3. Weka: Machine learning toolkit containing classification and clustering algorithms. In this case used for creating recommendations based on input. Neo4j: Graph Database. Very suitable for social networks data. Used here for finding “shortest path” between two nodes
  • 4. Neo4J can handle large sets of unstructured linked data:
  • 5. RDF : Subject- Property- Object Neo4J: Node 1–Relationship-Node2
  • 6. Statement: “Sequoia Capital Funded Google” Initialize Database: grapb = new EmbeddedGraphDatabase( “SEC" ); index = new LuceneIndexService( graphDb ); Create the Nodes: Node Sequoia = graphDb.createNode(); Sequoia.setProperty( "name", “Seqioua Capital” ); Node Google = graphDb.createNode(); Google.setProperty( "name", “Google” ); index.index(Sequoia , "name“,” Seqioua Capital”) ); Create Relationship: Relationship rel = Sequoia.createRelationshipTo(Google, Relationship.FUNDED);
  • 7. Traversertraverser = node.traverse( Order.DEPTH_FIRST, topEvaluator.END_OF_NETWORK, new ReturnableEvaluator(){public booleanisReturnableNode(TraversalPositioncurrentPosition){Relationship last =currentPosition.lastRelationshipTraversed(); return( last.getType().equals(InvestorRelationTypes.FUNDED) ) return false; } }, InvestorRelationTypes.BOARD, Direction.INCOMING, InvestorRelationTypes.FUNDED, Direction.INCOMING, InvestorRelationTypes.ACQUIRED, Direction.OUTGOING ); return traverser.getAllNodes();
  • 9. Weka Create Attributes (table input) Create DataSet for Learning Build predictive model Evaluate quality of Model Predict the rank based on input
  • 10.
  • 12. Attribute –Each instance consist of attributes
  • 13.
  • 15. 1) Create Attributes: Attribute pathAttribute = new Attribute("path");Attribute categoryAttribute = new Attribute("category");Attribute similiarityAttribute = new Attribute("similarity");Attribute probabiityAttribute = new Attribute("probability"); In Weka a vector is container foR Attributes FastVector allAttributes = new FastVector(4); allAttributes.addElement(pathAttribute); allAttributes.addElement(categoryAttribute); 2) Create Dataset:Instance is a “container” of Attributesand the Dataset is container of Instances. Instances trainingDataSet = new Instances("VC", allAttributes, 17); For each instance we set values to be trained upon: Instance instance = new Instance(4);instance.setDataset(trainingDataSet);instance.setValue(0, path);instance.setValue(1, category); instance.setValue(2, similiarity); instance.setValue(3, rank); trainingDataSet.add(instance);
  • 16. 3) Train Classifier and Evaluate RBFNetwork rbfLearner = new RBFNetwork(); rbfLearner.setNumClusters(17); rbfLearner.buildClassifier(trainingDataSet ); Evaluation learningSetEvaluation = new Evaluation(learningDataset); learningSetEvaluation.evaluateModel(rbfLearner, learningDataset); 4) Predict Unknown Cases Instance instance = new Instance(4);instance.setDataset(trainingDataSet);instance.setValue(0, path);instance.setValue(1, category); instance.setValue(2, similiarity); instance.setValue(3, 0); double prediction = rbfLearner.classifyInstance(testInstance);