SlideShare una empresa de Scribd logo
1 de 17
RAPIDMINER: INTRODUCTION TO DATAMINING
AGENDA ,[object Object]
Introduction to RapidMiner
Use of RapidMiner for Data Mining
Download and Installation Steps
Memory Usage , Plug-ins & Settings
Supported File Formats,[object Object]
Different levels of analysis that are available: Artificial neural networks – Non-linear predictive models that resemble biological neural networks in structure. Genetic algorithms - Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution. Decision trees – Provide a set of rules that you can apply to a new dataset to predict the outcome.        Examples: ,[object Object]
Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. Rule induction – The extraction of useful if-then rules from data based on statistical significance. Nearest neighbor – Classify records based on the k-most similar  records Data visualization - Visual interpretation of complex relationships in multidimensional data.
Applications Can be divided into four major kinds: Classification Numerical prediction Association Clustering Some examples: Automatic abstraction Financial forecasting Targeted marketing Medical diagnosis Credit card fraud detection Weather forecasting etc.
Introduction to RapidMiner RapidMiner (formerly YALE*)is an environment for machine learning and data mining experiments.  RapidMiner is used for both research and real-world data mining tasks. Software versions:  ,[object Object]
Enterprise edition (Community Edition + More Features + Services + Guarantees) *YALE - Yet Another Learning Environment
  Some properties of RapidMiner: Written in Java Knowledge discovery processes are modelled as operator trees Internal XML representation ensures standardized interchange format of data mining experiments Scriptinglanguage allows for automating large-scale experiments Multi-layered data view concept ensures efficient and transparent data handling GUI, command-line mode (batch mode), and Java API for using RapidMiner from other programs Several plugins already exist A large set of high-dimensional visualization schemes for data and models offered by its plotting facility. Applications: text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.
Use of RapidMiner for Data Mining Using RapidMiner ,[object Object]
GUI can be used to design XML description of the operator tree
Break points can be used to check the intermediate resultsUse from a separate program Command line version and Java API can be used to invoke RapidMiner in your programs without using the GUI
Download and Installation Steps Download The latest version of RapidMiner can be downloaded from http://rapid-i.com/content/blogsection/7/82/lang,en/ by selecting the appropriate version(Windows x86, x64 etc.) and RapidMiner edition Installation Windows executable Download the windows executable (.exe) file Double-click the rapidminer-xxx-instal.exe file to run it Follow the instructions

Más contenido relacionado

Más de DataminingTools Inc

Más de DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

RAPIDMINER: Introduction To Datamining

  • 2.
  • 4. Use of RapidMiner for Data Mining
  • 6. Memory Usage , Plug-ins & Settings
  • 7.
  • 8.
  • 9. Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. Rule induction – The extraction of useful if-then rules from data based on statistical significance. Nearest neighbor – Classify records based on the k-most similar records Data visualization - Visual interpretation of complex relationships in multidimensional data.
  • 10. Applications Can be divided into four major kinds: Classification Numerical prediction Association Clustering Some examples: Automatic abstraction Financial forecasting Targeted marketing Medical diagnosis Credit card fraud detection Weather forecasting etc.
  • 11.
  • 12. Enterprise edition (Community Edition + More Features + Services + Guarantees) *YALE - Yet Another Learning Environment
  • 13. Some properties of RapidMiner: Written in Java Knowledge discovery processes are modelled as operator trees Internal XML representation ensures standardized interchange format of data mining experiments Scriptinglanguage allows for automating large-scale experiments Multi-layered data view concept ensures efficient and transparent data handling GUI, command-line mode (batch mode), and Java API for using RapidMiner from other programs Several plugins already exist A large set of high-dimensional visualization schemes for data and models offered by its plotting facility. Applications: text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.
  • 14.
  • 15. GUI can be used to design XML description of the operator tree
  • 16. Break points can be used to check the intermediate resultsUse from a separate program Command line version and Java API can be used to invoke RapidMiner in your programs without using the GUI
  • 17. Download and Installation Steps Download The latest version of RapidMiner can be downloaded from http://rapid-i.com/content/blogsection/7/82/lang,en/ by selecting the appropriate version(Windows x86, x64 etc.) and RapidMiner edition Installation Windows executable Download the windows executable (.exe) file Double-click the rapidminer-xxx-instal.exe file to run it Follow the instructions
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Supported File Formats Can read data files, read & write models, parameter sets and attribute sets. Most important – examples and instances
  • 23. Data files & attribute description files ARFFEXAMPLESOURCE - .arff format DATABASEEXAMPLESOURCE – To read from databases SPARSEFORMATEXAMPLESOURCE DENSEFORMATEXAMPLESOURCE Attribute description file (.aml) in order to retrieve metadata about the instances XML Attributes that can be set: Name – unique name of the attribute Sourcefile – name of the file containing the data(default used if not specified) Sourcecol –column within the file(Starting from 1) Sourcecol_end – sourcecol-sourcecol_end attributes are generated with the same properties. Valuetype– one out of nominal,numeric, integer, real, ordered, binominal, polynominal and file_path Blocktype – one out of single_value, value_series, value_series_start, value_series_end, interval, interval_start, interval_end
  • 24. Model files (.mod files) Contains the models generated by previous runs MODELWRITER – to write model files MODELLOADER – to read model files MODELAPPLIER – to apply model files Attribute construction files (.att files) ATTRIBUTECONSTRUCTIONWRITER – writes an attribute set ATTRIBUTECONSTRUCTIONLOADER – reads an attribute set Parameter set files (.par files) GRIDPARAMETEROPTIMIZTION – generates a set of optimal parameters for a particular task PARAMETERSETLOADER – use the parameter files Attribute weight files (.wgt files) Attibute selection is seen as attribute weighing which allows for more flexibility ATTRIBUTEWEIGHTSWRITER – to write attribute weights to a file ATTRIBUTEWEIGHTSLOADER – to read the attribute weights ATTRIBUTEWEIGHTSAPPLIER – to apply in the example sets
  • 26. More questions… Reach us at support@dataminingtools.net VISIT: WWW.DATAMININGTOOLS.NET