This document summarizes a seminar on machine learning using big data. It discusses the history of data storage and traditional databases. It then introduces machine learning and the types of learning, including supervised and unsupervised learning. Specific algorithms for each type are covered such as k-means clustering for unsupervised and naive Bayes for supervised. Case studies on applications like Amazon product recommendations are presented. The document concludes by discussing tools for machine learning and future applications as more connected devices generate extensive data.
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Machine Learning using Big data
1. Machine Learning Using Big Data
A SEMINAR ON
SEMINAR GUIDE: PROF A.K HASE PRESENTER: MR. VAIBHAV KURKUTE
15-04-2017
2. 1. History & Traditional Database
2. Introduction
3. Data Mining
4. What is Machine Learning ?
5. Types of Learning's
6. Supervised Learning Algorithms
7. Unsupervised Learning Algorithms
8. Case Studies
9. Future Scope & Tools
10. Conclusion
15-04-2017
Content
3. • Old Source of Data: Telephone (Text or Voice)
• Computer Invention & Business Uses
• Old Data Storage
• 21st Century Evolution
• Traditional Databases & Drawbacks
• Structure Data
• Use of MySQL Database Use
• Machine Generating Data.
• Unstructured Data
Use MongoDB i.e NoSQL Database
*Hadoop Distributed File System,HBASE ,Hive.
15-04-2017
History
5. • Generated Fast in unstructured form.
• Continuously Processed and Analyzed
• Large amounts of data, like a million rows in an Excel sheet
• Different types of data mostly unstructured data.
• Get Knowledge out of this data.
1. Google processes 20 petabytes of data every day
2. Facebook gets Thousands of Status in an hour.
15-04-2017
Introduction to Big Data
6. • Web: estimated Google index 45 billion pages
• Transaction data: 5-50 TB/day
• Satellite image feeds: ~1TB/day/satellite
• Sensor networks/arrays
– CERN Large Hadron Collider ~100 petabytes/day
• Biological data: 1-10TB/day/sequencer
• TV: 2TB/day/channel; YouTube 4TB/day uploaded
• Digitized telephony: ~100 petabytes/day
15-04-2017
How big is Big Data ?
8. • Data Mining is of no use if we can’t get useful information from data
• To mine insights from the data & make it potentially useful.
• Previously Unknown data to knowledge.
• Which can be used for ?
1. Predict future trends
2. Allowing businesses to make proactive.
3. Knowledge-driven decisions
4. E.G From your travel history on Yatra.com, one can identify your hometown
5. E.G Snyder & Vini Facebook status
15-04-2017
Data Mining
10. • Machine learn on its own
• No need to tell the machine what to do
• No Need of Coding
• We provide what we call the training data set.
• Use of algorithms and Learn Pattern so to.
• Create knowledge from data.
• Example:
If we give sample input & output like
2 -> f(x) -> 4 and 3 -> f(x) -> 9
4 -> f(x) -> 16 then 5 -> f(x) -> ?
15-04-2017
Machine Learning
11. • Here are few examples:
1. Google’s self-driving cars
2. Blocking of suspicious credit cards & Spam Mails
3. Recommendation engines on an e-commerce site
4. Facebook Friend Suggestion
“People worry that computers will get too smart and take over the world, but the real
problem is that they're too stupid and they've already taken over the world”
15-04-2017
Machine Learning
13. • Training data with correct answers i.e Examples for Computer
• Use training data to prepare the algorithm
• Apply it to data without a correct answer
• It’s like predictive algorithms.
15-04-2017
Type: Supervised Learning
14. • No Examples for Computer i.e No training data
• We give data to algorithm
• Here we know which algorithm to use.
• It’s like exploratory algorithm
• We have just to input data & Not Output
• Example
Differentiates correctly between the face of a horse, cat or human (clustering of data)
15-04-2017
Type: Unsupervised Learning
15. • Clustering:
• Splitting records to pre-defined group
• Data with similar property
• Association:
Seeing what often appears together with what.
• K-means clustering
15-04-2017
Unsupervised Algorithm
16. • Classification:
• Assigning Records to Predefined Groups
• E.g Recognizing handwritten numbers, or classify emails spam or not.
• Regression (predictive analysis):
• Predict the output value using training data
• Naïve Bayes classifier.
• Decision trees
• Nearest neighbors (kNN)
• Neural networks
15-04-2017
Supervised Algorithm
17. • Classification:
• Assigning Records to Predefined Groups
• E.g a data used by motor vehicle company to find where to sale ?
• Regression (predictive analysis):
• Predict the output value using training data
• Naïve Bayes classifier.
• Decision trees
• Nearest neighbors (kNN)
• Neural networks
15-04-2017
Supervised Algorithm
18. • Type of Unsupervised Learning.
• We have to predict using training data.
• Association Rules Mining its using If-Then Condition.
• CASE STUDY 1:
How does amazon predict which product will be sold with what ?
15-04-2017
Apriori Algorithm
19. • It is a type of Market Basket Analysis
• Information of this type used in the form of “if–then” statements.
• Rules are computed from the data
• Examine all possible rules.
• For the items in an if–then format.
• Select only those that are most likely
to be indicators of true dependence.
15-04-2017
Case Study (Amazon)
21. 15-04-2017
Case Study (Amazon)
• Generate frequent item sets
• With two items, then with three items.
• Based on , how many transactions in the database include the item.
24. 15-04-2017
Real life application
• Some real life applications of machine learning:
Recommender systems – suggesting similar people on Facebook/LinkedIn, similar
movies/ books etc. on Amazon,
Business applications – Customer segmentation, Customer retention, Targeted
Marketing etc.
Medical applications – Disease diagnosis,
Banking – Credit card issue, fraud detection etc.
Language translation, text to speech or vice versa.
25. 15-04-2017
Future scope
• Companies using ML – Google, FB, Microsoft, BoA and those which are not using are at
loss.
• With the current increase in use of IoT (Household, Business, Industries etc.) so there is
need of continuously analysis data and conclude using machine learning.
• Connected devices, we now have access to so much more data—and along with it, an
increased need to manage and understand what we know.
• In the future, users will receive more precise recommendations and ads will become
both more effective and less annoying.
26. Conclusion
• Machine Learning can efficiently support fraud/error detection system.
• Association rule is often the most accurate for suggestion product in market basket
analysis.
• ML can play a good role in the different phase of software engineering, like planning,
analysis, design and testing.
• And Mostly in analyzing data Generated from Sensor used in IoT.
“Machine Learning is like magic where you can get answer to any question”