SlideShare una empresa de Scribd logo
1 de 23
Python For BIG DATA ANALYTICS
View Mastering Python course details at http://www.edureka.co/python
Slide 2 www.edureka.co/python
At the end of this module, you will be able to
Objectives
 Understand Python
 Understand Web Scrapping example using Python
 Understand PyDoop: Python API for Hadoop
 Implement Word Count example in Pydoop
 Integrate Data Science with Python
 Implement Zombie Invasion modeling using Python
Slide 3 www.edureka.co/python
Why Python?
 Python is a great language for the beginner programmers since it is easy-to-learn and easy-to-maintain.
 Python’s biggest strength is that the bulk of it’s library is portable. It also supports GUI Programming and
can be used to create Applications portable on Mac, Windows and Unix X-Windows system.
 With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics.
Slide 4 www.edureka.co/python
Growing Interest in Python
Slide 5 www.edureka.co/python
Demo: Web Scraping using Python
 This example demonstrates how to scrape basic financial data from IMDB webpage
 We shall use open source web scraping framework for Python called Beautiful Soup to crawl and
extract data from webpages
 Scraping is used for a wide range of purposes, from data mining to monitoring and automated testing
Slide 6 www.edureka.co/python
Demo: Collecting Tweets using Python
 This example demonstrates how to extract historical tweets for a particular brand like “nike” or “apple”
 We shall make a REST API call to twitter to extract tweets
 This data can be further used to perform sentiment analysis for a particular brand on Twitter
Slide 7 www.edureka.co/python
Big Data
 Lots of Data (Terabytes or Petabytes)
 Big data is the term for a collection of data
sets so large and complex that it becomes
difficult to process using on-hand database
management tools or traditional data
processing applications
 The challenges include capture, curation,
storage, search, sharing, transfer, analysis,
and visualization
cloud
tools
statistics
No SQL
compression
storage
support
database
analize
information
terabytes
processing
mobile
Big Data
Slide 8 www.edureka.co/python
Un-Structured Data is Exploding
Complex, Unstructured
Relational
 2500 exabytes of new information in 2012 with internet as primary driver
 Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year
Slide 9 www.edureka.co/python
Big Data Scenarios : Hospital Care
Hospitals are analyzing medical data and patient
records to predict those patients that are likely to seek
readmission within a few months of discharge. The
hospital can then intervene in hopes of preventing
another costly hospital stay
Medical diagnostics company analyzes millions of lines
of data to develop first non-intrusive test for
predicting coronary artery disease. To do so,
researchers at the company analyzed over 100 million
gene samples to ultimately identify the 23 primary
predictive genes for coronary artery disease
Slide 10 www.edureka.co/python
http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png
Amazon has an unrivalled bank of data on online
consumer purchasing behaviour that it can mine from
its 152 million customer accounts
Amazon also uses Big Data to monitor, track and secure its
1.5 billion items in its retail store that are laying around it
200 fulfilment centres around the world. Amazon stores the
product catalogue data in S3
S3 can write, read and delete objects up to 5 TB of data
each. The catalogue stored in S3 receives more than 50
million updates a week and every 30 minutes all data
received is crunched and reported back to the different
warehouses and the website
Big Data Scenarios : Amazon.com
Slide 11 www.edureka.co/python
http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png
Netflix uses 1 petabyte to store the videos for streaming
BitTorrent Sync has transferred over 30 petabytes of data
since its pre-alpha release in January 2013
The 2009 movie Avatar is reported to have taken over 1
petabyte of local storage at Weta Digital for the rendering
of the 3D CGI effects
One petabyte of average MP3-encoded songs (for mobile,
roughly one megabyte per minute), would require 2000
years to play
Big Data Scenarios: NetFlix
Slide 12 www.edureka.co/python
 IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
Web
logs
Images
Videos
Audios
Sensor
Data
Volume Velocity Variety
IBM’s Definition
Slide 13 www.edureka.co/python
Hadoop for Big Data
 Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
 It is an Open-source Data Management with scale-out storage & distributed processing
Slide 14 www.edureka.co/python
Hadoop and MapReduce
Hadoop is a system for large scale data processing
It has two main components:
 HDFS – Hadoop Distributed File System (Storage)
» Distributed across “nodes”
» Natively redundant
» NameNode tracks locations
 MapReduce (Processing)
» Splits a task across processors
» “near” the data & assembles results
» Self-Healing, High Bandwidth
» Clustered storage
» Job Tracker manages the Task Trackers
Map-Reduce
Key Value
Slide 15 www.edureka.co/python
PyDoop – Hadoop with Python
 PyDoop package provides a Python API for Hadoop MapReduce and
HDFS
 PyDoop has several advantages over Hadoop’s built-in solutions for
Python programming, i.e., Hadoop Streaming and Jython
 One of the biggest advantage of PyDoop is it’s HDFS API. This
allows you to connect to an HDFS installation, read and write files, and
get information on files, directories and global file system properties
 The MapReduce API of PyDoop allows you to solve many complex
problems with minimal programming efforts. Advance MapReduce
concepts such as ‘Counters’ and ‘Record Readers’ can be implemented
in Python using PyDoop
Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with
PyDoop package
Slide 16 www.edureka.co/python
Demo: Word Count using Hadoop Streaming API
 The example shows the simple word count application written in Python
 We shall use Hadoop Streaming APIs to run MapReduce code written in Python
 Word Count application can be used to index text documents/files for a given “search query”
Slide 17 www.edureka.co/python
Python and Data Science
 Python is an excellent choice for Data
Scientist to do his day-to-day activities as it
provides libraries to do all these things
 Python has a diverse range of open source
libraries for just about everything that a
Data Scientist does in his day-to-day work
 Python and most of its libraries are both
open source and free
The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing and
manipulating data, computing statistics and , creating visual reports on that data, building predictive and
explanatory models, evaluating these models on additional data, integrating models into production systems, etc.
Slide 18 www.edureka.co/python
SciPy.org
SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and
engineering.
NumPy
Base N-dimensional
array package
IPython
Enhanced Interactive
Console
SciPy library
Base N-dimensional
array package
Sympy
Symbolic mathematics
Matplotlib
Comprehensive 2D
Plotting
pandas
Data structures
and analysis
Slide 19 www.edureka.co/python
Demo: Zombie Invasion Model
This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie
invasion", using the equations specified by Philip Munz.
The system is given as:
dS/dt = P - B*S*Z - d*S
dZ/dt = B*S*Z + G*R - A*S*Z
dR/dt = d*S + A*S*Z - G*R
There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial
conditions.
This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R].
Where:
S: the number of susceptible victims
Z: the number of zombies
R: the number of people "killed”
P: the population birth rate
d: the chance of a natural death
B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie)
G: the chance a dead person is resurrected into a zombie
A: the chance a zombie is totally destroyed
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
Slide 20 www.edureka.co/python
How it Works?
Slide 21Slide 21 www.edureka.co/python
Course Topics
 Module 1
» Getting Started with Python
 Module 2
» Sequences and File Operations
 Module 3
» Deep Dive - Functions, Sorting, Errors and
Exception Handling
 Module 4
» Regular Expressions, its Packages and Object
Oriented Programming in Python
 Module 5
» Debugging, Databases and Project Skeletons
Module 6
» Machine Learning Using Python – I
Module 7
» Machine Learning Using Python – II
Module 8
» Introduction to Hadoop
 Module 9
» Hadoop and Python
 Module 10
» Web Scraping using Python and Project Work
Questions
Slide 22 www.edureka.co/pythonTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Slide 23 Course Url

Más contenido relacionado

La actualidad más candente

Big data analytics project report
Big data analytics project reportBig data analytics project report
Big data analytics project reportManav Deshmukh
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Universitat Politècnica de Catalunya
 
Federated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesFederated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesAlAtfat
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Web scraping
Web scrapingWeb scraping
Web scrapingSelecto
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentationBushra Jbawi
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big dataPrashant Sharma
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 

La actualidad más candente (20)

Big data analytics project report
Big data analytics project reportBig data analytics project report
Big data analytics project report
 
What is big data?
What is big data?What is big data?
What is big data?
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
 
Cnn
CnnCnn
Cnn
 
Case study
Case studyCase study
Case study
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Federated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesFederated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devices
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Big data storage
Big data storageBig data storage
Big data storage
 
Data visualization
Data visualizationData visualization
Data visualization
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 

Similar a Python for Big Data Analytics

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...Edureka!
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th juneEdureka!
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big DataEdureka!
 
Python PPT
Python PPTPython PPT
Python PPTEdureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Codemotion
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooksNatalino Busa
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageEdureka!
 
What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?SofiaCarter4
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and researchkchine3
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
6 Open Source Data Science Projects To Impress Your Interviewer
6 Open Source Data Science Projects To Impress Your Interviewer6 Open Source Data Science Projects To Impress Your Interviewer
6 Open Source Data Science Projects To Impress Your InterviewerPrachiVarshney7
 
Researh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonResearh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonWaternomics
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with pythonUmair ul Hassan
 

Similar a Python for Big Data Analytics (20)

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th june
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
 
Python PPT
Python PPTPython PPT
Python PPT
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
 
What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Python
Python Python
Python
 
6 Open Source Data Science Projects To Impress Your Interviewer
6 Open Source Data Science Projects To Impress Your Interviewer6 Open Source Data Science Projects To Impress Your Interviewer
6 Open Source Data Science Projects To Impress Your Interviewer
 
Researh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonResearh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-python
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with python
 

Más de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Más de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Python for Big Data Analytics

  • 1. Python For BIG DATA ANALYTICS View Mastering Python course details at http://www.edureka.co/python
  • 2. Slide 2 www.edureka.co/python At the end of this module, you will be able to Objectives  Understand Python  Understand Web Scrapping example using Python  Understand PyDoop: Python API for Hadoop  Implement Word Count example in Pydoop  Integrate Data Science with Python  Implement Zombie Invasion modeling using Python
  • 3. Slide 3 www.edureka.co/python Why Python?  Python is a great language for the beginner programmers since it is easy-to-learn and easy-to-maintain.  Python’s biggest strength is that the bulk of it’s library is portable. It also supports GUI Programming and can be used to create Applications portable on Mac, Windows and Unix X-Windows system.  With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics.
  • 5. Slide 5 www.edureka.co/python Demo: Web Scraping using Python  This example demonstrates how to scrape basic financial data from IMDB webpage  We shall use open source web scraping framework for Python called Beautiful Soup to crawl and extract data from webpages  Scraping is used for a wide range of purposes, from data mining to monitoring and automated testing
  • 6. Slide 6 www.edureka.co/python Demo: Collecting Tweets using Python  This example demonstrates how to extract historical tweets for a particular brand like “nike” or “apple”  We shall make a REST API call to twitter to extract tweets  This data can be further used to perform sentiment analysis for a particular brand on Twitter
  • 7. Slide 7 www.edureka.co/python Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization cloud tools statistics No SQL compression storage support database analize information terabytes processing mobile Big Data
  • 8. Slide 8 www.edureka.co/python Un-Structured Data is Exploding Complex, Unstructured Relational  2500 exabytes of new information in 2012 with internet as primary driver  Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year
  • 9. Slide 9 www.edureka.co/python Big Data Scenarios : Hospital Care Hospitals are analyzing medical data and patient records to predict those patients that are likely to seek readmission within a few months of discharge. The hospital can then intervene in hopes of preventing another costly hospital stay Medical diagnostics company analyzes millions of lines of data to develop first non-intrusive test for predicting coronary artery disease. To do so, researchers at the company analyzed over 100 million gene samples to ultimately identify the 23 primary predictive genes for coronary artery disease
  • 10. Slide 10 www.edureka.co/python http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png Amazon has an unrivalled bank of data on online consumer purchasing behaviour that it can mine from its 152 million customer accounts Amazon also uses Big Data to monitor, track and secure its 1.5 billion items in its retail store that are laying around it 200 fulfilment centres around the world. Amazon stores the product catalogue data in S3 S3 can write, read and delete objects up to 5 TB of data each. The catalogue stored in S3 receives more than 50 million updates a week and every 30 minutes all data received is crunched and reported back to the different warehouses and the website Big Data Scenarios : Amazon.com
  • 11. Slide 11 www.edureka.co/python http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png Netflix uses 1 petabyte to store the videos for streaming BitTorrent Sync has transferred over 30 petabytes of data since its pre-alpha release in January 2013 The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects One petabyte of average MP3-encoded songs (for mobile, roughly one megabyte per minute), would require 2000 years to play Big Data Scenarios: NetFlix
  • 12. Slide 12 www.edureka.co/python  IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ Web logs Images Videos Audios Sensor Data Volume Velocity Variety IBM’s Definition
  • 13. Slide 13 www.edureka.co/python Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model  It is an Open-source Data Management with scale-out storage & distributed processing
  • 14. Slide 14 www.edureka.co/python Hadoop and MapReduce Hadoop is a system for large scale data processing It has two main components:  HDFS – Hadoop Distributed File System (Storage) » Distributed across “nodes” » Natively redundant » NameNode tracks locations  MapReduce (Processing) » Splits a task across processors » “near” the data & assembles results » Self-Healing, High Bandwidth » Clustered storage » Job Tracker manages the Task Trackers Map-Reduce Key Value
  • 15. Slide 15 www.edureka.co/python PyDoop – Hadoop with Python  PyDoop package provides a Python API for Hadoop MapReduce and HDFS  PyDoop has several advantages over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython  One of the biggest advantage of PyDoop is it’s HDFS API. This allows you to connect to an HDFS installation, read and write files, and get information on files, directories and global file system properties  The MapReduce API of PyDoop allows you to solve many complex problems with minimal programming efforts. Advance MapReduce concepts such as ‘Counters’ and ‘Record Readers’ can be implemented in Python using PyDoop Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with PyDoop package
  • 16. Slide 16 www.edureka.co/python Demo: Word Count using Hadoop Streaming API  The example shows the simple word count application written in Python  We shall use Hadoop Streaming APIs to run MapReduce code written in Python  Word Count application can be used to index text documents/files for a given “search query”
  • 17. Slide 17 www.edureka.co/python Python and Data Science  Python is an excellent choice for Data Scientist to do his day-to-day activities as it provides libraries to do all these things  Python has a diverse range of open source libraries for just about everything that a Data Scientist does in his day-to-day work  Python and most of its libraries are both open source and free The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing and manipulating data, computing statistics and , creating visual reports on that data, building predictive and explanatory models, evaluating these models on additional data, integrating models into production systems, etc.
  • 18. Slide 18 www.edureka.co/python SciPy.org SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. NumPy Base N-dimensional array package IPython Enhanced Interactive Console SciPy library Base N-dimensional array package Sympy Symbolic mathematics Matplotlib Comprehensive 2D Plotting pandas Data structures and analysis
  • 19. Slide 19 www.edureka.co/python Demo: Zombie Invasion Model This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie invasion", using the equations specified by Philip Munz. The system is given as: dS/dt = P - B*S*Z - d*S dZ/dt = B*S*Z + G*R - A*S*Z dR/dt = d*S + A*S*Z - G*R There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial conditions. This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R]. Where: S: the number of susceptible victims Z: the number of zombies R: the number of people "killed” P: the population birth rate d: the chance of a natural death B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie) G: the chance a dead person is resurrected into a zombie A: the chance a zombie is totally destroyed
  • 20. LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate Slide 20 www.edureka.co/python How it Works?
  • 21. Slide 21Slide 21 www.edureka.co/python Course Topics  Module 1 » Getting Started with Python  Module 2 » Sequences and File Operations  Module 3 » Deep Dive - Functions, Sorting, Errors and Exception Handling  Module 4 » Regular Expressions, its Packages and Object Oriented Programming in Python  Module 5 » Debugging, Databases and Project Skeletons Module 6 » Machine Learning Using Python – I Module 7 » Machine Learning Using Python – II Module 8 » Introduction to Hadoop  Module 9 » Hadoop and Python  Module 10 » Web Scraping using Python and Project Work
  • 22. Questions Slide 22 www.edureka.co/pythonTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions