SlideShare una empresa de Scribd logo
1 de 30
Developer Series
Join our webinar series to expand your developer skills!
Security Tuesdays
Data Science and AI Wednesdays
Cloud Native and Red Hat OpenShift
Thursdays
meetup.com/IBM-Cloud-MEA
Introduction to Big Data
analysis & Machine Learning
in Python with PySpark
—
Anam Mahmood
Developer Advocate, UAE
Hashim Noor
Client Technical Specialist, UAE
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Get started at: https://ibm.biz/BdfPQ5
Let’s get started
• Sign up/Log in to your IBM Cloud
Account
https://ibm.biz/BdfPQ5
• Follow along for the hands-on:
https://github.com/Anam-
Mahmood/Introduction-to-Big-Data-
analysis-Machine-Learning-in-Python-
with-PySpark/blob/main/README.md
3
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
4
chat with everyone!
Q&A here!
Follow us to get notified
about upcoming events
View more info about this event
Workshop Resources
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Survey
https://ibm.biz/BdfPQN
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Agenda
6
What is Big Data? 07
The 5 V’s of Big Data 10
Big Data Digital transformation 11
What is Data Science? 12
Subsets of AI 13
Supervised and Unsupervised Learning 14
What is Apache Spark? 15
Components of Apache Spark 16
Why Apache Spark 17
Features 18
Use Cases 19
Hands-on 21
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
"Data is the new oil. It’s
valuable, but if
unrefined it cannot
really be used."
-Clive Humby
7
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
The amount of data in the world was
estimated to be
8
44
zettabytes
at the dawn of 2020.
https://hoteltechreport.com/news/big-data-examples
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Big Data
Dynamic, large and disparate volumes
of data being created by people, tools,
and machines.
https://towardsdatascience.com/what-is-big-data-lets-answer-this-question-933b94709caf
9
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Velocity
Velocity is the
speed at which data
accumulates.
The 5 V’s of Big Data
10
Volume
Volume is the scale
of the data, or the
increase in the
amount of data
stored.
Variety
Variety is the
diversity of the data.
Variety also reflects
that data comes
from different
sources
Veracity
Veracity is the
quality and origin of
data, and its
conformity to facts
and accuracy.
Value
Value is our ability
and need to turn
data into value.
Value isn't just
profit.
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
How big data is driving digital transformation?
11
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
What is Data Science?
Data science is an interdisciplinary
field leveraging insights from
many fields to extract knowledge
from data.
12
https://blog.finxter.com/artificial-intelligence-machine-learning-deep-learning-and-data-science-whats-the-difference/
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
The Subsets of AI
13
https://blog.devitpl.com/learning-machine-learning/
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
14
https://i.vas3k.ru/7w1.jpg
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Apache Spark
15
 Spark is an Apache project advertised as
“lightning-fast cluster computing”.
 Spark provides a faster and more general data
processing platform.
 Spark lets you run programs up to 100x faster in
memory, or 10x faster on disk, than Hadoop.
 Spark also makes it possible to write code more
quickly as you have over 80 high-level operators
at your disposal.
https://en.wikipedia.org/wiki/Apache_Spark
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Components of Apache Spark
16
Spark SQL
Spark SQL is
Apache Spark’s
module for working
with structured data.
Spark
Streaming
This component
allows Spark to
process real-time
streaming data.
Data can be
ingested from many
sources like Kafka,
etc.
MLlib
This library contains
a wide array of
machine learning
algorithms-
classification,
regression,
clustering, and
collaborative
filtering.
GraphX
Spark also comes
with a library to
manipulate graph
databases and
perform
computations called
GraphX.
Apache Spark
Core
It provides in-
memory
computing and
referencing
datasets in
external storage
systems.
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
 Python Pandas is intended for quick and easy data manipulation tasks.
 Pandas Dataframe help in:
 Data Manipulation tasks such as sorting, merging data frame.
 Modifying by updating, adding and deleting columns from a data frame.
 Cleaning and data preparation by imputing missing data or NANs.
 Pandas dataframe does not support parallelization.
Why Apache Spark?
17
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
 Fast processing – Big data is characterized by volume, variety, velocity, and veracity which needs to be processed at a
higher speed.
 Flexibility – Apache Spark supports multiple languages and allows the developers to write applications in Java, Scala, R, or
Python.
 In-memory computing – Spark stores the data in the RAM of servers which allows quick access and in turn accelerates the
speed of analytics.
 Real-time processing – Spark can process real-time streaming data.
 Better analytics – Apache Spark consists of a rich set of SQL queries, machine learning algorithms, complex analytics, etc.
Features
18
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Media & Entertainment
Industry
Apache Spark is used in
the gaming industry to
identify patterns from the
real-time in-game events
and respond.
Travel Industry
TripAdvisor, a leading
travel website that helps
users plan a perfect trip is
using Apache Spark to
speed up its personalized
customer
recommendations.
Healthcare
Many healthcare providers
are using Apache Spark to
analyse patient records
along with past clinical
data to identify which
patients are likely to face
health issues after being
discharged from the clinic.
E-Commerce Industry
Shopify wanted to analyse
the kinds of products its
customers were selling to
identify eligible stores with
which it can tie up - for a
business partnership.
Use cases
19
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
 Banks are using Spark to access and analyse the social media profiles, call recordings, complaint logs, emails, forum
discussions and a lot more.
 Architecture Flow
1. User logs into Watson Studio, creates a project and initiates and instance of Cloud Object Storage and Notebook.
2. User uploads the data file in the CSV and text format to the object storage.
3. User creates a notebook from the URL provided.
4. Then enter the credentials of the dataset.
5. Run the notebook.
Use Case – Finance Industry
20
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Hands-on
• Sign up/Log in to your IBM Cloud
Account
https://ibm.biz/BdfPQ5
• Follow along for the hands-on:
https://developer.ibm.com/tutoria
ls/getting-started-with-pyspark/
21
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Summary
22
 Big Data is Dynamic, large and disparate volumes of data being created by people, tools, and machines.
 The 5 V’s of Big Data, velocity, volume, variety, veracity and value.
 Data Science and the difference between AL, ML and Deep Learning.
 The five components of Apache spark and it’s features.
 Use cases of Apache spark
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
Survey
https://ibm.biz/BdfPQN
Get started at: https://ibm.biz/BdfPQ5
Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
24
Conference
begins
June 8
Digital Developer Conference
Learn industry-recognized data and AI skills from IBM experts,
partners, and the worldwide community
Register for free and get ready to build smart and secure data and
AI solutions with a focus on experimentation, eminence, and
education. The IBM Developer ecosystem of experts, enthusiasts,
and client partners have created an experience to explore AI-
centric solutions and platforms—dedicated to the worldwide
community of developers, data scientists, and academia. You will
also find regional and topical events where you can meet like-
minded people—as well as a data science course with a badge.
Learn more and register for free at ibm.biz/devcon-ai.
On-demand replays available after the event
Data & AI
What is
Call for Code?
IBM Developer / © 2021 IBM Corporation 25
Call for Code invites
developers and problem
solvers around the world
to build and contribute
to sustainable Open-
Source software
solutions, that address
social and humanitarian
issues, while ensuring
solutions are deployed to
make a real difference.
Call for Code has become
the only global, always-on
tech for good Open-Source
platform to deploy & scale
top projects through a host
of offerings:
Sponsors
Why join?
 Skill Building
 Social Good
 Ideas toAction
 Community (400K+
developers, 179 nations,
15K+ applications)
Awards
200K USD for Global
Challenge winner
10K USD for
University Challenge
winner
5K USD for Middle East
and Africa region winner
Clean Water and Sanitation
Water is the natural resource that is most
threatened by climate change and a prerequisite
for life on earth. From intelligent solutions for
small farmers to recycling showers, technology
can make a significant impact on the availability
of water and its consumption.
Zero Hunger
135 million people suffer from acute hunger, with
climate change a major contributing factor.
Technology can help grow more crops in areas
on the edge of drought or quickly distribute
perishables from small stores to local homeless
shelters.
Responsible Production and
Consumption
Worldwide consumption and production drives
the global economy yet is inextricably linked to
the environment. Technology can help make
recommendations on energy efficiency to
highlighting the carbon footprint of online
purchases.
26
IBM Developer / © 2021 IBM Corporation
2021 Call for Code fighting CLIMATE CHANGE
Resources and Events
IBM Developer / © 2021 IBM Corporation
Call for Code Main Page: ibm.biz/callforcode
FAQs: callforcode.org/faq/
Starter Kits: Zero Hunger, CleanWater and
Sanitation, Responsible production and green
consumption
Office hours Every Monday
from 2 PM to 3 PM GST (Dubai time)
crowdcast.io/e/mea_cfc_officehours
Join our MEA Call for Code Slack channel:
ibm.biz/mea_cfc_slack_channel
Get notified for upcoming Call for Code events by
following our Meetup Page:
meetup.com/IBM-Cloud-MEA/
Go to our Crowdcast page for replays of previous
events that could help you:
crowdcast.io/ibmdeveloper
27
Resources
28
IBM Developer: https://developer.ibm.com/
Meetup: https://www.meetup.com/IBM-Cloud-MEA/
Learning:
https://cognitiveclass.ai/
https://learn.ibm.com/
Big Data Fundamentals: https://cognitiveclass.ai/learn/big-data
Spark Fundamentals: https://cognitiveclass.ai/learn/spark
Thank you
Anam Mahmood
Developer Advocate, UAE
anam.mahmood@ibm.com
Hashim Noor
ClientTechnical Specialist, UAE
hashim.noor1@ibm.com
Get started at: https://ibm.biz/BdfEMY
29
30

Más contenido relacionado

La actualidad más candente

Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
 
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMSBig Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMSMatt Stubbs
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group MeetVinoth Kannan
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)DataWorks Summit
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseDataWorks Summit
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public CloudIMC Institute
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationDataWorks Summit
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDataWorks Summit
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageWes McKinney
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialMarcos Quezada
 
Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...TigerGraph
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 

La actualidad más candente (20)

Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMSBig Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group Meet
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use case
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
Practical advice to build a data driven company
Practical advice to build a data driven companyPractical advice to build a data driven company
Practical advice to build a data driven company
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public Cloud
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
 
Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 

Similar a Introduction to pyspark new

Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelKangaroot
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...DataWorks Summit
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumStarttech Ventures
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source frameworkedunextgen
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework edunextgen
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefitsJohan Picard
 

Similar a Introduction to pyspark new (20)

Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff Scheel
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source framework
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 

Último

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Introduction to pyspark new

  • 1. Developer Series Join our webinar series to expand your developer skills! Security Tuesdays Data Science and AI Wednesdays Cloud Native and Red Hat OpenShift Thursdays meetup.com/IBM-Cloud-MEA
  • 2. Introduction to Big Data analysis & Machine Learning in Python with PySpark — Anam Mahmood Developer Advocate, UAE Hashim Noor Client Technical Specialist, UAE Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation Get started at: https://ibm.biz/BdfPQ5
  • 3. Let’s get started • Sign up/Log in to your IBM Cloud Account https://ibm.biz/BdfPQ5 • Follow along for the hands-on: https://github.com/Anam- Mahmood/Introduction-to-Big-Data- analysis-Machine-Learning-in-Python- with-PySpark/blob/main/README.md 3 Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 4. 4 chat with everyone! Q&A here! Follow us to get notified about upcoming events View more info about this event Workshop Resources Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 5. Survey https://ibm.biz/BdfPQN Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 6. Agenda 6 What is Big Data? 07 The 5 V’s of Big Data 10 Big Data Digital transformation 11 What is Data Science? 12 Subsets of AI 13 Supervised and Unsupervised Learning 14 What is Apache Spark? 15 Components of Apache Spark 16 Why Apache Spark 17 Features 18 Use Cases 19 Hands-on 21 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 7. "Data is the new oil. It’s valuable, but if unrefined it cannot really be used." -Clive Humby 7 Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 8. The amount of data in the world was estimated to be 8 44 zettabytes at the dawn of 2020. https://hoteltechreport.com/news/big-data-examples Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 9. Big Data Dynamic, large and disparate volumes of data being created by people, tools, and machines. https://towardsdatascience.com/what-is-big-data-lets-answer-this-question-933b94709caf 9 Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 10. Velocity Velocity is the speed at which data accumulates. The 5 V’s of Big Data 10 Volume Volume is the scale of the data, or the increase in the amount of data stored. Variety Variety is the diversity of the data. Variety also reflects that data comes from different sources Veracity Veracity is the quality and origin of data, and its conformity to facts and accuracy. Value Value is our ability and need to turn data into value. Value isn't just profit. Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 11. How big data is driving digital transformation? 11 Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 12. What is Data Science? Data science is an interdisciplinary field leveraging insights from many fields to extract knowledge from data. 12 https://blog.finxter.com/artificial-intelligence-machine-learning-deep-learning-and-data-science-whats-the-difference/ Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 13. The Subsets of AI 13 https://blog.devitpl.com/learning-machine-learning/ Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 14. 14 https://i.vas3k.ru/7w1.jpg Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 15. Apache Spark 15  Spark is an Apache project advertised as “lightning-fast cluster computing”.  Spark provides a faster and more general data processing platform.  Spark lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop.  Spark also makes it possible to write code more quickly as you have over 80 high-level operators at your disposal. https://en.wikipedia.org/wiki/Apache_Spark Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 16. Components of Apache Spark 16 Spark SQL Spark SQL is Apache Spark’s module for working with structured data. Spark Streaming This component allows Spark to process real-time streaming data. Data can be ingested from many sources like Kafka, etc. MLlib This library contains a wide array of machine learning algorithms- classification, regression, clustering, and collaborative filtering. GraphX Spark also comes with a library to manipulate graph databases and perform computations called GraphX. Apache Spark Core It provides in- memory computing and referencing datasets in external storage systems. Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 17.  Python Pandas is intended for quick and easy data manipulation tasks.  Pandas Dataframe help in:  Data Manipulation tasks such as sorting, merging data frame.  Modifying by updating, adding and deleting columns from a data frame.  Cleaning and data preparation by imputing missing data or NANs.  Pandas dataframe does not support parallelization. Why Apache Spark? 17 Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 18.  Fast processing – Big data is characterized by volume, variety, velocity, and veracity which needs to be processed at a higher speed.  Flexibility – Apache Spark supports multiple languages and allows the developers to write applications in Java, Scala, R, or Python.  In-memory computing – Spark stores the data in the RAM of servers which allows quick access and in turn accelerates the speed of analytics.  Real-time processing – Spark can process real-time streaming data.  Better analytics – Apache Spark consists of a rich set of SQL queries, machine learning algorithms, complex analytics, etc. Features 18 Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 19. Media & Entertainment Industry Apache Spark is used in the gaming industry to identify patterns from the real-time in-game events and respond. Travel Industry TripAdvisor, a leading travel website that helps users plan a perfect trip is using Apache Spark to speed up its personalized customer recommendations. Healthcare Many healthcare providers are using Apache Spark to analyse patient records along with past clinical data to identify which patients are likely to face health issues after being discharged from the clinic. E-Commerce Industry Shopify wanted to analyse the kinds of products its customers were selling to identify eligible stores with which it can tie up - for a business partnership. Use cases 19 Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 20.  Banks are using Spark to access and analyse the social media profiles, call recordings, complaint logs, emails, forum discussions and a lot more.  Architecture Flow 1. User logs into Watson Studio, creates a project and initiates and instance of Cloud Object Storage and Notebook. 2. User uploads the data file in the CSV and text format to the object storage. 3. User creates a notebook from the URL provided. 4. Then enter the credentials of the dataset. 5. Run the notebook. Use Case – Finance Industry 20 Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 21. Hands-on • Sign up/Log in to your IBM Cloud Account https://ibm.biz/BdfPQ5 • Follow along for the hands-on: https://developer.ibm.com/tutoria ls/getting-started-with-pyspark/ 21 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 22. Summary 22  Big Data is Dynamic, large and disparate volumes of data being created by people, tools, and machines.  The 5 V’s of Big Data, velocity, volume, variety, veracity and value.  Data Science and the difference between AL, ML and Deep Learning.  The five components of Apache spark and it’s features.  Use cases of Apache spark Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 23. Survey https://ibm.biz/BdfPQN Get started at: https://ibm.biz/BdfPQ5 Introduction to Big Data analysis & Machine Learning in Python with PySpark, May 26 2021/ © 2021 IBM Corporation
  • 24. 24 Conference begins June 8 Digital Developer Conference Learn industry-recognized data and AI skills from IBM experts, partners, and the worldwide community Register for free and get ready to build smart and secure data and AI solutions with a focus on experimentation, eminence, and education. The IBM Developer ecosystem of experts, enthusiasts, and client partners have created an experience to explore AI- centric solutions and platforms—dedicated to the worldwide community of developers, data scientists, and academia. You will also find regional and topical events where you can meet like- minded people—as well as a data science course with a badge. Learn more and register for free at ibm.biz/devcon-ai. On-demand replays available after the event Data & AI
  • 25. What is Call for Code? IBM Developer / © 2021 IBM Corporation 25 Call for Code invites developers and problem solvers around the world to build and contribute to sustainable Open- Source software solutions, that address social and humanitarian issues, while ensuring solutions are deployed to make a real difference. Call for Code has become the only global, always-on tech for good Open-Source platform to deploy & scale top projects through a host of offerings: Sponsors Why join?  Skill Building  Social Good  Ideas toAction  Community (400K+ developers, 179 nations, 15K+ applications) Awards 200K USD for Global Challenge winner 10K USD for University Challenge winner 5K USD for Middle East and Africa region winner
  • 26. Clean Water and Sanitation Water is the natural resource that is most threatened by climate change and a prerequisite for life on earth. From intelligent solutions for small farmers to recycling showers, technology can make a significant impact on the availability of water and its consumption. Zero Hunger 135 million people suffer from acute hunger, with climate change a major contributing factor. Technology can help grow more crops in areas on the edge of drought or quickly distribute perishables from small stores to local homeless shelters. Responsible Production and Consumption Worldwide consumption and production drives the global economy yet is inextricably linked to the environment. Technology can help make recommendations on energy efficiency to highlighting the carbon footprint of online purchases. 26 IBM Developer / © 2021 IBM Corporation 2021 Call for Code fighting CLIMATE CHANGE
  • 27. Resources and Events IBM Developer / © 2021 IBM Corporation Call for Code Main Page: ibm.biz/callforcode FAQs: callforcode.org/faq/ Starter Kits: Zero Hunger, CleanWater and Sanitation, Responsible production and green consumption Office hours Every Monday from 2 PM to 3 PM GST (Dubai time) crowdcast.io/e/mea_cfc_officehours Join our MEA Call for Code Slack channel: ibm.biz/mea_cfc_slack_channel Get notified for upcoming Call for Code events by following our Meetup Page: meetup.com/IBM-Cloud-MEA/ Go to our Crowdcast page for replays of previous events that could help you: crowdcast.io/ibmdeveloper 27
  • 28. Resources 28 IBM Developer: https://developer.ibm.com/ Meetup: https://www.meetup.com/IBM-Cloud-MEA/ Learning: https://cognitiveclass.ai/ https://learn.ibm.com/ Big Data Fundamentals: https://cognitiveclass.ai/learn/big-data Spark Fundamentals: https://cognitiveclass.ai/learn/spark
  • 29. Thank you Anam Mahmood Developer Advocate, UAE anam.mahmood@ibm.com Hashim Noor ClientTechnical Specialist, UAE hashim.noor1@ibm.com Get started at: https://ibm.biz/BdfEMY 29
  • 30. 30