2. contents
INTRODUCTION : Introduction to Data Science with Python Installing Python, Programming
PYTHON BASICS : (basic syntax, data structures data objects, math, comparison operators , condition statements ,
loops , list, tuple , set , dicts, functions )
NUMPY PACKAGE : Array, selecting data, slicing, array manipulation, stacking, splitting arrays
PANDAS PACKAGE : overview, series, and data frame, data manipulation
PYTHON advanced : (treating missing values, removing duplicates, grouping, data mugging with pandas
histogram)
PYTHON advanced : visualization with matplotlib
EDA : data cleaning, data wrangling
3. What is Python?
It is used for:
• web development (server-side),
• software development,
• mathematics,
• system scripting.
What can Python do?
• Python can be used on a server to create web applications.
• Python can be used alongside software to create workflows.
• Python can connect to database systems. It can also read and modify files.
• Python can be used to handle big data and perform complex mathematics.
• Python can be used for rapid prototyping, or for production-ready software development.
5. Python basics
Basic Syntax
Python syntax is highly readable.
Statements in Python typically end with a new line.
() is used to denote line continuation.
Python uses indentation to indicate a block of code and gives an error if indentation is skipped.
All the continuous lines indented with the same number of spaces form a block.
Semicolon ( ; ) allows multiple statements on a single line.
A group of individual statements used to make a single code block is called suites.
6. Python Comments
Python allows in-code documentation by using comments.
We can comment portion of the code in two ways.
Starting a line with a #: If a #is used at the beginning of a line, Python will
consider the rest of the line as a comment.
Example
7. 2. Using docstrings :
2. Using docstrings: Python docstrings provide extended documentation capabilities. It can be a single
line or multiple lines comments. Line or lines to be commented on are started and ended with triple quotes.
Example:
8. Python Data Types
Following are the standard data types in python
List
Tuple
Set
Dictionary
9. LIST
List is a compound data type.
It contains items separated by commas and enclosed within square brackets ([]).
Items belonging to a list can be of different data type.
List is ordered and changeable.
List allow duplicate data.
List can be created by using the list constructor list().
append() object method is used to add an item to the list.
remove() object method is used to remove a specific item from the list.
len() method is used to get a count of elements in the list.
11. Tuples
•A tuple is an ordered and unchangeable list or collection.
•In Python tuples are written with round brackets.
•Items in the tuple are separated by commans.
•Can use tuple() constructor to make a tuple.
•len() method returns the number of items in a tuple.
13. Dictionaries
•A dictionary is an unordered collection.
•It is changeable and indexed using the key.
•Dictionaries are enclosed by curly braces ({ }) and values can be assigned and accessed using
square braces ([])
•len() method to returns the number of items.
•the dict() constructor can be used to make a dictionary
•We can add an item to the dictionary by using a new index key and assigning a value to it.
•Elements are stored in a dictionary in a key-value pair and the pair is unique.
•We can remove item from a dictionary using the del() function.
15. Sets
A set is an unordered collection
It is iterable, mutable and has no duplicate elements.
Sets are enclosed by curly braces ({ }).
A set can be created using the set constructor.
Elements can be added to a set using the add() method.
A frozen set is an immutable object which can be created using the frozenset constructor
17. Python Operators
Operators are the constructs used to perform operations on variables and values.
Operator Types
Arithmetic Operators
Comparison or Relational Operators
Assignment Operators
Logical Operators
18. Arithmetic Operators
Arithmetic Operators are used with numeric values to perform common
mathematical operations.
Operators are :
+ Addition
- Subtraction
/ Division
* Multiplication
% Modulus
** Exponentiation
// Floor Division
20. Comparison or Relational Operators
Comparison operators are used to compare two values.
Operators are:
== Equal
!= Not Equal
> Greater than
< Less Than
<> Not Equal
!= Not Equal
>= Greater Than Equal
<= Less Than Equal
22. Assignment Operator
Assignment operators are used to assign values to variables.
Operators are:
= assigns a value to a variable
+= adds the right operand to the left operand and assigns the result to the left operand
-= subtracts the right operand from the left operand and assigns the result to the left operand
*= multiply the right operand from the left operand and assign the result to the left operand
/= divides the left operand with the right operand and assigns the result to the left operand
%= returns the remainder when the left operand is divided by the right operand and assigns the
result to the left operand.
23. Assignment Operator
//= divides left operand with the right operand and assign the floor value result to left operand.
**= calculate exponent value using operands and assign the result to the left operand.
&= performs AND on operands and assign value to left operand
|= performs OR on operands and assign value to left operand
^= performs bitwise XOR on operands and assign value to left operand.
>>= performs bitwise right shift on operands and store values on left operand
<<= performs bitwise left shift on operands and store values on left operand
25. Logical Operators
These operators are used to combine conditional statements
Operators are:
and - returns true if both the statements are true
or - returns true if either of the statement is true not reverses the result
26. LOOPS AND CONDITIONS
LOOPS AND CONDITIONS
Conditional Constructs
Conditional constructs are used to perform different computations or actions depending on whether the
condition evaluates to true or false. The conditions usually uses comparisons and arithmetic expressions with
variables. These expressions are evaluated to the Boolean values True or False. The statements for the
decision taking are called conditional statements, alternatively known as conditional expressions or
constructs.
Types of Conditional Statements
To understand the use of different conditional constructs in Python.
If Statement
If .. Else Statement
If .. Elseif .. else statement
Nested if statement
27. If statement
If statement
The if statements in Python. It is made up of three main components:
the if KEYWORD itself,
an EXPRESSION that is tested for its true value,
a CODE SUITE to execute if the expression evaluates to non zero or true.
28. if .. else statement
Like other languages, Python features an else statement that can be paired with an if statement.
The else statement identifies a block of code to be executed if the conditional expression of the if
statement resolves to a false Boolean value.
29. If .. elif .. else statement (Chained
conditions)
elif is the Python else-if statement. It allows one to check multiple expressions for truth value and execute a
block of code as soon as one of the conditions evaluates to be true. Like the else statement, the elif statement
is optional. Unlike else, there can be an arbitrary number of elif statements following an if.
30. Nested If Statements
In Python one if condition can also be nested within another if condition. Indentation is the way to figure
out the level of nesting
31. Continue statement
Whenever a continue statement in Python is encountered it re-starts a loop, skipping the following statements in
the block. It could be used with both while and for loops.The while loop is conditional and the for loop is
iterative, so using continue is subject to same requirements before the next iteration of the loop can begin.
Otherwise the loop will
terminate normally.
Output:
Current variable value : 6
Current variable value : 4
Current variable value : 3
Current variable value : 2
Current variable value : 1
Current variable value : 0
Good bye!
32. Functions
Functions are constructed to structure programs and are useful to utilize code in more than n sections in a
program. It increases s reusability of code and removes redundancy.
Syntax:
def function_name(parameters):
function body (statements)
The function body consists of indented statements. To end the function body, the inintents to be ended. Every time, a
function is called the function body is executed. The parameters in the function definition are optional.
A function may have a return statement that returns a result. Once the return statement is executed in the function body
the function is ended.
35. Creating Arrays from Python Lists
First, we can use np.array to create arrays from Python lists:
# integer array:
np.array([1, 4, 2, 5, 3])
Out[8]: array([1, 4, 2, 5, 3])
Remember that unlike Python lists, NumPy is constrained to arrays that all containthe same type. If types do not match, NumPy will upcast if possible (here,
integers are
upcast to floating point):
In[9]: np.array([3.14, 4, 2, 3])
Out[9]: array([ 3.14, 4. , 2. , 3. ])
If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:
In[10]: np.array([1, 2, 3, 4], dtype='float32')
Out[10]: array([ 1., 2., 3., 4.], dtype=float32)
Finally, unlike Python lists, NumPy arrays can explicitly be multidimensional; here’sone way of initializing a multidimensional array using a list of lists:
In[11]: # nested lists result in multidimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])
Out[11]: array([[2, 3, 4],
[4, 5, 6],
[6, 7, 8]])
The inner lists are treated as rows of the resulting two-dimensional array.
36. NumPy Array Attributes
First, let’s discuss some useful array attributes. We’ll start by defining three random arrays: a one-dimensional, two-dimensional, and
three-dimensional array. We’ll use NumPy’s random number generator, which we will seed with a set value in order to ensure that
the same random arrays are generated each time this code is run:
In[1]: import NumPy as np
np. random.seed(0) # seed for reproducibility
x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array
Each array has attributes ndim (the number of dimensions), shape (the size of each
dimension), and size (the total size of the array):
In[2]: print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
37. Array Slicing: Accessing Subarray
Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked
by the colon (:) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice ofan array x, use this:
x[start:stop:step]
If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1. We’ll take a look at accessing subarrays in one
dimension and inmultiple dimensions.
One-dimensional subarrays
In[16]: x = np.arange(10)
x
Out[16]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In[17]: x[:5] # first five elements
Out[17]: array([0, 1, 2, 3, 4])
In[18]: x[5:] # elements after index 5
Out[18]: array([5, 6, 7, 8, 9])
In[19]: x[4:7] # middle subarray
Out[19]: array([4, 5, 6])
38. Reshaping of Arrays
Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the
reshape() method. For example, if you want to put the numbers
1 through 9 in a 3×3 grid, you can do the following:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)
[[1 2 3]
[4 5 6]
[7 8 9]]
39. Splitting of arrays
The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of
these, we can pass a list of indices giving the split points:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)
[1 2 3] [99 99] [3 2 1]
Notice that N split points lead to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar:
grid = np.arange(16).reshape((4, 4))
grid
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In[52]: upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)
[[0 1 2 3]
[4 5 6 7]]
41. Pandas
At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured
arrays in which the rows and columns are identified with labels rather than simple integer indices.
As we will see during the course of this chapter,Pandas provides a host of useful tools, methods, and
functionality on top of the basic data structures, but nearly everything that follows will require an
understanding of what these structures are. Thus, before we go any further, let’s introduce these three
fundamental Pandas data structures: the Series, DataFrame, and Index.
We will start our code sessions with the standard NumPy and Pandas imports:
import numpy as np
import pandas as pd
47. Combining Datasets: Merge and Join
Combining Datasets: Merge and Join
One essential feature offered by Pandas is its high-performance, in-memory join and merge operations. If you
have ever worked with databases, you should be familiar with this type of data interaction. The main interface for
this is the pd. Merge function, and we’ll see a few examples of how this can work in practice.
Relational Algebra
The behavior implemented in pd. merge() is a subset of what is known as relational algebra, which is a formal set
of rules for manipulating relational data, and forms the conceptual foundation of operations available in most
databases. The strength of the relational algebra approach is that it proposes several primitive operations, which
become the building blocks of more complicated operations on any dataset.
48.
49. Visualization with matplotlib
Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we will use some
standard shorthands for Matplotlib imports:
In[1]: import matplotlib as mpl
import matplotlib.pyplot as plt
show() or No show()? How to Display Your Plots
A visualization you can’t see won’t be of much use, but just how you view your Matplotlibplots depends on the
context. The best use of Matplotlib differs depending on how you are using it; roughly, the three applicable
contexts are using Matplotlib in a script, in an IPython terminal, or in an IPython notebook.
50. matplotlib
Importing matplotlib
Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we willuse some standard
shorthands for Matplotlib imports:
In[1]: import matplotlib as mpl
import matplotlib.pyplot as plt
The plt interface is what we will use most often, as we’ll see throughout this chapter.