The document contains a series of slides about Python programming concepts. It begins with an introductory quote from Guido van Rossum about the origins of Python in 1989. The slides then cover various Python topics like data types, variables, operators, conditional statements, loops, functions, modules and libraries. Examples are provided to illustrate each concept. The document serves as a high-level overview of core Python programming principles.
4. “In December 1989, I was looking for a "hobby"
programming project that would keep me occupied during
the week around Christmas. My office ... would be closed,
but I had a home computer, and not much else on my
hands. I decided to write an interpreter for the new
scripting language I had been thinking about lately: a
descendant of ABC that would appeal to Unix/C hackers. I
chose Python as a working title for the project, being in a
slightly irreverent mood (and a big fan of Monty Python's
Flying Circus).”
— Guido van Rossum
4| @Apptrainers
5. The big technology companies have each largely aligned themselves with different languages
stacks.
Oracle and IBM are aligned with Java (Oracle actually owns Java).
Google are known for their use of Python (1997), a very versatile, dynamic and extensible
language, although in reality they are also heavy users of C++ and Java. They have also created
their own language called Go (2009).
5| @Apptrainers
6. Easy to learn and powerful programming language
It has efficient high-level data structures and a simple but effective approach to object-
oriented programming.
Freely available in source or binary form for all major platforms from the Python Web
site, https://www.python.org/
The Python interpreter is easily extended with new functions and data types implemented
in C or C++ (or other languages callable from C).
Python is also suitable as an extension language for customizable applications.
Widely used (Google, NASA, Quora).
6
| @Apptrainers
7. When you run python program an interpreter will parse python program line by line basis, as
compared to compiled languages like C or C++, where compiler first compiles the program
and then start running.
Difference is that interpreted languages are little bit slow as compared to compiled languages.
7| @Apptrainers
8. In python you don’t need to define variable data type ahead of time, python automatically
guesses the data type of the variable based on the type of value it contains.
8| @Apptrainers
9. Python codes are usually 1/3 or 1/5 of the java code. It means we can write less code in Python
to achieve the same thing as in Java.
9| @Apptrainers
10. There are many good options for saving and manipulating code
Sublime text (unlimited free trial available)
Notepad++
Xcode (Mac)
TextWrangler (Mac)
TextEdit (Mac)
Now there are multiple platforms for taking online courses for free
Coursera
Edx
Stanford Online
Khan Academy
Udacity
| @Apptrainers 10
11. To download Python follow the instructions on the
official website!
https://www.python.org/
11| @Apptrainers
12. I would strongly recommend this video:
https://www.youtube.com/watch?v=HW29067qVWk
12| @Apptrainers
15. “GitHub is a code hosting platform for version control and collaboration. It lets you and others
work together on projects from anywhere”.
GitHub accounts can be public (free) or private (not free)
A repository is usually used to organize a single project, It contains folders and files, images,
videos, spreadsheets, and data sets – anything your project needs.
15| @Apptrainers
16. Master in a repository:The final version
Branch:To try out new ideas that don’t affect
the master unless pull request is accepted. Any
changes committed to branch reflects for you
to keep track of different versions
Adding Commits:To Keep track (history) of
user progress on a branch or master.
Forking a repository: creates a copy of Repo.
Submit a pull request to owner so that the
owner can incorporate changes.
16| @Apptrainers
17. Download Python and Jupyter Notebook
Write a python code to print your name, your id, and your favorite quote!
Save the project as .html and as .ipynb
Install git and create a GitHub account
Upload your first project as .html to e-learning
Upload your first project as .ipynb to your Github account
Share the link of your Github with me on e-learning
17| @Apptrainers
19. You can type things directly into a running Python session
19| @Apptrainers
20. Most of the programming languages like C, C++, Java use braces { } to define a block of code.
Python uses indentation.
A code block (body of a function, loop etc.) starts with indentation and ends with the first
unindented line.The amount of indentation is up to you, but it must be consistent throughout
that block.
Generally four whitespaces are used for indentation and is preferred over tabs. Here is an
example.
for i in range(1,11):
print(i)
if i == 5:
break
Incorrect indentation will result into IndentationError.
20| @Apptrainers
21. In Python, we use the hash (#) symbol to start writing a comment.
It extends up to the newline character. Comments are for programmers for better
understanding of a program. Python Interpreter ignores comment.
#This is a comment
#print out Hello
print('Hello’)
If we have comments that extend multiple lines, one way of doing it is to use hash (#) in the
beginning of each line.
Another way of doing this is to use triple quotes, either ’‘ ' or ” ” ".
These triple quotes are generally used for multi-line strings. But they can be used as multi-line
comment as well.
"""This is also
a perfect example
of multi-line comments"""
21| @Apptrainers
22. expression: A data value or set of operations to compute a value.
Examples: 1 + 4 * 3
42
Arithmetic operators we will use:
+ - * / addition, subtraction, multiplication, division
% modulus, a.k.a. remainder
** exponentiation
precedence: Order in which operations are computed.
* / % ** have a higher precedence than + -
1 + 3 * 4 is 13
Parentheses can be used to force a certain order of evaluation.
(1 + 3) * 4 is 16
Operat
or
Description Example
= Assignment num = 7
+ Addition num = 2 + 2
- Subtraction num = 6 - 4
* Multiplication num = 5 * 4
/ Division num = 25 / 5
% Modulo num = 8 % 3
** Exponent num = 9 ** 2
22| @Apptrainers
23. When we divide integers with / , the quotient is also an integer.
35 / 5 is 7
84 / 10 is 8
156 / 100 is 1
The % operator computes the remainder from a division of integers.
The operators + - * / % ** ( ) all work for real numbers.
The / produces an exact answer: 15.0 / 2.0 is 7.5
The same rules of precedence also apply to real numbers:
Evaluate ( ) before * / % before + -
When integers and reals are mixed, the result is a real number.
Example: 1 / 2.0 is 0.5
The conversion occurs on a
per-operator basis
7 / 3 * 1.2 + 3 / 2
2 * 1.2 + 3 / 2
2.4 + 3 / 2
2.4 + 1
3.4
23| @Apptrainers
24. Python has useful commands for performing calculations.
Command name Description
abs(value) absolute value
ceil(value) rounds up
cos(value) cosine, in radians
floor(value) rounds down
log(value) logarithm, base e
log10(value) logarithm, base 10
max(value1, value2) larger of two values
min(value1, value2) smaller of two values
round(value) nearest whole number
sin(value) sine, in radians
sqrt(value) square root
Constant Description
e 2.7182818...
pi 3.1415926...
To use many of these commands, you
must write the following at the top of your
Python program:
from math import *
24| @Apptrainers
25. variable: A named piece of memory that can store a value.
Usage:
Compute an expression's result,
store that result into a variable,
and use that variable later in the program.
assignment statement: Stores a value into a variable.
Syntax:
name = value
Examples: x = 5 gpa = 3.14
x 5 gpa 3.14
A variable that has been given a value can be used in expressions.
x + 4 is 9
Exercise: Evaluate the quadratic equation for a given a, b, and c.
25| @Apptrainers
26. print : Produces text output on the console.
Syntax:
print ("Message”)
print (Expression)
Prints the given text message or expression value on the console, and moves the cursor down to the
next line.
print (Item1, Item2, ..., ItemN)
Prints several messages and/or expressions on the same line.
Examples:
print ("Hello, world!”)
age = 45
print ("You have", 65 - age, "years until retirement”)
Output:
Hello, world!
You have 20 years until retirement 26| @Apptrainers
27. input : Reads a number from user input.
You can assign (store) the result of input into a variable.
Example:
age = input("How old are you? ")
print ("Your age is", age)
print ("You have", 65 - age, "years until retirement”)
Output:
How old are you? 53
Your age is 53
You have 12 years until retirement
Exercise: Write a Python program that prompts the user for his/her amount of money, then
reports how many Nintendo Wiis the person can afford, and how much more money he/she
will need to afford an additional Wii.
27| @Apptrainers
28. for loop: Repeats a set of statements over a group of values.
Syntax:
for variableName in groupOfValues:
statements
We indent the statements to be repeated with tabs or spaces.
variableName gives a name to each value, so you can refer to it in the statements.
groupOfValues can be a range of integers, specified with the range function.
Example:
for x in range(1, 6):
print (x, "squared is", x * x)
Output:
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
28| @Apptrainers
30. The range function specifies a range of integers:
range(start, stop) - the integers between start (inclusive)
and stop (exclusive)
It can also accept a third value specifying the change between values.
range(start, stop, step) - the integers between start (inclusive)
and stop (exclusive) by step
Example:
for x in range(5, 0, -1):
print (x)
print (”Hello!”)
Output:
5
4
3
2
1
Hello!
30| @Apptrainers
31. Some loops incrementally compute a value that is initialized outside the loop. This is
sometimes called a cumulative sum.
sum = 0
for i in range(1, 11):
sum = sum + (i * i)
print ("sum of first 10 squares is", sum)
Output:
sum of first 10 squares is 385
Exercise: Write a Python program that computes the factorial of an integer.
31| @Apptrainers
32. if statement: Executes a group of statements only if a certain condition is true. Otherwise,
the statements are skipped.
Syntax:
if condition:
statements
Example:
gpa = 3.4
if gpa > 2.0:
print ("Your application is accepted.”)
32| @Apptrainers
33. if/else statement: Executes one block of statements if a certain
condition is True, and a second block of statements if it is False.
Syntax:
if condition:
statements
else:
statements
Example:
gpa = 1.4
if gpa > 2.0:
print "Welcome to JUST University!"
else:
print "Your application is denied."
Multiple conditions can be chained with elif ("else if"):
if condition:
statements
elif condition:
statements
else:
statements
33| @Apptrainers
34. while loop: Executes a group of statements as long as a condition is True.
good for indefinite loops (repeat an unknown number of times)
Syntax:
while condition:
statements
Example:
number = 1
while number < 200:
print number,
number = number * 2
Output:
1 2 4 8 16 32 64 128
34| @Apptrainers
35. Many logical expressions use relational operators:
Logical expressions can be combined with logical operators:
Exercise: Write code to display and count the factors of a number.
Operator Example Result
and 9 != 6 and 2 < 3 True
or 2 == 3 or -1 < 5 True
not not 7 > 0 False
Operator Meaning Example Result
== equals 1 + 1 == 2 True
!= does not equal 3.2 != 2.5 True
< less than 10 < 5 False
> greater than 10 > 5 True
<= less than or equal to 126 <= 100 False
>= greater than or equal to 5.0 >= 5.0 True
35| @Apptrainers
36. string: A sequence of text characters in a program.
Strings start and end with quotation mark " or apostrophe ' characters.
Examples:
"hello"
"This is a string"
"This, too, is a string. It can be very long!"
A string may not span across multiple lines or contain a " character.
"This is not
a legal String."
"This is not a "legal" String either."
A string can represent characters by preceding them with a backslash.
t tab character
n new line character
" quotation mark character
backslash character
Example: "HellottherenHow are you?"
36| @Apptrainers
37. Characters in a string are numbered with indexes starting at 0:
Example:
name = "P. Diddy"
Accessing an individual character of a string:
variableName [ index ]
Example:
print name, "starts with", name[0]
Output:
P. Diddy starts with P
index 0 1 2 3 4 5 6 7
character P . D i d d y
37| @Apptrainers
38. len(string) - number of characters in a string (including spaces)
str.lower(string) - lowercase version of a string
str.upper(string) - uppercase version of a string
Example:
name = "Martin Douglas Stepp"
length = len(name)
big_name = str.upper(name)
print big_name, "has", length, "characters"
Output:
MARTIN DOUGLAS STEPP has 20 characters
38| @Apptrainers
39. A compound data type:
[0]
[2.3, 4.5]
[5, "Hello", "there", 9.8]
[]
Use len() to get the length of a list
>>> names = [“Ben",“Chen",“Yaqin"]
>>> len(names)
3
39| @Apptrainers
43. Certain features of Python are not loaded by default
In order to use these features, you’ll need to import the modules that contain them.
E.g.
import matplotlib.pyplot as plt
import numpy as np
43| @Apptrainers
44. f = 7 / 2
# in python 2, f will be 3, unless “from __future__ import division”
f = 7 / 2 # in python 3 f = 3.5
f = 7 // 2 # f = 3 in both python 2 and 3
f = 7 / 2. # f = 3.5 in both python 2 and 3
f = 7 / float(2) # f is 3.5 in both python 2 and 3
f = int(7 / 2) # f is 3 in both python 2 and 3
44| @Apptrainers
45. Get the i-th element of a list
x = [i for i in range(10)] # is the list [0, 1, ..., 9]
zero = x[0] # equals 0, lists are 0-indexed
one = x[1] # equals 1
nine = x[-1] # equals 9, 'Pythonic' for last element
eight = x[-2] # equals 8, 'Pythonic' for next-to-last element
one_to_four = x[1:5] # [1, 2, 3, 4]
first_three = x[:3] # [0, 1, 2]
last_three = x[-3:] # [7, 8, 9]
three_to_end = x[3:] # [3, 4, ..., 9]
without_first_and_last = x[1:-1] # [1, 2, ..., 8]
copy_of_x = x[:] # [0, 1, 2, ..., 9]
another_copy_of_x = x[:3] + x[3:] # [0, 1, 2, ..., 9]
45| @Apptrainers
46. 1 in [1, 2, 3] # True
0 in [1, 2, 3] # False
x = [1, 2, 3]
y = [4, 5, 6]
x.extend(y) # x is now [1,2,3,4,5,6]
x = [1, 2, 3]
y = [4, 5, 6]
z = x + y # z is [1,2,3,4,5,6]; x is unchanged.
x, y = [1, 2] # x is 1 and y is 2
[x, y] = 1, 2 # same as above
x, y = [1, 2] # same as above
x, y = 1, 2 # same as above
_, y = [1, 2] # y is 2, didn't care about the first element
46| @Apptrainers
47. >>> a = ['Mary', 'had', 'a', 'little', 'lamb']
>>> for i in range(len(a)):
... print(i, a[i])
...
0 Mary
1 had
2 a
3 little
4 lamb
47| @Apptrainers
48. What are the expected output for the following code?
a = list(range(10))
b = a
b[0] = 100
print(a)
a = list(range(10))
b = a[:]
b[0] = 100
print(a)
[100, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
a = [0, 1, 2, 3, 4]
b = a
c = a[:]
a == b
Out[129]: True
a is b
Out[130]: True
a == c
Out[132]: True
a is c
Out[133]: False
48| @Apptrainers
49. Similar to lists, but are immutable
a_tuple = (0, 1, 2, 3, 4)
Other_tuple = 3, 4
Another_tuple = tuple([0, 1, 2, 3, 4])
Hetergeneous_tuple = (‘john’, 1.1, [1, 2])
Can be sliced, concatenated, or repeated
a_tuple[2:4] # will print (2, 3)
Cannot be modified
a_tuple[2] = 5
TypeError: 'tuple' object does not support item assignment
Note: tuple is defined by comma, not
parentheses, which is only used for
convenience and grouping elements. So a = (1)
is not a tuple, but a = (1,) is.
49| @Apptrainers
50. Useful for returning multiple values from functions
Tuples and lists can also be used for multiple assignments
def sum_and_product(x, y):
return (x + y),(x * y)
sp = sum_and_product(2, 3) # equals (5, 6)
s, p = sum_and_product(5, 10) # s is 15, p is 50
x, y = 1, 2
[x, y] = [1, 2]
(x, y) = (1, 2)
x, y = y, x
50| @Apptrainers
51. a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0]=a #### ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a)
my_tuple[0]=a #### No ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0]=5 #### ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0][0]=5 #### No ERROR
51| @Apptrainers
52. A dictionary associates values with unique keys
empty_dict = {} # Pythonic
empty_dict2 = dict() # less Pythonic
grades = { "Joel" : 80, "Tim" : 95 } # dictionary literal
joels_grade = grades["Joel"] # equals 80
grades["Tim"] = 99 # replaces the old value
grades["Kate"] = 100 # adds a third entry
num_students = len(grades) # equals 3
• Access/modify value with key
try:
kates_grade = grades["Kate"]
except KeyError:
print "no grade for Kate!" 52| @Apptrainers
54. Check for existence of key
joel_has_grade = "Joel" in grades # True
kate_has_grade = "Kate" in grades # False
joels_grade = grades.get("Joel", 0) # equals 80
kates_grade = grades.get("Kate", 0) # equals 0
no_ones_grade = grades.get("No One") # default
default is None
• Use “get” to avoid keyError and add default value
• Get all items
all_keys = grades.keys() # return a list of all keys
all_values = grades.values() # return a list of all values
all_pairs = grades.items() # a list of (key, value) tuples
#Which of the following is faster?
'Joel' in grades # faster. Hashtable
'Joel' in all_keys # slower. List.
In python3,The
following will not
return lists but
iterable objects
54| @Apptrainers
56. try:
print 0 / 0
except ZeroDivisionError:
print ("cannot divide by zero")
https://docs.python.org/3/tutorial/errors.ht
ml
56| @Apptrainers
57. Functions are defined using def
def double(x):
"""this is where you put an optional docstring
that explains what the function does.
for example, this function multiplies its
input by 2"""
return x * 2
• You can call a function after it is defined
z = double(10) # z is 20
• You can give default values to parameters
def my_print(message="my default message"):
print (message)
my_print("hello") # prints 'hello'
my_print() # prints 'my default message‘ 57| @Apptrainers
58. Sometimes it is useful to specify arguments by name
def subtract(a=0, b=0):
return a – b
subtract(10, 5) # returns 5
subtract(0, 5) # returns -5
subtract(b = 5) # same as above
subtract(b = 5, a = 0) # same as above
58| @Apptrainers
59. Functions are objects too
In [12]: def double(x): return x * 2
...: DD = double;
...: DD(2)
...:
Out[12]: 4
In [16]: def apply_to_one(f):
...: return f(1)
...: x=apply_to_one(DD)
...: x
...:
Out[16]: 2
59| @Apptrainers
60. Small anonymous functions can be created with the lambda keyword.
The power of lambda is better shown when you use them
as an anonymous function inside another function.
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2)
mytripler = myfunc(3)
print(mydoubler(11))
print(mytripler(11))
A lambda function can take any number of arguments, but can only
have one expression.
x = lambda a : a + 10
print(x(5))
x = lambda a, b, c : a * b - c
print(x(5, 6, 2))
60| @Apptrainers
62. A very convenient way to create a new list
squares = [x * x for x in range(5)]
print (squares)
Out[52]: [0, 1, 4, 9, 16]
squares=[0,0,0,0,0]
for x in range(5):
squares[x] = x * x
print (squares)
Out[64]: [0, 1, 4, 9, 16] 62| @Apptrainers
63. In [68]: even_numbers = []
In [69]: for x in range(5):
...: if x % 2 == 0:
...: even_numbers.append(x)
...: even_numbers
Out[69]: [0, 2, 4]
In [65]: even_numbers = [x for x in range(5) if x % 2 == 0]
In [66]: even_numbers
Out[66]: [0, 2, 4]
Can also be used to filter list
63| @Apptrainers
64. More complex examples:
# create 100 pairs (0,0) (0,1) ... (9,8), (9,9)
pairs = [(x, y)
for x in range(10)
for y in range(10)]
# only pairs with x < y,
# range(lo, hi) equals
# [lo, lo + 1, ..., hi - 1]
increasing_pairs = [(x, y)
for x in range(10)
for y in range(x + 1, 10)]
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3) …etc
64| @Apptrainers
65. Convenient tools in python to apply function to sequences of data
def double(x): return 2*x
b=range(5)
list(map(double, b))
Out[203]: [0, 2, 4, 6, 8]
In [204]: double(b)
Traceback (most recent call last):
TypeError: unsupported operand type(s) for *: 'int' and 'range'
def double(x): return 2*x
print ([double(i) for i in range(5)])
Out[205]: [0, 2, 4, 6, 8]
65| @Apptrainers
67. def is_even(x): return x%2==0
a=[0, 1, 2, 3]
list(filter(is_even, a))
Out[208]: [0, 2]
In [209]: [a[i] for i in a if is_even(i)]
Out[209]: [0, 2]
a = [1, 2, 3, 4, 5, 6]
print list(filter(lambda x : x % 2 == 0, a)) # Output: [2, 4, 6]
67| @Apptrainers
68. In [216]: from functools import reduce
In [217]: reduce(lambda x, y: x+y, range(10))
Out[217]: 45
In [220]: reduce(lambda x, y: x*y, [1, 2, 3, 4])
Out[220]: 24
68| @Apptrainers
69. Useful to combined multiple lists into a list of tuples
In [238]: list(zip(['a', 'b', 'c'], [1, 2, 3], ['A', 'B', 'C']))
Out[238]: [('a', 1, 'A'), ('b', 2, 'B'), ('c', 3, 'C')]
In [245]: names = ['James', 'Tom', 'Mary']
...: grades = [100, 90, 95]
...: list(zip(names, grades))
...:
Out[245]: [('James', 100), ('Tom', 90), ('Mary', 95)]
69| @Apptrainers
70. file object = open(file_name [,
access_mode])
access_mode − The access_mode
determines the mode in which the file
has to be opened, i.e., read, write,
append, etc. A complete list of possible
values is given below in the table.This
is optional parameter and the default
file access mode is read (r).
70| @Apptrainers
72. read(): It reads the entire file and returns it contents in the form of a string
readline(): It reads the first line of the file i.e till a newline character or an EOF in case of a file
having a single line and returns a string
readlines(): It reads the entire file line by line and returns a list of line strings
1 hello 40 50 hi
This is my course
Welcome to this course n wish you all the best
f = open("my_file2.txt", 'w')
f.write("Hello Everyone!")
72| @Apptrainers
73. Notice how each piece of data is
separated by a comma.
73| @Apptrainers
77. What is Numpy?
• Numpy, Scipy, and Matplotlib provide MATLAB-
like functionality in python.
• Numpy Features:
Typed multidimentional arrays (matrices)
Fast numerical computations (matrix math)
High-level math functions
3
|@Apptrainers
78. Why do we need NumPy
Let’s see for ourselves!
4
|@Apptrainers
79. Why do we need NumPy
• Python does numerical computations slowly.
• 1000 x 1000 matrix multiply
Python triple loop takes > 10 min.
Numpy takes ~0.03 seconds
5
|@Apptrainers
80. NumPy Overview
1. Arrays
2. Shaping and transposition
3. Mathematical Operations
4. Indexing and slicing
5. Broadcasting
6
|@Apptrainers
86. Arrays, Basic Properties
import numpy as np
a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
print a.ndim, a.shape, a.dtype
1. Arrays can have any number of dimensions, including zero (a scalar).
2. Arrays are typed: np.uint8, np.int64, np.float32, np.float64
3. Arrays are dense. Each element of the array exists and has the same type.
12
|@Apptrainers
96. Arrays, danger zone
• Must be dense, no holes.
• Must be one type
• Cannot combine arrays of different shape
22
|@Apptrainers
97. Shaping
a = np.array([1,2,3,4,5,6])
a = a.reshape(3,2)
a = a.reshape(2,-1)
a = a.ravel()
1. Total number of elements cannot change.
2. Use -1 to infer axis shape
3. Row-major by default (MATLAB is column-major)
23
|@Apptrainers
100. Return values
• Numpy functions return either views or copies.
• Views share data with the original array, like
references in Java/C++. Altering entries of a
view, changes the same entries in the original.
• The numpy documentation says which functions
return views or copies
• np.copy, np.view make explicit copies and views.
26
|@Apptrainers
104. Saving and loading arrays
np.savez(‘data.npz’, a=a)
data = np.load(‘data.npz’)
a = data[‘a’]
1. NPZ files can hold multiple arrays
2. np.savez_compressed similar.
30
|@Apptrainers
105. Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
31
|@Apptrainers
106. Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
32
|@Apptrainers
107. Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
33
|@Apptrainers
108. Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
34
|@Apptrainers
109. Math, upcasting
Just as in Python and Java, the result of a math
operator is cast to the more general or precise
datatype.
uint64 + uint16 => uint64
float32 / int32 => float32
Warning: upcasting does not prevent
overflow/underflow. You must manually cast first.
Use case: images often stored as uint8. You should
convert to float32 or float64 before doing math.
35
|@Apptrainers
113. Indexing
x[0,0] # top-left element
x[0,-1] # first row, last column
x[0,:] # first row (many entries)
x[:,0] # first column (many entries)
Notes:
Zero-indexing
Multi-dimensional indices are comma-separated (i.e., a
tuple)
39
|@Apptrainers
117. Axes
a.sum() # sum all entries
a.sum(axis=0) # sum over rows
a.sum(axis=1) # sum over columns
a.sum(axis=1, keepdims=True)
1. Use the axis parameter to control which axis
NumPy operates on
2. Typically, the axis specified will disappear,
keepdims keeps all dimensions
43
|@Apptrainers
119. Broadcasting
a = a + 1 # add one to every element
When operating on multiple arrays, broadcasting rules are
used.
Each dimension must match, from right-to-left
1. Dimensions of size 1 will broadcast (as if the value was
repeated).
2. Otherwise, the dimension must have the same shape.
3. Extra dimensions of size 1 are added to the left as needed.
45
|@Apptrainers
120. Broadcasting example
Suppose we want to add a color value to an image
a.shape is 100, 200, 3
b.shape is 3
a + b will pad b with two extra dimensions so it
has an effective shape of 1 x 1 x 3.
So, the addition will broadcast over the first and
second dimensions.
46
|@Apptrainers
121. Broadcasting failures
If a.shape is 100, 200, 3 but b.shape is 4 then a + b
will fail. The trailing dimensions must have the
same shape (or be 1)
47
|@Apptrainers
122. Tips to avoid bugs
1. Know what your datatypes are.
2. Check whether you have a view or a copy.
3. Know np.dot vs np.multiply.
48
|@Apptrainers
123. 49
numpy.dot
numpy.dot(a, b, out=None)
Dot product of two arrays. Specifically,
• If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
• If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.
• If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a *
b is preferred.
• If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
• If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and
the second-to-last axis of b:
dot(a, b)[i,j,k,m] =
(a[i,j,:] * b[k,:,m])
|@Apptrainers
129. What is Pandas?
Pandas is a Python module, which is rounding up the capabilities of Numpy,
Scipy and Matplotlab. The word pandas is an acronym which is derived
from:
"Python and data analysis" and "panel data".
There is often some confusion about whether Pandas is an alternative to
Numpy, SciPy and Matplotlib.
The truth is that it is built on top of Numpy. This means that Numpy is
required by pandas.
Scipy and Matplotlib on the other hand are not required by pandas but they
are extremely useful. That's why the Pandas project lists them as "optional
dependency".
| @Apptrainers
130. What is Pandas?
• Pandas is a software library written for the Python programming
language.
• It is used for data manipulation and analysis.
• It provides special data structures and operations for the
manipulation of numerical tables and time series.
| @Apptrainers| @Apptrainers
132. Series
• A Series is a one-dimensional labelled array-like object.
• It is capable of holding any data type, e.g. integers, floats, strings,
Python objects, and so on.
• It can be seen as a data structure with two arrays: one functioning as
the index, i.e. the labels, and the other one contains the actual data.
| @Apptrainers
133. Example
import pandas as pd
S = pd.Series([11, 28, 72, 3, 5, 8])
S
The above code returns:
0 11
1 28
2 72
3 3
4 5
5 8
dtype: int64
| @Apptrainers
134. • We can directly access the index and the values of our Series S:
print(S.index)
print(S.values)
RangeIndex(start=0, stop=6, step=1)
[11 28 72 3 5 8]
| @Apptrainers
135. • If we compare this to creating an array in numpy, there are still lots of
similarities:
import numpy as np
X = np.array([11, 28, 72, 3, 5, 8])
print(X)
print(S.values)
# both are the same type:
print(type(S.values),
type(X))
[11 28 72 3 5 8]
[11 28 72 3 5 8]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
| @Apptrainers
136. Another example:
fruits = ['apples', 'oranges', 'cherries', 'pears']
quantities = [20, 33, 52, 10]
S = pd.Series(quantities, index=fruits)
S
Output:
apples 20
oranges 33
cherries 52
pears 10
dtype: int64
| @Apptrainers
137. If we add two series with the same indices, we get a new series with the same
index and the corresponding values will be added:
fruits = ['apples', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits)
print(S + S2)
print(“sum of S: ", sum(S))
Output:
apples 37
oranges 46
cherries 83
pears 42
dtype: int64
sum of S: 115
| @Apptrainers
138. The indices do not have to be the same for the Series addition. The index will be the
"union" of both indices. If an index doesn't occur in both Series, the value for this Series
will be NaN:
fruits = ['peaches', 'oranges', 'cherries', 'pears']
fruits2 = ['raspberries', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits2)
print(S + S2)
Output:
cherries 83.0
oranges 46.0
peaches NaN
pears 42.0
raspberries NaN
dtype: float64
| @Apptrainers
139. fruits = ['apples', 'oranges', 'cherries', 'pears']
fruits_ro = ["mere", "portocale", "cireșe", "pere"]
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits_ro)
print(S+S2)
Output:
apples NaN
cherries NaN
cireșe NaN
mere NaN
oranges NaN
pears NaN
pere NaN
portocale NaN
dtype: float64
| @Apptrainers
140. It's possible to access single values of a Series or more than one value
by a list of indices:
print(S['apples'])
20
print(S[['apples', 'oranges', 'cherries']])
apples 20
oranges 33
cherries 52
dtype: int64
| @Apptrainers
141. Similar to Numpy we can use scalar operations or mathematical functions on a series:
import numpy as np
print((S + 3) * 4)
print("======================")
print(np.sin(S))
Output:
apples 92
oranges 144
cherries 220
pears 52
dtype: int64
======================
apples 0.912945
oranges 0.999912
cherries 0.986628
pears -0.544021
dtype: float64
| @Apptrainers
142. Pandas.Series.Apply
Series.apply(func, convert_dtype=True, args=(), **kwds)
Parameter Meaning
func
a function, which can be a NumPy function that will be
applied to the entire Series or a Python function that
will be applied to every single value of the series
convert_dtype
A boolean value. If it is set to True (default), apply will
try to find better dtype for elementwise function
results. If False, leave as dtype=object
args
Positional arguments which will be passed to the
function "func" additionally to the values from the
series.
**kwds
Additional keyword arguments will be passed as
keywords to the function
| @Apptrainers
144. • We can also use Python lambda functions. Let's assume, we have the
following task: test the amount of fruit for every kind. If there are less
than 50 available, we will augment the stock by 10:
S.apply(lambda x: x if x > 50 else x+10 )
apples 30
oranges 43
cherries 52
pears 20
dtype: int64
| @Apptrainers
145. Filtering with a Boolean array:
S[S>30]
oranges 33
cherries 52
dtype: int64
| @Apptrainers
146. • A series can be seen as an ordered Python dictionary with a fixed
length.
"apples" in S
True
| @Apptrainers
147. • We can even pass a dictionary to a Series object, when we create it.
We get a Series with the dict's keys as the indices. The indices will be
sorted.
cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235,
"Rome": 2874038, "Paris": 2273305, "Vienna": 1805681,
"Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000,
"Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900,
"Milan": 1350680}
city_series = pd.Series(cities)
print(city_series)
149. NaN
One problem in dealing with data analysis tasks consists in missing
data. Pandas makes it as easy as possible to work with missing data.
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart",
"Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
my_city_series
151. • Due to the NaN values the population values for the other cities are
turned into floats. There is no missing data in the following examples,
so the values are int:
my_cities = ["London", "Paris", "Berlin", "Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
my_city_series
157. Filtering out Missing Data
It's possible to filter out missing data with the Series method dropna. It
returns a Series which consists only of non-null data:
import pandas as pd
cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome":
2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425,
"Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119,
"Barcelona":1602386, "Munich": 1493900, "Milan": 1350680}
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
print(my_city_series.dropna())
| @Apptrainers
159. Filling in Missing Data
• In many cases you don't want to filter out missing data, but you want to fill in
appropriate data for the empty gaps. A suitable method in many situations will be
fillna:
print(my_city_series.fillna(0))
London 8615246.0
Paris 2273305.0
Zurich 0.0
Berlin 3562166.0
Stuttgart 0.0
Hamburg 1760433.0
dtype: float64
| @Apptrainers
160. • If we call fillna with a dictionary, we can provide the appropriate data, i.e.
the population of Zurich and Stuttgart:
missing_cities = {"Stuttgart":597939, "Zurich":378884}
my_city_series.fillna(missing_cities)
London 8615246.0
Paris 2273305.0
Zurich 378884.0
Berlin 3562166.0
Stuttgart 597939.0
Hamburg 1760433.0
dtype: float64
| @Apptrainers
163. DataFrame
• The underlying idea of a DataFrame is based on spreadsheets. We
can see the data structure of a DataFrame as tabular and
spreadsheet-like.
• A DataFrame logically corresponds to a "sheet" of an Excel document.
• A DataFrame has both a row and a column index.
| @Apptrainers
164. • Like a spreadsheet or Excel sheet, a DataFrame object contains an
ordered collection of columns.
• Each column consists of a unique data type, but different columns can
have different types, e.g. the first column may consist of integers,
while the second one consists of Boolean values and so on.
• There is a close connection between the DataFrames and the Series
of Pandas.
• A DataFrame can be seen as a concatenation of Series, each Series
having the same index, i.e. the index of the DataFrame.
| @Apptrainers
167. • This result is not what we have intended or expected. The reason is
that concat used 0 as the default for the axis parameter. Let's do it
with "axis=1":
shops_df = pd.concat([shop1, shop2, shop3], axis=1)
print(shops_df)
| @Apptrainers
175. Custom Index
• We can see that an index (0,1,2, ...) has been automatically assigned
to the DataFrame. We can also assign a custom index to the
DataFrame object:
ordinals = ["first", "second", "third", "fourth", "fifth", "sixth", "seventh",
"eigth", "ninth", "tenth", "eleventh", "twelfth", "thirteenth"]
city_frame = pd.DataFrame(cities, index=ordinals)
print(city_frame)
| @Apptrainers
177. Rearranging the Order of Columns
We can also define and rearrange the order of the columns at the time
of creation of the DataFrame.
This makes also sure that we will have a defined ordering of our
columns, if we create the DataFrame from a dictionary.
Dictionaries are not ordered.
| @Apptrainers
180. • But what if you want to change the column names and the ordering
of an existing DataFrame?
city_frame.reindex(["country", "name", "population"])
print(city_frame)
| @Apptrainers
182. • Now, we want to rename our columns. For this purpose, we will use
the DataFrame method 'rename'. This method supports two calling
conventions
• (index=index_mapper, columns=columns_mapper, ...)
• (mapper, axis={'index', 'columns'}, ...)
• We will rename the columns of our DataFrame into Romanian names
in the following example.
• We set the parameter inplace to True so that our DataFrame will be
changed instead of returning a new DataFrame, if inplace is set to
False, which is the default!
| @Apptrainers
185. Existing Column as the Index of a DataFrame
• We want to create a more useful index in the following example. We
will use the country name as the index, i.e. the list value associated to
the key "country" of our cities dictionary:
city_frame = pd.DataFrame(cities, columns=["name", "population"],
index=cities["country"])
print(city_frame)
| @Apptrainers
187. • Alternatively, we can change an existing DataFrame.
• We can use the method set_index to turn a column into an index.
• "set_index" does not work in-place, it returns a new data frame with
the chosen column as the index:
| @Apptrainers
190. • We saw in the previous example that the set_index method returns a
new DataFrame object and doesn't change the original DataFrame. If
we set the optional parameter "inplace" to True, the DataFrame will
be changed in place, i.e. no new object will be created:
city_frame = pd.DataFrame(cities)
city_frame.set_index("country", inplace=True)
print(city_frame)
| @Apptrainers
192. Label-Indexing on the Rows
• So far we have indexed DataFrames via the columns. We will
demonstrate now, how we can access rows from DataFrames via the
locators 'loc' and 'iloc'. ('ix' is deprecated and will be removed in the
future)
city_frame = pd.DataFrame(cities, columns=("name",
"population"), index=cities["country"])
print(city_frame.loc["Germany"])
| @Apptrainers
196. Sum and Cumulative Sum
• We can calculate the sum of all the columns of a DataFrame or the
sum of certain columns:
print(city_frame.sum())
| @Apptrainers
198. We can use "cumsum" to calculate the cumulative sum:
| @Apptrainers
199. Assigning New Values to Columns
• x is a Pandas Series.
• We can reassign the previously calculated cumulative sums to the
population column:
city_frame["population"] = x
print(city_frame)
| @Apptrainers
201. • Instead of replacing the values of the population column
with the cumulative sum, we want to add the cumulative
population sum as a new column with the name
"cum_population".
city_frame = pd.DataFrame(cities, columns=["country",
"population", "cum_population"], index=cities["name"])
print(city_frame)
| @Apptrainers
203. • We can see that the column "cum_population" is set to NaN, as we haven't
provided any data for it.
• We will assign now the cumulative sums to this column:
city_frame["cum_population"] =city_frame["population"].cumsum()
print(city_frame)
| @Apptrainers
205. • We can also include a column name which is not contained
in the dictionary, when we create the DataFrame from the
dictionary. In this case, all the values of this column will be
set to NaN:
city_frame = pd.DataFrame(cities, columns=["country",
"area", "population"], index=cities["name"])
print(city_frame)
| @Apptrainers
207. Accessing the Columns of a DataFrame
• There are two ways to access a column of a DataFrame. The result is
in both cases a Series:
# in a dictionary-like way:
print(city_frame["population"])
| @Apptrainers
212. city_frame.population
From the previous example, we can see that we
have not copied the population column. "p" is a
view on the data of city_frame.
| @Apptrainers
213. Assigning New Values to a Column
• The column area is still not defined. We can set all elements of the
column to the same value:
city_frame["area"] = 1572
print(city_frame)
| @Apptrainers
215. • In this case, it will be definitely better to assign the exact area to the
cities. The list with the area values needs to have the same length as
the number of rows in our DataFrame.
# area in square km:
area = [1572, 891.85, 605.77, 1285, 105.4, 414.6, 228, 755, 525.2, 517,
101.9, 310.4, 181.8]
# area could have been designed as a list, a Series, an array or a scalar
city_frame["area"] = area
print(city_frame)
| @Apptrainers
218. Let's assume, we have only the areas of London, Hamburg and Milan.
The areas are in a series with the correct indices. We can assign this
series as well:
city_frame = pd.DataFrame(cities, columns=["country", "area",
"population"], index=cities["name"])
some_areas = pd.Series([1572, 755, 181.8], index=['London',
'Hamburg', 'Milan'])
city_frame['area'] = some_areas
print(city_frame)
| @Apptrainers
220. Inserting new columns into existing
DataFrames
• In the previous example we have added the column area at creation
time. Quite often it will be necessary to add or insert columns into
existing DataFrames.
• For this purpose the DataFrame class provides a method "insert",
which allows us to insert a column into a DataFrame at a specified
location:
insert(self, loc, column, value, allow_duplicates=False)`
| @Apptrainers
225. DataFrame from Nested Dictionaries
A nested dictionary of dictionaries can be passed to a DataFrame as
well.
The indices of the outer dictionary are taken as the columns and the
inner keys. i.e. the keys of the nested dictionaries, are used as the row
indices:
| @Apptrainers
232. Filling a DataFrame with random values:
import numpy as np
names = ['Frank', 'Eve', 'Stella', 'Guido', 'Lara']
index = ["January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December"]
df = pd.DataFrame((np.random.randn(12, 5)*1000).round(2),
columns=names, index=index)
print(df)
randn: returns sample or samples of random numbers from a normal
distribution with Mean as 1st argument and VAR as second argument.
| @Apptrainers
234. Summary
• So far we have covered the following:
• Python 3.0 (scalers, lists, dictionaries, loops, selection, functions)
• Numpy
• Pandas
• The reason for studying these packages is to be able to program the 5
steps in any data science process.
| @Apptrainers