علم البيانات - Data Sience

• Introduction to Python
• Numpy
• Pandas
| @Apptrainers
content

“In December 1989, I was looking for a "hobby"
programming project that would keep me occupied during
the week around Christmas. My office ... would be closed,
but I had a home computer, and not much else on my
hands. I decided to write an interpreter for the new
scripting language I had been thinking about lately: a
descendant of ABC that would appeal to Unix/C hackers. I
chose Python as a working title for the project, being in a
slightly irreverent mood (and a big fan of Monty Python's
Flying Circus).”
— Guido van Rossum
4| @Apptrainers

 The big technology companies have each largely aligned themselves with different languages
stacks.
 Oracle and IBM are aligned with Java (Oracle actually owns Java).
 Google are known for their use of Python (1997), a very versatile, dynamic and extensible
language, although in reality they are also heavy users of C++ and Java. They have also created
their own language called Go (2009).
5| @Apptrainers

 Easy to learn and powerful programming language
 It has efficient high-level data structures and a simple but effective approach to object-
oriented programming.
 Freely available in source or binary form for all major platforms from the Python Web
site, https://www.python.org/
The Python interpreter is easily extended with new functions and data types implemented
in C or C++ (or other languages callable from C).
Python is also suitable as an extension language for customizable applications.
Widely used (Google, NASA, Quora).
6
| @Apptrainers

When you run python program an interpreter will parse python program line by line basis, as
compared to compiled languages like C or C++, where compiler first compiles the program
and then start running.
Difference is that interpreted languages are little bit slow as compared to compiled languages.
7| @Apptrainers

 In python you don’t need to define variable data type ahead of time, python automatically
guesses the data type of the variable based on the type of value it contains.
8| @Apptrainers

Python codes are usually 1/3 or 1/5 of the java code. It means we can write less code in Python
to achieve the same thing as in Java.
9| @Apptrainers

 There are many good options for saving and manipulating code
Sublime text (unlimited free trial available)
Notepad++
Xcode (Mac)
TextWrangler (Mac)
TextEdit (Mac)
 Now there are multiple platforms for taking online courses for free
Coursera
Edx
Stanford Online
Khan Academy
Udacity
| @Apptrainers 10

 To download Python follow the instructions on the
official website!
https://www.python.org/
11| @Apptrainers

I would strongly recommend this video:
https://www.youtube.com/watch?v=HW29067qVWk
12| @Apptrainers

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
https://github.com
14| @Apptrainers

“GitHub is a code hosting platform for version control and collaboration. It lets you and others
work together on projects from anywhere”.
GitHub accounts can be public (free) or private (not free)
A repository is usually used to organize a single project, It contains folders and files, images,
videos, spreadsheets, and data sets – anything your project needs.
15| @Apptrainers

Master in a repository:The final version
Branch:To try out new ideas that don’t affect
the master unless pull request is accepted. Any
changes committed to branch reflects for you
to keep track of different versions
Adding Commits:To Keep track (history) of
user progress on a branch or master.
Forking a repository: creates a copy of Repo.
Submit a pull request to owner so that the
owner can incorporate changes.
16| @Apptrainers

 Download Python and Jupyter Notebook
 Write a python code to print your name, your id, and your favorite quote!
 Save the project as .html and as .ipynb
 Install git and create a GitHub account
 Upload your first project as .html to e-learning
 Upload your first project as .ipynb to your Github account
Share the link of your Github with me on e-learning
17| @Apptrainers

https://www.tutorialspoint.com/execute_python_online.php
https://www.onlinegdb.com/online_python_compiler
18| @Apptrainers

You can type things directly into a running Python session
19| @Apptrainers

Most of the programming languages like C, C++, Java use braces { } to define a block of code.
Python uses indentation.
A code block (body of a function, loop etc.) starts with indentation and ends with the first
unindented line.The amount of indentation is up to you, but it must be consistent throughout
that block.
Generally four whitespaces are used for indentation and is preferred over tabs. Here is an
example.
for i in range(1,11):
print(i)
if i == 5:
break
Incorrect indentation will result into IndentationError.
20| @Apptrainers

In Python, we use the hash (#) symbol to start writing a comment.
It extends up to the newline character. Comments are for programmers for better
understanding of a program. Python Interpreter ignores comment.
#This is a comment
#print out Hello
print('Hello’)
If we have comments that extend multiple lines, one way of doing it is to use hash (#) in the
beginning of each line.
Another way of doing this is to use triple quotes, either ’‘ ' or ” ” ".
These triple quotes are generally used for multi-line strings. But they can be used as multi-line
comment as well.
"""This is also
a perfect example
of multi-line comments"""
21| @Apptrainers

expression: A data value or set of operations to compute a value.
Examples: 1 + 4 * 3
42
Arithmetic operators we will use:
+ - * / addition, subtraction, multiplication, division
% modulus, a.k.a. remainder
** exponentiation
precedence: Order in which operations are computed.
* / % ** have a higher precedence than + -
1 + 3 * 4 is 13
Parentheses can be used to force a certain order of evaluation.
(1 + 3) * 4 is 16
Operat
or
Description Example
= Assignment num = 7
+ Addition num = 2 + 2
- Subtraction num = 6 - 4
* Multiplication num = 5 * 4
/ Division num = 25 / 5
% Modulo num = 8 % 3
** Exponent num = 9 ** 2
22| @Apptrainers

When we divide integers with / , the quotient is also an integer.
 35 / 5 is 7
 84 / 10 is 8
 156 / 100 is 1
The % operator computes the remainder from a division of integers.
 The operators + - * / % ** ( ) all work for real numbers.
 The / produces an exact answer: 15.0 / 2.0 is 7.5
 The same rules of precedence also apply to real numbers:
Evaluate ( ) before * / % before + -
 When integers and reals are mixed, the result is a real number.
 Example: 1 / 2.0 is 0.5
The conversion occurs on a
per-operator basis
7 / 3 * 1.2 + 3 / 2
2 * 1.2 + 3 / 2
2.4 + 3 / 2
2.4 + 1
3.4
23| @Apptrainers

Python has useful commands for performing calculations.
Command name Description
abs(value) absolute value
ceil(value) rounds up
cos(value) cosine, in radians
floor(value) rounds down
log(value) logarithm, base e
log10(value) logarithm, base 10
max(value1, value2) larger of two values
min(value1, value2) smaller of two values
round(value) nearest whole number
sin(value) sine, in radians
sqrt(value) square root
Constant Description
e 2.7182818...
pi 3.1415926...
To use many of these commands, you
must write the following at the top of your
Python program:
from math import *
24| @Apptrainers

variable: A named piece of memory that can store a value.
Usage:
 Compute an expression's result,
 store that result into a variable,
 and use that variable later in the program.
assignment statement: Stores a value into a variable.
Syntax:
name = value
Examples: x = 5 gpa = 3.14
x 5 gpa 3.14
A variable that has been given a value can be used in expressions.
x + 4 is 9
Exercise: Evaluate the quadratic equation for a given a, b, and c.
25| @Apptrainers

 print : Produces text output on the console.
 Syntax:
print ("Message”)
print (Expression)
 Prints the given text message or expression value on the console, and moves the cursor down to the
next line.
print (Item1, Item2, ..., ItemN)
 Prints several messages and/or expressions on the same line.
 Examples:
print ("Hello, world!”)
age = 45
print ("You have", 65 - age, "years until retirement”)
Output:
Hello, world!
You have 20 years until retirement 26| @Apptrainers

 input : Reads a number from user input.
 You can assign (store) the result of input into a variable.
 Example:
age = input("How old are you? ")
print ("Your age is", age)
print ("You have", 65 - age, "years until retirement”)
Output:
How old are you? 53
Your age is 53
You have 12 years until retirement
 Exercise: Write a Python program that prompts the user for his/her amount of money, then
reports how many Nintendo Wiis the person can afford, and how much more money he/she
will need to afford an additional Wii.
27| @Apptrainers

for loop: Repeats a set of statements over a group of values.
 Syntax:
for variableName in groupOfValues:
statements
 We indent the statements to be repeated with tabs or spaces.
 variableName gives a name to each value, so you can refer to it in the statements.
 groupOfValues can be a range of integers, specified with the range function.
 Example:
for x in range(1, 6):
print (x, "squared is", x * x)
Output:
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
28| @Apptrainers

The range function specifies a range of integers:
 range(start, stop) - the integers between start (inclusive)
and stop (exclusive)
It can also accept a third value specifying the change between values.
 range(start, stop, step) - the integers between start (inclusive)
and stop (exclusive) by step
Example:
for x in range(5, 0, -1):
print (x)
print (”Hello!”)
Output:
5
4
3
2
1
Hello!
30| @Apptrainers

 Some loops incrementally compute a value that is initialized outside the loop. This is
sometimes called a cumulative sum.
sum = 0
for i in range(1, 11):
sum = sum + (i * i)
print ("sum of first 10 squares is", sum)
Output:
sum of first 10 squares is 385
Exercise: Write a Python program that computes the factorial of an integer.
31| @Apptrainers

if statement: Executes a group of statements only if a certain condition is true. Otherwise,
the statements are skipped.
Syntax:
if condition:
statements
Example:
gpa = 3.4
if gpa > 2.0:
print ("Your application is accepted.”)
32| @Apptrainers

if/else statement: Executes one block of statements if a certain
condition is True, and a second block of statements if it is False.
 Syntax:
if condition:
statements
else:
statements
Example:
gpa = 1.4
if gpa > 2.0:
print "Welcome to JUST University!"
else:
print "Your application is denied."
Multiple conditions can be chained with elif ("else if"):
if condition:
statements
elif condition:
statements
else:
statements
33| @Apptrainers

while loop: Executes a group of statements as long as a condition is True.
good for indefinite loops (repeat an unknown number of times)
Syntax:
while condition:
statements
Example:
number = 1
while number < 200:
print number,
number = number * 2
Output:
1 2 4 8 16 32 64 128
34| @Apptrainers

Many logical expressions use relational operators:
Logical expressions can be combined with logical operators:
Exercise: Write code to display and count the factors of a number.
Operator Example Result
and 9 != 6 and 2 < 3 True
or 2 == 3 or -1 < 5 True
not not 7 > 0 False
Operator Meaning Example Result
== equals 1 + 1 == 2 True
!= does not equal 3.2 != 2.5 True
< less than 10 < 5 False
> greater than 10 > 5 True
<= less than or equal to 126 <= 100 False
>= greater than or equal to 5.0 >= 5.0 True
35| @Apptrainers

 string: A sequence of text characters in a program.
 Strings start and end with quotation mark " or apostrophe ' characters.
 Examples:
"hello"
"This is a string"
"This, too, is a string. It can be very long!"
 A string may not span across multiple lines or contain a " character.
"This is not
a legal String."
"This is not a "legal" String either."
 A string can represent characters by preceding them with a backslash.
 t tab character
 n new line character
 " quotation mark character
 backslash character
 Example: "HellottherenHow are you?"
36| @Apptrainers

 Characters in a string are numbered with indexes starting at 0:
 Example:
name = "P. Diddy"
 Accessing an individual character of a string:
variableName [ index ]
 Example:
print name, "starts with", name[0]
Output:
P. Diddy starts with P
index 0 1 2 3 4 5 6 7
character P . D i d d y
37| @Apptrainers

len(string) - number of characters in a string (including spaces)
str.lower(string) - lowercase version of a string
str.upper(string) - uppercase version of a string
Example:
name = "Martin Douglas Stepp"
length = len(name)
big_name = str.upper(name)
print big_name, "has", length, "characters"
Output:
MARTIN DOUGLAS STEPP has 20 characters
38| @Apptrainers

A compound data type:
[0]
[2.3, 4.5]
[5, "Hello", "there", 9.8]
[]
Use len() to get the length of a list
>>> names = [“Ben",“Chen",“Yaqin"]
>>> len(names)
3
39| @Apptrainers

http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html
42| @Apptrainers

Certain features of Python are not loaded by default
In order to use these features, you’ll need to import the modules that contain them.
E.g.
import matplotlib.pyplot as plt
import numpy as np
43| @Apptrainers

f = 7 / 2
# in python 2, f will be 3, unless “from __future__ import division”
f = 7 / 2 # in python 3 f = 3.5
f = 7 // 2 # f = 3 in both python 2 and 3
f = 7 / 2. # f = 3.5 in both python 2 and 3
f = 7 / float(2) # f is 3.5 in both python 2 and 3
f = int(7 / 2) # f is 3 in both python 2 and 3
44| @Apptrainers

 Get the i-th element of a list
x = [i for i in range(10)] # is the list [0, 1, ..., 9]
zero = x[0] # equals 0, lists are 0-indexed
one = x[1] # equals 1
nine = x[-1] # equals 9, 'Pythonic' for last element
eight = x[-2] # equals 8, 'Pythonic' for next-to-last element
one_to_four = x[1:5] # [1, 2, 3, 4]
first_three = x[:3] # [0, 1, 2]
last_three = x[-3:] # [7, 8, 9]
three_to_end = x[3:] # [3, 4, ..., 9]
without_first_and_last = x[1:-1] # [1, 2, ..., 8]
copy_of_x = x[:] # [0, 1, 2, ..., 9]
another_copy_of_x = x[:3] + x[3:] # [0, 1, 2, ..., 9]
45| @Apptrainers

1 in [1, 2, 3] # True
0 in [1, 2, 3] # False
x = [1, 2, 3]
y = [4, 5, 6]
x.extend(y) # x is now [1,2,3,4,5,6]
x = [1, 2, 3]
y = [4, 5, 6]
z = x + y # z is [1,2,3,4,5,6]; x is unchanged.
x, y = [1, 2] # x is 1 and y is 2
[x, y] = 1, 2 # same as above
x, y = [1, 2] # same as above
x, y = 1, 2 # same as above
_, y = [1, 2] # y is 2, didn't care about the first element
46| @Apptrainers

>>> a = ['Mary', 'had', 'a', 'little', 'lamb']
>>> for i in range(len(a)):
... print(i, a[i])
...
0 Mary
1 had
2 a
3 little
4 lamb
47| @Apptrainers

What are the expected output for the following code?
a = list(range(10))
b = a
b[0] = 100
print(a)
a = list(range(10))
b = a[:]
b[0] = 100
print(a)
[100, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
a = [0, 1, 2, 3, 4]
b = a
c = a[:]
a == b
Out[129]: True
a is b
Out[130]: True
a == c
Out[132]: True
a is c
Out[133]: False
48| @Apptrainers

Similar to lists, but are immutable
a_tuple = (0, 1, 2, 3, 4)
Other_tuple = 3, 4
Another_tuple = tuple([0, 1, 2, 3, 4])
Hetergeneous_tuple = (‘john’, 1.1, [1, 2])
Can be sliced, concatenated, or repeated
a_tuple[2:4] # will print (2, 3)
Cannot be modified
a_tuple[2] = 5
TypeError: 'tuple' object does not support item assignment
Note: tuple is defined by comma, not
parentheses, which is only used for
convenience and grouping elements. So a = (1)
is not a tuple, but a = (1,) is.
49| @Apptrainers

Useful for returning multiple values from functions
Tuples and lists can also be used for multiple assignments
def sum_and_product(x, y):
return (x + y),(x * y)
sp = sum_and_product(2, 3) # equals (5, 6)
s, p = sum_and_product(5, 10) # s is 15, p is 50
x, y = 1, 2
[x, y] = [1, 2]
(x, y) = (1, 2)
x, y = y, x
50| @Apptrainers

a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0]=a #### ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a)
my_tuple[0]=a #### No ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0]=5 #### ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0][0]=5 #### No ERROR
51| @Apptrainers

A dictionary associates values with unique keys
empty_dict = {} # Pythonic
empty_dict2 = dict() # less Pythonic
grades = { "Joel" : 80, "Tim" : 95 } # dictionary literal
joels_grade = grades["Joel"] # equals 80
grades["Tim"] = 99 # replaces the old value
grades["Kate"] = 100 # adds a third entry
num_students = len(grades) # equals 3
• Access/modify value with key
try:
kates_grade = grades["Kate"]
except KeyError:
print "no grade for Kate!" 52| @Apptrainers

Check for existence of key
joel_has_grade = "Joel" in grades # True
kate_has_grade = "Kate" in grades # False
joels_grade = grades.get("Joel", 0) # equals 80
kates_grade = grades.get("Kate", 0) # equals 0
no_ones_grade = grades.get("No One") # default
default is None
• Use “get” to avoid keyError and add default value
• Get all items
all_keys = grades.keys() # return a list of all keys
all_values = grades.values() # return a list of all values
all_pairs = grades.items() # a list of (key, value) tuples
#Which of the following is faster?
'Joel' in grades # faster. Hashtable
'Joel' in all_keys # slower. List.
In python3,The
following will not
return lists but
iterable objects
54| @Apptrainers

a = [0, 0, 0, 1]
any(a)
Out[135]: True
all(a)
Out[136]: False
55| @Apptrainers

try:
print 0 / 0
except ZeroDivisionError:
print ("cannot divide by zero")
https://docs.python.org/3/tutorial/errors.ht
ml
56| @Apptrainers

Functions are defined using def
def double(x):
"""this is where you put an optional docstring
that explains what the function does.
for example, this function multiplies its
input by 2"""
return x * 2
• You can call a function after it is defined
z = double(10) # z is 20
• You can give default values to parameters
def my_print(message="my default message"):
print (message)
my_print("hello") # prints 'hello'
my_print() # prints 'my default message‘ 57| @Apptrainers

Sometimes it is useful to specify arguments by name
def subtract(a=0, b=0):
return a – b
subtract(10, 5) # returns 5
subtract(0, 5) # returns -5
subtract(b = 5) # same as above
subtract(b = 5, a = 0) # same as above
58| @Apptrainers

Functions are objects too
In [12]: def double(x): return x * 2
...: DD = double;
...: DD(2)
...:
Out[12]: 4
In [16]: def apply_to_one(f):
...: return f(1)
...: x=apply_to_one(DD)
...: x
...:
Out[16]: 2
59| @Apptrainers

Small anonymous functions can be created with the lambda keyword.
The power of lambda is better shown when you use them
as an anonymous function inside another function.
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2)
mytripler = myfunc(3)
print(mydoubler(11))
print(mytripler(11))
A lambda function can take any number of arguments, but can only
have one expression.
x = lambda a : a + 10
print(x(5))
x = lambda a, b, c : a * b - c
print(x(5, 6, 2))
60| @Apptrainers

pairs = [(2, 'two'), (3, 'three'), (1, 'one'), (4, 'four')]
pairs.sort(key = lambda pair: pair[0])
print (pairs)
Out[22]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
def getKey(pair): return pair[0]
pairs.sort(key=getKey)
print (pairs)
Out[107]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')
61| @Apptrainers

A very convenient way to create a new list
squares = [x * x for x in range(5)]
print (squares)
Out[52]: [0, 1, 4, 9, 16]
squares=[0,0,0,0,0]
for x in range(5):
squares[x] = x * x
print (squares)
Out[64]: [0, 1, 4, 9, 16] 62| @Apptrainers

In [68]: even_numbers = []
In [69]: for x in range(5):
...: if x % 2 == 0:
...: even_numbers.append(x)
...: even_numbers
Out[69]: [0, 2, 4]
In [65]: even_numbers = [x for x in range(5) if x % 2 == 0]
In [66]: even_numbers
Out[66]: [0, 2, 4]
Can also be used to filter list
63| @Apptrainers

More complex examples:
# create 100 pairs (0,0) (0,1) ... (9,8), (9,9)
pairs = [(x, y)
for x in range(10)
for y in range(10)]
# only pairs with x < y,
# range(lo, hi) equals
# [lo, lo + 1, ..., hi - 1]
increasing_pairs = [(x, y)
for x in range(10)
for y in range(x + 1, 10)]
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3) …etc
64| @Apptrainers

Convenient tools in python to apply function to sequences of data
def double(x): return 2*x
b=range(5)
list(map(double, b))
Out[203]: [0, 2, 4, 6, 8]
In [204]: double(b)
Traceback (most recent call last):
TypeError: unsupported operand type(s) for *: 'int' and 'range'
def double(x): return 2*x
print ([double(i) for i in range(5)])
Out[205]: [0, 2, 4, 6, 8]
65| @Apptrainers

map_output = map(lambda x: x*2, [1, 2, 3, 4])
print(map_output) # Output: map object: <map object at 0x04D6BAB0>
list_map_output = list(map_output)
print(list_map_output) # Output: [2, 4, 6, 8]
map(lambda x : x*2, [1, 2, 3, 4]) #Output [2, 4, 6, 8]
map(lambda x, y: x + y, list_a, list_b) # Output: [11, 22, 33]
66| @Apptrainers

def is_even(x): return x%2==0
a=[0, 1, 2, 3]
list(filter(is_even, a))
Out[208]: [0, 2]
In [209]: [a[i] for i in a if is_even(i)]
Out[209]: [0, 2]
a = [1, 2, 3, 4, 5, 6]
print list(filter(lambda x : x % 2 == 0, a)) # Output: [2, 4, 6]
67| @Apptrainers

In [216]: from functools import reduce
In [217]: reduce(lambda x, y: x+y, range(10))
Out[217]: 45
In [220]: reduce(lambda x, y: x*y, [1, 2, 3, 4])
Out[220]: 24
68| @Apptrainers

Useful to combined multiple lists into a list of tuples
In [238]: list(zip(['a', 'b', 'c'], [1, 2, 3], ['A', 'B', 'C']))
Out[238]: [('a', 1, 'A'), ('b', 2, 'B'), ('c', 3, 'C')]
In [245]: names = ['James', 'Tom', 'Mary']
...: grades = [100, 90, 95]
...: list(zip(names, grades))
...:
Out[245]: [('James', 100), ('Tom', 90), ('Mary', 95)]
69| @Apptrainers

 file object = open(file_name [,
access_mode])
access_mode − The access_mode
determines the mode in which the file
has to be opened, i.e., read, write,
append, etc. A complete list of possible
values is given below in the table.This
is optional parameter and the default
file access mode is read (r).
70| @Apptrainers

read(): It reads the entire file and returns it contents in the form of a string
readline(): It reads the first line of the file i.e till a newline character or an EOF in case of a file
having a single line and returns a string
readlines(): It reads the entire file line by line and returns a list of line strings
1 hello 40 50 hi
This is my course
Welcome to this course n wish you all the best
f = open("my_file2.txt", 'w')
f.write("Hello Everyone!")
72| @Apptrainers

Notice how each piece of data is
separated by a comma.
73| @Apptrainers

Numpy
Numerical Computing in Python
2

What is Numpy?
• Numpy, Scipy, and Matplotlib provide MATLAB-
like functionality in python.
• Numpy Features:
 Typed multidimentional arrays (matrices)
 Fast numerical computations (matrix math)
 High-level math functions
3
|@Apptrainers

Why do we need NumPy
Let’s see for ourselves!
4
|@Apptrainers

Why do we need NumPy
• Python does numerical computations slowly.
• 1000 x 1000 matrix multiply
 Python triple loop takes > 10 min.
 Numpy takes ~0.03 seconds
5
|@Apptrainers

NumPy Overview
1. Arrays
2. Shaping and transposition
3. Mathematical Operations
4. Indexing and slicing
5. Broadcasting
6
|@Apptrainers

Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
7
|@Apptrainers

Arrays
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
𝑝 𝑥
𝑝 𝑦
𝑝 𝑧
𝑎11 ⋯ 𝑎1𝑛
⋮ ⋱ ⋮
𝑎 𝑚1 ⋯ 𝑎 𝑚𝑛
8
|@Apptrainers

Arrays
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
9
|@Apptrainers

Arrays
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
10
|@Apptrainers

Arrays
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
11
|@Apptrainers

Arrays, Basic Properties
import numpy as np
a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
print a.ndim, a.shape, a.dtype
1. Arrays can have any number of dimensions, including zero (a scalar).
2. Arrays are typed: np.uint8, np.int64, np.float32, np.float64
3. Arrays are dense. Each element of the array exists and has the same type.
12
|@Apptrainers

Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
13
|@Apptrainers

Arrays, creation
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
14
|@Apptrainers

Arrays, creation
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
15
|@Apptrainers

Arrays, creation
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
16
|@Apptrainers

Arrays, creation
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
17
|@Apptrainers

Arrays, creation
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
18
|@Apptrainers

Arrays, creation
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
19
|@Apptrainers

Arrays, creation
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
21
|@Apptrainers

Arrays, danger zone
• Must be dense, no holes.
• Must be one type
• Cannot combine arrays of different shape
22
|@Apptrainers

Shaping
a = np.array([1,2,3,4,5,6])
a = a.reshape(3,2)
a = a.reshape(2,-1)
a = a.ravel()
1. Total number of elements cannot change.
2. Use -1 to infer axis shape
3. Row-major by default (MATLAB is column-major)
23
|@Apptrainers

import numpy as np
a = np.array([1,2,3,4,5,6])
print(a)
print('-'*20)
b=a.reshape(3,2)
print(b)
print('-'*20)
c=a.reshape(2,-1)
print(c)
print('-'*20)
d= a.ravel()
print(d)
24
|@Apptrainers

Return values
• Numpy functions return either views or copies.
• Views share data with the original array, like
references in Java/C++. Altering entries of a
view, changes the same entries in the original.
• The numpy documentation says which functions
return views or copies
• np.copy, np.view make explicit copies and views.
26
|@Apptrainers

Transposition
a = np.arange(10).reshape(5,2)
a = a.T
a = a.transpose((1,0))
np.transpose permutes axes.
a.T transposes the first two axes.
27
|@Apptrainers

Saving and loading arrays
np.savez(‘data.npz’, a=a)
data = np.load(‘data.npz’)
a = data[‘a’]
1. NPZ files can hold multiple arrays
2. np.savez_compressed similar.
30
|@Apptrainers

Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
31
|@Apptrainers

32
|@Apptrainers

33
|@Apptrainers

34
|@Apptrainers

Math, upcasting
Just as in Python and Java, the result of a math
operator is cast to the more general or precise
datatype.
uint64 + uint16 => uint64
float32 / int32 => float32
Warning: upcasting does not prevent
overflow/underflow. You must manually cast first.
Use case: images often stored as uint8. You should
convert to float32 or float64 before doing math.
35
|@Apptrainers

Math, universal functions
Also called ufuncs
Element-wise
Examples:
 np.exp
 np.sqrt
 np.sin
 np.cos
 np.isnan
36
|@Apptrainers

Also called ufuncs
Element-wise
Examples:
 np.exp
 np.sqrt
 np.sin
 np.cos
 np.isnan
37
|@Apptrainers

Also called ufuncs
Element-wise
Examples:
 np.exp
 np.sqrt
 np.sin
 np.cos
 np.isnan
38
|@Apptrainers

Indexing
x[0,0] # top-left element
x[0,-1] # first row, last column
x[0,:] # first row (many entries)
x[:,0] # first column (many entries)
Notes:
 Zero-indexing
 Multi-dimensional indices are comma-separated (i.e., a
tuple)
39
|@Apptrainers

Python Slicing
Syntax: start:stop:step
a = list(range(10))
a[:3] # indices 0, 1, 2
a[-3:] # indices 7, 8, 9
a[3:8:2] # indices 3, 5, 7
a[4:1:-1] # indices 4, 3, 2 (this one is tricky)
41
|@Apptrainers

Axes
a.sum() # sum all entries
a.sum(axis=0) # sum over rows
a.sum(axis=1) # sum over columns
a.sum(axis=1, keepdims=True)
1. Use the axis parameter to control which axis
NumPy operates on
2. Typically, the axis specified will disappear,
keepdims keeps all dimensions
43
|@Apptrainers

Broadcasting
a = a + 1 # add one to every element
When operating on multiple arrays, broadcasting rules are
used.
Each dimension must match, from right-to-left
1. Dimensions of size 1 will broadcast (as if the value was
repeated).
2. Otherwise, the dimension must have the same shape.
3. Extra dimensions of size 1 are added to the left as needed.
45
|@Apptrainers

Broadcasting example
Suppose we want to add a color value to an image
a.shape is 100, 200, 3
b.shape is 3
a + b will pad b with two extra dimensions so it
has an effective shape of 1 x 1 x 3.
So, the addition will broadcast over the first and
second dimensions.
46
|@Apptrainers

Broadcasting failures
If a.shape is 100, 200, 3 but b.shape is 4 then a + b
will fail. The trailing dimensions must have the
same shape (or be 1)
47
|@Apptrainers

Tips to avoid bugs
1. Know what your datatypes are.
2. Check whether you have a view or a copy.
3. Know np.dot vs np.multiply.
48
|@Apptrainers

49
numpy.dot
numpy.dot(a, b, out=None)
Dot product of two arrays. Specifically,
• If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
• If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.
• If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a *
b is preferred.
• If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
• If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and
the second-to-last axis of b:
dot(a, b)[i,j,k,m] =
(a[i,j,:] * b[k,:,m])
|@Apptrainers

51
Numpy.multiply
|@Apptrainers

What is Pandas?
Pandas is a Python module, which is rounding up the capabilities of Numpy,
Scipy and Matplotlab. The word pandas is an acronym which is derived
from:
"Python and data analysis" and "panel data".
There is often some confusion about whether Pandas is an alternative to
Numpy, SciPy and Matplotlib.
The truth is that it is built on top of Numpy. This means that Numpy is
required by pandas.
Scipy and Matplotlib on the other hand are not required by pandas but they
are extremely useful. That's why the Pandas project lists them as "optional
dependency".
| @Apptrainers

What is Pandas?
• Pandas is a software library written for the Python programming
language.
• It is used for data manipulation and analysis.
• It provides special data structures and operations for the
manipulation of numerical tables and time series.
| @Apptrainers| @Apptrainers

Common Data Structures in Pandas
• Series
• Data Frame
| @Apptrainers| @Apptrainers

Series
• A Series is a one-dimensional labelled array-like object.
• It is capable of holding any data type, e.g. integers, floats, strings,
Python objects, and so on.
• It can be seen as a data structure with two arrays: one functioning as
the index, i.e. the labels, and the other one contains the actual data.
| @Apptrainers

Example
import pandas as pd
S = pd.Series([11, 28, 72, 3, 5, 8])
S
The above code returns:
0 11
1 28
2 72
3 3
4 5
5 8
dtype: int64
| @Apptrainers

• We can directly access the index and the values of our Series S:
print(S.index)
print(S.values)
RangeIndex(start=0, stop=6, step=1)
[11 28 72 3 5 8]
| @Apptrainers

• If we compare this to creating an array in numpy, there are still lots of
similarities:
import numpy as np
X = np.array([11, 28, 72, 3, 5, 8])
print(X)
print(S.values)
# both are the same type:
print(type(S.values),
type(X))
[11 28 72 3 5 8]
[11 28 72 3 5 8]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
| @Apptrainers

Another example:
fruits = ['apples', 'oranges', 'cherries', 'pears']
quantities = [20, 33, 52, 10]
S = pd.Series(quantities, index=fruits)
S
Output:
apples 20
oranges 33
cherries 52
pears 10
dtype: int64
| @Apptrainers

If we add two series with the same indices, we get a new series with the same
index and the corresponding values will be added:
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits)
print(S + S2)
print(“sum of S: ", sum(S))
Output:
apples 37
oranges 46
cherries 83
pears 42
dtype: int64
sum of S: 115
| @Apptrainers

The indices do not have to be the same for the Series addition. The index will be the
"union" of both indices. If an index doesn't occur in both Series, the value for this Series
will be NaN:
fruits = ['peaches', 'oranges', 'cherries', 'pears']
fruits2 = ['raspberries', 'oranges', 'cherries', 'pears']
S2 = pd.Series([17, 13, 31, 32], index=fruits2)
print(S + S2)
Output:
cherries 83.0
oranges 46.0
peaches NaN
pears 42.0
raspberries NaN
dtype: float64
| @Apptrainers

fruits_ro = ["mere", "portocale", "cireșe", "pere"]
S2 = pd.Series([17, 13, 31, 32], index=fruits_ro)
print(S+S2)
Output:
apples NaN
cherries NaN
cireșe NaN
mere NaN
oranges NaN
pears NaN
pere NaN
portocale NaN
dtype: float64
| @Apptrainers

It's possible to access single values of a Series or more than one value
by a list of indices:
print(S['apples'])
20
print(S[['apples', 'oranges', 'cherries']])
apples 20
oranges 33
cherries 52
dtype: int64
| @Apptrainers

Similar to Numpy we can use scalar operations or mathematical functions on a series:
import numpy as np
print((S + 3) * 4)
print("======================")
print(np.sin(S))
Output:
apples 92
oranges 144
cherries 220
pears 52
dtype: int64
======================
apples 0.912945
oranges 0.999912
cherries 0.986628
pears -0.544021
dtype: float64
| @Apptrainers

Pandas.Series.Apply
Series.apply(func, convert_dtype=True, args=(), **kwds)
Parameter Meaning
func
a function, which can be a NumPy function that will be
applied to the entire Series or a Python function that
will be applied to every single value of the series
convert_dtype
A boolean value. If it is set to True (default), apply will
try to find better dtype for elementwise function
results. If False, leave as dtype=object
args
Positional arguments which will be passed to the
function "func" additionally to the values from the
series.
**kwds
Additional keyword arguments will be passed as
keywords to the function
| @Apptrainers

S.apply(np.sin)
apples 0.912945
oranges 0.999912
cherries 0.986628
pears -0.544021
dtype: float64
| @Apptrainers

• We can also use Python lambda functions. Let's assume, we have the
following task: test the amount of fruit for every kind. If there are less
than 50 available, we will augment the stock by 10:
S.apply(lambda x: x if x > 50 else x+10 )
apples 30
oranges 43
cherries 52
pears 20
dtype: int64
| @Apptrainers

Filtering with a Boolean array:
S[S>30]
oranges 33
cherries 52
dtype: int64
| @Apptrainers

• A series can be seen as an ordered Python dictionary with a fixed
length.
"apples" in S
True
| @Apptrainers

• We can even pass a dictionary to a Series object, when we create it.
We get a Series with the dict's keys as the indices. The indices will be
sorted.
cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235,
"Rome": 2874038, "Paris": 2273305, "Vienna": 1805681,
"Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000,
"Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900,
"Milan": 1350680}
city_series = pd.Series(cities)
print(city_series)

NaN
One problem in dealing with data analysis tasks consists in missing
data. Pandas makes it as easy as possible to work with missing data.
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart",
"Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
my_city_series

• Due to the NaN values the population values for the other cities are
turned into floats. There is no missing data in the following examples,
so the values are int:
my_cities = ["London", "Paris", "Berlin", "Hamburg"]
my_city_series

The Methods isnull() and notnull()
"Hamburg"]
print(my_city_series.isnull())
| @Apptrainers

print(my_city_series.notnull())

• We get also a NaN, if a value in the dictionary has a None:
d = {"a":23, "b":45, "c":None, "d":0}
S = pd.Series(d)
print(S)
| @Apptrainers

print(pd.isnull(S))
| @Apptrainers

Print(pd.notnull(S))
| @Apptrainers

Filtering out Missing Data
It's possible to filter out missing data with the Series method dropna. It
returns a Series which consists only of non-null data:
import pandas as pd
cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome":
2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425,
"Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119,
"Barcelona":1602386, "Munich": 1493900, "Milan": 1350680}
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"]
print(my_city_series.dropna())
| @Apptrainers

Filling in Missing Data
• In many cases you don't want to filter out missing data, but you want to fill in
appropriate data for the empty gaps. A suitable method in many situations will be
fillna:
print(my_city_series.fillna(0))
London 8615246.0
Paris 2273305.0
Zurich 0.0
Berlin 3562166.0
Stuttgart 0.0
Hamburg 1760433.0
dtype: float64
| @Apptrainers

• If we call fillna with a dictionary, we can provide the appropriate data, i.e.
the population of Zurich and Stuttgart:
missing_cities = {"Stuttgart":597939, "Zurich":378884}
my_city_series.fillna(missing_cities)
London 8615246.0
Paris 2273305.0
Zurich 378884.0
Berlin 3562166.0
Stuttgart 597939.0
Hamburg 1760433.0
dtype: float64
| @Apptrainers

cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235,
"Rome": 2874038, "Paris": 2273305, "Vienna": 1805681,
"Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000,
"Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900,
"Milan": 1350680}
"Hamburg"]
my_city_series = my_city_series.fillna(0).astype(int)
print(my_city_series)
| @Apptrainers

London 8615246
Paris 2273305
Zurich 0
Berlin 3562166
Stuttgart 0
Hamburg 1760433
dtype: int64
| @Apptrainers

DataFrame
• The underlying idea of a DataFrame is based on spreadsheets. We
can see the data structure of a DataFrame as tabular and
spreadsheet-like.
• A DataFrame logically corresponds to a "sheet" of an Excel document.
• A DataFrame has both a row and a column index.
| @Apptrainers

• Like a spreadsheet or Excel sheet, a DataFrame object contains an
ordered collection of columns.
• Each column consists of a unique data type, but different columns can
have different types, e.g. the first column may consist of integers,
while the second one consists of Boolean values and so on.
• There is a close connection between the DataFrames and the Series
of Pandas.
• A DataFrame can be seen as a concatenation of Series, each Series
having the same index, i.e. the index of the DataFrame.
| @Apptrainers

import pandas as pd
years = range(2014, 2018)
shop1 = pd.Series([2409.14, 2941.01, 3496.83, 3119.55], index=years)
print(pd.concat([shop1, shop2, shop3]))
| @Apptrainers

• This result is not what we have intended or expected. The reason is
that concat used 0 as the default for the axis parameter. Let's do it
with "axis=1":
shops_df = pd.concat([shop1, shop2, shop3], axis=1)
print(shops_df)
| @Apptrainers

cities = ["Zürich", "Winterthur", "Freiburg"]
shops_df.columns = cities
print(shops_df)
# alternative way: give names to series:
shop1.name = "Zürich"
shop2.name = "Winterthur"
shop3.name = "Freiburg"
print("------")
shops_df2 = pd.concat([shop1, shop2, shop3], axis=1)
print(shops_df2)
| @Apptrainers

print(type(shops_df))
<class 'pandas.core.frame.DataFrame'>
| @Apptrainers

DataFrames from Dictionaries
cities = {"name": ["London", "Berlin", "Madrid", "Rome", "Paris",
"Vienna", "Bucharest", "Hamburg", "Budapest", "Warsaw",
"Barcelona", "Munich", "Milan"],
"population": [8615246, 3562166, 3165235, 2874038, 2273305,
1805681, 1803425, 1760433, 1754000, 1740119, 1602386, 1493900,
1350680],
"country": ["England", "Germany", "Spain", "Italy", "France", "Austria",
"Romania", "Germany", "Hungary", "Poland", "Spain", "Germany",
"Italy"]}
city_frame = pd.DataFrame(cities)
print(city_frame)
| @Apptrainers

Retrieving the Column Names
city_frame.columns.values
Output:
array(['country', 'name', 'population'], dtype=object)
| @Apptrainers

Custom Index
• We can see that an index (0,1,2, ...) has been automatically assigned
to the DataFrame. We can also assign a custom index to the
DataFrame object:
ordinals = ["first", "second", "third", "fourth", "fifth", "sixth", "seventh",
"eigth", "ninth", "tenth", "eleventh", "twelfth", "thirteenth"]
city_frame = pd.DataFrame(cities, index=ordinals)
print(city_frame)
| @Apptrainers

Rearranging the Order of Columns
We can also define and rearrange the order of the columns at the time
of creation of the DataFrame.
This makes also sure that we will have a defined ordering of our
columns, if we create the DataFrame from a dictionary.
Dictionaries are not ordered.
| @Apptrainers

city_frame = pd.DataFrame(cities, columns=["name", "country",
"population"])
print(city_frame)
| @Apptrainers

• But what if you want to change the column names and the ordering
of an existing DataFrame?
city_frame.reindex(["country", "name", "population"])
print(city_frame)
| @Apptrainers

• Now, we want to rename our columns. For this purpose, we will use
the DataFrame method 'rename'. This method supports two calling
conventions
• (index=index_mapper, columns=columns_mapper, ...)
• (mapper, axis={'index', 'columns'}, ...)
• We will rename the columns of our DataFrame into Romanian names
in the following example.
• We set the parameter inplace to True so that our DataFrame will be
changed instead of returning a new DataFrame, if inplace is set to
False, which is the default!
| @Apptrainers

city_frame.rename(columns={"name":"Nume", "country":"țară",
"population":"populație"}, inplace=True)
print(city_frame)
| @Apptrainers

Existing Column as the Index of a DataFrame
• We want to create a more useful index in the following example. We
will use the country name as the index, i.e. the list value associated to
the key "country" of our cities dictionary:
city_frame = pd.DataFrame(cities, columns=["name", "population"],
index=cities["country"])
print(city_frame)
| @Apptrainers

• Alternatively, we can change an existing DataFrame.
• We can use the method set_index to turn a column into an index.
• "set_index" does not work in-place, it returns a new data frame with
the chosen column as the index:
| @Apptrainers

city_frame2 = city_frame.set_index("country")
print(city_frame2)
| @Apptrainers

• We saw in the previous example that the set_index method returns a
new DataFrame object and doesn't change the original DataFrame. If
we set the optional parameter "inplace" to True, the DataFrame will
be changed in place, i.e. no new object will be created:
city_frame.set_index("country", inplace=True)
print(city_frame)
| @Apptrainers

Label-Indexing on the Rows
• So far we have indexed DataFrames via the columns. We will
demonstrate now, how we can access rows from DataFrames via the
locators 'loc' and 'iloc'. ('ix' is deprecated and will be removed in the
future)
city_frame = pd.DataFrame(cities, columns=("name",
"population"), index=cities["country"])
print(city_frame.loc["Germany"])
| @Apptrainers

Sum and Cumulative Sum
• We can calculate the sum of all the columns of a DataFrame or the
sum of certain columns:
print(city_frame.sum())
| @Apptrainers

city_frame["population"].sum()
33800614
| @Apptrainers

We can use "cumsum" to calculate the cumulative sum:
| @Apptrainers

Assigning New Values to Columns
• x is a Pandas Series.
• We can reassign the previously calculated cumulative sums to the
population column:
city_frame["population"] = x
print(city_frame)
| @Apptrainers

• Instead of replacing the values of the population column
with the cumulative sum, we want to add the cumulative
population sum as a new column with the name
"cum_population".
city_frame = pd.DataFrame(cities, columns=["country",
"population", "cum_population"], index=cities["name"])
print(city_frame)
| @Apptrainers

• We can see that the column "cum_population" is set to NaN, as we haven't
provided any data for it.
• We will assign now the cumulative sums to this column:
city_frame["cum_population"] =city_frame["population"].cumsum()
print(city_frame)
| @Apptrainers

• We can also include a column name which is not contained
in the dictionary, when we create the DataFrame from the
dictionary. In this case, all the values of this column will be
set to NaN:
"area", "population"], index=cities["name"])
print(city_frame)
| @Apptrainers

Accessing the Columns of a DataFrame
• There are two ways to access a column of a DataFrame. The result is
in both cases a Series:
# in a dictionary-like way:
print(city_frame["population"])
| @Apptrainers

# as an attribute
print(city_frame.population)
| @Apptrainers

print(type(city_frame.population))
<class 'pandas.core.series.Series'>
| @Apptrainers

city_frame.population
From the previous example, we can see that we
have not copied the population column. "p" is a
view on the data of city_frame.
| @Apptrainers

Assigning New Values to a Column
• The column area is still not defined. We can set all elements of the
column to the same value:
city_frame["area"] = 1572
print(city_frame)
| @Apptrainers

• In this case, it will be definitely better to assign the exact area to the
cities. The list with the area values needs to have the same length as
the number of rows in our DataFrame.
# area in square km:
area = [1572, 891.85, 605.77, 1285, 105.4, 414.6, 228, 755, 525.2, 517,
101.9, 310.4, 181.8]
# area could have been designed as a list, a Series, an array or a scalar
city_frame["area"] = area
print(city_frame)
| @Apptrainers

Sorting DataFrames
city_frame = city_frame.sort_values(by="area", ascending=False)
print(city_frame)
| @Apptrainers

Let's assume, we have only the areas of London, Hamburg and Milan.
The areas are in a series with the correct indices. We can assign this
series as well:
city_frame = pd.DataFrame(cities, columns=["country", "area",
"population"], index=cities["name"])
some_areas = pd.Series([1572, 755, 181.8], index=['London',
'Hamburg', 'Milan'])
city_frame['area'] = some_areas
print(city_frame)
| @Apptrainers

Inserting new columns into existing
DataFrames
• In the previous example we have added the column area at creation
time. Quite often it will be necessary to add or insert columns into
existing DataFrames.
• For this purpose the DataFrame class provides a method "insert",
which allows us to insert a column into a DataFrame at a specified
location:
insert(self, loc, column, value, allow_duplicates=False)`
| @Apptrainers

"population"], index=cities["name"])
idx = 1
city_frame.insert(loc=idx, column='area', value=area)
print(city_frame)
<class 'pandas.core.frame.DataFrame'>
| @Apptrainers

DataFrame from Nested Dictionaries
A nested dictionary of dictionaries can be passed to a DataFrame as
well.
The indices of the outer dictionary are taken as the columns and the
inner keys. i.e. the keys of the nested dictionaries, are used as the row
indices:
| @Apptrainers

• You like to have the years in the columns and the countries in the
rows? No problem, you can transpose the data:
growth_frame.T
| @Apptrainers

• Consider:
growth_frame = growth_frame.T
growth_frame2 = growth_frame.reindex(["Switzerland", "Italy",
"Germany", "Greece"]) # remove France
print(growth_frame2)
| @Apptrainers

Filling a DataFrame with random values:
import numpy as np
names = ['Frank', 'Eve', 'Stella', 'Guido', 'Lara']
index = ["January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December"]
df = pd.DataFrame((np.random.randn(12, 5)*1000).round(2),
columns=names, index=index)
print(df)
randn: returns sample or samples of random numbers from a normal
distribution with Mean as 1st argument and VAR as second argument.
| @Apptrainers

Summary
• So far we have covered the following:
• Python 3.0 (scalers, lists, dictionaries, loops, selection, functions)
• Numpy
• Pandas
• The reason for studying these packages is to be able to program the 5
steps in any data science process.
| @Apptrainers

علم البيانات - Data Sience

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to علم البيانات - Data Sience

Similar to علم البيانات - Data Sience (20)

Recently uploaded

Recently uploaded (20)

علم البيانات - Data Sience