2. Scientific Computing
● Fortran
● MATLAB
● Scilab
●
GNU-Octave
● Mathematica
●
Python
Fortran is the first widely used programming language for
scientific purposes.
Matlab is currently the most popular programming
language for scientific computing. It is created by
mathworks.inc and you need to spend a lot of money to get
the license. Matlab is relatively easy for non-programmer
scientist. It use “matrix” as basic data type. Good for you if
you are familiar with “discreet mathematics”
GNU-Octave & Scilab are Matlab open source alternatives.
Mathematica is a language created by Wolfram. It is also
used for scientific purposes.
Python is a general purpose programming language,
created by Guido Van Rossum. Although it is not created
for solely scientific purposes, it serves well since it has so
many free libraries.
3. Why Python?
● Free & Open source
● General Purpose
● Readable
●
Has many libraries
As Python is free and open source, you can use it for free.
It is important to use free and open source software,
because science should be able to be accessed by
everyone.
Python was not only created for scientific purposes, but
also for other general purposes. Python can be used for
almost everything such as web development, hacking, etc.
Python is readable because readability counts. Use a
readable language make your program greatly
maintainable.
Python has many libraries, like numpy, scipy, scikit-learn
and matplotlib. Those libraries allow you to do everything
you can do in Matlab by using Python.
4. How easy Python is?
Python Java
# show prompt and read input
your_name = raw_input (“What's your_name?”)
// show prompt
System.out.print(“What's your name?”);
// read input
Scanner keyboard = new Scanner(System.in);
String your_name = keyboard.nextLine();
# define array (or list)
fruits = [“Strawberry”, “Orange”, “Grape”]
// define array
String[] fruits = {"Strawberry", "Orange",
"Grape"};
# print out array component
for fruit in fruits:
print (fruit)
// print out array component
for(int i=0; i<fruits.length; i++){
System.out.print(fruits[i]);
}
# swap two variable's values
a = 5
b = 7
a, b = b, a
// swap two variable's values
int a = 5;
int b = 7;
int c = a;
a = b;
b = c;
To make it clear, I compare Python and Java. Not only that
you need to write less, but also Python code seems to be
more readable and intuitive.
With this easiness, I'm sure, even Hirasawa Yui can
understand Python in few days :)
5. Let's get started
● Debian and Ubuntu user can
do this:
– sudo apt-get install
python-numpy,
python-scipy,
python-sklearn,
python-matplotlib
● Windows user can download
the battery-included package
called canopy (formerly EPD
free)
– https://www.enthought.com/prod
ucts/canopy/
Generally, any Linux platform come with Python
pre-installed. So what you need to do is to install the
“scientific” libraries, like numpy, scipy, sklearn and
matplotlib. That's all.
Unlike Linux, Windows doesn't come with python
pre-installed. So, you need to install python before
installing the libraries. We don't need to worry since
Enthought Inc, has provided a package with battery
included. What you need to do is to download the package
and click the next button only (again, I think Hirasawa Yui
can also do this easily)
Since Python is cross-platform, you can make your codes
in Windows and run it on Linux or vice-versa. Java is not
the only “write once run everywhere” anymore.
6. Nice to meet you - はじめまして
● Python use indentation to remark the
blocks
● There is no such semicolons, dollar
signs, curly braces and other mythical
characters. You can write code in
python in normal way, no need to type
$0meThing LIKE '%this%'
● Python is case sensitive, so THIS is
different from This one, and not equal
to this. True is not true, and false is
differ from FALSE
This is just a brief introduction about Python's syntax. I
hope you got the idea.
Learning Python is actually easy, but mastering Python
needs more efforts. To have more understanding about
Python, please take a glance at
http://www.diveintopython.net/
In the meantime, you don't need to master it yet.
7. Show something
# integer value
print (1)
print (1+1)
# integer
variable
a = 2
print (1+a)
a = a+2
print (a)
print (1+a)
print (a*2-2)
# String
print (“Ok,..”)
# list
a = [1,2,3,4,5,6,7]
print(a)
print(a[0])
print(a[-1])
print(a[1:5])
print(a[1:])
print(a[:5])
# dictionary
b = {“name” : “Yui”, “position” :
“guitar”, “age” : 17}
print b[“name”]
print b[“age”]
Look, look
everyone...
You can show any values or variables by using the print
keyword.
In python 2.x, it is written as:
print “some_value”
But in python 3.x, it is written as:
print(“some_value”)
It is better to write a code that runs on both python 2.x and
3.x, that's why I use print(“some_value”) as an
example.
Python supports various data types including, but not
limited to, int, float, double, str, char, list,
dictionary and tuples. You can get more information
about Python data types on
http://www.diveintopython.net/native_data_types/index.html
8. Ask for something
# Ask for String value
name = raw_input(“Your name? ”)
print (“hello ”+name)
# Ask for int value, conversion needed
age = int(raw_input(“Your age? ”))
print (“your age is ”+age)
# Now, let's use dictionary
person = {}
person[“name”] = raw_input(“Your name? ”)
person[“age”] = int(raw_input(“Your age? ”))
print (“hello ”+name+“ you are ”+str(age)+“ years old”)
Do you have
candy?
raw_input is not the only way to ask for input in Python. We can
also use “input” keyword instead, which is easier yet less secure.
Using the input, you will be able to do something as follow:
have_candy = input(“Have some candy? ”)
if(have_candy):
how_many = input(“How many? ”)
how_many = how_many – 1
print (“Now you only have ”+how_many)
First, the program will show a prompt to ask whether you have
candies or not. You can write the True directly which would be
evaluated as boolean. Next (if you write True), the program will
ask how many candies you have and you can write an integer
value as the answer.
But this is less secure, since you can write any Python codes as
an answer, for example 1+1, 5==5, etc
In the other hand, raw_input will treat any of your answers as
str. Therefore, conversion is needed.
9. Branching: Choose one and only
one !!!
a = int(raw_input(“Gimme a number”))
if a<5:
print(“Too few”)
elif a<10:
print(“Fine...”)
else:
print(“Too much”)
Branching in Python is just straightforward. Notice that you don't
need to write curly braces nor “begin-end” for each
block-statement. But you need to write colon (:) at the end of a
condition.
If the a is less than 5, the program will show “Too few”. If the a is
not less than 5, but it is still less than 10, the program will show
“Fine”. Otherwise, if the a is not less than 5 and a is not less than
10, the program will show “Too much”.
For making comparison in Python, you can use the following
symbols:
<,>,<>,==,<=,>=
You can also use “and” or “or” statement like this:
if a<5 and a>3:
print (“a must be 4”)
Notice, that in Python we use elif, instead of else if.
10. Looping: Do it over, over and over
again
There are 2 kinds of looping in Python:
●
While
cake_list = [“tart”,”donut”,”shortcake”]
i=0
while i<len(cake_list):
print cake_list[i]
i = i+1
●
For
cake_list = [“tart”,”donut”,”shortcake”]
for cake in cake_list:
print cake
while hungry:
eat()
Using python
just to say that?
Unlike other programming language, the for in Python
uses list instead of index as its parameter.
However, you can still do something as follow:
for i in [0,1,2,3,4]:
print i
You can also use xrange to make it even easier:
for i in xrange(5):
print i
If you want to make a countdown loop, you can also use
the following:
for i in xrange(4,0,-1):
print i
11. np.array: Play with matrix easily
import numpy as np
a = np.array([[1,2,3][4,5,6]])
print a.shape
print a.T
print a.diagonal()
print a.max()
print a.min()
print a.mean()
print a.reshape(6,1)
print np.dot(a,a.T)
Homework
Done ...
Numpy library allow you to use Matlab like matrix. To make
a numpy array you can use the np.array(list)
numpy array has various functions, including (but not
limited to):
● shape: get the matrix shape
● transpose: do transpose
● diagonal: find matrix's diagonal
● max: find maximum element in array
● min: find minimum element in array
● mean: find mean of elements in array
● std: find standard deviation of elements in array
● reshape: change matrix shape
● dot: do matrix dot operation
12. matplotlib.pyplot: Your visual friend
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2,3,4,5])
# y will consists of square of each item of x
y = x**2
plt.plot(x,y) # make the line plot
plt.scatter(x,y) # make the point plot
plt.show() # show up the plot
# read an image and show it
img = plt.imread('/home/gofrendi/k-on.png')
plt.imshow(img)
plt.show() Umm, what are x and y?
Some kind of food
I guess ...
The matplotlib.pyplot allows you to do some plotting.
Making graphical representation of numbers will help us to
analyze things better.
The plot method allows you to make a smooth curve,
while the scatter method allows you to put intersection of
x and y as dots.
The imread allows you to fetch an image as a variable.
13. Classification
● Yui has
strength = 4.2
agility = 10.6
● Should she become
monk, barbarian or
demon hunter?
No...
I don't want to be
monk
Classification is one of the most popular topics in computer
science. Some “smart systems” usually learn from data
(called as training set) before it is able to do the
classification task.
The already-trained-classifier can then predict things as
needed.
Strength and agility usually called as features or
dimension, while monk, barbarian, and demon hunter
usually called as class or cluster.
14. Let's Pythontable = np.array([
[8.4, 1, 2],
[1.6, 10, 1],
[3.6, 8.7, 2],
[1.2, 6.7, 3],
[8, 2.8, 2],
[9.3, 7.3, 1],
[4.3, 2.8, 3],
[6.6, 7.5, 2],
[8.5, 8.7, 1],
[0.4, 1.7, 3],
[8.2, 6.1, 1],
[4.5, 5, 3],
[3.3, 6.2, 2],
[5.6, 5.2, 2],
[2.4, 2.9, 3],
[9.2, 9.6, 1],
[7.2, 1.4, 2],
[3, 9.9, 1],
[2.7, 4, 3]
])
1 = barbarian
2 = monk
3 = demon hunter
Barbarian???
It sounds like
Barbeque
How could “barbarian”
become “barbeque”
In Python, every target should be represented by numbers.
In this case, 1 is for barbarian, 2 is for monk, and 3 is for
demon hunter.
15. Do some plotting
x = table[:, 0]
y = table[:, 1]
target = table[:, 2]
import matplotlib.pyplot as plt
# The others
plt.scatter(x,y, c=target)
# Yui
plt.plot(4.2,10.6, 'r^')
plt.show()
Barbarian
Monk
Demon
Hunter
Yui
Ah, save...
I don't need to
be a monk ...
Graphical representation is usually more meaningful
compared to numeric representation. Can you guess what
Yui should be?
16. sklearn: The Classifiers
# K-Means
from sklearn.neighbors import KneighborsClassifier
# Support Vector Machine
from sklearn.svm import SVC
# Decision Tree
from sklearn.tree import DecisionTreeClassifier
# Random Forest
from sklearn.ensemble import RandomForestClassifier
# Naive Bayes
from sklearn.naive_bayes import GaussianNB
# Linear Discriminant Analysis
from sklearn.lda import LDA
# Quadratical Discriminant Analysis
from sklearn.qda import QDA
All of the sklearn's classifiers have the same interface.
They have predict and fit methods which we will see in
the next slide.
17. Do classification in view lines
# prepare data & target
table = np.array([ [8.4, 1, 2], [1.6, 10, 1], …, [2.7, 4, 3] ])
data = table[:,:2]
target = table[:,2]
# use SVM as classifier
from sklearn.svm import SVC
classifier = SVC()
# fit classifier (learning phase)
classifier.fit(data, target)
# get classifier's learning result
prediction = classifier.predict(data)
# calculate true & false
correct_prediction = 0
wrong_prediction = 0
for i in xrange(len(prediction)):
if prediction[i] == target[i]:
correct_prediction += 1
else:
wrong_prediction += 1
print (“correct prediction : ”+str(correct_prediction))
print (“wrong prediction : ”+str(wrong_prediction))
You got 100
Mr. Classifier...
One of the best classifiers in the world is called SVM
(Support Vector Machine). We won't talk about the theory
here, but you can see how good it is.
Beside SVM, you can also use Decision Tree, Random
Forest, LDA, QDA and the others in exactly the same way.
For instance, we can use the Gaussian Naive Bayes as
follow:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
18. See the classifier visually
# look for max and min of x & y
x = data[:,0]
y = data[:,1]
x_max, x_min = x.max()+1, x.min()-1
y_max, y_min = y.max()+1, y.min()-1
# prepare axis & ordinat
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
# let the classifier predict each point
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# plot the contour of xx, yy and Z
plt.contourf(xx, yy, Z)
# plot the points
plt.scatter(x,y,c=prediction)
# see, where Yui really is
plt.plot(4.2,10.6, 'r^')
plt.show()
I am
Demon Hunter
Looking for the classifier visually usually gives you the idea
about its characteristics.
This allows you to find out what's wrong with the
classification progress as well.
19. Each classifier has their own
characteristics
● Check this out:
– http://scikit-learn.org/stable/auto_examples/plot_cl
assifier_comparison.html#example-plot-classifier-c
omparison-py
As people has different characteristics and
expertises, classifiers also differ from each other.
It's important to choose best classifier that matches
your case to gain the best result.