SlideShare a Scribd company logo
1 of 20
Statistics in NumPy and SciPy



        February 12, 2009
Enthought Python Distribution (EPD)

 MORE THAN SIXTY INTEGRATED PACKAGES

  • Python 2.6                     • Repository access
  • Science (NumPy, SciPy, etc.)   • Data Storage (HDF, NetCDF, etc.)
  • Plotting (Chaco, Matplotlib)   • Networking (twisted)
  • Visualization (VTK, Mayavi)    • User Interface (wxPython, Traits UI)
  • Multi-language Integration     • Enthought Tool Suite
    (SWIG,Pyrex, f2py, weave)        (Application Development Tools)
Enthought Training Courses




                 Python Basics, NumPy,
                 SciPy, Matplotlib, Chaco,
                 Traits, TraitsUI, …
PyCon


    http://us.pycon.org/2010/tutorials/

        Introduction to Traits




                                 Corran Webster
Upcoming Training Classes
  March 1 – 5, 2009
       Python for Scientists and Engineers
       Austin, Texas, USA

  March 8 – 12, 2009
       Python for Quants
       London, UK




                       http://www.enthought.com/training/
NumPy / SciPy
  Statistics
Statistics overview

 • NumPy methods and functions
   – .mean, .std, .var, .min, .max, .argmax, .argmin
   – median, nanargmax, nanargmin, nanmax,
    nanmin, nansum
 • NumPy random number generators
 • Distribution objects in SciPy (scipy.stats)
 • Many functions in SciPy
   – f_oneway, bayes_mvs
   – nanmedian, nanstd, nanmean
NumPy methods
 • All array objects have some “statistical”
   methods
  – .mean(), .std(), .var(), .max(), .min(), .argmax(),
   .argmin()
  – Take an axis keyword that allows them to work on
   N-d arrays (shown with .sum).


   axis=0               axis=1
NumPy functions

 • median
 • nan-functions (ignore nans)
   –   nanmax
   –   nanmin
   –   nanargmin
   –   nanargmax
   –   nansum
 • Can also use masks and regular functions
NumPy Random Number Generators

 • Based on Mersenne twister algorithm
 • Written using PyRex / Cython
 • Univariate (over 40)
 • Multivariate (only 3)
  – multinomial
  – dirichlet
  – multivariate_normal
 • Convenience functions
  – rand, randn, randint, ranf
Statistics
scipy.stats — CONTINUOUS DISTRIBUTIONS

over 80
continuous
distributions!

METHODS

pdf     entropy
cdf     nnlf
rvs     moment
ppf     freeze
stats
fit
sf
isf
Using stats objects
DISTRIBUTIONS


>>> from scipy.stats import norm
# Sample normal dist. 100 times.
>>> samp = norm.rvs(size=100)

>>> x = linspace(-5, 5, 100)
# Calculate probability dist.
>>> pdf = norm.pdf(x)
# Calculate cummulative Dist.
>>> cdf = norm.cdf(x)
# Calculate Percent Point Function
>>> ppf = norm.ppf(x)
Distribution objects
Every distribution can be modified by loc and scale keywords
(many distributions also have required shape arguments to select from a family)

LOCATION (loc) --- shift left (<0) or right (>0) the distribution




SCALE (scale) --- stretch (>1) or compress (<1) the distribution
Example distributions
NORM (norm) – N(µ,σ)


  Only location and scale       location   mean                    µ
  arguments:
                                scale      standard deviation      σ


LOG NORMAL (lognorm)

log(S) is N(µ, σ)
                                location   offset from zero (rarely used)
        S is lognormal
                                scale      eµ

         one shape parameter!   shape      σ
Setting location and Scale

NORMAL DISTRIBUTION


>>> from scipy.stats import norm
# Normal dist with mean=10 and std=2
>>> dist = norm(loc=10, scale=2)

>>> x = linspace(-5, 15, 100)
# Calculate probability dist.
>>> pdf = dist.pdf(x)
# Calculate cummulative dist.
>>> cdf = dist.cdf(x)

# Get 100 random samples from dist.
>>> samp = dist.rvs(size=100)

# Estimate parameters from data
>>> mu, sigma = norm.fit(samp)           .fit returns best
>>> print “%4.2f, %4.2f” % (mu, sigma)   shape + (loc, scale)
10.07, 1.95                              that explains the data
Statistics
scipy.stats — Discrete Distributions

 10 standard
 discrete
 distributions
 (plus any
 finite RV)

METHODS
pmf     moment
cdf     entropy
rvs     freeze
ppf
stats
sf
isf
Using stats objects
 CREATING NEW DISCRETE DISTRIBUTIONS


# Create loaded dice.
>>> from scipy.stats import rv_discrete
>>> xk = [1,2,3,4,5,6]
>>> pk = [0.3,0.35,0.25,0.05,
          0.025,0.025]
>>> new = rv_discrete(name='loaded',
                   values=(xk,pk))

# Calculate histogram
>>> samples = new.rvs(size=1000)
>>> bins=linspace(0.5,5.5,6)
>>> subplot(211)
>>> hist(samples,bins=bins,normed=True)

# Calculate pmf
>>> x = range(0,8)
>>> subplot(212)
>>> stem(x,new.pmf(x))
Statistics
CONTINUOUS DISTRIBUTION ESTIMATION USING GAUSSIAN KERNELS

# Sample two normal distributions
# and create a bi-modal distribution
>>> rv1 = stats.norm()
>>> rv2 = stats.norm(2.0,0.8)
>>> samples = hstack([rv1.rvs(size=100),
                        rv2.rvs(size=100)])


# Use a Gaussian kernel density to
# estimate the PDF for the samples.
>>> from scipy.stats.kde import gaussian_kde
>>> approximate_pdf = gaussian_kde(samples)
>>> x = linspace(-3,6,200)

# Compare the histogram of the samples to
# the PDF approximation.
>>> hist(samples, bins=25, normed=True)
>>> plot(x, approximate_pdf(x),'r')
Other functions in scipy.stats

 • Statistical Tests (Anderson, Wilcox, etc.)
 • Other calculations (hmean, nanmedian)
 • Work in progress
 • A great place to jump in and help
Other statistical Resources

 • scikits.statsmodels
 • RPy2
 • PyMC

More Related Content

What's hot

pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for PythonWes McKinney
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsPhoenix
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Edureka!
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
 
Web Development with Python and Django
Web Development with Python and DjangoWeb Development with Python and Django
Web Development with Python and DjangoMichael Pirnat
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
File handling in Python
File handling in PythonFile handling in Python
File handling in PythonMegha V
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)PyData
 
Data Analysis and Visualization using Python
Data Analysis and Visualization using PythonData Analysis and Visualization using Python
Data Analysis and Visualization using PythonChariza Pladin
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPyDevashish Kumar
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 

What's hot (20)

Numpy tutorial
Numpy tutorialNumpy tutorial
Numpy tutorial
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data Analytics
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
Python libraries
Python librariesPython libraries
Python libraries
 
NUMPY
NUMPY NUMPY
NUMPY
 
Python functions
Python functionsPython functions
Python functions
 
Python Modules
Python ModulesPython Modules
Python Modules
 
Web Development with Python and Django
Web Development with Python and DjangoWeb Development with Python and Django
Web Development with Python and Django
 
NumPy.pptx
NumPy.pptxNumPy.pptx
NumPy.pptx
 
Python list
Python listPython list
Python list
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
File handling in Python
File handling in PythonFile handling in Python
File handling in Python
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
Data Analysis and Visualization using Python
Data Analysis and Visualization using PythonData Analysis and Visualization using Python
Data Analysis and Visualization using Python
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 

Viewers also liked

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPykammeyer
 
Доклад АКТО-2012 Душкин, Смирнова
Доклад АКТО-2012 Душкин, СмирноваДоклад АКТО-2012 Душкин, Смирнова
Доклад АКТО-2012 Душкин, СмирноваDmitry Dushkin
 
Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Piotr Milanowski
 
Raspberry Pi and Scientific Computing [SciPy 2012]
Raspberry Pi and Scientific Computing [SciPy 2012]Raspberry Pi and Scientific Computing [SciPy 2012]
Raspberry Pi and Scientific Computing [SciPy 2012]Samarth Shah
 
Python для анализа данных
Python для анализа данныхPython для анализа данных
Python для анализа данныхPython Meetup
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and developmentWes McKinney
 
The Heisenberg Uncertainty Principle[1]
The Heisenberg Uncertainty Principle[1]The Heisenberg Uncertainty Principle[1]
The Heisenberg Uncertainty Principle[1]guestea12c43
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonChetan Khatri
 
09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeasterOpenCascade
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...Ryan Rosario
 

Viewers also liked (14)

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
Доклад АКТО-2012 Душкин, Смирнова
Доклад АКТО-2012 Душкин, СмирноваДоклад АКТО-2012 Душкин, Смирнова
Доклад АКТО-2012 Душкин, Смирнова
 
Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.
 
MongoDB Replication Cluster
MongoDB Replication ClusterMongoDB Replication Cluster
MongoDB Replication Cluster
 
Raspberry Pi and Scientific Computing [SciPy 2012]
Raspberry Pi and Scientific Computing [SciPy 2012]Raspberry Pi and Scientific Computing [SciPy 2012]
Raspberry Pi and Scientific Computing [SciPy 2012]
 
Python для анализа данных
Python для анализа данныхPython для анализа данных
Python для анализа данных
 
Introduction to mongoDB
Introduction to mongoDBIntroduction to mongoDB
Introduction to mongoDB
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
The Heisenberg Uncertainty Principle[1]
The Heisenberg Uncertainty Principle[1]The Heisenberg Uncertainty Principle[1]
The Heisenberg Uncertainty Principle[1]
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - Python
 
09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster09 jus 20101123_optimisation_salomeaster
09 jus 20101123_optimisation_salomeaster
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
 
Bayesian model averaging
Bayesian model averagingBayesian model averaging
Bayesian model averaging
 

Similar to NumPy/SciPy Statistics

Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Francesco
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
getting started with numpy and pandas.pptx
getting started with numpy and pandas.pptxgetting started with numpy and pandas.pptx
getting started with numpy and pandas.pptxworkvishalkumarmahat
 
Scikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonScikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonZahid Hasan
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonDr. Volkan OBAN
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnKarlijn Willems
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to pythonActiveState
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxAkashgupta517936
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptxkarthikks82
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in RSujaAldrin
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 

Similar to NumPy/SciPy Statistics (20)

Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
getting started with numpy and pandas.pptx
getting started with numpy and pandas.pptxgetting started with numpy and pandas.pptx
getting started with numpy and pandas.pptx
 
Scikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonScikit learn cheat_sheet_python
Scikit learn cheat_sheet_python
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-Python
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learn
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptx
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 

More from Enthought, Inc.

Talk at NYC Python Meetup Group
Talk at NYC Python Meetup GroupTalk at NYC Python Meetup Group
Talk at NYC Python Meetup GroupEnthought, Inc.
 
Scientific Applications with Python
Scientific Applications with PythonScientific Applications with Python
Scientific Applications with PythonEnthought, Inc.
 
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with MayaviScientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with MayaviEnthought, Inc.
 
February EPD Webinar: How do I...use PiCloud for cloud computing?
February EPD Webinar: How do I...use PiCloud for cloud computing?February EPD Webinar: How do I...use PiCloud for cloud computing?
February EPD Webinar: How do I...use PiCloud for cloud computing?Enthought, Inc.
 
Parallel Processing with IPython
Parallel Processing with IPythonParallel Processing with IPython
Parallel Processing with IPythonEnthought, Inc.
 
Scientific Computing with Python Webinar: Traits
Scientific Computing with Python Webinar: TraitsScientific Computing with Python Webinar: Traits
Scientific Computing with Python Webinar: TraitsEnthought, Inc.
 
Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Enthought, Inc.
 
Scientific Computing with Python Webinar --- June 19, 2009
Scientific Computing with Python Webinar --- June 19, 2009Scientific Computing with Python Webinar --- June 19, 2009
Scientific Computing with Python Webinar --- June 19, 2009Enthought, Inc.
 
Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Enthought, Inc.
 

More from Enthought, Inc. (13)

Numpy Talk at SIAM
Numpy Talk at SIAMNumpy Talk at SIAM
Numpy Talk at SIAM
 
Talk at NYC Python Meetup Group
Talk at NYC Python Meetup GroupTalk at NYC Python Meetup Group
Talk at NYC Python Meetup Group
 
Scientific Applications with Python
Scientific Applications with PythonScientific Applications with Python
Scientific Applications with Python
 
SciPy 2010 Review
SciPy 2010 ReviewSciPy 2010 Review
SciPy 2010 Review
 
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with MayaviScientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
 
Chaco Step-by-Step
Chaco Step-by-StepChaco Step-by-Step
Chaco Step-by-Step
 
February EPD Webinar: How do I...use PiCloud for cloud computing?
February EPD Webinar: How do I...use PiCloud for cloud computing?February EPD Webinar: How do I...use PiCloud for cloud computing?
February EPD Webinar: How do I...use PiCloud for cloud computing?
 
SciPy India 2009
SciPy India 2009SciPy India 2009
SciPy India 2009
 
Parallel Processing with IPython
Parallel Processing with IPythonParallel Processing with IPython
Parallel Processing with IPython
 
Scientific Computing with Python Webinar: Traits
Scientific Computing with Python Webinar: TraitsScientific Computing with Python Webinar: Traits
Scientific Computing with Python Webinar: Traits
 
Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009Scientific Computing with Python Webinar --- August 28, 2009
Scientific Computing with Python Webinar --- August 28, 2009
 
Scientific Computing with Python Webinar --- June 19, 2009
Scientific Computing with Python Webinar --- June 19, 2009Scientific Computing with Python Webinar --- June 19, 2009
Scientific Computing with Python Webinar --- June 19, 2009
 
Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

NumPy/SciPy Statistics

  • 1. Statistics in NumPy and SciPy February 12, 2009
  • 2. Enthought Python Distribution (EPD) MORE THAN SIXTY INTEGRATED PACKAGES • Python 2.6 • Repository access • Science (NumPy, SciPy, etc.) • Data Storage (HDF, NetCDF, etc.) • Plotting (Chaco, Matplotlib) • Networking (twisted) • Visualization (VTK, Mayavi) • User Interface (wxPython, Traits UI) • Multi-language Integration • Enthought Tool Suite (SWIG,Pyrex, f2py, weave) (Application Development Tools)
  • 3. Enthought Training Courses Python Basics, NumPy, SciPy, Matplotlib, Chaco, Traits, TraitsUI, …
  • 4. PyCon http://us.pycon.org/2010/tutorials/ Introduction to Traits Corran Webster
  • 5. Upcoming Training Classes March 1 – 5, 2009 Python for Scientists and Engineers Austin, Texas, USA March 8 – 12, 2009 Python for Quants London, UK http://www.enthought.com/training/
  • 6. NumPy / SciPy Statistics
  • 7. Statistics overview • NumPy methods and functions – .mean, .std, .var, .min, .max, .argmax, .argmin – median, nanargmax, nanargmin, nanmax, nanmin, nansum • NumPy random number generators • Distribution objects in SciPy (scipy.stats) • Many functions in SciPy – f_oneway, bayes_mvs – nanmedian, nanstd, nanmean
  • 8. NumPy methods • All array objects have some “statistical” methods – .mean(), .std(), .var(), .max(), .min(), .argmax(), .argmin() – Take an axis keyword that allows them to work on N-d arrays (shown with .sum). axis=0 axis=1
  • 9. NumPy functions • median • nan-functions (ignore nans) – nanmax – nanmin – nanargmin – nanargmax – nansum • Can also use masks and regular functions
  • 10. NumPy Random Number Generators • Based on Mersenne twister algorithm • Written using PyRex / Cython • Univariate (over 40) • Multivariate (only 3) – multinomial – dirichlet – multivariate_normal • Convenience functions – rand, randn, randint, ranf
  • 11. Statistics scipy.stats — CONTINUOUS DISTRIBUTIONS over 80 continuous distributions! METHODS pdf entropy cdf nnlf rvs moment ppf freeze stats fit sf isf
  • 12. Using stats objects DISTRIBUTIONS >>> from scipy.stats import norm # Sample normal dist. 100 times. >>> samp = norm.rvs(size=100) >>> x = linspace(-5, 5, 100) # Calculate probability dist. >>> pdf = norm.pdf(x) # Calculate cummulative Dist. >>> cdf = norm.cdf(x) # Calculate Percent Point Function >>> ppf = norm.ppf(x)
  • 13. Distribution objects Every distribution can be modified by loc and scale keywords (many distributions also have required shape arguments to select from a family) LOCATION (loc) --- shift left (<0) or right (>0) the distribution SCALE (scale) --- stretch (>1) or compress (<1) the distribution
  • 14. Example distributions NORM (norm) – N(µ,σ) Only location and scale location mean µ arguments: scale standard deviation σ LOG NORMAL (lognorm) log(S) is N(µ, σ) location offset from zero (rarely used) S is lognormal scale eµ one shape parameter! shape σ
  • 15. Setting location and Scale NORMAL DISTRIBUTION >>> from scipy.stats import norm # Normal dist with mean=10 and std=2 >>> dist = norm(loc=10, scale=2) >>> x = linspace(-5, 15, 100) # Calculate probability dist. >>> pdf = dist.pdf(x) # Calculate cummulative dist. >>> cdf = dist.cdf(x) # Get 100 random samples from dist. >>> samp = dist.rvs(size=100) # Estimate parameters from data >>> mu, sigma = norm.fit(samp) .fit returns best >>> print “%4.2f, %4.2f” % (mu, sigma) shape + (loc, scale) 10.07, 1.95 that explains the data
  • 16. Statistics scipy.stats — Discrete Distributions 10 standard discrete distributions (plus any finite RV) METHODS pmf moment cdf entropy rvs freeze ppf stats sf isf
  • 17. Using stats objects CREATING NEW DISCRETE DISTRIBUTIONS # Create loaded dice. >>> from scipy.stats import rv_discrete >>> xk = [1,2,3,4,5,6] >>> pk = [0.3,0.35,0.25,0.05, 0.025,0.025] >>> new = rv_discrete(name='loaded', values=(xk,pk)) # Calculate histogram >>> samples = new.rvs(size=1000) >>> bins=linspace(0.5,5.5,6) >>> subplot(211) >>> hist(samples,bins=bins,normed=True) # Calculate pmf >>> x = range(0,8) >>> subplot(212) >>> stem(x,new.pmf(x))
  • 18. Statistics CONTINUOUS DISTRIBUTION ESTIMATION USING GAUSSIAN KERNELS # Sample two normal distributions # and create a bi-modal distribution >>> rv1 = stats.norm() >>> rv2 = stats.norm(2.0,0.8) >>> samples = hstack([rv1.rvs(size=100), rv2.rvs(size=100)]) # Use a Gaussian kernel density to # estimate the PDF for the samples. >>> from scipy.stats.kde import gaussian_kde >>> approximate_pdf = gaussian_kde(samples) >>> x = linspace(-3,6,200) # Compare the histogram of the samples to # the PDF approximation. >>> hist(samples, bins=25, normed=True) >>> plot(x, approximate_pdf(x),'r')
  • 19. Other functions in scipy.stats • Statistical Tests (Anderson, Wilcox, etc.) • Other calculations (hmean, nanmedian) • Work in progress • A great place to jump in and help
  • 20. Other statistical Resources • scikits.statsmodels • RPy2 • PyMC

Editor's Notes

  1. &lt;&lt;1,Parallel Processing with IPython&gt;&gt;
  2. [toc] level = 2 title = Statistics # end config