SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Pandas

                                Maik Röder
                         Python Barcelona Meetup
                             7. February 2013

                           Python Consultant
                         maikroeder@gmail.com


Friday, February 8, 13
Pandas
                         • Powerful and productive Python data
                           analysis and management library
                         • Panel Data System
                         • Open Sourced by AQR Capital
                           Management, LLC in late 2009
                         • 30.000 lines of tested Python/Cython code
                         • Used in production in many companies
Friday, February 8, 13
Pandas
                         • Rich data structures and functions to make
                           working with structured data fast, easy, and
                           expressive
                         • Built on top of Numpy with its high
                           performance array-computing features
                         • flexible data manipulation capabilities of
                           spreadsheets and relational databases
                         • Sophisticated indexing functionality
                          • slice, dice, perform aggregations, select
                             subsets of data
Friday, February 8, 13
The ideal tool for data
                               scientists
                         • Munging data
                         • Cleaning data
                         • Analyzing data
                         • Modeling data
                         • Organizing the results of the analysis into a
                           form suitable for plotting or tabular display


Friday, February 8, 13
Series
                 • one-dimensional array-like object
                         >>> s = Series((1,2,3,4,5))
                 • Contains an array of data (of any Numpy
                         data type)
                         >>> s.values
                 • Has an associated array of data labels, the
                         index (Default index from 0 to N - 1)
                         >>> s.index
Friday, February 8, 13
Series data structure
        >>> import numpy
      >>> randn = numpy.random.randn
      >>> from pandas import *
      >>> s = Series(randn(3),('a','b','c'))
      >>> s
      a    -0.889880
      b     1.102135
      c    -2.187296
      >>> s.mean()
      -0.65834710697853194

Friday, February 8, 13
Series to/from dict
          • Series to Python dict - No more explicit order
          >>> dict(s)
          {'a': -0.88988001423312313,
           'c': -2.1872960440695666,
           'b': 1.1021347373670938}
          • Back to a Series with a new Index from sorted
                   dictionary keys
          >>> Series(dict(s))
          a   -0.889880
          b    1.102135
          c   -2.187296
Friday, February 8, 13
Reindexing labels
                 >>> s
                 a   -0.496848
                 b     0.607173
                 c   -1.570596
                 >>> s.index
                 Index([a, b, c], dtype=object)
                 >>> s.reindex(['c','b','a'])
                 c   -1.570596
                 b     0.607173
                 a   -0.496848
Friday, February 8, 13
Vectorization
                 >>> s + s
                 a   -1.779760
                 b    2.204269
                 c   -4.374592
                 • Series work with Numpy
                 >>> numpy.exp(s)
                 a       0.410705
                 b       3.010586
                 c       0.112220
Friday, February 8, 13
DataFrame
                         • Like data.frame in the statistical
                           language/package R
                         • 2-dimensional tabular data structure
                         • Data manipulation with integrated
                           indexing
                         • Support heterogeneous columns
                         • Homogeneous columns
Friday, February 8, 13
DataFrame
                     >>> d = {'one': s*s, 'two': s+s}
                     >>> DataFrame(d)
                             one       two
                     a 0.791886 -1.779760
                     b 1.214701 2.204269
                     c 4.784264 -4.374592
                     >>> df.index
                     Index([a, b, c], dtype=object)
                     >>> df.columns
                     Index([one, two], dtype=objec)

Friday, February 8, 13
Dataframe add column
                 • Add a third column
                 >>> df['three'] = s * 3
                 • It will share the existing index
                 >>> df
                        one       two     three
                 a 0.791886 -1.779760 -2.669640
                 b 1.214701 2.204269 3.306404
                 c 4.784264 -4.374592 -6.561888

Friday, February 8, 13
Access to columns

                 • Access by attribute   • Access by dict like
                                           notation
                 >>> df.one              >>> df['one']
                        one                     one
                 a 0.791886              a 0.791886
                 b 1.214701              b 1.214701
                 c 4.784264              c 4.784264


Friday, February 8, 13
Reindexing

                 >>> df.reindex(['c','b','a'])
                 >>> df
                        one       two     three
                 c 4.784264 -4.374592 -6.561888
                 b 1.214701 2.204269 3.306404
                 a 0.791886 -1.779760 -2.669640



Friday, February 8, 13
Drop entries from an axis

            >>> df.drop('c')
            b 1.214701 2.204269 3.306404
            a 0.791886 -1.779760 -2.669640
            >>> df.drop(['b,'a'])
                   one       two     three
            c 4.784264 -4.374592 -6.561888


Friday, February 8, 13
Descriptive statistics
                 >>> df.mean()
                 one      2.263617
                 two     -1.316694
                 three   -1.975041
                 • Also: count, sum, median, min,
                         max, abs, prod, std, var,
                         skew, kurt, quantile, cumsum,
                         cumprod, cummax, cummin


Friday, February 8, 13
Computational Tools
                 • Covariance
                         >>> s1 = Series(randn(1000))
                         >>> s2 = Series(randn(1000))
                         >>> s1.cov(s2)
                         0.013973709323221539
                 • Also: pearson, kendall, spearman

Friday, February 8, 13
This and much more...
                         • Group by: split-apply-combine
                         • Merge, join and aggregate
                         • Reshaping and Pivot Tables
                         • Time Series / Date functionality
                         • Plotting with matplotlib
                         • IO Tools (Text, CSV, HDF5, ...)
                         • Sparse data structures
Friday, February 8, 13
Resources


                         • http://pypi.python.org/pypi/pandas
                         • http://code.google.com/p/pandas


Friday, February 8, 13
Out now...




Friday, February 8, 13

Más contenido relacionado

La actualidad más candente

Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)PyData
 
Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1Jatin Miglani
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsPhoenix
 
Python pandas tutorial
Python pandas tutorialPython pandas tutorial
Python pandas tutorialHarikaReddy115
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for PythonWes McKinney
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python PandasNeeru Mittal
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandasAkshitaKanther
 
Python: Modules and Packages
Python: Modules and PackagesPython: Modules and Packages
Python: Modules and PackagesDamian T. Gordon
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Dr. Volkan OBAN
 
Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPyHuy Nguyen
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 
Arrays in python
Arrays in pythonArrays in python
Arrays in pythonmoazamali28
 

La actualidad más candente (20)

Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data Analytics
 
NumPy
NumPyNumPy
NumPy
 
Python pandas tutorial
Python pandas tutorialPython pandas tutorial
Python pandas tutorial
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 
Numpy
NumpyNumpy
Numpy
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 
Python: Modules and Packages
Python: Modules and PackagesPython: Modules and Packages
Python: Modules and Packages
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
NumPy.pptx
NumPy.pptxNumPy.pptx
NumPy.pptx
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet
 
Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPy
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Numpy tutorial
Numpy tutorialNumpy tutorial
Numpy tutorial
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Pandas
PandasPandas
Pandas
 
Arrays in python
Arrays in pythonArrays in python
Arrays in python
 

Destacado (20)

Panda Presentation
Panda PresentationPanda Presentation
Panda Presentation
 
Presentation On Pandas
Presentation On PandasPresentation On Pandas
Presentation On Pandas
 
Panda powerpoint
Panda powerpointPanda powerpoint
Panda powerpoint
 
Introduction to Pandas
Introduction to PandasIntroduction to Pandas
Introduction to Pandas
 
PANDA BEAR
PANDA BEARPANDA BEAR
PANDA BEAR
 
Pandas
PandasPandas
Pandas
 
Hannah donaldson red panda!
Hannah donaldson red panda!Hannah donaldson red panda!
Hannah donaldson red panda!
 
Red panda Chayse Musolf
Red panda Chayse Musolf Red panda Chayse Musolf
Red panda Chayse Musolf
 
Giant pandas
Giant pandasGiant pandas
Giant pandas
 
Oso panda
Oso pandaOso panda
Oso panda
 
Giant panda
Giant pandaGiant panda
Giant panda
 
The Panda Bear
The Panda BearThe Panda Bear
The Panda Bear
 
Getting started with pandas
Getting started with pandasGetting started with pandas
Getting started with pandas
 
Red panda powerpoint
Red panda powerpointRed panda powerpoint
Red panda powerpoint
 
Giant Pandas
Giant PandasGiant Pandas
Giant Pandas
 
Panda bear
Panda bearPanda bear
Panda bear
 
Pandas
PandasPandas
Pandas
 
El Oso Panda
El Oso PandaEl Oso Panda
El Oso Panda
 
Osos panda
Osos pandaOsos panda
Osos panda
 
Oso panda
Oso pandaOso panda
Oso panda
 

Similar a Pandas

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Using the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slidesUsing the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slidesTiffany Timbers
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSBIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSTIBCO Spotfire
 
DataCamp Cheat Sheets 4 Python Users (2020)
DataCamp Cheat Sheets 4 Python Users (2020)DataCamp Cheat Sheets 4 Python Users (2020)
DataCamp Cheat Sheets 4 Python Users (2020)EMRE AKCAOGLU
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr
 
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr
 
Fast, stable and scalable true radix sorting with Matt Dowle at useR! Aalborg
Fast, stable and scalable true radix sorting with Matt Dowle at useR! AalborgFast, stable and scalable true radix sorting with Matt Dowle at useR! Aalborg
Fast, stable and scalable true radix sorting with Matt Dowle at useR! AalborgSri Ambati
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQLIvan Zoratti
 
SciPy 2011 pandas lightning talk
SciPy 2011 pandas lightning talkSciPy 2011 pandas lightning talk
SciPy 2011 pandas lightning talkWes McKinney
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in RSujaAldrin
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptxkarthikks82
 

Similar a Pandas (20)

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Using the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slidesUsing the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slides
 
R Introduction
R IntroductionR Introduction
R Introduction
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
Data Exploration in R.pptx
Data Exploration in R.pptxData Exploration in R.pptx
Data Exploration in R.pptx
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSBIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
 
DataCamp Cheat Sheets 4 Python Users (2020)
DataCamp Cheat Sheets 4 Python Users (2020)DataCamp Cheat Sheets 4 Python Users (2020)
DataCamp Cheat Sheets 4 Python Users (2020)
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
 
Fast, stable and scalable true radix sorting with Matt Dowle at useR! Aalborg
Fast, stable and scalable true radix sorting with Matt Dowle at useR! AalborgFast, stable and scalable true radix sorting with Matt Dowle at useR! Aalborg
Fast, stable and scalable true radix sorting with Matt Dowle at useR! Aalborg
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Ch1
Ch1Ch1
Ch1
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
 
SciPy 2011 pandas lightning talk
SciPy 2011 pandas lightning talkSciPy 2011 pandas lightning talk
SciPy 2011 pandas lightning talk
 
NumPy/SciPy Statistics
NumPy/SciPy StatisticsNumPy/SciPy Statistics
NumPy/SciPy Statistics
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
 

Más de maikroeder

Encode RNA Dashboard
Encode RNA DashboardEncode RNA Dashboard
Encode RNA Dashboardmaikroeder
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2maikroeder
 
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...maikroeder
 
Cms - Content Management System Utilities for Django
Cms - Content Management System Utilities for DjangoCms - Content Management System Utilities for Django
Cms - Content Management System Utilities for Djangomaikroeder
 
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik RöderPlone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Rödermaikroeder
 

Más de maikroeder (6)

Google charts
Google chartsGoogle charts
Google charts
 
Encode RNA Dashboard
Encode RNA DashboardEncode RNA Dashboard
Encode RNA Dashboard
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2
 
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
 
Cms - Content Management System Utilities for Django
Cms - Content Management System Utilities for DjangoCms - Content Management System Utilities for Django
Cms - Content Management System Utilities for Django
 
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik RöderPlone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Pandas

  • 1. Pandas Maik Röder Python Barcelona Meetup 7. February 2013 Python Consultant maikroeder@gmail.com Friday, February 8, 13
  • 2. Pandas • Powerful and productive Python data analysis and management library • Panel Data System • Open Sourced by AQR Capital Management, LLC in late 2009 • 30.000 lines of tested Python/Cython code • Used in production in many companies Friday, February 8, 13
  • 3. Pandas • Rich data structures and functions to make working with structured data fast, easy, and expressive • Built on top of Numpy with its high performance array-computing features • flexible data manipulation capabilities of spreadsheets and relational databases • Sophisticated indexing functionality • slice, dice, perform aggregations, select subsets of data Friday, February 8, 13
  • 4. The ideal tool for data scientists • Munging data • Cleaning data • Analyzing data • Modeling data • Organizing the results of the analysis into a form suitable for plotting or tabular display Friday, February 8, 13
  • 5. Series • one-dimensional array-like object >>> s = Series((1,2,3,4,5)) • Contains an array of data (of any Numpy data type) >>> s.values • Has an associated array of data labels, the index (Default index from 0 to N - 1) >>> s.index Friday, February 8, 13
  • 6. Series data structure >>> import numpy >>> randn = numpy.random.randn >>> from pandas import * >>> s = Series(randn(3),('a','b','c')) >>> s a -0.889880 b 1.102135 c -2.187296 >>> s.mean() -0.65834710697853194 Friday, February 8, 13
  • 7. Series to/from dict • Series to Python dict - No more explicit order >>> dict(s) {'a': -0.88988001423312313, 'c': -2.1872960440695666, 'b': 1.1021347373670938} • Back to a Series with a new Index from sorted dictionary keys >>> Series(dict(s)) a -0.889880 b 1.102135 c -2.187296 Friday, February 8, 13
  • 8. Reindexing labels >>> s a -0.496848 b 0.607173 c -1.570596 >>> s.index Index([a, b, c], dtype=object) >>> s.reindex(['c','b','a']) c -1.570596 b 0.607173 a -0.496848 Friday, February 8, 13
  • 9. Vectorization >>> s + s a -1.779760 b 2.204269 c -4.374592 • Series work with Numpy >>> numpy.exp(s) a 0.410705 b 3.010586 c 0.112220 Friday, February 8, 13
  • 10. DataFrame • Like data.frame in the statistical language/package R • 2-dimensional tabular data structure • Data manipulation with integrated indexing • Support heterogeneous columns • Homogeneous columns Friday, February 8, 13
  • 11. DataFrame >>> d = {'one': s*s, 'two': s+s} >>> DataFrame(d) one two a 0.791886 -1.779760 b 1.214701 2.204269 c 4.784264 -4.374592 >>> df.index Index([a, b, c], dtype=object) >>> df.columns Index([one, two], dtype=objec) Friday, February 8, 13
  • 12. Dataframe add column • Add a third column >>> df['three'] = s * 3 • It will share the existing index >>> df one two three a 0.791886 -1.779760 -2.669640 b 1.214701 2.204269 3.306404 c 4.784264 -4.374592 -6.561888 Friday, February 8, 13
  • 13. Access to columns • Access by attribute • Access by dict like notation >>> df.one >>> df['one'] one one a 0.791886 a 0.791886 b 1.214701 b 1.214701 c 4.784264 c 4.784264 Friday, February 8, 13
  • 14. Reindexing >>> df.reindex(['c','b','a']) >>> df one two three c 4.784264 -4.374592 -6.561888 b 1.214701 2.204269 3.306404 a 0.791886 -1.779760 -2.669640 Friday, February 8, 13
  • 15. Drop entries from an axis >>> df.drop('c') b 1.214701 2.204269 3.306404 a 0.791886 -1.779760 -2.669640 >>> df.drop(['b,'a']) one two three c 4.784264 -4.374592 -6.561888 Friday, February 8, 13
  • 16. Descriptive statistics >>> df.mean() one 2.263617 two -1.316694 three -1.975041 • Also: count, sum, median, min, max, abs, prod, std, var, skew, kurt, quantile, cumsum, cumprod, cummax, cummin Friday, February 8, 13
  • 17. Computational Tools • Covariance >>> s1 = Series(randn(1000)) >>> s2 = Series(randn(1000)) >>> s1.cov(s2) 0.013973709323221539 • Also: pearson, kendall, spearman Friday, February 8, 13
  • 18. This and much more... • Group by: split-apply-combine • Merge, join and aggregate • Reshaping and Pivot Tables • Time Series / Date functionality • Plotting with matplotlib • IO Tools (Text, CSV, HDF5, ...) • Sparse data structures Friday, February 8, 13
  • 19. Resources • http://pypi.python.org/pypi/pandas • http://code.google.com/p/pandas Friday, February 8, 13