1. Code Club(Wrangling Data With Python)
Tony Hirst & Sam Leon
@psychemedia / tony.hirst @ okfn.org
Week 4
2. Week4–LearningObjectives
By the end of this session, you will be able to:
• Recall and make use of what we did previously…
• Be able to merge data from different data
frames
• Be familiar with the idea of “tidy data”
• Have taken first steps in being able to
reshape datasets (long-wide data
transforms, pivots)
• Be able to start cleaning datasets using a
variety of strategies
3. Weeks1-3–Recap
In previous weeks, you have learned…
• How to getting data into and out of pandas from a
variety of file types (CSV, XSLSX) on your computer
and the web, as well as from HTML wen pages and
the World Bank Indicators API
• How Python can use lists to represent data and
how pandas represents tabular data using a data
frame
• How to filter, sort and generally manipulate tabular
data using pandas
• How to process data columns and derive new ones
5. Ref:PythonforFinance
Efficiency and Productivity Through Python
From Prototyping to Production
Financial industry context
- “quants” develop proof-of-concept models in
eg Matlab or R
- developers translate applications into
production code (C++, Java)
Inefficiencies – prototype code not reusable
Diverse skill sets – different programming languages & regimes
Legacy code – maintenance and development becomes complex
6. Ref:PythonforFinance
Efficiency and Productivity Through Python
Shorter time to results
- eg ability to download data directly source API
- eg ability to perform vectorised, column based
operations,
such as cumulative sum operations
- eg ability to reshape datasets
- eg ability to generate charts directly from
appropriately shaped data
8. Ref:PythonforFinance
Efficiency and Productivity Through Python
A consistent technological framework
“Python has the potential to provide
a single, powerful, consistent
framework with which to streamline
end to end development and
production efforts…”