27-28/11/2018

Etherpad

Why Are We Here?

  • To learn basic concepts of programming (in Python)
  • How to solve problems in your research by…
    • Building scripts
    • Automating tasks
  • Mechanics of manipulating data
    • File I/O
    • Data structures

XKCD

How Are We Doing This?

Using the Python language

  • we need something ;)
  • free, well-documented, and cross-platform
  • large academic userbase
  • many libraries for specialist work

we won’t be covering the entire language

No, I mean “how are we doing this?”

Text editor

  • the more usual way to write code
  • edit-save-execute cycle

Jupyter notebook

  • interactive notebook-based interface
  • good for data exploration, prototyping, and teaching
  • not so good for writing scripts/‘production code’

Do I need to use Python afterwards?

  • No. ;)
    • The lesson is general, it’s just taught in Python
    • The principles are the same in nearly all languages
    • If your colleagues/field settled on another language(s), maybe learn that
    • (language wars are unproductive… ;) )

What are we doing?

Analysing and visualising experimental data

  • Effectiveness of a new treatment for arthritis
  • Several patients, recording inflammation on each day
  • Tabular (comma-separated) data

We’re going to get the computer to do this for us

  • Why not just do it by hand?
  • AUTOMATION, REUSE, SHARING

01. Setup

Setting Up - 1

Before we begin…

  • make a neat working environment in the terminal
  • obtain data
cd ~/Desktop
mkdir python-novice-inflammation
cd python-novice-inflammation

LIVE DEMO

Setting up - 2

Before we begin…

  • make a neat working environment
  • obtain data
cp 2018-03-29-standrews/lessons/python/files/python-novice-inflammation-data.zip ./
unzip python-novice-inflammation-data.zip
cp 2018-03-29-standrews/lessons/python/files/python-novice-inflammation-code.zip ./
unzip python-novice-inflammation-code.zip

(you can download files via Etherpad: http://pad.software-carpentry.org/2018-11-27-standrews)

LIVE DEMO

02. Getting Started

Python in the terminal

We start the Python console by executing the command python

$ python
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct  6 2017, 12:04:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

LIVE DEMO

Python REPL

  • Python’s console is a read-evaluate-print-loop, just like the shell
>>> 3 + 5
8
>>> 12 / 7
1.7142857142857142
>>> 2 ** 16
65536
>>> 15 % 4
3
>>> (2 + 4) * (3 - 7)
-24

LIVE DEMO

My first variable

  • To do interesting things, we want persistent values
  • variables are like named boxes
  • data goes in the box
  • when we use the name of the box, we mean what’s in the box

Creating a variable

  • To assign a value use the equals sign: =
  • The variable name/box label goes on the left, and the data item goes on the right
  • Character strings, or just strings, are enclosed in quotes
>>> name = "Samia"
>>> name
'Samia'
>>> print(name)
Samia

LIVE DEMO

Working with variables

weight_kg = 55
print(weight_kg)
2.2 * weight_kg
print("weight in pounds", 2.2 * weight_kg)
weight_kg = 57.5
print("weight in kilograms is now:", weight_kg)
weight_lb = 2.2 * weight_kg
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
weight_kg = 100
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)

LIVE DEMO

Exercise 01 (5min)

What are the values in mass and age after the following code is executed?

mass = 47.5
age = 122
mass = mass * 2.0
age = age - 20
  1. mass == 47.5, age == 122
  2. mass == 95.0, age == 102
  3. mass == 47.5, age == 102
  4. mass == 95.0, age == 122

Exercise 02 (5min)

What does the following code print out?

first, second = 'Grace', 'Hopper'
third, fourth = second, first
print(third, fourth)
  1. Hopper Grace
  2. Grace Hopper
  3. "Grace Hopper"
  4. "Hopper Grace"

03. Data Analysis

Examine the data

  • Inspect a data file using the shell
$ head data/inflammation-01.csv 
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1
0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1
0,0,1,2,2,4,2,1,6,4,7,6,6,9,9,15,4,16,18,12,12,5,18,9,5,3,10,3,12,7,8,4,7,3,5,4,4,3,2,1
0,0,2,2,4,2,2,5,5,8,6,5,11,9,4,13,5,12,10,6,9,17,15,8,9,3,13,7,8,2,8,8,4,2,3,5,4,1,1,1
0,0,1,2,3,1,2,3,5,3,7,8,8,5,10,9,15,11,18,19,20,8,5,13,15,10,6,10,6,7,4,9,3,5,2,5,3,2,2,1
0,0,0,3,1,5,6,5,5,8,2,4,11,12,10,11,9,10,17,11,6,16,12,6,8,14,6,13,10,11,4,6,4,7,6,3,2,1,0,0
0,1,1,2,1,3,5,3,5,8,6,8,12,5,13,6,13,8,16,8,18,15,16,14,12,7,3,8,9,11,2,5,4,5,1,4,1,2,0,0
  • To load this data in Python, we’ll use the numpy library

We want to produce summary information about inflammation by patient and by day

Python libraries

  • Python contains many powerful, general tools
  • Specialised tools are contained in libraries or packages
  • We call on libraries/packages, when needed
  • Packages are loaded with import
  • Packages are shared via repositories, e.g. PyPI and conda
>>> import numpy

LIVE DEMO

Load data from file

  • numpy provides a function loadtxt() to load tabular data:
numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')
  • dotted notation tells us loadtxt() belongs to numpy
  • fname: an argument expecting the path to a file
  • delimiter: an argument expecting the character that separates columns

Loaded data

>>> numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
       ..., 
       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
  • The matrix is truncated to fit the screen
  • ... indicate missing rows or columns
  • If there are no significant digits, they are not shown (1 == 1. == 1.0)

Assign the matrix to a variable called data

LIVE DEMO

What is our data?

>>> type(data)
<class 'numpy.ndarray'>

LIVE DEMO

Members and attributes

  • Creating the array created information, too
  • Info stored in members or attributes that belong to data
  • data.<attribute> e.g. data.shape
>>> print(data.dtype)
float64
>>> print(data.shape)
(60, 40)

LIVE DEMO

Indexing arrays

  • We often work with subsets of data
    • individual rows (patients)
    • individual columns (days)
  • Counting of array elements starts at zero, not at one.
>>> print('first value in data:', data[0, 0])
first value in data: 0.0
>>> print('middle value in data:', data[30, 20])
middle value in data: 13.0

LIVE DEMO

Slicing arrays

  • To get a range of data from the array, index with [ and specify start and end indices
  • 0:4 means start at zero and go up to but not including 4
    • 0, 1, 2, 3
  • Define start and end separated by : (colon).
>>> print(data[0:4, 0:10])
[[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.]
 [ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6.]
 [ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.]
 [ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7.]]

LIVE DEMO