Please do use the course Etherpad:
- Communal notes: share your understanding, and benefit from others
- Ask questions: get detailed answers with links and examples
- A record/reference for after the course
27-28/11/2018
Please do use the course Etherpad:
Using the Python
language
we won’t be covering the entire language
Text editor
Jupyter
notebook
Python
afterwards?Python
Analysing and visualising experimental data
We’re going to get the computer to do this for us
Before we begin…
cd ~/Desktop mkdir python-novice-inflammation cd python-novice-inflammation
LIVE DEMO
Before we begin…
cp 2018-03-29-standrews/lessons/python/files/python-novice-inflammation-data.zip ./ unzip python-novice-inflammation-data.zip cp 2018-03-29-standrews/lessons/python/files/python-novice-inflammation-code.zip ./ unzip python-novice-inflammation-code.zip
(you can download files via Etherpad
: http://pad.software-carpentry.org/2018-11-27-standrews)
LIVE DEMO
Python
in the terminalWe start the Python
console by executing the command python
$ python Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 6 2017, 12:04:38) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
LIVE DEMO
Python
REPLPython
’s console is a read-evaluate-print-loop, just like the shell>>> 3 + 5 8 >>> 12 / 7 1.7142857142857142 >>> 2 ** 16 65536 >>> 15 % 4 3 >>> (2 + 4) * (3 - 7) -24
LIVE DEMO
=
>>> name = "Samia" >>> name 'Samia' >>> print(name) Samia
LIVE DEMO
weight_kg = 55 print(weight_kg) 2.2 * weight_kg print("weight in pounds", 2.2 * weight_kg) weight_kg = 57.5 print("weight in kilograms is now:", weight_kg) weight_lb = 2.2 * weight_kg print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb) weight_kg = 100 print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
LIVE DEMO
What are the values in mass
and age
after the following code is executed?
mass = 47.5 age = 122 mass = mass * 2.0 age = age - 20
mass == 47.5
, age == 122
mass == 95.0
, age == 102
mass == 47.5
, age == 102
mass == 95.0
, age == 122
What does the following code print out?
first, second = 'Grace', 'Hopper' third, fourth = second, first print(third, fourth)
Hopper Grace
Grace Hopper
"Grace Hopper"
"Hopper Grace"
$ head data/inflammation-01.csv 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1 0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1 0,0,1,2,2,4,2,1,6,4,7,6,6,9,9,15,4,16,18,12,12,5,18,9,5,3,10,3,12,7,8,4,7,3,5,4,4,3,2,1 0,0,2,2,4,2,2,5,5,8,6,5,11,9,4,13,5,12,10,6,9,17,15,8,9,3,13,7,8,2,8,8,4,2,3,5,4,1,1,1 0,0,1,2,3,1,2,3,5,3,7,8,8,5,10,9,15,11,18,19,20,8,5,13,15,10,6,10,6,7,4,9,3,5,2,5,3,2,2,1 0,0,0,3,1,5,6,5,5,8,2,4,11,12,10,11,9,10,17,11,6,16,12,6,8,14,6,13,10,11,4,6,4,7,6,3,2,1,0,0 0,1,1,2,1,3,5,3,5,8,6,8,12,5,13,6,13,8,16,8,18,15,16,14,12,7,3,8,9,11,2,5,4,5,1,4,1,2,0,0
Python
, we’ll use the numpy
libraryWe want to produce summary information about inflammation by patient and by day
Python
librariesPython
contains many powerful, general toolsimport
PyPI
and conda
>>> import numpy
LIVE DEMO
numpy
provides a function loadtxt()
to load tabular data:numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')
loadtxt()
belongs to numpy
fname
: an argument expecting the path to a filedelimiter
: an argument expecting the character that separates columns>>> numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',') array([[ 0., 0., 1., ..., 3., 0., 0.], [ 0., 1., 2., ..., 1., 0., 1.], [ 0., 1., 1., ..., 2., 1., 1.], ..., [ 0., 1., 1., ..., 1., 1., 1.], [ 0., 0., 0., ..., 0., 2., 0.], [ 0., 0., 1., ..., 1., 1., 0.]])
...
indicate missing rows or columns1 == 1. == 1.0
)Assign the matrix to a variable called data
LIVE DEMO
>>> type(data) <class 'numpy.ndarray'>
LIVE DEMO
data
data.<attribute>
e.g. data.shape
>>> print(data.dtype) float64 >>> print(data.shape) (60, 40)
LIVE DEMO
>>> print('first value in data:', data[0, 0]) first value in data: 0.0 >>> print('middle value in data:', data[30, 20]) middle value in data: 13.0
LIVE DEMO
[
and specify start and end indices0:4
means start at zero and go up to but not including 4
0, 1, 2, 3
:
(colon).>>> print(data[0:4, 0:10]) [[ 0. 0. 1. 3. 1. 2. 4. 7. 8. 3.] [ 0. 1. 2. 1. 2. 1. 3. 2. 2. 6.] [ 0. 1. 1. 3. 3. 2. 6. 2. 5. 9.] [ 0. 0. 2. 0. 4. 2. 2. 1. 6. 7.]]
LIVE DEMO
Python
assumes the first elementPython
assumes the end element>>> small = data[:3, 36:] >>> print('small is:\n', small)
QUESTION: What would :
on its own indicate?
LIVE DEMO
We can take slices of any series, not just arrays.
>>> element = 'oxygen' >>> print('first three characters:', element[0:3]) first three characters: oxy
What is the value of element[:4]
?
oxyg
gen
oxy
en
array
s know how to perform operations on their values+
, -
, *
, /
, etc. are elementwise>>> doubledata = data * 2.0 >>> print("original:\n", data[:3, 36:]) original: [[ 2. 3. 0. 0.] [ 1. 1. 0. 1.] [ 2. 2. 1. 1.]] >>> print("doubledata:\n", doubledata[:3, 36:]) doubledata: [[ 4. 6. 0. 0.] [ 2. 2. 0. 2.] [ 4. 4. 2. 2.]]
LIVE DEMO
numpy
functionsnumpy
provides functions to operate on arrays>>> print(numpy.mean(data)) 6.14875 >>> print(data.mean()) 6.14875 >>> maxval = numpy.max(data) >>> print('maximum inflammation:', maxval) maximum inflammation: 20.0 >>> minval = data.min() >>> print('minimum inflammation:', minval) minimum inflammation: 0.0
LIVE DEMO
Extract a single row, or operate directly on a row
>>> patient_0 = data[0, :] # temporary variable >>> print('maximum inflammation for patient 0:', patient_0.max()) maximum inflammation for patient 0: 18.0 >>> print('maximum inflammation for patient 0:', numpy.max(data[0, :])) maximum inflammation for patient 0: 18.0 >>> print('maximum inflammation for patient 2:', numpy.max(data[2, :])) maximum inflammation for patient 2: 19.0
LIVE DEMO
Tedious. Prone to errors/typos: easier way to to do this…
numpy
operations on axesnumpy
functions take an axis=
parameter: 0
(columns) or 1
(rows)>>> print(numpy.max(data, axis=1)) # max by patient [ 18. 18. 19. 17. 17. 18. 17. 20. 17. 18. 18. 18. 17. 16. 17. 18. 19. 19. 17. 19. 19. 16. 17. 15. 17. 17. 18. 17. 20. 17. 16. 19. 15. 15. 19. 17. 16. 17. 19. 16. 18. 19. 16. 19. 18. 16. 19. 15. 16. 18. 14. 20. 17. 15. 17. 16. 17. 19. 18. 18.] >>> print(data.mean(axis=0)) # mean by day [ 0. 0.45 1.11666667 1.75 2.43333333 3.15 3.8 3.88333333 5.23333333 5.51666667 5.95 5.9 8.35 7.73333333 8.36666667 9.5 9.58333333 10.63333333 11.56666667 12.35 13.25 11.96666667 11.03333333 10.16666667 10. 8.66666667 9.15 7.25 7.33333333 6.58333333 6.06666667 5.95 5.11666667 3.6 3.3 3.56666667 2.48333333 1.5 1.13333333 0.56666667]
LIVE DEMO
“The purpose of computing is insight, not numbers” - Richard Hamming
The best way to gain insight is often to visualise data
matplotlib
matplotlib
is the de facto standard/base plotting library in Python
>>> import matplotlib.pyplot
LIVE DEMO
matplotlib.pyplot.imshow()
matplotlib.pyplot.imshow()
renders matrix values as an image
>>> image = matplotlib.pyplot.imshow(data) >>> matplotlib.pyplot.show()
matplotlib.pyplot.plot()
matplotlib.pyplot.plot()
renders a line graphWe want to plot the average inflammation level on each day
>>> ave_inflammation = numpy.mean(data, axis=0) >>> ave_plot = matplotlib.pyplot.plot(ave_inflammation) >>> matplotlib.pyplot.show()
QUESTION: does this look reasonable?
.mean()
looks artificial>>> max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0)) >>> matplotlib.pyplot.show() >>> min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0)) >>> matplotlib.pyplot.show()
QUESTION: does this look reasonable?
Can you create a plot showing the standard deviation (numpy.std()
) of the inflammation data for each day across all patients?
We can put all three plots into a single figure
fig
) with fig = matplotlib.pyplot.figure()
fig
with ax = fig.add_subplot()
ax.set_ylabel()
ax.plot()
LIVE DEMO
Can you modify your script to display the three graphs on top of one another, instead of side by side?
Save your new script as exercise_05.py
for
loopsfor
loops
word = "lead" print(word[0]) print(word[1]) print(word[2]) print(word[3])
QUESTION: Why is this not a good approach?
LIVE DEMO
for
loopsfor
loops perform actions for every item in a collection>>> word = "lead" >>> for char in word: ... print(char) ... l e a d
LIVE DEMO
for
loop syntaxfor element in collection: <do things with element>
for
loop statement ends in a colon, :
tab
(\t
)for
loopValues defined outside a loop can be modified in the loop
>>> length = 0 >>> for vowel in 'aeiou': ... length = length + 1 ... >>> print("There are", length, "vowels")
QUESTION: What output does this program give you?
LIVE DEMO
for
loop variables>>> letter = "z" >>> print(letter) z >>> for letter in "abc": ... print(letter) ... >>> print("after the loop, letter is:", letter)
LIVE DEMO
range()
range()
is a Python
function that creates a sequence of numbers
range
type that can be iterated over in a loop>>> seq = range(3) >>> print("Range is:", seq) >>> for val in seq: ... print(val) >>> seq = range(2, 5) >>> seq = range(3, 10, 3) >>> seq = range(10, 0, -1)
LIVE DEMO
Can you write a loop that takes a string, e.g. Newton
, and produces a new string with the characters in reverse order, e.g. notweN
?
HINTS
ab
+ cd
mystr = ""
list
slist
s are a built in Python
datatype>>> odds = [1, 3, 5, 7] >>> print("odds are:", odds) odds are: [1, 3, 5, 7] >>> print('first and last:', odds[0], odds[-1]) first and last: 1 7 >>> for number in odds: ... print(number)
LIVE DEMO
list
s, like string
s, are sequenceslist
elements can be changed: list
s are mutablestring
s are not mutable>>> names = ["Curie", "Darwing", "Turing"] # typo in Darwin's name >>> print("names is originally:", names) names is originally: ['Curie', 'Darwing', 'Turing'] >>> names[1] = 'Darwin' # correct the name >>> print('final value of names:', names) final value of names: ['Curie', 'Darwin', 'Turing'] >>> name = "darwin" >>> name[0] = "D" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment
LIVE DEMO
There are risks to modifying list
s in-place
>>> my_list = [1, 2, 3, 4] >>> your_list = my_list >>> print("my list:", my_list) my list: [1, 2, 3, 4] >>> my_list[1] = 0 >>> print("your list:", your_list)
QUESTION: What is the value of your_list
?
LIVE DEMO
list
copieslist
by slicing it or using the list()
functionnew_list = old_list[:]
>>> my_list = [1, 2, 3, 4] # original list >>> your_list = my_list[:] # copy 1 >>> your_other_list = list(my_list) # copy 2 >>> print("my_list:", my_list) my_list: [1, 2, 3, 4] >>> my_list[1] = 0 # change element >>> print("my_list:", my_list) my_list: [1, 0, 3, 4] >>> print("your_list:", your_list) your_list: [1, 2, 3, 4] >>> print("your_other_list:", your_list) your_other_list: [1, 2, 3, 4]
LIVE DEMO
list
functionslist
s are Python
objects and have useful functions (methods)>>> print(odds) [1, 3, 5, 7] >>> odds.append(9) >>> print("odds after adding a value:", odds) odds after adding a value: [1, 3, 5, 7, 9] >>> odds.reverse() >>> print("odds after reversing the list:", odds) odds after reversing the list: [9, 7, 5, 3, 1] >>> odds.pop() 1 >>> print("odds after popping:", odds) odds after popping: [9, 7, 5, 3]
LIVE DEMO
Overloading refers to an operator (e.g. +
) having more than one meaning, depending on the thing it operates on.
+
means addlist
s, +
means concatenate>>> vowels = ['a', 'e', 'i', 'o', 'u'] >>> vowels_welsh = ['a', 'e', 'i', 'o', 'u', 'w', 'y'] >>> print(vowels + vowels_welsh) ['a', 'e', 'i', 'o', 'u', 'a', 'e', 'i', 'o', 'u', 'w', 'y'] >>> counts = [2, 4, 6, 8, 10] >>> repeats = counts * 2 >>> print(repeats) [2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
QUESTION: What does ‘multiplication’ (*
) do for lists?
LIVE DEMO
<something>
if
some condition is trueif
statement:if <condition>: <executed if condition is True>
>>> num = 37 >>> if num > 100: ... print('greater') ... >>> num = 149 >>> if num > 100: ... print('greater') ... greater
LIVE DEMO
if-else
statementsif
statement executes code if the condition evaluates as true
false
?if <condition>: <executed if condition is True> else: <executed if condition is not True>
>>> num = 37 >>> if num > 100: ... print('greater') ... else: ... print('not greater') ... not greater
LIVE DEMO
if-elif-else
elif
(else if
)if <condition1>: <executed if condition1 is True> elif <condition2>: <executed if condition2 is True and condition1 is not True> else: <executed if no conditions True>
>>> num = -3 >>> if num > 0: ... print(num, "is positive") ... elif num == 0: ... print(num, "is zero") ... else: ... print(num, "is negative") ... -3 is negative
LIVE DEMO
Conditions can be combined using Boolean Logic
and
, or
and not
>>> if (4 > 0) or (2 > 0): ... print('at least one part is true') ... else: ... print('both parts are false') ... at least one part is true
LIVE DEMO
What is the result of executing the code below?
>>> if 4 > 5: ... print('A') ... elif 4 == 5: ... print('B') ... elif 4 < 5: ... print('C')
A
B
C
B
and C
Two useful condition operators are ==
(equality) and in
(membership)
>>> print(1 == 1) True >>> print(1 ==2) False >>> print('a' in 'toast') True >>> print('b' in 'toast') False >>> print(1 in [1, 2, 3]) True >>> print(1 in range(3)) True
LIVE DEMO
We will write a new script to do this:
analyse_files.py
$ nano analyse_files.py
BUT we need to know how to interact with the filesystem to get filenames
os
moduleThe os
module allows interaction with the filesystem
import matplotlib.pyplot import numpy as np import os
LIVE DEMO
os.listdir()
The os.listdir()
function lists the contents of a directory
for
loop or list comprehensiondata
directory# Get a list of inflammation data files files = [] for fname in os.listdir('data'): if 'inflammation' in fname: files.append(fname) print("Inflammation data files:", files)
$ python analyse_files.py
BUT something’s not quite right…
LIVE DEMO
os.path.join()
os.listdir()
function only returns filenames, not the path (relative or absolute)os.path.join()
builds a path from directory and filenames, suitable for the underlying OS
files = [] for fname in os.listdir('data'): if 'inflammation' in fname: files.append(os.path.join('data', fname)) print("Inflammation data files:", files)
$ python analyse_files.py Inflammation data files: ['data/inflammation-05.csv', …]
LIVE DEMO
Now we have all the tools we need to
os
np.loadtxt()
np.mean()
, np.max()
, etc.matplotlib
.add_subplot()
We’re going to build the rest of this script together
$ nano analyse_files.py $ python analyse_files.py Analysing data/inflammation-05.csv Writing image to data/inflammation-05.png Analysing data/inflammation-11.csv Writing image to data/inflammation-11.png […]
LIVE DEMO
There are two suspicious features to some of the datasets
We’ll use if
statements to test for these conditions and give a warning
Is day zero value 0, and day 20 value 20?
$ nano analyse_files.py
# Test for suspicious maxima if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20: print("Suspicious-looking maxima!")
$ python analyse_files.py
LIVE DEMO
Are all the minima zero? (do they sum to zero?)
$ nano analyse_files.py
# Test for suspicious maxima if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20: print("Suspicious-looking maxima!") elif np.sum(data.min(axis=0)) == 0: print('Minima sum to zero!')
$ python analyse_files.py
LIVE DEMO
If everything’s OK, let’s be reassuring
$ nano analyse_files.py
# Test for suspicious maxima if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20: print("Suspicious-looking maxima!") elif np.sum(data.min(axis=0)) == 0: print('Minima sum to zero!') else: print('Seems OK!')
$ python analyse_files.py
LIVE DEMO
numpy.array
s, list
s, string
s, numbersPython
scripts: edit-save-executePython
(Part 2)Please do use the course Etherpad:
Analysing experimental data
We’re going to improve our code
Before we begin…
return to our neat working environment
$ cd ~/Desktop $ cd python-novice-inflammation
Jupyter
notebooksJupyter
At the command-line, start Jupyter
notebook:
jupyter notebook
LIVE DEMO
Jupyter
landing pageLIVE DEMO
LIVE DEMO
functions
)LIVE DEMO
Jupyter
documents are comprised of cells
Jupyter
cell can have one of several typesChange the first cell to Markdown
LIVE DEMO
Markdown
allows us to enter formatted text.
Execute a cell with Shift + Enter
LIVE DEMO
Python
code can be entered directly into a code cell
Execute a cell with Shift + Enter
LIVE DEMO
BUT the code is long and complicated
SO we will package our code for reuse: FUNCTIONS
Functions in code work like mathematical functions
\[y = f(x)\]
\(y\) is the returned value, or output(s)
The output \(y\) depends in some way on the value of \(x\) - defined by \(f()\).
Not all functions in code take an input, or produce a usable output, but the principle is generally the same.
fahr_to_kelvin()
to convert Fahrenheit to Kelvin
\[f(x) = ((x - 32) \times \frac{5}{9}) + 273.15\]
LIVE DEMO
fahr_to_kelvin()
in the notebook is the same as calling any other functionprint('freezing point of water:', fahr_to_kelvin(32)) print('boiling point of water:', fahr_to_kelvin(212))
LIVE DEMO
Create a new function in your notebook, and call it.
def kelvin_to_celsius(temp): return temp - 273.15
print('freezing point of water', kelvin_to_celsius(273.15))
LIVE DEMO
Composing Python
functions works the same way as for mathematical functions: \(y = f(g(x))\)
temp_f
) to C (temp_c
) by executing the code:temp_c = kelvin_to_celsius(fahr_to_kelvin(temp_f))
LIVE DEMO
We can wrap this composed function inside a new function:
fahr_to_celsius
:
def fahr_to_celsius(temp_f): return kelvin_to_celsius(fahr_to_kelvin(temp_f)) print('freezing point of water in Celsius:', fahr_to_celsius(32.0))
This is how programs are built:
combining small bits into larger bits until the behaviour we want is obtained
LIVE DEMO
Can you write a function called outer()
that:
string
argumentprint(outer("helium")) hm
Variables defined within a function, including parameters, are not ‘visible’ outside the function
a = "Hello" def my_fn(a): a = "Goodbye" my_fn(a) print(a)
LIVE DEMO
What would be printed if you ran the code below?
a, b = 3, 7 def swap(a, b): temp = a a = b b = temp swap(a, b) print(b, a)
7 3
3 7
3 3
7 7
Now we can write functions!
Let’s make the inflammation analysis easier to reuse: one function per operation
analyse_files.py
notebook from the first lessonWhat operations should be put into functions?
The code is divisible into two sections
detect_problems()
def detect_problems(data): if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20: print('Suspicious looking maxima!') elif np.sum(data.min(axis=0)) == 0: print('Minima add up to zero!') else: print('Seems OK!')
LIVE DEMO
plot_data()
We’ll write a function called plot_data()
that plots the data to file
def plot_data(data, fname): # create figure and three axes fig = plt.figure(figsize=(10.0, 3.0)) [...]
LIVE DEMO
Our code is now much more readable
detect_problems()
and plot_data()
# Analyse each file in turn for fname in files: print("Analysing", fname) # load data data = np.loadtxt(fname=fname, delimiter=',') # identify problems in the data detect_problems(data) # plot image in file imgname = fname[:-4] + '.png' plot_data(data, imgname)
Why should I bother?
How can I write Python programs that will work like Unix command-line tools?
|
)sys
modulesys
is a Python
module for interacting with the operating system
Open a new file called sys_version.py
in your editor
$ nano sys_version.py
import sys print('version is', sys.version)
$ python sys_version.py version is 3.6.3 |Anaconda custom (64-bit)| (default, Oct 6 2017, 12:04:38) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
LIVE DEMO
sys.argv
sys.argv
is a variable that contains the command-line arguments used to call our script
Open a new file called sys_argv.py
in your editor
$ nano sys_argv.py
import sys print('sys.argv is', sys.argv)
$ python sys_argv.py sys.argv is ['sys_argv.py'] $ python sys_argv.py item1 item2 somefile.txt sys.argv is ['sys_argv.py', 'item1', 'item2', 'somefile.txt']
LIVE DEMO
We’re going to build a script that reports readings from data files
$ python readings.py mydata.csv
--min
, --max
, --mean
$ python readings.py --min mydata.csv
$ python readings.py --min mydata.csv myotherdata.csv
STDIN
so we can use it with pipes$ cat mydata.csv | readings.py --min
We start with a script that doesn’t do all that
$ nano readings.py
import sys import numpy def main(): script = sys.argv[0] filename = sys.argv[1] data = numpy.loadtxt(filename, delimiter=',') for m in numpy.mean(data, axis=1): print(m)
LIVE DEMO
There’s a way to tell if a Python
file is being run as a script
import readings
)$ python readings.py
)Python
code has __name__ == '__main__'
only when run as a scriptWe run main()
only if the file is run as a script
if __name__ == '__main__': main()
Add this to readings.py
and run the script
LIVE DEMO
We want to be able to analyse multiple files with one command
NOTE: wildcards are expanded by the operating system
$ ls data/small-* data/small-01.csv data/small-02.csv data/small-03.csv $ python sys_argv.py data/small-* sys.argv is ['sys_argv.py', 'data/small-01.csv', 'data/small-02.csv', 'data/small-03.csv']
1
onwards are filenamesdef main(): script = sys.argv[0] for filename in sys.argv[1:]: print(filename) data = numpy.loadtxt(filename, delimiter=',') for m in numpy.mean(data, axis=1): print(m)
We want to use --min
, --max
, --mean
to tell the script what to calculate
$ python readings.py --max myfile.csv
The flag will be sys.argv[1]
, so filenames are sys.argv[2:]
def main(): script = sys.argv[0] action = sys.argv[1] filenames = sys.argv[2:] if action not in ['--min', '--mean', '--max']: print('Action is not one of --min, --mean, or --max: ' + action) sys.exit(1) for f in filenames: process(f, action)
process()
We split the script into two functions for readability
process()
function returns the summarised datadef process(filename, action): data = numpy.loadtxt(filename, delimiter=',') if action == '--min': values = numpy.min(data, axis=1) elif action == '--mean': values = numpy.mean(data, axis=1) elif action == '--max': values = numpy.max(data, axis=1) for m in values: print(m)
LIVE DEMO
STDIN
The final change will let us use STDIN
if no file is specified
sys.stdin
catches STDIN
from the operating systemif len(filenames) == 0: process(sys.stdin, action) else: for f in filenames: process(f, action)
$ python readings.py --max < data/small-01.csv
LIVE DEMO
testing
centre()
import numpy as np def centre(data, desired): return (data - np.mean(data)) + desired
LIVE DEMO
centre()
on real data
Use numpy
to create an artificial dataset
z = np.zeros((2, 2)) print(centre(z, 3.0))
LIVE DEMO
Try the function on real data…
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',') print(centre(data, 0))
LIVE DEMO
mean
, min
, max
, std
centred = centre(data, 0) print('original min, mean, and max are:', numpy.min(data), numpy.mean(data), numpy.max(data)) print('min, mean, and max of centered data are:', numpy.min(centred), numpy.mean(centred), numpy.max(centred)) print('std dev before and after:', numpy.std(data), numpy.std(centred))
LIVE DEMO
#
) is a good thingPython
provides for docstrings
Python
’s help systemdef centre(data, desired): """Returns the array in data, recentered around the desired value.""" return (data - numpy.mean(data)) + desired help(centre)
LIVE DEMO
centre()
function requires two argumentsdef centre(data, desired=0.0): """Returns the array in data, recentered around the desired value. Example: centre([1, 2, 3], 0) => [-1, 0, 1] """ return (data - np.mean(data)) + desired
centre(data, 0.0) centre(data, desired=0.0) centre(data)
LIVE DEMO
Can you write a function called rescale()
that - takes an array as input - returns an array with values scaled in the range [0.0, 1.0] - has an informative docstring
L
and H
are the lowest and highest values in the original array, then the replacement for a value v
should be (v-L) / (H-L)
.
Programming n. - the process of making errors and correcting them until the code works
Python
tries to tell you what has gone wrong by providing a traceback
def favourite_ice_cream(): ice_creams = [ "chocolate", "vanilla", "strawberry" ] print(ice_creams[3])
favourite_ice_cream()
LIVE DEMO
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-1-b0e1f9b712d6> in <module>() 8 print(ice_creams[3]) 9 ---> 10 favourite_ice_cream() <ipython-input-1-b0e1f9b712d6> in favourite_ice_cream() 6 "strawberry" 7 ] ----> 8 print(ice_creams[3]) 9 10 favourite_ice_cream() IndexError: list index out of range
LIVE DEMO
Python
def some_function() msg = "hello, world!" print(msg) return msg
LIVE DEMO
File "<ipython-input-3-dbf32ad5d3e8>", line 1 def some_function() ^ SyntaxError: invalid syntax
LIVE DEMO
def some_function(): msg = "hello, world!" print(msg) return msg
LIVE DEMO
File "<ipython-input-4-e169556d667b>", line 4 return msg ^ IndentationError: unexpected indent
LIVE DEMO
NameError
s occur when a variable is not defined in scope
print(a) --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-5-c5a4f3535135> in <module>() ----> 1 print(a) NameError: name 'a' is not defined
LIVE DEMO
IndexError
letters = ['a', 'b'] print("Letter #1 is", letters[0]) print("Letter #2 is", letters[1]) print("Letter #3 is", letters[2]) Letter #1 is a Letter #2 is b --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-9-62bced7460d2> in <module>() 2 print("Letter #1 is", letters[0]) 3 print("Letter #2 is", letters[1]) ----> 4 print("Letter #3 is", letters[2]) IndexError: list index out of range
LIVE DEMO
abbabbabba
?for number in range(10): # use a if the number is a multiple of 3, otherwise use b if (Number % 3) = 0: message = message + a else: message = message + "b" print(message)
What does this function do?
def s(p): a = 0 for v in p: a += v m = a / len(p) d = 0 for v in p: d += (v - m) * (v - m) return numpy.sqrt(d / (len(p) - 1))
What does this function do?
def std_dev(sample): sample_sum = 0 for value in sample: sample_sum += value sample_mean = sample_sum / len(sample) sum_squared_devs = 0 for value in sample: sum_squared_devs += (value - sample_mean) * (value - sample_mean) return numpy.sqrt(sum_squared_devs / (len(sample) - 1))
First line of defence: sensible naming, style and documentation
We’ve focused on the basics of building code: variables, loops, functions, etc.
Write code that checks its own operation
Pythonic
way to see if code runs correctly
Firefox
source code is checks on the rest of the code!assert
that a condition is True
True
, the code may be correctFalse
, the code is not correctassert <condition>, "Some text describing the problem"
numbers = [1.5, 2.3, 0.7, -0.001, 4.4] total = 0.0 for n in numbers: assert n > 0.0, 'Data should only contain positive values' total += n print('total is:', total)
QUESTION: What does this assertion do?
LIVE DEMO
def normalise_rectangle(rect): """Normalises a rectangle to the origin, longest axis 1.0 units.""" x0, y0, x1, y1 = rect dx = x1 - x0 dy = y1 - y0 if dx > dy: scaled = float(dx) / dy upper_x, upper_y = 1.0, scaled else: scaled = float(dx) / dy upper_x, upper_y = scaled, 1.0 return (0, 0, upper_x, upper_y)
Preconditions must be true at the start of an operation or function
rect
has four valuesdef normalise_rectangle(rect): """Normalises a rectangle to the origin, longest axis 1.0 units.""" assert len(rect) == 4, "Rectangle must have four co-ordinates" x0, y0, x1, y1 = rect [...]
LIVE DEMO
Postconditions must be true at the end of an operation or function.
def normalise_rectangle(rect): """Normalises a rectangle to the origin, longest axis 1.0 units.""" [...] assert 0 < upper_x <= 1.0, "Calculated upper x-coordinate invalid" assert 0 < upper_y <= 1.0, "Calculated upper y-coordinate invalid" return (0, 0, upper_x, upper_y)
LIVE DEMO
Assertions help understand programs
Fail early, fail often