Please do use the course Etherpad:
- Communal notes: share your understanding, and benefit from others
- Ask questions: get detailed answers with links and examples
- A record/reference for after the course
26/03/2018
Please do use the course Etherpad:
Using the Python language
we won’t be covering the entire language
Text editor
Jupyter notebook
Python afterwards?PythonAnalysing and visualising experimental data
We’re going to get the computer to do this for us
Before we begin…
cd ~/Desktop mkdir python-novice-inflammation cd python-novice-inflammation
LIVE DEMO
Before we begin…
cp 2018-03-29-standrews/lessons/python/files/python-novice-inflammation-data.zip ./ unzip python-novice-inflammation-data.zip cp 2018-03-29-standrews/lessons/python/files/python-novice-inflammation-code.zip ./ unzip python-novice-inflammation-code.zip
(you can download files via Etherpad: http://pad.software-carpentry.org/2018-03-29-standrews)
LIVE DEMO
Python in the terminalWe start the Python console by executing the command python
$ python Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 6 2017, 12:04:38) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
LIVE DEMO
Python REPLPython’s console is a read-evaluate-print-loop, just like the shell>>> 3 + 5 8 >>> 12 / 7 1.7142857142857142 >>> 2 ** 16 65536 >>> 15 % 4 3 >>> (2 + 4) * (3 - 7) -24
LIVE DEMO
=>>> name = "Samia" >>> name 'Samia' >>> print(name) Samia
LIVE DEMO
weight_kg = 55
print(weight_kg)
2.2 * weight_kg
print("weight in pounds", 2.2 * weight_kg)
weight_kg = 57.5
print("weight in kilograms is now:", weight_kg)
weight_lb = 2.2 * weight_kg
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
weight_kg = 100
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
LIVE DEMO
What are the values in mass and age after the following code is executed?
mass = 47.5 age = 122 mass = mass * 2.0 age = age - 20
mass == 47.5, age == 122mass == 95.0, age == 102mass == 47.5, age == 102mass == 95.0, age == 122What does the following code print out?
first, second = 'Grace', 'Hopper' third, fourth = second, first print(third, fourth)
Hopper GraceGrace Hopper"Grace Hopper""Hopper Grace"$ head data/inflammation-01.csv 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1 0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1 0,0,1,2,2,4,2,1,6,4,7,6,6,9,9,15,4,16,18,12,12,5,18,9,5,3,10,3,12,7,8,4,7,3,5,4,4,3,2,1 0,0,2,2,4,2,2,5,5,8,6,5,11,9,4,13,5,12,10,6,9,17,15,8,9,3,13,7,8,2,8,8,4,2,3,5,4,1,1,1 0,0,1,2,3,1,2,3,5,3,7,8,8,5,10,9,15,11,18,19,20,8,5,13,15,10,6,10,6,7,4,9,3,5,2,5,3,2,2,1 0,0,0,3,1,5,6,5,5,8,2,4,11,12,10,11,9,10,17,11,6,16,12,6,8,14,6,13,10,11,4,6,4,7,6,3,2,1,0,0 0,1,1,2,1,3,5,3,5,8,6,8,12,5,13,6,13,8,16,8,18,15,16,14,12,7,3,8,9,11,2,5,4,5,1,4,1,2,0,0
Python, we’ll use the numpy libraryWe want to produce summary information about inflammation by patient and by day
Python librariesPython contains many powerful, general toolsimportPyPI and conda>>> import numpy
LIVE DEMO
numpy provides a function loadtxt() to load tabular data:numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')
loadtxt() belongs to numpyfname: an argument expecting the path to a filedelimiter: an argument expecting the character that separates columns>>> numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')
array([[ 0., 0., 1., ..., 3., 0., 0.],
[ 0., 1., 2., ..., 1., 0., 1.],
[ 0., 1., 1., ..., 2., 1., 1.],
...,
[ 0., 1., 1., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 0., 2., 0.],
[ 0., 0., 1., ..., 1., 1., 0.]])
... indicate missing rows or columns1 == 1. == 1.0)Assign the matrix to a variable called data
LIVE DEMO
>>> type(data) <class 'numpy.ndarray'>
LIVE DEMO
datadata.<attribute> e.g. data.shape>>> print(data.dtype) float64 >>> print(data.shape) (60, 40)
LIVE DEMO
>>> print('first value in data:', data[0, 0])
first value in data: 0.0
>>> print('middle value in data:', data[30, 20])
middle value in data: 13.0
LIVE DEMO
[ and specify start and end indices0:4 means start at zero and go up to but not including 4
0, 1, 2, 3: (colon).>>> print(data[0:4, 0:10]) [[ 0. 0. 1. 3. 1. 2. 4. 7. 8. 3.] [ 0. 1. 2. 1. 2. 1. 3. 2. 2. 6.] [ 0. 1. 1. 3. 3. 2. 6. 2. 5. 9.] [ 0. 0. 2. 0. 4. 2. 2. 1. 6. 7.]]
LIVE DEMO
Python assumes the first elementPython assumes the end element>>> small = data[:3, 36:]
>>> print('small is:\n', small)
QUESTION: What would : on its own indicate?
LIVE DEMO
We can take slices of any series, not just arrays.
>>> element = 'oxygen'
>>> print('first three characters:', element[0:3])
first three characters: oxy
What is the value of element[:4]?
oxyggenoxyenarrays know how to perform operations on their values+, -, *, /, etc. are elementwise>>> doubledata = data * 2.0
>>> print("original:\n", data[:3, 36:])
original:
[[ 2. 3. 0. 0.]
[ 1. 1. 0. 1.]
[ 2. 2. 1. 1.]]
>>> print("doubledata:\n", doubledata[:3, 36:])
doubledata:
[[ 4. 6. 0. 0.]
[ 2. 2. 0. 2.]
[ 4. 4. 2. 2.]]
LIVE DEMO
numpy functionsnumpy provides functions to operate on arrays>>> print(numpy.mean(data))
6.14875
>>> print(data.mean())
6.14875
>>> maxval = numpy.max(data)
>>> print('maximum inflammation:', maxval)
maximum inflammation: 20.0
>>> minval = data.min()
>>> print('minimum inflammation:', minval)
minimum inflammation: 0.0
LIVE DEMO
Extract a single row, or operate directly on a row
>>> patient_0 = data[0, :] # temporary variable
>>> print('maximum inflammation for patient 0:', patient_0.max())
maximum inflammation for patient 0: 18.0
>>> print('maximum inflammation for patient 0:', numpy.max(data[0, :]))
maximum inflammation for patient 0: 18.0
>>> print('maximum inflammation for patient 2:', numpy.max(data[2, :]))
maximum inflammation for patient 2: 19.0
LIVE DEMO
Tedious. Prone to errors/typos: easier way to to do this…
numpy operations on axesnumpy functions take an axis= parameter: 0 (columns) or 1 (rows)>>> print(numpy.max(data, axis=1)) # max by patient [ 18. 18. 19. 17. 17. 18. 17. 20. 17. 18. 18. 18. 17. 16. 17. 18. 19. 19. 17. 19. 19. 16. 17. 15. 17. 17. 18. 17. 20. 17. 16. 19. 15. 15. 19. 17. 16. 17. 19. 16. 18. 19. 16. 19. 18. 16. 19. 15. 16. 18. 14. 20. 17. 15. 17. 16. 17. 19. 18. 18.] >>> print(data.mean(axis=0)) # mean by day [ 0. 0.45 1.11666667 1.75 2.43333333 3.15 3.8 3.88333333 5.23333333 5.51666667 5.95 5.9 8.35 7.73333333 8.36666667 9.5 9.58333333 10.63333333 11.56666667 12.35 13.25 11.96666667 11.03333333 10.16666667 10. 8.66666667 9.15 7.25 7.33333333 6.58333333 6.06666667 5.95 5.11666667 3.6 3.3 3.56666667 2.48333333 1.5 1.13333333 0.56666667]
LIVE DEMO
“The purpose of computing is insight, not numbers” - Richard Hamming
The best way to gain insight is often to visualise data
matplotlibmatplotlib is the de facto standard/base plotting library in Python
>>> import matplotlib.pyplot
LIVE DEMO
matplotlib.pyplot.imshow()matplotlib.pyplot.imshow() renders matrix values as an image
>>> image = matplotlib.pyplot.imshow(data) >>> matplotlib.pyplot.show()
matplotlib.pyplot.plot()matplotlib.pyplot.plot() renders a line graphWe want to plot the average inflammation level on each day
>>> ave_inflammation = numpy.mean(data, axis=0) >>> ave_plot = matplotlib.pyplot.plot(ave_inflammation) >>> matplotlib.pyplot.show()
QUESTION: does this look reasonable?
.mean() looks artificial>>> max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0)) >>> matplotlib.pyplot.show() >>> min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0)) >>> matplotlib.pyplot.show()
QUESTION: does this look reasonable?
Can you create a plot showing the standard deviation (numpy.std()) of the inflammation data for each day across all patients?
We can put all three plots into a single figure
fig) with fig = matplotlib.pyplot.figure()fig with ax = fig.add_subplot()ax.set_ylabel()ax.plot()LIVE DEMO
Can you modify your script to display the three graphs on top of one another, instead of side by side?
Save your new script as exercise_05.py
for loopsfor loops
word = "lead" print(word[0]) print(word[1]) print(word[2]) print(word[3])
QUESTION: Why is this not a good approach?
LIVE DEMO
for loopsfor loops perform actions for every item in a collection>>> word = "lead" >>> for char in word: ... print(char) ... l e a d
LIVE DEMO
for loop syntaxfor element in collection:
<do things with element>
for loop statement ends in a colon, :tab (\t)for loopValues defined outside a loop can be modified in the loop
>>> length = 0
>>> for vowel in 'aeiou':
... length = length + 1
...
>>> print("There are", length, "vowels")
QUESTION: What output does this program give you?
LIVE DEMO
for loop variables>>> letter = "z"
>>> print(letter)
z
>>> for letter in "abc":
... print(letter)
...
>>> print("after the loop, letter is:", letter)
LIVE DEMO
range()range() is a Python function that creates a sequence of numbers
range type that can be iterated over in a loop>>> seq = range(3)
>>> print("Range is:", seq)
>>> for val in seq:
... print(val)
>>> seq = range(2, 5)
>>> seq = range(3, 10, 3)
>>> seq = range(10, 0, -1)
LIVE DEMO
Can you write a loop that takes a string, e.g. Newton, and produces a new string with the characters in reverse order, e.g. notweN?
HINTS
ab + cdmystr = ""listslists are a built in Python datatype>>> odds = [1, 3, 5, 7]
>>> print("odds are:", odds)
odds are: [1, 3, 5, 7]
>>> print('first and last:', odds[0], odds[-1])
first and last: 1 7
>>> for number in odds:
... print(number)
LIVE DEMO
lists, like strings, are sequenceslist elements can be changed: lists are mutablestrings are not mutable>>> names = ["Curie", "Darwing", "Turing"] # typo in Darwin's name
>>> print("names is originally:", names)
names is originally: ['Curie', 'Darwing', 'Turing']
>>> names[1] = 'Darwin' # correct the name
>>> print('final value of names:', names)
final value of names: ['Curie', 'Darwin', 'Turing']
>>> name = "darwin"
>>> name[0] = "D"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
LIVE DEMO
There are risks to modifying lists in-place
>>> my_list = [1, 2, 3, 4]
>>> your_list = my_list
>>> print("my list:", my_list)
my list: [1, 2, 3, 4]
>>> my_list[1] = 0
>>> print("your list:", your_list)
QUESTION: What is the value of your_list?
LIVE DEMO
list copieslist by slicing it or using the list() functionnew_list = old_list[:]>>> my_list = [1, 2, 3, 4] # original list
>>> your_list = my_list[:] # copy 1
>>> your_other_list = list(my_list) # copy 2
>>> print("my_list:", my_list)
my_list: [1, 2, 3, 4]
>>> my_list[1] = 0 # change element
>>> print("my_list:", my_list)
my_list: [1, 0, 3, 4]
>>> print("your_list:", your_list)
your_list: [1, 2, 3, 4]
>>> print("your_other_list:", your_list)
your_other_list: [1, 2, 3, 4]
LIVE DEMO
list functionslists are Python objects and have useful functions (methods)>>> print(odds)
[1, 3, 5, 7]
>>> odds.append(9)
>>> print("odds after adding a value:", odds)
odds after adding a value: [1, 3, 5, 7, 9]
>>> odds.reverse()
>>> print("odds after reversing the list:", odds)
odds after reversing the list: [9, 7, 5, 3, 1]
>>> odds.pop()
1
>>> print("odds after popping:", odds)
odds after popping: [9, 7, 5, 3]
LIVE DEMO
Overloading refers to an operator (e.g. +) having more than one meaning, depending on the thing it operates on.
+ means addlists, + means concatenate>>> vowels = ['a', 'e', 'i', 'o', 'u'] >>> vowels_welsh = ['a', 'e', 'i', 'o', 'u', 'w', 'y'] >>> print(vowels + vowels_welsh) ['a', 'e', 'i', 'o', 'u', 'a', 'e', 'i', 'o', 'u', 'w', 'y'] >>> counts = [2, 4, 6, 8, 10] >>> repeats = counts * 2 >>> print(repeats) [2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
QUESTION: What does ‘multiplication’ (*) do for lists?
LIVE DEMO
<something> if some condition is trueif statement:if <condition>: <executed if condition is True>
>>> num = 37
>>> if num > 100:
... print('greater')
...
>>> num = 149
>>> if num > 100:
... print('greater')
...
greater
LIVE DEMO
if-else statementsif statement executes code if the condition evaluates as true
false?if <condition>:
<executed if condition is True>
else:
<executed if condition is not True>
>>> num = 37
>>> if num > 100:
... print('greater')
... else:
... print('not greater')
...
not greater
LIVE DEMO
if-elif-elseelif (else if)if <condition1>:
<executed if condition1 is True>
elif <condition2>:
<executed if condition2 is True and condition1 is not True>
else:
<executed if no conditions True>
>>> num = -3 >>> if num > 0: ... print(num, "is positive") ... elif num == 0: ... print(num, "is zero") ... else: ... print(num, "is negative") ... -3 is negative
LIVE DEMO
Conditions can be combined using Boolean Logic
and, or and not>>> if (4 > 0) or (2 > 0):
... print('at least one part is true')
... else:
... print('both parts are false')
...
at least one part is true
LIVE DEMO
What is the result of executing the code below?
>>> if 4 > 5:
... print('A')
... elif 4 == 5:
... print('B')
... elif 4 < 5:
... print('C')
ABCB and CTwo useful condition operators are == (equality) and in (membership)
>>> print(1 == 1)
True
>>> print(1 ==2)
False
>>> print('a' in 'toast')
True
>>> print('b' in 'toast')
False
>>> print(1 in [1, 2, 3])
True
>>> print(1 in range(3))
True
LIVE DEMO
We will write a new script to do this:
analyse_files.py$ nano analyse_files.py
BUT we need to know how to interact with the filesystem to get filenames
os moduleThe os module allows interaction with the filesystem
import matplotlib.pyplot import numpy as np import os
LIVE DEMO
os.listdir()The os.listdir() function lists the contents of a directory
for loop or list comprehensiondata directory# Get a list of inflammation data files
files = []
for fname in os.listdir('data'):
if 'inflammation' in fname:
files.append(fname)
print("Inflammation data files:", files)
$ python analyse_files.py
BUT something’s not quite right…
LIVE DEMO
os.path.join()os.listdir() function only returns filenames, not the path (relative or absolute)os.path.join() builds a path from directory and filenames, suitable for the underlying OS
files = []
for fname in os.listdir('data'):
if 'inflammation' in fname:
files.append(os.path.join('data', fname))
print("Inflammation data files:", files)
$ python analyse_files.py Inflammation data files: ['data/inflammation-05.csv', …]
LIVE DEMO
Now we have all the tools we need to
osnp.loadtxt()np.mean(), np.max(), etc.matplotlib.add_subplot()We’re going to build the rest of this script together
$ nano analyse_files.py $ python analyse_files.py Analysing data/inflammation-05.csv Writing image to data/inflammation-05.png Analysing data/inflammation-11.csv Writing image to data/inflammation-11.png […]
LIVE DEMO
There are two suspicious features to some of the datasets
We’ll use if statements to test for these conditions and give a warning
Is day zero value 0, and day 20 value 20?
$ nano analyse_files.py
# Test for suspicious maxima
if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20:
print("Suspicious-looking maxima!")
$ python analyse_files.py
LIVE DEMO
Are all the minima zero? (do they sum to zero?)
$ nano analyse_files.py
# Test for suspicious maxima
if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20:
print("Suspicious-looking maxima!")
elif np.sum(data.min(axis=0)) == 0:
print('Minima sum to zero!')
$ python analyse_files.py
LIVE DEMO
If everything’s OK, let’s be reassuring
$ nano analyse_files.py
# Test for suspicious maxima
if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20:
print("Suspicious-looking maxima!")
elif np.sum(data.min(axis=0)) == 0:
print('Minima sum to zero!')
else:
print('Seems OK!')
$ python analyse_files.py
LIVE DEMO
numpy.arrays, lists, strings, numbersPython scripts: edit-save-executePython (Part 2)Please do use the course Etherpad:
Analysing experimental data
We’re going to improve our code
Before we begin…
return to our neat working environment
$ cd ~/Desktop $ cd python-novice-inflammation
Jupyter notebooksJupyterAt the command-line, start Jupyter notebook:
jupyter notebook
LIVE DEMO
Jupyter landing pageLIVE DEMO
LIVE DEMO
functions)LIVE DEMO
Jupyter documents are comprised of cells
Jupyter cell can have one of several typesChange the first cell to Markdown
LIVE DEMO
Markdown allows us to enter formatted text.
Execute a cell with Shift + Enter
LIVE DEMO
Python code can be entered directly into a code cell
Execute a cell with Shift + Enter
LIVE DEMO
BUT the code is long and complicated
SO we will package our code for reuse: FUNCTIONS
Functions in code work like mathematical functions
\[y = f(x)\]
\(y\) is the returned value, or output(s)
The output \(y\) depends in some way on the value of \(x\) - defined by \(f()\).
Not all functions in code take an input, or produce a usable output, but the principle is generally the same.
fahr_to_kelvin() to convert Fahrenheit to Kelvin
\[f(x) = ((x - 32) \times \frac{5}{9}) + 273.15\]
LIVE DEMO
fahr_to_kelvin() in the notebook is the same as calling any other functionprint('freezing point of water:', fahr_to_kelvin(32))
print('boiling point of water:', fahr_to_kelvin(212))
LIVE DEMO
Create a new function in your notebook, and call it.
def kelvin_to_celsius(temp): return temp - 273.15
print('freezing point of water', kelvin_to_celsius(273.15))
LIVE DEMO
Composing Python functions works the same way as for mathematical functions: \(y = f(g(x))\)
temp_f) to C (temp_c) by executing the code:temp_c = kelvin_to_celsius(fahr_to_kelvin(temp_f))
LIVE DEMO
We can wrap this composed function inside a new function:
fahr_to_celsius:
def fahr_to_celsius(temp_f):
return kelvin_to_celsius(fahr_to_kelvin(temp_f))
print('freezing point of water in Celsius:', fahr_to_celsius(32.0))
This is how programs are built:
combining small bits into larger bits until the behaviour we want is obtained
LIVE DEMO
Can you write a function called outer() that:
string argumentprint(outer("helium"))
hm
Variables defined within a function, including parameters, are not ‘visible’ outside the function
a = "Hello" def my_fn(a): a = "Goodbye" my_fn(a) print(a)
LIVE DEMO
What would be printed if you ran the code below?
a, b = 3, 7
def swap(a, b):
temp = a
a = b
b = temp
swap(a, b)
print(b, a)
7 33 73 37 7Now we can write functions!
Let’s make the inflammation analysis easier to reuse: one function per operation
analyse_files.py notebook from the first lessonWhat operations should be put into functions?
The code is divisible into two sections
detect_problems()def detect_problems(data):
if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20:
print('Suspicious looking maxima!')
elif np.sum(data.min(axis=0)) == 0:
print('Minima add up to zero!')
else:
print('Seems OK!')
LIVE DEMO
plot_data()We’ll write a function called plot_data() that plots the data to file
def plot_data(data, fname):
# create figure and three axes
fig = plt.figure(figsize=(10.0, 3.0))
[...]
LIVE DEMO
Our code is now much more readable
detect_problems() and plot_data()# Analyse each file in turn
for fname in files:
print("Analysing", fname)
# load data
data = np.loadtxt(fname=fname, delimiter=',')
# identify problems in the data
detect_problems(data)
# plot image in file
imgname = fname[:-4] + '.png'
plot_data(data, imgname)
Why should I bother?
How can I write Python programs that will work like Unix command-line tools?
|)sys modulesys is a Python module for interacting with the operating system
Open a new file called sys_version.py in your editor
$ nano sys_version.py
import sys
print('version is', sys.version)
$ python sys_version.py version is 3.6.3 |Anaconda custom (64-bit)| (default, Oct 6 2017, 12:04:38) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
LIVE DEMO
sys.argvsys.argv is a variable that contains the command-line arguments used to call our script
Open a new file called sys_argv.py in your editor
$ nano sys_argv.py
import sys
print('sys.argv is', sys.argv)
$ python sys_argv.py sys.argv is ['sys_argv.py'] $ python sys_argv.py item1 item2 somefile.txt sys.argv is ['sys_argv.py', 'item1', 'item2', 'somefile.txt']
LIVE DEMO
We’re going to build a script that reports readings from data files
$ python readings.py mydata.csv
--min, --max, --mean
$ python readings.py --min mydata.csv
$ python readings.py --min mydata.csv myotherdata.csv
STDIN so we can use it with pipes$ cat mydata.csv | readings.py --min
We start with a script that doesn’t do all that
$ nano readings.py
import sys
import numpy
def main():
script = sys.argv[0]
filename = sys.argv[1]
data = numpy.loadtxt(filename, delimiter=',')
for m in numpy.mean(data, axis=1):
print(m)
LIVE DEMO
There’s a way to tell if a Python file is being run as a script
import readings)$ python readings.py)Python code has __name__ == '__main__' only when run as a scriptWe run main() only if the file is run as a script
if __name__ == '__main__': main()
Add this to readings.py and run the script
LIVE DEMO
We want to be able to analyse multiple files with one command
NOTE: wildcards are expanded by the operating system
$ ls data/small-* data/small-01.csv data/small-02.csv data/small-03.csv $ python sys_argv.py data/small-* sys.argv is ['sys_argv.py', 'data/small-01.csv', 'data/small-02.csv', 'data/small-03.csv']
1 onwards are filenamesdef main():
script = sys.argv[0]
for filename in sys.argv[1:]:
print(filename)
data = numpy.loadtxt(filename, delimiter=',')
for m in numpy.mean(data, axis=1):
print(m)
We want to use --min, --max, --mean to tell the script what to calculate
$ python readings.py --max myfile.csv
The flag will be sys.argv[1], so filenames are sys.argv[2:]
def main():
script = sys.argv[0]
action = sys.argv[1]
filenames = sys.argv[2:]
if action not in ['--min', '--mean', '--max']:
print('Action is not one of --min, --mean, or --max: ' + action)
sys.exit(1)
for f in filenames:
process(f, action)
process()We split the script into two functions for readability
process() function returns the summarised datadef process(filename, action):
data = numpy.loadtxt(filename, delimiter=',')
if action == '--min':
values = numpy.min(data, axis=1)
elif action == '--mean':
values = numpy.mean(data, axis=1)
elif action == '--max':
values = numpy.max(data, axis=1)
for m in values:
print(m)
LIVE DEMO
STDINThe final change will let us use STDIN if no file is specified
sys.stdin catches STDIN from the operating system if len(filenames) == 0:
process(sys.stdin, action)
else:
for f in filenames:
process(f, action)
$ python readings.py --max < data/small-01.csv
LIVE DEMO
testingcentre()import numpy as np
def centre(data, desired):
return (data - np.mean(data)) + desired
LIVE DEMO
centre() on real data
Use numpy to create an artificial dataset
z = np.zeros((2, 2)) print(centre(z, 3.0))
LIVE DEMO
Try the function on real data…
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',') print(centre(data, 0))
LIVE DEMO
mean, min, max, stdcentred = centre(data, 0)
print('original min, mean, and max are:', numpy.min(data), numpy.mean(data), numpy.max(data))
print('min, mean, and max of centered data are:', numpy.min(centred),
numpy.mean(centred), numpy.max(centred))
print('std dev before and after:', numpy.std(data), numpy.std(centred))
LIVE DEMO
#) is a good thingPython provides for docstrings
Python’s help systemdef centre(data, desired):
"""Returns the array in data, recentered around the desired value."""
return (data - numpy.mean(data)) + desired
help(centre)
LIVE DEMO
centre() function requires two argumentsdef centre(data, desired=0.0):
"""Returns the array in data, recentered around the desired value.
Example: centre([1, 2, 3], 0) => [-1, 0, 1]
"""
return (data - np.mean(data)) + desired
centre(data, 0.0) centre(data, desired=0.0) centre(data)
LIVE DEMO
Can you write a function called rescale() that - takes an array as input - returns an array with values scaled in the range [0.0, 1.0] - has an informative docstring
L and H are the lowest and highest values in the original array, then the replacement for a value v should be (v-L) / (H-L).
Programming n. - the process of making errors and correcting them until the code works
Python tries to tell you what has gone wrong by providing a traceback
def favourite_ice_cream():
ice_creams = [
"chocolate",
"vanilla",
"strawberry"
]
print(ice_creams[3])
favourite_ice_cream()
LIVE DEMO
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-1-b0e1f9b712d6> in <module>()
8 print(ice_creams[3])
9
---> 10 favourite_ice_cream()
<ipython-input-1-b0e1f9b712d6> in favourite_ice_cream()
6 "strawberry"
7 ]
----> 8 print(ice_creams[3])
9
10 favourite_ice_cream()
IndexError: list index out of range
LIVE DEMO
Pythondef some_function()
msg = "hello, world!"
print(msg)
return msg
LIVE DEMO
File "<ipython-input-3-dbf32ad5d3e8>", line 1
def some_function()
^
SyntaxError: invalid syntax
LIVE DEMO
def some_function():
msg = "hello, world!"
print(msg)
return msg
LIVE DEMO
File "<ipython-input-4-e169556d667b>", line 4
return msg
^
IndentationError: unexpected indent
LIVE DEMO
NameErrors occur when a variable is not defined in scope
print(a) --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-5-c5a4f3535135> in <module>() ----> 1 print(a) NameError: name 'a' is not defined
LIVE DEMO
IndexErrorletters = ['a', 'b']
print("Letter #1 is", letters[0])
print("Letter #2 is", letters[1])
print("Letter #3 is", letters[2])
Letter #1 is a
Letter #2 is b
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-9-62bced7460d2> in <module>()
2 print("Letter #1 is", letters[0])
3 print("Letter #2 is", letters[1])
----> 4 print("Letter #3 is", letters[2])
IndexError: list index out of range
LIVE DEMO
abbabbabba?for number in range(10):
# use a if the number is a multiple of 3, otherwise use b
if (Number % 3) = 0:
message = message + a
else:
message = message + "b"
print(message)
What does this function do?
def s(p):
a = 0
for v in p:
a += v
m = a / len(p)
d = 0
for v in p:
d += (v - m) * (v - m)
return numpy.sqrt(d / (len(p) - 1))
What does this function do?
def std_dev(sample):
sample_sum = 0
for value in sample:
sample_sum += value
sample_mean = sample_sum / len(sample)
sum_squared_devs = 0
for value in sample:
sum_squared_devs += (value - sample_mean) * (value - sample_mean)
return numpy.sqrt(sum_squared_devs / (len(sample) - 1))
First line of defence: sensible naming, style and documentation
We’ve focused on the basics of building code: variables, loops, functions, etc.
Write code that checks its own operation
Pythonic way to see if code runs correctly
Firefox source code is checks on the rest of the code!assert that a condition is True
True, the code may be correctFalse, the code is not correctassert <condition>, "Some text describing the problem"
numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
total = 0.0
for n in numbers:
assert n > 0.0, 'Data should only contain positive values'
total += n
print('total is:', total)
QUESTION: What does this assertion do?
LIVE DEMO
def normalise_rectangle(rect):
"""Normalises a rectangle to the origin, longest axis 1.0 units."""
x0, y0, x1, y1 = rect
dx = x1 - x0
dy = y1 - y0
if dx > dy:
scaled = float(dx) / dy
upper_x, upper_y = 1.0, scaled
else:
scaled = float(dx) / dy
upper_x, upper_y = scaled, 1.0
return (0, 0, upper_x, upper_y)
Preconditions must be true at the start of an operation or function
rect has four valuesdef normalise_rectangle(rect):
"""Normalises a rectangle to the origin, longest axis 1.0 units."""
assert len(rect) == 4, "Rectangle must have four co-ordinates"
x0, y0, x1, y1 = rect
[...]
LIVE DEMO
Postconditions must be true at the end of an operation or function.
def normalise_rectangle(rect):
"""Normalises a rectangle to the origin, longest axis 1.0 units."""
[...]
assert 0 < upper_x <= 1.0, "Calculated upper x-coordinate invalid"
assert 0 < upper_y <= 1.0, "Calculated upper y-coordinate invalid"
return (0, 0, upper_x, upper_y)
LIVE DEMO
Assertions help understand programs
Fail early, fail often