Skip to content

Commit ca1c179

Browse files
committed
add biopython example code; add course table of contents to help planning for teachers
1 parent 20801dd commit ca1c179

2 files changed

Lines changed: 447 additions & 0 deletions

File tree

TOC.md

Lines changed: 371 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,371 @@
1+
An Introduction to Solving Biological Problems with Python
2+
================================================================================
3+
4+
Divided into 4 sessions over two days.
5+
6+
1. DAY 1. MORNING. SESSION 1.: running the Python interpreter, variables and types, arithmetic, basic data structures
7+
2. DAY 1. AFTERNOON. SESSION 2.: logic & flow control, loops, exceptions, importing libraries
8+
3. DAY 2. MORNING. SESSION 3.: custom functions, variable scope, some biological examples
9+
4. DAY 2. AFTERNOON. SESSION 4.: dealing with files, parsing file formats, introduction to BioPython
10+
11+
DAY 1. MORNING. SESSION 1.
12+
--------------------------------------------------------------------------------
13+
14+
### Part 1. [Gabor]
15+
16+
INTRO: the python programming language & python interpreter (command line)
17+
Python is free, cross-platform, widely used, well documented & well supported.
18+
Python is a simple interpreted language, with no separate compilation step.
19+
20+
- Getting started
21+
- Printing values
22+
- Using variables: they are names for values, created by use. No declaration necessary.
23+
A variable is just a name, it does not have a type. Values are garbage collected,
24+
if nothing refers to data any longer, it can be recycled. Must assign value to variable
25+
before using it. Python does not assume default values for variables,
26+
doing so can mask many errors.
27+
- Simple data types: Values do have types. Use functions to convert between types.
28+
- booleans
29+
- integers
30+
- floating point numbers
31+
- complex numbers
32+
- strings are sequences of characters
33+
- the None object
34+
- Arithmetic: addition, subtraction, multiplication, division, exponentiation, remainder
35+
- Saving code in files
36+
- Comments
37+
38+
#### EXERCISES
39+
40+
```
41+
* create a variable, print out a message
42+
* addition operator
43+
* calculate the mean of two variables
44+
* [1.1] Print DNA sequence from amino acid one.
45+
```
46+
47+
### Part 2. [Anne]
48+
49+
As well as the basic data types we introduced, python has several ways of storing
50+
a collection of values. We are going to see four of them: tuples, lists, sets and
51+
dictionaries.
52+
53+
- Collections: complex data types
54+
- tuples: A tuple is a sequence of immutable Python objects. Tuples are sequences,
55+
just like lists. The only difference is that tuples can't be changed i.e.,
56+
tuples are immutable and tuples use parentheses and lists use square brackets.
57+
- lists: the most popular [value, value, value, ...] it is mutable, it can be
58+
changed after been created. It is heterogeneous, it can store values of many kinds.
59+
Appending values to a list lengthens it, deleting values shortens it. Most
60+
operations on lists are methods. Two that are often used incorrectly sort() and reverse()
61+
- manipulating tuples and lists
62+
63+
Online Python doc: https://docs.python.org/2/ Library | 5.6. Sequence Types | Mutable Sequence Types (5.6.4)
64+
65+
#### EXERCISES
66+
67+
```
68+
* [1.2] Print DNA sequence from a list of DNA codons
69+
```
70+
71+
- String manipulations and format: strings are indexed exactly like lists.
72+
Strings are immutable, they cannot be changed in place. Use + to concatenate strings.
73+
Concatenation always produces a new string. Use string % to format output.
74+
Use triple quotes for multi-line strings. Strings have methods: capitalize()
75+
upper() lower() count() find() replace()
76+
77+
Online Python doc: https://docs.python.org/2/ Library | 5.6. Sequence Types | 5.6.1. String Methods
78+
79+
Online Python doc: https://docs.python.org/2/ Library | 5.6. Sequence Types | 5.6.2. String Formatting Operations
80+
81+
#### EXERCISES
82+
83+
```
84+
* [1.3] String manipulation using your name
85+
```
86+
87+
- Sets contain unique unordered elements. They are very similar to lists but
88+
because the elements are not in order they do not have an index.
89+
90+
Online Python doc: https://docs.python.org/2/ Library | 5.7. Set Types
91+
92+
#### EXERCISES
93+
94+
```
95+
* [1.4] Find the unique amino acid codes in a protein sequence
96+
```
97+
98+
- Dictionaries contain a mapping of keys to values
99+
100+
Online Python doc: https://docs.python.org/2/ Library | 5.8. Mapping Types
101+
102+
```
103+
Dictionary can be very useful when combined with string formatting e.g.
104+
format_string = "Dear %(name)s, we have sequenced %(num)d libraries. The yield is %(yield)dM reads."
105+
print format_string % {'name': 'Anne', 'num':3, 'yield': 182}
106+
```
107+
108+
#### EXERCISES
109+
110+
```
111+
* [1.5] Use a dictionary to map between codon sequences and amino acids they
112+
encode to print out the name of the amino acids of a DNA sequence
113+
```
114+
115+
116+
```
117+
>>> TAKE HOME MESSAGE
118+
>>> Variables are labels that refer to data.
119+
>>> Many variables may refer to the same piece of data.
120+
>>> Use strings to store text.
121+
>>> Use lists to store many related values in order.
122+
>>> User sets to store unique related values in order.
123+
>>> Use dictionaries to store key/value pairs.
124+
```
125+
126+
DAY 1. AFTERNOON. SESSION 2.
127+
--------------------------------------------------------------------------------
128+
129+
### Part 1. [Gabor]
130+
131+
INTRO: program control and logic - code blocks: if/loops/exceptions.
132+
Real power of programs comes from repetition and selection. Why indentation?
133+
Because it makes the code you write clearer and easier to read.
134+
Python style guide (PEP 8) recommends 4 spaces.
135+
Loops let us do things many times. Collections let us store many values together.
136+
137+
- code blocks
138+
- conditional execution
139+
- the if statement: use if/elif/else to make choices
140+
- comparisons and truth
141+
142+
#### EXERCISES
143+
144+
```
145+
[2.0] Compare your age with other persons and print if you are younger/older/same age
146+
[2.?] Check if a DNA sequence contain a stop codon
147+
```
148+
149+
- loops
150+
- the for loop: a for loop is used to access each value in turn
151+
- the while loop: a while loop is used to step through all possible indices
152+
- skipping and breaking loops
153+
- looping gotchas
154+
155+
#### EXERCISES
156+
157+
```
158+
[2.1] Loop over a list of bases using for and while loops
159+
```
160+
161+
- more looping
162+
- using enumerate
163+
- using zip
164+
- filtering in loops
165+
166+
#### EXERCISES
167+
168+
```
169+
[2.2] Calculate the GC content of a DNA sequence
170+
```
171+
172+
### Part 2 (after break) [Anne]
173+
174+
Python provides two very important features to handle any unexpected error in your
175+
Python programs and to add debugging capabilities in them: exceptions and assertions.
176+
177+
- exceptions: An exception is an event, which occurs during the execution of a program,
178+
that disrupts the normal flow of the program's instructions. In general, when a Python
179+
script encounters a situation that it can't cope with, it raises an exception.
180+
An exception is a Python object that represents an error.
181+
182+
#### EXERCISES
183+
184+
```
185+
[2.3] Raise an exception if the DNA sequence is not valid
186+
```
187+
188+
- importing modules and libraries
189+
- help(math)
190+
- import sys
191+
- print sys.version & sys.platform
192+
- print sys.path which defines the list of directories Python searches in to find modules.
193+
sys.argv: The most commonly-used element of sys is probably sys.argv, which holds the command-line arguments of the currently-executing program.
194+
195+
```
196+
>>> TAKE HOME MESSAGE
197+
>>> Use while to repeat something until something changes.
198+
>>> Use for to do something once for each part of a larger whole.
199+
>>> Use if and else to make choices.
200+
```
201+
202+
DAY 2. MORNING. SESSION 3.
203+
--------------------------------------------------------------------------------
204+
205+
### Part 1. [Anne]
206+
207+
INTRO: function basics and definition
208+
A programming language should not include everything anyone might ever want
209+
Instead, it should make it easy for people to create what they need
210+
to solve specific problems by defining functions to create higher-level operations.
211+
In python it is done using the keyword 'def'.
212+
213+
- function definition syntax
214+
215+
#### EXERCISES
216+
217+
```
218+
[3.1a] Create a function that calculate the means of two number and then from a list of number
219+
[3.1b] Create a function to calculate the molecular weight of a DNA sequence
220+
```
221+
222+
- function arguments
223+
224+
#### EXERCISES
225+
226+
```
227+
[3.2] Extend the previous function to also calculate the weight of a RNA sequence
228+
```
229+
230+
- return value
231+
232+
#### EXERCISES
233+
234+
```
235+
[3.3] Write a function that counts the number of each base found in a DNA sequence
236+
```
237+
238+
### Part 2. [Gabor]
239+
240+
- variable scope: globals vs within blocks
241+
- advanced topics: anonymous functions (lambda); functions as values; nested functions
242+
243+
#### EXERCISES
244+
245+
```
246+
BIO examples
247+
- program ribosome that translates RNA into protein
248+
- extra points for also taking DNA (T -> U)
249+
- extra points for all reading frames.
250+
251+
- calculate GC content of DNA not on whole sequence but with sliding window.
252+
253+
- calculate hydrophobicity with sliding window.
254+
```
255+
256+
```
257+
>>> TAKE HOME MESSAGE
258+
>>> Define functions to break programs down into manageable pieces.
259+
>>> Remember that a function is really just another kind of data.
260+
```
261+
262+
Day 2. AFTERNOON. SESSION 4.
263+
--------------------------------------------------------------------------------
264+
265+
### Part 1. [Anne]
266+
267+
INTRO: In this session we cover 2 widely used ways of reading data into our
268+
programs, via the command line and by reading files from disk.
269+
270+
- reading command line arguments
271+
272+
#### EXERCISES
273+
274+
```
275+
[4.1a] Write a script that takes 2 integers from the command line using the sys.argv
276+
library, add the two numbers and printout the result
277+
[4.1b] Write a script tha takes a DNA sequence from the command line and prints out
278+
its length and GC content
279+
```
280+
281+
- the argparse library
282+
283+
#### EXERCISES
284+
285+
```
286+
[4.1c] Use the argparse library to do the same exercise as above
287+
```
288+
289+
### Part 2. [Gabor]
290+
291+
- file objects
292+
- mode modifiers
293+
- error checking
294+
- closing files
295+
- reading from files
296+
- the with statement
297+
- writing to files
298+
299+
#### EXERCISES
300+
301+
```
302+
[4.2a] Write a script that writes a list of number to a file, with each number
303+
on a separate line
304+
[4.2b] Open a file and for each line print out the line number and its length
305+
```
306+
307+
- data formats
308+
- delimited files
309+
- reading delimited files
310+
- writing delimited files
311+
- more advanced examples
312+
- read csv file
313+
- write csv file
314+
315+
#### EXERCISES
316+
317+
```
318+
[4.3a] Read a tab separated file
319+
[4.3b] Write a csv file
320+
```
321+
322+
- fixed format files (PDB)
323+
- XML files
324+
- python file libraries: os & os.path
325+
- more advanced examples
326+
- recursive file search
327+
- recursive delete
328+
329+
- system calls
330+
331+
#### EXERCISES
332+
333+
```
334+
[4.4] Write a script that execute the command 'ls' to get the list of files
335+
then modify your script to only print python files
336+
```
337+
338+
### Part 3. [Anne]
339+
340+
- using BioPython
341+
342+
Biopython is to make it as easy as possible to use Python for bioinformatics by
343+
creating high-quality, reusable modules and classes. Biopython features include
344+
parsers for various Bioinformatics file formats (BLAST, Clustalw, FASTA, Genbank,...),
345+
access to online services (NCBI, Expasy,...), interfaces to common and not-so-common programs
346+
(Clustalw, DSSP, MSMS...), a standard sequence class, various clustering modules,
347+
a KD tree data structure etc. and even documentation.
348+
349+
Basically, we just like to program in Python and want to make it as easy as possible
350+
to use Python for bioinformatics by creating high-quality, reusable modules and scripts.
351+
352+
Biopython tutorial http://biopython.org | Tutorial | 1.2 What can I find in the Biopython package
353+
354+
#### BioPython EXAMPLES
355+
356+
- more advanced examples
357+
- writing FASTA files
358+
- reading FASTA files
359+
360+
```
361+
>>> TAKE HOME MESSAGE
362+
>>> Happy Python programming!
363+
```
364+
365+
IDEAS: if you need help: http://stackoverflow.com/
366+
367+
IDEAS: Pylint is a tool that checks for errors in python code, tries to enforce a coding standard and looks for bad code smells: http://www.pylint.org/
368+
369+
IDEAS: Any code that hasn't been tested is probably wrong: Python unit testing framework unittest
370+
371+
IDEAS: from http://software-carpentry.org/v4/python

0 commit comments

Comments
 (0)