|
| 1 | +An Introduction to Solving Biological Problems with Python |
| 2 | +================================================================================ |
| 3 | + |
| 4 | +Divided into 4 sessions over two days. |
| 5 | + |
| 6 | +1. DAY 1. MORNING. SESSION 1.: running the Python interpreter, variables and types, arithmetic, basic data structures |
| 7 | +2. DAY 1. AFTERNOON. SESSION 2.: logic & flow control, loops, exceptions, importing libraries |
| 8 | +3. DAY 2. MORNING. SESSION 3.: custom functions, variable scope, some biological examples |
| 9 | +4. DAY 2. AFTERNOON. SESSION 4.: dealing with files, parsing file formats, introduction to BioPython |
| 10 | + |
| 11 | +DAY 1. MORNING. SESSION 1. |
| 12 | +-------------------------------------------------------------------------------- |
| 13 | + |
| 14 | +### Part 1. [Gabor] |
| 15 | + |
| 16 | +INTRO: the python programming language & python interpreter (command line) |
| 17 | +Python is free, cross-platform, widely used, well documented & well supported. |
| 18 | +Python is a simple interpreted language, with no separate compilation step. |
| 19 | + |
| 20 | +- Getting started |
| 21 | +- Printing values |
| 22 | +- Using variables: they are names for values, created by use. No declaration necessary. |
| 23 | +A variable is just a name, it does not have a type. Values are garbage collected, |
| 24 | +if nothing refers to data any longer, it can be recycled. Must assign value to variable |
| 25 | +before using it. Python does not assume default values for variables, |
| 26 | +doing so can mask many errors. |
| 27 | +- Simple data types: Values do have types. Use functions to convert between types. |
| 28 | + - booleans |
| 29 | + - integers |
| 30 | + - floating point numbers |
| 31 | + - complex numbers |
| 32 | + - strings are sequences of characters |
| 33 | + - the None object |
| 34 | +- Arithmetic: addition, subtraction, multiplication, division, exponentiation, remainder |
| 35 | +- Saving code in files |
| 36 | + - Comments |
| 37 | + |
| 38 | +#### EXERCISES |
| 39 | + |
| 40 | +``` |
| 41 | +* create a variable, print out a message |
| 42 | +* addition operator |
| 43 | +* calculate the mean of two variables |
| 44 | +* [1.1] Print DNA sequence from amino acid one. |
| 45 | +``` |
| 46 | + |
| 47 | +### Part 2. [Anne] |
| 48 | + |
| 49 | +As well as the basic data types we introduced, python has several ways of storing |
| 50 | +a collection of values. We are going to see four of them: tuples, lists, sets and |
| 51 | +dictionaries. |
| 52 | + |
| 53 | +- Collections: complex data types |
| 54 | + - tuples: A tuple is a sequence of immutable Python objects. Tuples are sequences, |
| 55 | + just like lists. The only difference is that tuples can't be changed i.e., |
| 56 | + tuples are immutable and tuples use parentheses and lists use square brackets. |
| 57 | + - lists: the most popular [value, value, value, ...] it is mutable, it can be |
| 58 | + changed after been created. It is heterogeneous, it can store values of many kinds. |
| 59 | + Appending values to a list lengthens it, deleting values shortens it. Most |
| 60 | + operations on lists are methods. Two that are often used incorrectly sort() and reverse() |
| 61 | + - manipulating tuples and lists |
| 62 | + |
| 63 | +Online Python doc: https://docs.python.org/2/ Library | 5.6. Sequence Types | Mutable Sequence Types (5.6.4) |
| 64 | + |
| 65 | +#### EXERCISES |
| 66 | + |
| 67 | +``` |
| 68 | +* [1.2] Print DNA sequence from a list of DNA codons |
| 69 | +``` |
| 70 | + |
| 71 | +- String manipulations and format: strings are indexed exactly like lists. |
| 72 | +Strings are immutable, they cannot be changed in place. Use + to concatenate strings. |
| 73 | +Concatenation always produces a new string. Use string % to format output. |
| 74 | +Use triple quotes for multi-line strings. Strings have methods: capitalize() |
| 75 | +upper() lower() count() find() replace() |
| 76 | + |
| 77 | +Online Python doc: https://docs.python.org/2/ Library | 5.6. Sequence Types | 5.6.1. String Methods |
| 78 | + |
| 79 | +Online Python doc: https://docs.python.org/2/ Library | 5.6. Sequence Types | 5.6.2. String Formatting Operations |
| 80 | + |
| 81 | +#### EXERCISES |
| 82 | + |
| 83 | +``` |
| 84 | +* [1.3] String manipulation using your name |
| 85 | +``` |
| 86 | + |
| 87 | +- Sets contain unique unordered elements. They are very similar to lists but |
| 88 | +because the elements are not in order they do not have an index. |
| 89 | + |
| 90 | +Online Python doc: https://docs.python.org/2/ Library | 5.7. Set Types |
| 91 | + |
| 92 | +#### EXERCISES |
| 93 | + |
| 94 | +``` |
| 95 | +* [1.4] Find the unique amino acid codes in a protein sequence |
| 96 | +``` |
| 97 | + |
| 98 | +- Dictionaries contain a mapping of keys to values |
| 99 | + |
| 100 | +Online Python doc: https://docs.python.org/2/ Library | 5.8. Mapping Types |
| 101 | + |
| 102 | +``` |
| 103 | +Dictionary can be very useful when combined with string formatting e.g. |
| 104 | +format_string = "Dear %(name)s, we have sequenced %(num)d libraries. The yield is %(yield)dM reads." |
| 105 | +print format_string % {'name': 'Anne', 'num':3, 'yield': 182} |
| 106 | +``` |
| 107 | + |
| 108 | +#### EXERCISES |
| 109 | + |
| 110 | +``` |
| 111 | +* [1.5] Use a dictionary to map between codon sequences and amino acids they |
| 112 | +encode to print out the name of the amino acids of a DNA sequence |
| 113 | +``` |
| 114 | + |
| 115 | + |
| 116 | +``` |
| 117 | +>>> TAKE HOME MESSAGE |
| 118 | +>>> Variables are labels that refer to data. |
| 119 | +>>> Many variables may refer to the same piece of data. |
| 120 | +>>> Use strings to store text. |
| 121 | +>>> Use lists to store many related values in order. |
| 122 | +>>> User sets to store unique related values in order. |
| 123 | +>>> Use dictionaries to store key/value pairs. |
| 124 | +``` |
| 125 | + |
| 126 | +DAY 1. AFTERNOON. SESSION 2. |
| 127 | +-------------------------------------------------------------------------------- |
| 128 | + |
| 129 | +### Part 1. [Gabor] |
| 130 | + |
| 131 | +INTRO: program control and logic - code blocks: if/loops/exceptions. |
| 132 | +Real power of programs comes from repetition and selection. Why indentation? |
| 133 | +Because it makes the code you write clearer and easier to read. |
| 134 | +Python style guide (PEP 8) recommends 4 spaces. |
| 135 | +Loops let us do things many times. Collections let us store many values together. |
| 136 | + |
| 137 | +- code blocks |
| 138 | +- conditional execution |
| 139 | + - the if statement: use if/elif/else to make choices |
| 140 | + - comparisons and truth |
| 141 | + |
| 142 | +#### EXERCISES |
| 143 | + |
| 144 | +``` |
| 145 | +[2.0] Compare your age with other persons and print if you are younger/older/same age |
| 146 | +[2.?] Check if a DNA sequence contain a stop codon |
| 147 | +``` |
| 148 | + |
| 149 | +- loops |
| 150 | + - the for loop: a for loop is used to access each value in turn |
| 151 | + - the while loop: a while loop is used to step through all possible indices |
| 152 | + - skipping and breaking loops |
| 153 | + - looping gotchas |
| 154 | + |
| 155 | +#### EXERCISES |
| 156 | + |
| 157 | +``` |
| 158 | +[2.1] Loop over a list of bases using for and while loops |
| 159 | +``` |
| 160 | + |
| 161 | +- more looping |
| 162 | + - using enumerate |
| 163 | + - using zip |
| 164 | + - filtering in loops |
| 165 | + |
| 166 | +#### EXERCISES |
| 167 | + |
| 168 | +``` |
| 169 | +[2.2] Calculate the GC content of a DNA sequence |
| 170 | +``` |
| 171 | + |
| 172 | +### Part 2 (after break) [Anne] |
| 173 | + |
| 174 | +Python provides two very important features to handle any unexpected error in your |
| 175 | +Python programs and to add debugging capabilities in them: exceptions and assertions. |
| 176 | + |
| 177 | +- exceptions: An exception is an event, which occurs during the execution of a program, |
| 178 | +that disrupts the normal flow of the program's instructions. In general, when a Python |
| 179 | +script encounters a situation that it can't cope with, it raises an exception. |
| 180 | +An exception is a Python object that represents an error. |
| 181 | + |
| 182 | +#### EXERCISES |
| 183 | + |
| 184 | +``` |
| 185 | +[2.3] Raise an exception if the DNA sequence is not valid |
| 186 | +``` |
| 187 | + |
| 188 | +- importing modules and libraries |
| 189 | + - help(math) |
| 190 | + - import sys |
| 191 | + - print sys.version & sys.platform |
| 192 | + - print sys.path which defines the list of directories Python searches in to find modules. |
| 193 | + sys.argv: The most commonly-used element of sys is probably sys.argv, which holds the command-line arguments of the currently-executing program. |
| 194 | + |
| 195 | +``` |
| 196 | +>>> TAKE HOME MESSAGE |
| 197 | +>>> Use while to repeat something until something changes. |
| 198 | +>>> Use for to do something once for each part of a larger whole. |
| 199 | +>>> Use if and else to make choices. |
| 200 | +``` |
| 201 | + |
| 202 | +DAY 2. MORNING. SESSION 3. |
| 203 | +-------------------------------------------------------------------------------- |
| 204 | + |
| 205 | +### Part 1. [Anne] |
| 206 | + |
| 207 | +INTRO: function basics and definition |
| 208 | +A programming language should not include everything anyone might ever want |
| 209 | +Instead, it should make it easy for people to create what they need |
| 210 | +to solve specific problems by defining functions to create higher-level operations. |
| 211 | +In python it is done using the keyword 'def'. |
| 212 | + |
| 213 | +- function definition syntax |
| 214 | + |
| 215 | +#### EXERCISES |
| 216 | + |
| 217 | +``` |
| 218 | +[3.1a] Create a function that calculate the means of two number and then from a list of number |
| 219 | +[3.1b] Create a function to calculate the molecular weight of a DNA sequence |
| 220 | +``` |
| 221 | + |
| 222 | +- function arguments |
| 223 | + |
| 224 | +#### EXERCISES |
| 225 | + |
| 226 | +``` |
| 227 | +[3.2] Extend the previous function to also calculate the weight of a RNA sequence |
| 228 | +``` |
| 229 | + |
| 230 | +- return value |
| 231 | + |
| 232 | +#### EXERCISES |
| 233 | + |
| 234 | +``` |
| 235 | +[3.3] Write a function that counts the number of each base found in a DNA sequence |
| 236 | +``` |
| 237 | + |
| 238 | +### Part 2. [Gabor] |
| 239 | + |
| 240 | +- variable scope: globals vs within blocks |
| 241 | +- advanced topics: anonymous functions (lambda); functions as values; nested functions |
| 242 | + |
| 243 | +#### EXERCISES |
| 244 | + |
| 245 | +``` |
| 246 | +BIO examples |
| 247 | +- program ribosome that translates RNA into protein |
| 248 | + - extra points for also taking DNA (T -> U) |
| 249 | + - extra points for all reading frames. |
| 250 | +
|
| 251 | +- calculate GC content of DNA not on whole sequence but with sliding window. |
| 252 | +
|
| 253 | +- calculate hydrophobicity with sliding window. |
| 254 | +``` |
| 255 | + |
| 256 | +``` |
| 257 | +>>> TAKE HOME MESSAGE |
| 258 | +>>> Define functions to break programs down into manageable pieces. |
| 259 | +>>> Remember that a function is really just another kind of data. |
| 260 | +``` |
| 261 | + |
| 262 | +Day 2. AFTERNOON. SESSION 4. |
| 263 | +-------------------------------------------------------------------------------- |
| 264 | + |
| 265 | +### Part 1. [Anne] |
| 266 | + |
| 267 | +INTRO: In this session we cover 2 widely used ways of reading data into our |
| 268 | +programs, via the command line and by reading files from disk. |
| 269 | + |
| 270 | +- reading command line arguments |
| 271 | + |
| 272 | +#### EXERCISES |
| 273 | + |
| 274 | +``` |
| 275 | +[4.1a] Write a script that takes 2 integers from the command line using the sys.argv |
| 276 | +library, add the two numbers and printout the result |
| 277 | +[4.1b] Write a script tha takes a DNA sequence from the command line and prints out |
| 278 | +its length and GC content |
| 279 | +``` |
| 280 | + |
| 281 | + - the argparse library |
| 282 | + |
| 283 | +#### EXERCISES |
| 284 | + |
| 285 | +``` |
| 286 | +[4.1c] Use the argparse library to do the same exercise as above |
| 287 | +``` |
| 288 | + |
| 289 | +### Part 2. [Gabor] |
| 290 | + |
| 291 | +- file objects |
| 292 | + - mode modifiers |
| 293 | + - error checking |
| 294 | +- closing files |
| 295 | +- reading from files |
| 296 | + - the with statement |
| 297 | +- writing to files |
| 298 | + |
| 299 | +#### EXERCISES |
| 300 | + |
| 301 | +``` |
| 302 | +[4.2a] Write a script that writes a list of number to a file, with each number |
| 303 | +on a separate line |
| 304 | +[4.2b] Open a file and for each line print out the line number and its length |
| 305 | +``` |
| 306 | + |
| 307 | +- data formats |
| 308 | +- delimited files |
| 309 | + - reading delimited files |
| 310 | + - writing delimited files |
| 311 | +- more advanced examples |
| 312 | + - read csv file |
| 313 | + - write csv file |
| 314 | + |
| 315 | +#### EXERCISES |
| 316 | + |
| 317 | +``` |
| 318 | +[4.3a] Read a tab separated file |
| 319 | +[4.3b] Write a csv file |
| 320 | +``` |
| 321 | + |
| 322 | +- fixed format files (PDB) |
| 323 | +- XML files |
| 324 | +- python file libraries: os & os.path |
| 325 | +- more advanced examples |
| 326 | + - recursive file search |
| 327 | + - recursive delete |
| 328 | + |
| 329 | +- system calls |
| 330 | + |
| 331 | +#### EXERCISES |
| 332 | + |
| 333 | +``` |
| 334 | +[4.4] Write a script that execute the command 'ls' to get the list of files |
| 335 | +then modify your script to only print python files |
| 336 | +``` |
| 337 | + |
| 338 | +### Part 3. [Anne] |
| 339 | + |
| 340 | +- using BioPython |
| 341 | + |
| 342 | +Biopython is to make it as easy as possible to use Python for bioinformatics by |
| 343 | +creating high-quality, reusable modules and classes. Biopython features include |
| 344 | +parsers for various Bioinformatics file formats (BLAST, Clustalw, FASTA, Genbank,...), |
| 345 | +access to online services (NCBI, Expasy,...), interfaces to common and not-so-common programs |
| 346 | +(Clustalw, DSSP, MSMS...), a standard sequence class, various clustering modules, |
| 347 | +a KD tree data structure etc. and even documentation. |
| 348 | + |
| 349 | +Basically, we just like to program in Python and want to make it as easy as possible |
| 350 | +to use Python for bioinformatics by creating high-quality, reusable modules and scripts. |
| 351 | + |
| 352 | +Biopython tutorial http://biopython.org | Tutorial | 1.2 What can I find in the Biopython package |
| 353 | + |
| 354 | +#### BioPython EXAMPLES |
| 355 | + |
| 356 | +- more advanced examples |
| 357 | + - writing FASTA files |
| 358 | + - reading FASTA files |
| 359 | + |
| 360 | +``` |
| 361 | +>>> TAKE HOME MESSAGE |
| 362 | +>>> Happy Python programming! |
| 363 | +``` |
| 364 | + |
| 365 | +IDEAS: if you need help: http://stackoverflow.com/ |
| 366 | + |
| 367 | +IDEAS: Pylint is a tool that checks for errors in python code, tries to enforce a coding standard and looks for bad code smells: http://www.pylint.org/ |
| 368 | + |
| 369 | +IDEAS: Any code that hasn't been tested is probably wrong: Python unit testing framework unittest |
| 370 | + |
| 371 | +IDEAS: from http://software-carpentry.org/v4/python |
0 commit comments