{ "metadata": { }, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
\n\n# Python - Loops\n\nby [The Carpentries](https://training.galaxyproject.org/hall-of-fame/carpentries/), [Helena Rasche](https://training.galaxyproject.org/hall-of-fame/hexylena/), [Donny Vrins](https://training.galaxyproject.org/hall-of-fame/dirowa/), [Bazante Sanders](https://training.galaxyproject.org/hall-of-fame/bazante1/)\n\nCC-BY licensed content from the [Galaxy Training Network](https://training.galaxyproject.org/)\n\n**Objectives**\n\n- How can I make a program do many things?\n\n**Objectives**\n\n- Explain what for loops are normally used for.\n- Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.\n- Write for loops that use the Accumulator pattern to aggregate values.\n\n**Time Estimation: 40M**\n
\n", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-0", "source": "

A for loop tells Python to execute some statements once for each value in a list, a character string, or some other collection: “for each thing in this group, do these operations”

\n
\n
Comment
\n

This tutorial is significantly based on the Carpentries Programming with Python, Programming with Python, and Plotting and Programming in Python, which are licensed CC-BY 4.0.

\n

Adaptations have been made to make this work better in a GTN/Galaxy environment.

\n
\n
\n
Agenda
\n

In this tutorial, we will cover:

\n
    \n
  1. For Loops
      \n
    1. Structure
    2. \n
    3. A for loop is made up of a collection, a loop variable, and a body.
    4. \n
    \n
  2. \n
\n
\n

For Loops

\n

Which of these would you rather write

\n
\n
\n
Input: Manually
\n
print(2)\nprint(3)\nprint(5)\nprint(7)\nprint(11)\n
\n
\n
\n
Output: With Loops
\n
for number in [2, 3, 5, 7, 11]:\n    print(number)\n
\n
\n
\n

It may be less clear here, since you just need to do one operation (print) but if you had to do two operations, three, more?

\n

Structure

\n

A for loop is made up of a collection, a loop variable, and a body.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-1", "source": [ "for number in [2, 3, 5]:\n", " doubled = number * 2\n", " print(f\"{number} doubled is {doubled}\")" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-2", "source": "\n
\n
\n
Input: The loop
\n
for number in [2, 3, 5]:\n    doubled = number * 2\n    print(f\"{number} doubled is {doubled}\")\n
\n
\n
\n
Output: What's really happening internally
\n
# First iteration, number = 2\ndoubled = number * 2\nprint(f\"{number} doubled is {doubled}\")\n# Second iteration, number = 3\ndoubled = number * 3\nprint(f\"{number} doubled is {doubled}\")\n# Third iteration, number = 5\ndoubled = number * 5\nprint(f\"{number} doubled is {doubled}\")\n
\n
\n
\n

Writing loops saves us time and makes sure our code is accurate, that we don’t accidentally introduce a typo somewhere in the loop body.

\n

Things You Can Loop Over

\n

You can loop over characters in a string

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-3", "source": [ "dna_string = 'ACTGGTCATCG'\n", "for base in dna_string:\n", " print(base)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-4", "source": "

You can loop over lists:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-5", "source": [ "cast = ['Elphaba', 'Glinda', 'Fiyero', 'Nessarose']\n", "for character in cast:\n", " print(character)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-6", "source": "

Indentation

\n

The first line of the for loop must end with a colon, and the body must be indented with four spaces. Many editors do this automatically for you and even convert Tabs into 4 spaces.

\n
\n
\n

The colon at the end of the first line signals the start of a block of statements.

\n
for x in y:\n    print(x)\n
\n

or

\n
if x > 10:\n    print(x)\n
\n

or even further nesting is possible:

\n
for x in y:\n    if x > 10:\n        print(x)\n
\n
\n

The indentation is in fact, quite necessary. Notice how this fails:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-7", "source": [ "#Fix me!\n", "for number in [2, 3, 5]:\n", "print(number)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-8", "source": "

And, likewise, this:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-9", "source": [ "patient1 = \"z2910\"\n", " patient2 = \"y9583\"" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-10", "source": "

Variable Naming

\n

Loop variables can be called anything, i, j, and k are very commong defaults due to their long history of use in other programing languages.\nAs with all variables, loop variables are: Created on demand, and Meaningless; their names can be anything at all.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-11", "source": [ "for kitten in [2, 3, 5]:\n", " print(kitten)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-12", "source": "

But meaningless is bad for variable names, and whenever possible, we should strive to pick useful, accurate variable names that help use remember what’s going on:

\n
for sequence in sequences:\n    print()\nfor patient in clinic_patients:\n    print()\nfor nucleotide in dna_sequence:\n    print()\n
\n

Range

\n

You can use range to iterate over a sequence of numbers. This is a built in function (check help(range)!) so it’s always available even if you don’t import anything. The range produced is non-inclusive: range(N) is the numbers 0 to N-1, so the result will be exactly the length you requested.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-13", "source": [ "for number in range(10):\n", " print(number)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-14", "source": "
\n
\n

In python range is a special type of iterable: none of the numbers are created until we need them.

\n
print(range(5))\nprint(range(-3, 8)[0:4])\n
\n

The easiest way to see what numbers are actually in there is to convert it to a list:

\n
print(list(range(5)))\nprint(list(range(-3, 8)))\nprint(list(range(0, 10, 2)))\n
\n
\n

Accumulation

\n

In programming you’ll often want to accumulate some values: counting things (or “accumulating”). The pattern consists of creating a variable to store your result, running a loop over some data, and in that loop, adding to the variable for your result.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-15", "source": [ "# Sum the first 10 integers.\n", "total = 0\n", "for number in range(1, 11):\n", " total = total + (number)\n", "print(f\" final: \")" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-16", "source": "

But how did we get that result? We can add some “debugging” lines to the above code to figure out how we got to that result. Try adding the following line in the above loop

\n
print(f'Currently {number}, our total is {total}')\n
\n

You can add it before you update total, after it, or both! Compare the outputs to understand what’s happening on each line.

\n
\n
\n

There are multiple ways to efficiently control your loop if you need it.\nthese are the inbuilt python functions: continue & break

\n

when python encounters continue in your loop it will stop working and goes to the next iteration of the loop.

\n
for letter in 'Galaxy':\n    if letter == 'l':\n        continue\n    print(f'The letters are: {letter}')\n
\n

with break python stops the loop and continues with the next part of the code like nothing happened

\n
for letter in 'Galaxy':\n    if letter == 'l':\n        break\n    print(f'The letters are: {letter}')\nprint('Done')\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-17", "source": [ "# Test break and continue here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-18", "source": "

Exercises

\n
\n
Question: Tracing Execution
\n

Create a table showing the numbers of the lines that are executed when this program runs,\nand the values of the variables after each line is executed.

\n
total = 0\nfor char in \"tin\":\n    total = total + 1\n
\n
👁 View solution\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
LineVariables
1total = 0
2total = 0 char = ‘t’
3total = 1 char = ‘t’
2total = 1 char = ‘i’
3total = 2 char = ‘i’
2total = 2 char = ‘n’
3total = 3 char = ‘n’
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-19", "source": [ "#Test your code here!" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-20", "source": "
\n
Question: Reversing a String
\n

Fill in the blanks in the program below so that it prints “stressed”\n(the reverse of the original character string “desserts”).

\n
original = \"stressed\"\nresult = ____\nfor char in original:\n    result = ____\nprint(result)\n
\n
👁 View solution\n
\n
original = \"stressed\"\nresult = \"\"\nfor char in original:\n    result = char + result\nprint(result)\n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-21", "source": [ "# Test your code here!\n", "original = \"stressed\"\n", "result = ____\n", "for char in original:\n", " result = ____\n", "print(result)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-22", "source": "
\n
Question: Practice Accumulating
\n

Fill in the blanks in each of the programs below\nto produce the indicated result.

\n
# Total length of the strings in the list: [\"red\", \"green\", \"blue\"] => 12\ntotal = 0\nfor word in [\"red\", \"green\", \"blue\"]:\n    ____ = ____ + len(word)\nprint(total)\n
\n
👁 View solution\n
\n
total = 0\nfor word in [\"red\", \"green\", \"blue\"]:\n    total = total + len(word)\nprint(total)\n
\n
\n
# List of word lengths: [\"red\", \"green\", \"blue\"] => [3, 5, 4]\nlengths = ____\nfor word in [\"red\", \"green\", \"blue\"]:\n    lengths.____(____)\nprint(lengths)\n
\n
👁 View solution\n
\n
lengths = []\nfor word in [\"red\", \"green\", \"blue\"]:\n    lengths.append(len(word))\nprint(lengths)\n
\n
\n
# Concatenate all words: [\"red\", \"green\", \"blue\"] => \"redgreenblue\"\nwords = [\"red\", \"green\", \"blue\"]\nresult = ____\nfor ____ in ____:\n    ____\nprint(result)\n
\n
👁 View solution\n
\n
words = [\"red\", \"green\", \"blue\"]\nresult = \"\"\nfor word in words:\n    result = result + word\nprint(result)\n
\n\n

Create an acronym: Starting from the list [\"red\", \"green\", \"blue\"], create the acronym \"RGB\" using\na for loop.

\n

Hint: You may need to use a string method to properly format the acronym.

\n
👁 View solution\n
\n
acronym = \"\"\nfor word in [\"red\", \"green\", \"blue\"]:\n    acronym = acronym + word[0].upper()\nprint(acronym)\n
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-23", "source": [ "#Test your code here!" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-24", "source": "
\n

Cumulative Sum

\n

Reorder and properly indent the lines of code below\nso that they print a list with the cumulative sum of data.\nThe result should be [1, 3, 5, 10].

\n
cumulative.append(total)\nfor number in data:\ncumulative = []\ntotal += number\ntotal = 0\nprint(cumulative)\ndata = [1,2,2,5]\n
\n
👁 View solution\n
\n
total = 0\ndata = [1,2,2,5]\ncumulative = []\nfor number in data:\n    total += number\n    cumulative.append(total)\nprint(cumulative)\n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-25", "source": [ "# Test your code here!" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-26", "source": "
\n
Question: A classic programmer test: Fizz Buzz
\n

FizzBuzz is a classic “test” question that is used in some job interviews to remove candidates who really do not understand programming. Your task is this:

\n

Write a for loop that loops over the numbers 1 to 50.

\n
    \n
  • If the number is divisible by 3, write Fizz instead of the number
  • \n
  • If the number is divisible by 5, write Buzz instead of the number
  • \n
  • If the number is divisible by 3 and 5 both, write FizzBuzz instead of the number
  • \n
  • Otherwise, write the number itself.
  • \n
\n
👁 View solution\n
\n
for i in range(1, 50):\n    if i % 3 == 0 and i % 5 == 0:\n        print(\"FizzBuzz\")\n    elif i % 3 == 0:\n        print(\"Fizz\")\n    elif i % 5 == 0:\n        print(\"Buzz\")\n    else:\n        print(i)\n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-27", "source": [ "# Do a FizzBuzz" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-28", "source": "
\n
Question: Identifying Item Errors
\n
    \n
  1. Read the code below and try to identify what the errors are\nwithout running it.
  2. \n
  3. Run the code, and read the error message. What type of error is it?
  4. \n
  5. Fix the error.
  6. \n
\n
seasons = ['Spring', 'Summer', 'Fall', 'Winter']\nprint(f'My favorite season is {seasons[4]}')\n
\n
👁 View solution\n
\n

This list has 4 elements and the index to access the last element in the list is 3.

\n
seasons = ['Spring', 'Summer', 'Fall', 'Winter']\nprint(f'My favorite season is {seasons[3]}')\n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-29", "source": [ "# Fix me!\n", "seasons = ['Spring', 'Summer', 'Fall', 'Winter']\n", "print(f'My favorite season is {seasons[4]}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-30", "source": "
\n
Question: Correct the errors
\n

This code is completely missing indentation, it needs to be fixed. Can you make some guesses at how indented each line should be?

\n
data = [1, 3, 5, 9]\nacc = 0\nfor i in data:\nif i < 4:\nacc = acc + i * 2\nelse:\nacc = acc + i\nprint(f'The value at {i} is {acc}')\nprint(f'The answer is {acc}')\n
\n
👁 View solution\n
\n
data = [1, 3, 5, 9]\nacc = 0\n# There is a : character at the end of this line, so you KNOW the next line\n# must be indented.\nfor i in data:\n    # Same here, another :\n    if i < 4:\n        acc = acc + i * 2\n    # And again! Another :\n    else:\n        acc = acc + i\n# But what about these lines?\nprint(f'The value at {i} is {acc}')\nprint(f'The answer is {acc}')\n
\n

Here this code is actually ambiguous, we don’t know how indented the two prints should be. This very synthetic example lacks good context, but there are three places it could be, with three different effects.

\n

There are two bits of knowledge we can use, however:

\n
    \n
  • the first print uses i, so it must be within the loop
  • \n
  • the second print cannot be indented more than the first print (Why? It would require a block like for ... : or if .. : to indent further.)
  • \n
\n

The first option, no indentation, prints out the value once per loop, that seems good

\n
[...]\n    else:\n        acc = acc + i\n    print(f'The value at {i} is {acc}')\n
\n

The second, prints out the value only during the else case, not otherwise.

\n
    else:\n        acc = acc + i\n        print(f'The value at {i} is {acc}')\n
\n

So that’s probably wrong, and we should take the first option. That leaves two options for the final print, no indentation, or at the same level as our first print statement. We can guess that we probably want to print out the final result of the loop, and that it should not be indented.

\n
data = [1, 3, 5, 9]\nacc = 0\nfor i in data:\n    if i < 4:\n        acc = acc + i * 2\n    else:\n        acc = acc + i\n    print(f'The value at {i} is {acc}')\nprint(f'The answer is {acc}')\n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-31", "source": [ "# This code accidentally lost it's indentation! Can you fix it?\n", "data = [1, 3, 5, 9]\n", "acc = 0\n", "for i in data:\n", "if i < 4:\n", "acc = acc + i * 2\n", "else:\n", "acc = acc + i\n", "print(f'The value at {i} is {acc}')\n", "print(f'The answer is {acc}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-32", "source": "
\n
Question: Trimming a FASTQ string
\n

Given a FASTQ string, and a list with quality scores, use break to print out just the good bit of DNA and it’s quality score.

\n
# We've got a Read\nread = \"\"\"\n@SEQ_ID\nGATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT\n+\n55CCF>>>>>>CCCCCCC65!''*((((***+))%%%++)(%%%%).1***-+*''))**\n\"\"\".strip().split('\\n')\n\ndef quality_to_percent(q):\n    return 100 * (1 - (10 ** (q / -10)))\n\nsequence = read[1]\nquality_scores = [ord(x) - 33 for x in read[3]]\n\nfor i in ... # TODO\n
\n
👁 View solution\n
\n

There are two ways to do this, one you might be able to guess, and one that might be new:

\n
    \n
  1. Loop over a range() using len(sequence). Since len(sequence) == len(quality_scores), when we access the Nth position of either, they match up.
  2. \n
  3. zip(sequence, quality_scores) will loop over both of these lists together. It produces a new list that looks like [['G', 20], ['A', 20], ['T', 34]].
  4. \n
\n
\n
👁 View solution\n
\n

The naïve solution is quite easy and readable:

\n
for i in range(len(sequence)):\n    if quality_scores[i] < 15:\n        break\n    print(f'Base {i} = {sequence[i]} with {quality_to_percent(quality_scores[i])}% accuracy')\n
\n

But we can make this a bit prettier using the zip() function:

\n
for base, score in zip(sequence, quality_scores):\n    if score < 15:\n        break\n    print(f'Base = {base} with {quality_to_percent(score)}% accuracy')\n
\n

But note that we don’t have the position in the list anymore, so we remove it from the print statement.

\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-33", "source": [ "# We've got a Read\n", "read = \"\"\"\n", "@SEQ_ID\n", "GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT\n", "+\n", "55CCF>>>>>>CCCCCCC65!''*((((***+))%%%++)(%%%%).1***-+*''))**\n", "\"\"\".strip().split('\\n')\n", "\n", "def quality_to_percent(q):\n", " return 100 * (1 - (10 ** (q / -10)))\n", "\n", "# Extract the sequence\n", "sequence = read[1]\n", "# And the quality scores, and map those to the correct values.\n", "quality_scores = [ord(x) - 33 for x in read[3]]\n", "\n", "# Write something here\n", "# That loops over BOTH the sequence and Quality Scores.\n", "# And prints them out\n", "# If the quality scores are `<15`, then break and quit printing.\n", "for i in ..." ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "python" ], "id": "" } } }, { "id": "cell-34", "source": "\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "cell_type": "markdown", "id": "final-ending-cell", "metadata": { "editable": false, "collapsed": false }, "source": [ "# Key Points\n\n", "- A *for loop* executes commands once for each value in a collection.\n", "- A `for` loop is made up of a collection, a loop variable, and a body.\n", "- The first line of the `for` loop must end with a colon, and the body must be indented.\n", "- Indentation is always meaningful in Python.\n", "- Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).\n", "- The body of a loop can contain many statements.\n", "- Use `range` to iterate over a sequence of numbers.\n", "- The Accumulator pattern turns many values into one.\n", "\n# Congratulations on successfully completing this tutorial!\n\n", "Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-loops/tutorial.html#feedback) and check there for further resources!\n" ] } ] }