Iteration using for in Python (Part 4)

Example: breaking a string into a list of words

Suppose we wish to transform a string into a list of words, using the blank space as a marker to show the divisions. For example, consider the following string:

"This is a list of words."

We would want our function to break the string up into the following list:

["This", "is", "a", "list", "of", "words."]

Strictly speaking, these aren't all words, as the last string has a period on the end. We define as a word anything that is separated off by a blank space.

This is actually close to a built-in string function .split().

Writing the function

def break_up(entry):


    for char in entry:

Figuring out how to use iterations is easy: we look at each character in the input.

        if not char.isspace():


        else:

We then have two different courses of action, depending on what we see is a space or not.

            word = word + char

If it is not a space, we add the character onto the current word we are building.

    word = ""

We make sure to initialize the word to the empty string outside of the loop.

Otherwise, we may have come to the end of a word. I say “may” because it is possible that we have two blank spaces in a row, and hence no word to add to our list.

            if len(word) > 0:
                new = new + [word]

                word = ""

So we first check to see if the length of the word is nonzero, and if so, we concatenate the list with the list containing the new word. We then reset word to the empty string so that we are ready to start creating the new word.

    new = []

The list new needs to be initialized outside of the loop to the empty list.

    return new

Finally, we return the list.

Putting all of the code together in the correct order gives the following:

def break_up(entry):
    new = []
    word = ""
    for char in entry:
        if not char.isspace():
            word = word + char

        else:
            if len(word) > 0:
                new = new + [word]

                word = ""
    return new

def test_break_up():
    """Tests correctness of break_up
    """
    assert break_up("one") == ["one"]
    assert break_up("  ") ==  []
    assert break_up("one two   three") \
     == ["one", "two", "three"]

For our tests we include strings without any words and strings with multiple blanks in a row. We obtain the following output:

Traceback (most recent call last):
  File "<string>", line 23, in <module>
  File "<string>", line 18, in test_break_up
AssertionError

We failed on the first assertion statement assert break_up("one") == ["one"] (this was on line 18 when the code was run). Let's use print to see what is going on.

def break_up(entry):
    new = []
    word = ""
    for char in entry:
        if not char.isspace():
            word = word + char
            print(word)
        else:
            if len(word) > 0:
                new = new + [word]

                word = ""
    return new

def test_break_up():
    """Tests correctness of break_up
    """
    assert break_up("one") == ["one"]
    assert break_up("  ") ==  []
    assert break_up("one two   three") \
     == ["one", "two", "three"]

We add print(word) in the first if statement right after the assignment word = word + char. Is word being created correctly? We rerun the program to get the following:

o
on
one
Traceback (most recent call last):
  File "<string>", line 23, in <module>
  File "<string>", line 18, in test_break_up
AssertionError

Yes, word is being created correctly.

def break_up(entry):
    new = []
    word = ""
    for char in entry:
        if not char.isspace():
            word = word + char
            print(word)
        else:
            if len(word) > 0:
                new = new + [word]
                print(new)
                word = ""
    return new

def test_break_up():
    """Tests correctness of break_up
    """
    assert break_up("one") == ["one"]
    assert break_up("  ") ==  []
    assert break_up("one two   three") \
     == ["one", "two", "three"]

We add print(new) in the second if statement right after the assignment new = new + [word]. Is word being added to new correctly? We rerun the program to get the following:

o
on
one
Traceback (most recent call last):
  File "<string>", line 23, in <module>
  File "<string>", line 18, in test_break_up
AssertionError

No, word is not being added to new correctly.

And now we can figure out why: a new word is added only when we see a blank space. But that doesn't work for the last word.

def break_up(entry):
    new = []
    word = ""
    for char in entry:
        if not char.isspace():
            word = word + char
            print(word)
        else:
            if len(word) > 0:
                new = new + [word]
                print(new)
                word = ""
    if len(word) > 0:
        return new + [word]
    else:
        return new

def test_break_up():
    """Tests correctness of break_up
    """
    assert break_up("one") == ["one"]
    assert break_up("  ") ==  []
    assert break_up("one two   three") \
     == ["one", "two", "three"]

We can fix this by checking whether word is nonempty and if so, add it to the list. This updated code now produces the following output:

o
on
one
o
on
one
['one']
t
tw
two
['one', 'two']
t
th
thr
thre
three

Example: selection sort

Suppose we want to sort a list of numbers by repeatedly finding the smallest number and moving it into position. Consider the example given:

An indexed list of numbers. 2 is moved to index 0 and 3 is moved to index 5. Indices 1 to 9 still need to be sorted.

The smallest number is 2, so we swap it with 3, and now have a smaller list on which to repeat the process.

An indexed list of numbers. 3 is moved to index 1 and 12 is moved to index 5. Indices 2 to 9 still need to be sorted.

3 is the smallest number in the smaller list, so we can swap it with the first element, in the smaller list, namely 12.

An indexed list of numbers. 7 is moved to index 2 and 10 is moved to index 4. Indices 3 to 9 still need to be sorted.

7 is the next smallest number, and gets swapped with 10.

An indexed list of numbers. 10 is moved to index 3 and 50 is moved to index 4. Indices 4 to 9 still need to be sorted.

As 10 is the next smallest, we swap it with 50.

An indexed list of numbers. 12 is moved to index 4 and 50 is moved to index 5. Indices 5 to 9 still need to be sorted.

Now we swap 12 with 50.

Now 20 is the smallest, and we can continue on until the entire list is sorted.

Writing the code

def swap(sequence, first, second):
    temp = sequence[first]
    sequence[first] = sequence[second]
    sequence[second] = temp

To write our function, we'll use a helper function called swap that swaps items in two given positions, which we call first and second. This is the same technique that we used earlier, to swap the first and middle elements of a list. That is, we first assign the first element to a temporary variable, assign the second element to where the first element was, and finally place the first element, which is in the temporary variable, to where the second element was.

def test_swap():
    unsorted = [4, 7, 2, 4, 1, 3, 8]
    swap(unsorted, 0, 5)
    assert unsorted == \
    [3, 7, 2, 4, 1, 4, 8]

test_swap()

To test swap, we swap the 3 in position 0 with the 4 in position 5. Our assertion gives the changed list, with 3 replacing 4 and 4 replacing 3.

def pos_min(sequence):
    best = sequence[0]
    pos = 0
    counter = 1
    while counter < len(sequence):
        if best > sequence[counter]:
            best = sequence[counter]
            pos = counter
        counter = counter + 1
    return pos

We will also use the function pos_min for finding the position of a minimum element, which we wrote earlier.

Our function makes use of a while loop, where we consider each item in the sequence in turn, incrementing the variable counter inside the loop, and setting it to 1.

Why didn't we set counter to 0 like we usually do? This is because we initially set the minimum item so far, using the variable best to be the item in position 0 in the sequence. The position of the item is stored in the variable pos.

This means that in our loop we are always comparing the item in position counter to best, and if warranted, updating the variables best and pos.

After processing all the elements in the sequence, pos will be the position of a minimum item in the sequence.

def selection(sequence):
    for counter in range(len(sequence)):

We are now ready to write our main function selection, using iteration to consider each position in the list being sorted.

        pos = pos_min(sequence[counter:]) + counter

The function pos_min is called on a suffix of the input list, that is, the part that hasn't yet been sorted. This gives us the position of the minimum element in the suffix of the list, but not its position in the entire, original list. For example, if the minimum element is in position 0 in the suffix, it will be in position counter in the entire list. So we add counter to the position to adjust positions in the suffix to positions in the entire list.

        swap(sequence, counter, pos)

And now we can swap the first position in the unsorted part of the list, namely counter, with the position of the minimum element.

def test_selection():
    unsorted = [4, 7, 2, 4, 1, 3, 8]
    selection(unsorted)
    assert unsorted == \
    [1, 2, 3, 4, 4, 7, 8]

test_selection()

We test our function by making sure that when we start with an unsorted list and then sort it, it will be transformed into a sorted list.