Iteration using for in Python (Part 4)
Example: breaking a string into a list of words
Suppose we wish to transform a string into a list of words, using the blank space as a marker to show the divisions. For example, consider the following string:
"This is a list of words."
We would want our function to break the string up into the following list:
["This", "is", "a", "list", "of", "words."]
Strictly speaking, these aren't all words, as the last string has a period on the end. We define as a word anything that is separated off by a blank space.
This is actually close to a built-in string function .split()
.
Writing the function
def break_up(entry):
for char in entry:
Figuring out how to use iterations is easy: we look at each character in the input.
if not char.isspace():
else:
We then have two different courses of action, depending on what we see is a space or not.
word = word + char
If it is not a space, we add the character onto the current word we are building.
word = ""
We make sure to initialize the word to the empty string outside of the loop.
Otherwise, we may have come to the end of a word. I say “may” because it is possible that we have two blank spaces in a row, and hence no word to add to our list.
if len(word) > 0:
new = new + [word]
word = ""
So we first check to see if the length of the word is nonzero, and if so, we concatenate the list with the list containing the new word. We then reset word to the empty string so that we are ready to start creating the new word.
new = []
The list new needs to be initialized outside of the loop to the empty list.
return new
Finally, we return the list.
Putting all of the code together in the correct order gives the following:
def break_up(entry):
new = []
word = ""
for char in entry:
if not char.isspace():
word = word + char
else:
if len(word) > 0:
new = new + [word]
word = ""
return new
def test_break_up():
"""Tests correctness of break_up
"""
assert break_up("one") == ["one"]
assert break_up(" ") == []
assert break_up("one two three") \
== ["one", "two", "three"]
For our tests we include strings without any words and strings with multiple blanks in a row. We obtain the following output:
Traceback (most recent call last): File "<string>", line 23, in <module> File "<string>", line 18, in test_break_up AssertionError
We failed on the first assertion statement assert break_up("one") == ["one"]
(this was on line 18 when the code was run). Let's use print to see what is going on.
def break_up(entry):
new = []
word = ""
for char in entry:
if not char.isspace():
word = word + char
print(word)
else:
if len(word) > 0:
new = new + [word]
word = ""
return new
def test_break_up():
"""Tests correctness of break_up
"""
assert break_up("one") == ["one"]
assert break_up(" ") == []
assert break_up("one two three") \
== ["one", "two", "three"]
We add print(word)
in the first if
statement right after the assignment word = word + char
. Is word
being created correctly? We rerun the program to get the following:
o on one Traceback (most recent call last): File "<string>", line 23, in <module> File "<string>", line 18, in test_break_up AssertionError
Yes, word
is being created correctly.
def break_up(entry):
new = []
word = ""
for char in entry:
if not char.isspace():
word = word + char
print(word)
else:
if len(word) > 0:
new = new + [word]
print(new)
word = ""
return new
def test_break_up():
"""Tests correctness of break_up
"""
assert break_up("one") == ["one"]
assert break_up(" ") == []
assert break_up("one two three") \
== ["one", "two", "three"]
We add print(new)
in the second if
statement right after the assignment new = new + [word]
. Is word
being added to new
correctly? We rerun the program to get the following:
o on one Traceback (most recent call last): File "<string>", line 23, in <module> File "<string>", line 18, in test_break_up AssertionError
No, word
is not being added to new
correctly.
And now we can figure out why: a new word is added only when we see a blank space. But that doesn't work for the last word.
def break_up(entry):
new = []
word = ""
for char in entry:
if not char.isspace():
word = word + char
print(word)
else:
if len(word) > 0:
new = new + [word]
print(new)
word = ""
if len(word) > 0:
return new + [word]
else:
return new
def test_break_up():
"""Tests correctness of break_up
"""
assert break_up("one") == ["one"]
assert break_up(" ") == []
assert break_up("one two three") \
== ["one", "two", "three"]
We can fix this by checking whether word
is nonempty and if so, add it to the list. This updated code now produces the following output:
o on one o on one ['one'] t tw two ['one', 'two'] t th thr thre three
Example: selection sort
Suppose we want to sort a list of numbers by repeatedly finding the smallest number and moving it into position. Consider the example given:
The smallest number is 2, so we swap it with 3, and now have a smaller list on which to repeat the process.
3 is the smallest number in the smaller list, so we can swap it with the first element, in the smaller list, namely 12.
7 is the next smallest number, and gets swapped with 10.
As 10 is the next smallest, we swap it with 50.
Now we swap 12 with 50.
Now 20 is the smallest, and we can continue on until the entire list is sorted.
Writing the code
def swap(sequence, first, second):
temp = sequence[first]
sequence[first] = sequence[second]
sequence[second] = temp
To write our function, we'll use a helper function called swap
that swaps items in two given positions, which we call first
and second
. This is the same technique that we used earlier, to swap the first and middle elements of a list. That is, we first assign the first element to a temporary variable, assign the second element to where the first element was, and finally place the first element, which is in the temporary variable, to where the second element was.
def test_swap():
unsorted = [4, 7, 2, 4, 1, 3, 8]
swap(unsorted, 0, 5)
assert unsorted == \
[3, 7, 2, 4, 1, 4, 8]
test_swap()
To test swap
, we swap the 3 in position 0 with the 4 in position 5. Our assertion gives the changed list, with 3 replacing 4 and 4 replacing 3.
def pos_min(sequence):
best = sequence[0]
pos = 0
counter = 1
while counter < len(sequence):
if best > sequence[counter]:
best = sequence[counter]
pos = counter
counter = counter + 1
return pos
We will also use the function pos_min
for finding the position of a minimum element, which we wrote earlier.
Our function makes use of a while loop, where we consider each item in the sequence in turn, incrementing the variable counter
inside the loop, and setting it to 1.
Why didn't we set counter
to 0 like we usually do? This is because we initially set the minimum item so far, using the variable best
to be the item in position 0 in the sequence. The position of the item is stored in the variable pos
.
This means that in our loop we are always comparing the item in position counter
to best
, and if warranted, updating the variables best
and pos
.
After processing all the elements in the sequence, pos
will be the position of a minimum item in the sequence.
def selection(sequence):
for counter in range(len(sequence)):
We are now ready to write our main function selection
, using iteration to consider each position in the list being sorted.
pos = pos_min(sequence[counter:]) + counter
The function pos_min
is called on a suffix of the input list, that is, the part that hasn't yet been sorted. This gives us the position of the minimum element in the suffix of the list, but not its position in the entire, original list. For example, if the minimum element is in position 0 in the suffix, it will be in position counter
in the entire list. So we add counter
to the position to adjust positions in the suffix to positions in the entire list.
swap(sequence, counter, pos)
And now we can swap the first position in the unsorted part of the list, namely counter
, with the position of the minimum element.
def test_selection():
unsorted = [4, 7, 2, 4, 1, 3, 8]
selection(unsorted)
assert unsorted == \
[1, 2, 3, 4, 4, 7, 8]
test_selection()
We test our function by making sure that when we start with an unsorted list and then sort it, it will be transformed into a sorted list.