Structuring data in Python (Part 3)

Associative arrays in Python

Definition

A Python dictionary is an associative array.

Another item from our wish list is an associative array, which is called a dictionary in Python.

Caution

Computer scientists use dictionary in a slightly different way.

This is a potentially confusing term, as it has a different meaning in computer science, not to mention real life.

A dictionary stores key, value pairs.
Keys should be distinct.

It is like a dictionary in that you can look up information using a key. The key is like a word and the value is like a definition. A key should not be used for more than one entry, though a value can be reused.

Caution

Keys must be immutable. (You can use tuples.)

Unlike lists, tuples can serve as keys, since keys must be immutable.

Creating and using dictionaries

To create a dictionary, instead of using square brackets like for lists or parentheses like for tuples, we use curly brackets. Separated by commas, we list key value pairs, with the key and value separated by a colon. Using an example, we will illustrate the creation of a dictionary and it can be used.

unicode = {"a":97, "b":100, "c":99}

We create a dictionary.

print(unicode["b"])

We can use square brackets to specify a key, which results in a value being extracted, 100.

unicode["b"] = 98

Since dictionaries are mutable, we can also use the square brackets to specify a value that should be changed.

print(unicode)

This outputs {'b': 98, 'a': 97, 'c': 99}. Notice that the order in which pairs are printed may not be what you expect. Remember that although you list the pairs in some order, the order doesn't matter. It is only by the association between keys and values that you can extract information.

print(unicode.keys())
print(unicode.items())

There are methods to extract keys, as a list, or key-value pairs, as a list of tuples. These two print statements gives the outputs dict_keys(['b', 'a', 'c']) and dict_items([('b', 98), ('a', 97), ('c', 99)]), respectively.

print("a" in unicode)
print("d" not in unicode)

We can also use in and not in to determine whether or not a key exists in a dictionary. This is handy, as we'll see shortly. These two print statements both give True as outputs.

Here's the complete sequence of statements and their outputs:

unicode = {"a":97, "b":100, "c":99}
print(unicode["b"])
unicode["b"] = 98
print(unicode)
print(unicode.keys())
print(unicode.items())
print("a" in unicode)
print("d" not in unicode)

100
{'a': 97, 'c': 99, 'b': 98}
dict_keys(['a', 'c', 'b'])
dict_items([('a', 97), ('c', 99), ('b', 98)])
True
True

There's a special way of writing a for loop for a dictionary, which we'll demonstrate in an example.

def is_in(info, target):

Suppose we wish to write our own version of the in function in which we either return the value associated with a key or the string "Not found".

    for key, value in info.items():

We can assign names to both the key and value in a particular tuple in the list of tuples of key-value pairs, and use them separately in the loop body.

        if key == target:
            return value

We use the key to compare to our input and the value as what we return.

    return "Not found"

If we make it all the way through the loop without returning a value, we return the string "Not found".

def test_is_in():
    unicode = {"a":97, "b":100, "c":99}
    assert is_in(unicode, "a") == 97
    assert is_in(unicode, "z") == "Not found"
    assert is_in({}, "a") == "Not found"

test_is_in()

Now we can test our function, making sure to include both empty and nonempty dictionaries, and both possible outcomes.

Here's our function when all parts are put together:

def is_in(info, target):
    for key, value in info.items():
        if key == target:
            return value
    return "Not found"

def test_is_in():
    unicode = {"a":97, "b":100, "c":99}
    assert is_in(unicode, "a") == 97
    assert is_in(unicode, "z") == "Not found"
    assert is_in({}, "a") == "Not found"

test_is_in()

Here are the functions we used to create a dictionary, access a value, or change a value:

Descriptor	Function usage
Create	`info = {"yellow":6, \ "red":3,"blue":4}`
Access	`info["blue"]`
Update	`info["red"] = 103`

Here are the other functions we used:

Descriptor	Function usage
List of keys	`info.keys()`
List of tuples of key, item pairs	`info.items()`
In	`"red" in info`
Not in	`"orange" not in info`
Loop	`for k, v in info.items():`

Caution

Order of display of elements may not be obvious.

Notice that the focus is very much on the keys. This is an indication of the type of situation in which to use a dictionary. If your information is not structured around the keys, then a dictionary is not the right choice.

Example: using a dictionary to count

Here's an example of where a dictionary is very useful. We'll use keys to represent distinct strings we encounter in a list and values to represent how many we've seen so far. Then we'll figure out the maximum. That allows us to process each string in turn, updating the dictionary values as we go. We're going to assume that every string is a name of a colour, hence the name.

def max_colour(seq):

We define the header.

    counts = {}

We have to be sure to initialize the dictionary. After all elements in the input have been processed, we have a dictionary in which the names of the colours are the keys and the values are how many times they appear.

    for item in seq:

We use for loop to process each item in the list.

        if item in counts.keys():
             counts[item] = counts[item] + 1

There are two situations, depending on whether or not the current string is already in the dictionary. If it is, we just update the value, showing that we have seen that string yet again.

        else:
             counts[item] = 1

Otherwise we set the value to 1. Since item is not a key, this creates a new key-value pair with the key item and the value 1.

    most = 0
    best = "Empty"

We initialize the largest value to 0 and the colour to the string "Empty" for the case of an empty list.

    for colour, num in counts.items():

To figure out the maximum, we iterate through all key value pairs.

         if num > most:            
            most = num
            best = colour

If the value is greater than the largest number we've seen so far, we update the value of the largest number and the colour to return.

    return best

Finally, we return a colour that appears the maximum number of times.

def test_max_colour():
    colours = ["red", "blue", "green",\
    "green", "red", "yellow", "red"]
    assert max_colour(colours) == "red"
    assert max_colour([]) == "Empty"

test_max_colour()

In our tests, we consider both empty and non-empty lists as inputs.

Here's our code when all parts are put together:

def max_colour(seq):
    counts = {}
    for item in seq:
        if item in counts.keys():
             counts[item] = counts[item] + 1    
        else:
             counts[item] = 1
    most = 0
    best = "Empty"
    for colour, num in counts.items():
         if num > most:            
            most = num
            best = colour
    return best

def test_max_colour():
    colours = ["red", "blue", "green",\
    "green", "red", "yellow", "red"]
    assert max_colour(colours) == "red"
    assert max_colour([]) == "Empty"

test_max_colour()

Example: looping through a pair of lists

Our last example will not use dictionaries, but it will use a function that allows us to iterate over two sequences at the same time, just like we iterated over both keys and values. We will once again look at a list of salaries and a list of boosters, with the goal being to calculate the new salaries.

def change(salaries, boosters):

We define the header.

    result = []

We make sure that our new list is initialized.

    for item, adjust in zip(salaries, boosters):

Here we are using the zip function to join salaries and boosters. This allows us to iterate over both.

        result = result + [item * adjust]

We update our new list by concatenating a new element.

    return result

We return the output.

def test_change():
    """Tests correctness of change
    """
    assert change([],[]) == []
    assert change([2, 3], [4, 5]) == [8, 15]

test_change()

We test that our function works on both empty and non-empty pairs of lists.

Here's our function when all parts are put together:

def change(salaries, boosters):
    result = []
    for item, adjust in zip(salaries, boosters):
        result = result + [item * adjust]
    return result

def test_change():
    """Tests correctness of change
    """
    assert change([],[]) == []
    assert change([2, 3], [4, 5]) == [8, 15]

test_change()

You can loop through two sequences using

for one, two in zip(seq_one, seq_two):

Keep this in mind for functions on two lists or any other types of sequences.