Building better programs in Python (Part 1)
Function documentation
Most information about building better programs isn't language-specific. However, there are a few Python-specific practices worth mentioning. One relates to comments. When used at the beginning of a line, using two number signs, ##
, is more readable than one. It is also worth remembering that the number sign, #
, inside a string is not interpreted as a comment, which is probably what you would expect.
Python convention
A docstring in Python is put right after the function header.
- Multiple lines are fine.
- Use three double quotes to start and end.
Back when we were looking at built-in functions, we printed docstrings to obtain more information. Now that we are creating our own functions, we can create docstrings for them. The string can be as long as you want going over multiple lines. This is accomplished by using triple quotes to mark the start and end.
As with built-in functions, to see the docstring for a user-defined function, we give the name followed by a dot, followed by two underscores, "doc", and finally two underscores:
print(function_name.__doc__)
Conventions for the docstring
According to Python conventions, the docstring consists of two parts, namely, the summary line and the details, separated by a blank line. There is a also a convention indicating where the triple quotes should be placed. For readability, the ending triple quote appears on a line by itself:
- Summary line starting with triple quote
- Blank line
- Details
- Ending triple quote alone on a line
The summary line starting with a triple quote is a statement of what the function does and how the inputs and outputs are related to each other:
- States what the function does
- Relates input and output (if any)
The details give preconditions, parameters, and postconditions:
- Preconditions (data type of input, restrictions)
- Parameters with meaning
- Postconditions (data type of output, side effects)
The preconditions specify data types, the parameters explain meaning, and the postconditions specify the data type of the output, if any, and side effects, if any. The preconditions also give any restrictions on data, such as whether strings are nonempty, integers positive, and numbers nonzero. The following are some example of restrictions on data:
- nonempty strings
- positive integers
- nonzero numbers
Caution
The docstring is not enforced by the computer.
Keep in mind that the docstring is like a contract between the user of the function and the writer of the function. The function is only guaranteed to work if the preconditions are met. The computer does not play the role of enforcer.
Example: making sandwiches
For our first example of a docstring, we consider a very simple function. Determine the number of sandwiches that can be made given a bread of particular width and length and a specified number of jars of sandwich spread. The spread should be 1 cm thick. The volume of each jar is 100 cubic cm. The steps involved in writing the function is as follows:
- Compute the volume for one sandwich.
- Compute the total volume of jar contents.
- Compute the total sandwiches.
Writing the docstring
As usual, we use comments to indicate each step mentioned earlier as we write the program.
import math
THICKNESS = 1.0 ## in cm, thickness of spread
JAR_VOLUME = 100.0 ## in cubic cm, volume of a jar
We use two constants, specifying the thickness of spread desired and the volume contained in each jar.
def sandwiches(width, length, jars):
The header specifies the names of the parameters, which we'll need to write our docstring.
"""Determines the number of sandwiches that
can be made using bread of size
width x length and jars number of jars.
The summary mentions the parameters by name and explains how the output, namely the number of sandwiches that can be made, relates to the size of the bread and the number of jars.
Preconditions:
width: int or float with value > 0
length: int or float with value > 0
jars: int with value >= 0
As preconditions, we specify the data types and restrictions. We are not allowing bread to have a dimension of zero, but we don't care whether the width or length is an integer or a floating point number. The number of jars must be a nonnegative integer.
Parameters:
width: the width of a piece of bread in cm
length: the length of a piece of bread in cm
jars: the number of jars provided
In the parameters section, we specify what each parameter means. Since our dimensions are measured in centimeters, we specify that here. This will allow a user to understand how to use the function properly.
Returns: int number of sandwiches
"""
Finally, we indicate that the number of sandwiches returned should be an integer.
## Compute volume for one sandwich
volume_one = width * length * THICKNESS
We first compute the volume of a single sandwich. Since THICKNESS
is a floating point number, volume_one
will be too. More importantly, since width
, length
, and THICKNESS
are all nonzero, so is volume_one
.
## Compute total volume of jar contents
volume_total = jars * JAR_VOLUME
We also compute the total volume of all the jars of spread.
## Compute total sandwiches
return math.floor(volume_total / volume_one)
Then, compute the number of sandwiches. Notice that we don't have to worry about dividing by zero. That is, if the user pays attention to the docstring.
print(sandwiches(0, 0, 1))
We add a test case.
Here's our function when all parts are put together:
import math
THICKNESS = 1.0 ## in cm, thickness of spread
JAR_VOLUME = 100.0 ## in cubic cm, volume of a jar
def sandwiches(width, length, jars):
"""Determines the number of sandwiches that
can be made using bread of size
width x length and jars number of jars.
Preconditions:
width: int or float with value > 0
length: int or float with value > 0
jars: int with value >= 0
Parameters:
width: the width of a piece of bread in cm
length: the length of a piece of bread in cm
jars: the number of jars provided
Returns: int number of sandwiches
"""
## Compute volume for one sandwich
volume_one = width * length * THICKNESS
## Compute total volume of jar contents
volume_total = jars * JAR_VOLUME
## Compute total sandwiches
return math.floor(volume_total / volume_one)
print(sandwiches(0, 0, 1))
When we run the code, we get the following error:
Traceback (most recent call last): File "<string>", line 32, in <module> File "<string>", line 30, in sandwiches ZeroDivisionError: float division by zero
This error is generated since the user ignores the docstring and enters a zero. We're lucky that in this case ignoring the docstring happened to result in an error. In another situation, ignoring the docstring could result in the function running and an incorrect answer being determined. Remember, the postconditions in a docstring are only guaranteed when the preconditions are met. Replacing the last print
statement in the code, print(sandwiches(0, 0, 1))
, with
print(sandwiches.__doc__)
and running the code, we can print our docstring:
Determines the number of sandwiches that can be made using bread of size width x length and jars number of jars. Preconditions: width: int or float with value > 0 length: int or float with value > 0 jars: int with value >= 0 Parameters: width: the width of a piece of bread in cm length: the length of a piece of bread in cm jars: the number of jars provided Returns: int number of sandwiches
Example: paper for a photo
Here is a docstring for the function papersize, which consumes two numbers, namely the dimensions of a photo, and produces the name of the smallest size of photo paper that can be used to print a photo of that size:
def paper_size(small, big):
"""Determines the smallest size of photo paper
for a photo of size small x big.
Preconditions:
small: int or float with value > 0
big: int or float with value > 0
Parameters:
small: smaller dimension of photo, in pixels
big: bigger dimension of photo, in pixels
The two dimensions can be equal.
Returns: string paper size or "Too big"
"""
The first two lines of the docstring give the summary, which mentions the parameters by name and indicates how they are related to the output. Next are the preconditions, which state that both small and big are positive numbers, where small is no greater than big. Integers and floating point numbers are both fine. In the section on parameters, the meaning of the two parameters is given as well as the fact that the measurements are in pixels. Although the names small
and big
are used, here we note that the values can be equal. Finally, the last line shows that the value returned will be a string, either the size of a paper or an indication that the photo is too big. In a moment we'll look at the names of some standard sizes of paper.
Assessing existing code
The following are some paper sizes:
Name | Size |
---|---|
2R | 600 × 900 |
3R | 1050 × 1500 |
4R | 1200 × 1800 |
5R | 1500 × 2100 |
Here is a possible function body giving a series of conditions:
if big > 2100 or small > 1500:
return("Too big")
elif big > 1800 or small > 1200:
return("Use size 5R")
elif big > 1500 or small > 1050:
return("Use size 4R")
elif big > 900 or small > 600:
return("Use size 3R")
else:
return("Use size 2R")
There are five possible outcomes and four conditions. Is the correlation between the conditions and outcomes clear?
Applying the checklist
Better program checklist
- Are types and meanings of all variables and constants clear? This seems fine for our code.
- Are there any “magic” numbers? There are lots of numbers used without meaning. We should fix the way these “magic” numbers are processed.
- Is the organization and use of blank lines clear? Since our function consists of just branching, the organization is reasonable.
- Are the steps indicated using comments? The conditions and outcomes should be short enough that comments aren't needed, provided that it is clear how they are related.
- Do functions have comments explaining the purpose, preconditions, and postconditions? Our docstring satisfies this condition.
- Are there helper functions for repeated and other tasks? There are no obvious repeated tasks in this example. One could consider using a helper function for each condition in the branching, though we'll opt not to do so, reasoning that those are not functions likely to be reused in other programs.
- Are there more parameters that could be added to a function? There is also no obvious use of parameters.
So we're left looking at “magic” numbers.
Improving the code by using constants
We can introduce constants for the dimensions of the four different sizes of paper and rewrite the code using them:
SMALL_2R = 600 # 2R paper small side in pixels
BIG_2R = 900 # 2R paper big side in pixels
SMALL_3R = 1050 # 3R paper small side in pixels
BIG_3R = 1500 # 3R paper big side in pixels
SMALL_4R = 1200 # 4R paper small side in pixels
BIG_4R = 1800 # 4R paper big side in pixels
SMALL_5R = 1500 # 5R paper small side in pixels
BIG_5R = 2100 # 5R paper big side in pixels
Code using constants
We rewrite the code using the constants defined:
def paper_size(small, big):
"""Docstring here
"""
if big > BIG_5R or small > SMALL_5R:
return("Too big")
elif big > BIG_4R or small > SMALL_4R:
return("Use size 5R")
elif big > BIG_3R or small > SMALL_3R:
return("Use size 4R")
elif big > BIG_2R or small > SMALL_2R:
return("Use size 3R")
else:
return("Use size 2R")
Notice that I've left a placeholder for the docstring so that we can see all the branching steps at once. Is this clear? Well, maybe.
Improving the code by adding clarity
But if we change the order of the cases, we can use conditions for which the correlation with the cases is much clearer:
def paper_size(small, big):
"""Docstring here
"""
if big <= BIG_2R and small <= SMALL_2R:
return("Use size 2R")
elif big <= BIG_3R and small <= SMALL_3R:
return("Use size 3R")
elif big <= BIG_4R and small <= SMALL_4R:
return("Use size 4R")
elif big <= BIG_5R and small <= SMALL_5R:
return("Use size 5R")
else:
return("Too big")