Built-in functions in Python (Part 3)
Entering and displaying strings
We've already used strings a little bit. Now it is time for a closer look. You can actually use either double or single quotation marks, as long as you make sure what you use matches. For example, the following code
print("Double quotes")
print('Single quotes')
print('Single quotes ')
gives the output
Double quotes Single quotes Single quotes
Whatever you choose, the string is displayed without any marks, which may make two different strings appear the same, as when there are trailing blanks. So what if you want your string to contain a quotation mark? If you are using one type of quotation mark to mark the beginning and ending of the string, you can use the other kind as a character inside the string. For example, the following code
print('"Double" quotes')
print("'Single' quotes")
gives the output
"Double" quotes 'Single' quotes
Another option is to use a special sequence of characters, called an escape character. An escape character is a way of encoding a symbol in a way that reduces the ambiguity about its meaning. The two possible meanings for quotation marks are as markers at the beginning or end of a string or as a regular character inside a string. Each time the escape character for a quotation mark is used, you are saying to the computer “Use this as a regular character inside a string, not as the marker at the beginning or end of a string.” The escape character for the single quotation mark is a backslash followed by a single quotation mark, and the escape character for a double quotation mark is a backslash followed by a double quotation mark. For example, the following code
print("\"Double\" quotes")
print('\'Single\' quotes')
gives the output
"Double" quotes 'Single' quotes
There are escape characters for other symbols as well, each starting with a backslash. But then how to you put a backslash in a string? You guessed correctly, by using a escape character. The escape character for the backslash is two backslashs in a row. The code
print('A backslash \\ is here')
gives the output
A backslash \ is here
There is also an escape character for a new line. It consists of a backslash followed by the lower-case letter n, \n
. When this is appears, the computer starts printing on a new line. For example, the code
print('Using \'control\'\nfor a new line.')
gives the output
Using 'control' for a new line.
To determine the length of the string, we use the function len
. We've already found a few functions in which the name used in Python is a short form of the name we use in pseudocode. The following code
print(len('This is my string.'))
print(len('This is my string. '))
print(len(''))
print(len(' '))
print('This is my string'[1])
gives the output
18 19 0 1 h
Notice that putting two quotation marks right next to each other results in an empty string of length zero. This is not the same as leaving a blank space, as that string has length 1. Counting from zero, the character in position 1 is h.
For concatenation, Python uses operator overloading. That is, the same symbol used to add two numbers can be used to concatenate two strings, just like in our pseudocode. For example, the code
print('rain' + 'bow')
gives the output
rainbow
What happens when we try to use a string and a number? The code
print("high" + 5)
gives the error
Traceback (most recent call last): File "<string>", line 2, in <module> TypeError: Can't convert 'int' object to str implicitly
Not surprisingly, we get an error. Here, “Type” error means that after seeing a string, another string was expected after the symbol. Instead 5 was read, so it was considered to be the wrong type. What happens if we instead try to multiply a string and a number? This is what happened when we were trying to enter a number and multiply it: the code print("high" * 5)
gives the output highhighhighhighhigh. We are asking for 5 copies of “high” to be concatenated together.
Summary
Here is a summary of what we have seen.
How strings are entered:
Single or double quotation marks.
How strings are displayed:
No quotation marks.
There is a difference between how strings are entered, namely using quotation marks, and how they are displayed, without any.
Caution
Make sure that quotation marks match.
Remember that once you start a string with a particular kind of quotation mark, the computer is trying to match it to one of the same kind.
Caution
Blank spaces may become invisible.
Caution
Don't forget the empty string. It has length zero.
Not being able to see where the string starts and ends may mean that you overlook some blank spaces, or the entire empty string.
Escape characters
Some of the escape characters in Python are shown in the table below.
Special characters | Escape characters |
---|---|
single quotation mark | \' |
double quotation mark | \" |
backslash | \\ |
new line | \n |
Note that all start with a backslash to let the computer know that something unusual is happening. Another option is to use one kind of quotation mark inside another:
- Single quotation marks can be used inside doubles.
- Double quotation marks can be used inside singles.
Common string functions
Function | Pseudocode | Python |
---|---|---|
string length | length("string") | len("string") |
index | "string"[4] | "string"[4] |
concatenation | "rain" + "bow" | "rain" + "bow" |
repeated | "high" * 5 |
We've seen that for the functions we've seen before, they look pretty much like in pseudocode, except the length function which is written len
.
Caution
Concatenation and repeated concatenation use operator overloading.
We also looked at repeated concatenation. Like regular concatenation, it makes use of operator overloading.
Extracting parts of strings
Python uses what is called the slice operation to form prefixes, suffixes, and subtrings of a string.
The slice operation specifies the slicing points for a substring. For example, consider the string “caterpillar”, which has 11 characters. The slicing points range from 0 to 11 and can be visually placed in-between each character. For example, the first slice point, 0, is to the left of the first character “c”; second slice point, 1, is in-between the first and second characters, “c” and “ a ”. Likewise, the fifth slice point is located in-between the characters “r” and “p”, whose indices are 4 and 5, respectively. The ninth slice point is located in-between the characters “l” and “a”, whose indices are 8 and 9, respectively.
Then, 'caterpillar'[:5]
means to extract characters starting from the first position, 0, until one before the fifth position, which results in the substring “cater”. 'caterpillar'[9:]
means to extract characters starting from the ninth position until the last position , which results in the substring “ar”. 'caterpillar'[5:9]
means to extract characters starting from the fifth position until one before the ninth position, which results in the substring “pill”.
Part | Result | Python |
---|---|---|
prefix | cater | 'caterpillar'[:5] |
suffix | ar | 'caterpillar'[9:] |
substring | pill | 'caterpillar'[5:9] |
The number to the left of the colon means “cut here and discard everything to the left of the cut”. The number to the right of the colon means “cut here and discard everything to the right of the cut”. To form a prefix, we cut on the right to chop off characters at the end of the string, and to form a sufix, we cut on the left to chop off characters at the beginning of the string.
Caution
The slice [a:b] does not include the character in position b.
Pay attention to the relationship between the slicing points and the positions. The slice with cuts at positions a and b starts at position a and ends one before position b.
Negative indices and slice points
Let's again consider the string “caterpillar” to understand about negative indices and slice points.
The string “caterpillar” has 11 characters. As we have seen earlier, starting from the first character and moving rightwards, the indices 0, 1, 2, ... , 10 refer to the characters “c”, “a”, “t”, ... , “r”, respectively. In contrast, starting the from the last character and moving leftward, the indices -1, -2, -3, ... , -11 refer to the characters “r”, “a”, “l” , ... , “c”, respectively.
The slicing points range from 0 to 11 and can be visually placed in-between each character. The first and last slice points, 0 and 11, are locations to the left of the first character, “c”, and right of the last chracter, “r”, respectively. Starting from the left and moving rightwards, the slice points 1, 2, ... , 10 refer to the locations in-between the pairs of characters “c” and “a”, “a” and “t”, “t” and “e”, ... , “a” and “r”, respectively. In contrast, starting from the right and moving leftwards, the slice points -1, -2, -3, ... , -10 refer to the locations in-between the character pairs “a” and “r”, “l” and “a”, “l” and “l”, ... , “c” and “a”, respectively. The slice point -11 refers to the location left of the first character “c”.
There's a convenient way to avoid subtracting from the length of the string to figure out the indices of character near the end of the string. The last character can be extracted using index -1, the second to last using index -2, and so on. Similarly, there are optional other names for the slice points.
Mixing the names together is fine. All of these extract the same substring:
'caterpillar'[5:9]
'caterpillar'[-6:-2]
'caterpillar'[-6:9]
'caterpillar'[5:-2]
Caution
Negative 0 is the same as 0.