Built-in functions (Part 4)

Entering and displaying strings

Let's look in more detail at strings. Remember that the format you use to enter, display, and store data can differ. In most languages, you use single or double quotation marks to indicate to the computer that what you are typing (entering) is a string. For example, "This is a string." and 'This is a string.'

The way that the computer displays the string may differ, using a different type of quotation mark or none at all, depending on the language. Three possible examples are "This is a string.", 'This is a string.', and This is a string.

How strings are entered

Often using single or double quotes

"This is a string."

'This is a string.'

 

How strings are displayed

Language dependent

"This is a string."

'This is a string.'

This is a string.


But if a quotation mark indicates the beginning or end of a string, how can you have a string that contains a quotation mark? In other words, how can you add a quotation mark to a string? Languages have special ways of encoding special characters. These are called "escape characters".

Definition

An escape character is a way to encode special characters such as quotation marks and new line markings.

For example, in the string "She said \'Hello\' to me.", we have used the backslash, \, as an escape character to encode the single quote.


Examples of functions on strings

We saw earlier that since a string is a sequence of characters, there is typically a length function to determine the number of characters in the sequence. For example, we can get the length of the string "caterpillar" using length("caterpillar"), which gives 11. We can also specify a particular character by giving its index, that is, its position in the string.

An index is a position of a character in a string, where the position is measured using integers starting from 0

c

0

a

1

t

2

e

3

r

4

p

5

i

6

l

7

l

8

a

9

r

10

In this example, the character in position 4 is "r". Writing using notations, "caterpillar"[4] is the letter "r". Did you notice something unusual in the way each character in the string is indexed?

Counting from zero

For historical reasons, computer scientists count from zero. So don't be surprised to find information being numbered starting from zero. There are actually quite a few advantages to this way of counting, but for now I'll just mention what might trip you up.

Caution

  • The first character is in position 0, not in position 1.
  • Consequently, the position of the last character is one less than the length.

Substrings

It is also useful to be able to extract parts of strings or substrings.

Removing zero or more characters from a string (beginning, end, or both) leaves a substring. A substring that is formed by removing zero or more characters from the end of the string is called a prefix. A substring that is formed by removing zero or more characters from the beginning of the string is called a suffix.

For the string "caterpillar", the substring

  • "cat" is a prefix and a substring,
  • "pillar" is a suffix and a substring,
  • "ill" is a substring that is neither a prefix nor a suffix,
  • "rpi" is a substring that is neither a prefix nor a suffix, and
  • "caterpillar" is a prefix and a suffix and a substring

Notice that a substring doesn't have to be a word in English (or any other language, for that matter). Also, notice that the definitions imply that a string is its own substring, prefix, and suffix.


Tricky parts about using strings

The following are important when using a string:

  • Characters in a string are counted starting from zero.
  • Blank spaces are counted in the length of a string.
  • There exists a string with no characters, namely the empty string.
  • Removing zero characters means that a string is its own substring, prefix, and suffix.
  • Sometimes, symbols used for mathematical operations are also used for strings, but with different meanings.

Example: a simple code

We'll look at a series of examples in which we write a string in code. This first one is very simple: all we do is exchange the first character with the middle character. We can think of the string as being in four pieces, where we keep the second and fourth pieces as is, but exchange the first and third, which are the first and middle characters.

Forming the new string

We create our coded string by concatenating the four pieces in this order:

  1. middle character,
  2. substring from second to one before the middle,
  3. first character, and
  4. substring from one after the middle to the last.

Notice that it relies on knowing the index of the middle character: the first piece is the middle character itself, the second is a substring ranging from the second character to one before the middle character, the third is the first character, and the fourth is the substring from one after the middle character to the last character.


Finding the middle character

Can we find the position of the middle character by dividing the length of the string by 2? For example, consider the two strings "wolf" and "cat".

w

0

o

1

l

2

f

3

If the length of the string is even, there are two characters in the middle. This gives the second of the two, which is good enough. In this example, length("wolf")/2 gives 2, which is the position of the letter "l".

c

0

a

1

t

2

If the length of the string is odd, though, dividing the length by 2 does not result in an integer. In this example, length("cat")/2 gives 1.5. Taking the floor of the result might be a good idea. We will finish this example when we've learned the syntax of a programming language and then try many more.