Structuring data (Part 1)

Structuring data

In studying sequences and objects, we've seen two ways of connecting values, in the one case by providing a grouping and an ordering and the other by providing a grouping and functions.

Game board showing its positions. — Images: Nils Prause/iStock/Thinkstock

Wish list

Store items in sequences without mutation.
Organize data using values other than numbers.
Encode information in a grid, such as an image or game board.

What other ways of connecting values might we wish to have? Using a sequence as a starting point, what modifications might we make? One modest wish might to be able to store items in a sequence that is immutable. If we have a sequence of words, we can look up a word by its position in the sequence. But what if we wanted to look up a value using information other than a number? Sequences work well for a list of data, but what if we have information that naturally appears as a grid, like the pixels in an image or positions on a game board?

Organizing data using values other than numbers

Pizza with toppings. — Images: foodandstyle/iStock/Thinkstock

The following are two examples of looking up values using data other than numbers.

Translation from words to numbers:
Instead of looking up a word using a position, which is a number, one might want to look up a number using a word.
Looking up costs of pizza toppings:
There is also the example we looked at long ago, namely of storing the costs of pizza toppings. We might want to use the names of toppings to look up prices.

Encoding information in a grid

A 3 by 4 grid with all squares in the first row, third square in the second row, and the second square of the third and forth rows coloured in black.

Looking in more depth about how to store information in a grid, we consider a very simple image made of black and white pixels, as shown in the figure. The figure consists of a 3 by 4 grid with all squares in the first row, third square in the second row, and the second square of the third and forth rows coloured in black. Each row can be viewed as a sequence of pixels. Sequences can be elements of sequences. To represent the entire image, we can use this fact that elements of sequences can themselves be sequences. So then starting with our rows as sequences, using 1 for black and 0 for white, we can represent the entire image as a sequence of row sequences, going from top to bottom. We store each row in a sequence: image ← [[1, 1, 1], [0, 0, 1], [0, 1, 0], [0, 1, 0]]

We can extract a particular value by using indices twice: print(image[1][2])

Note that image[1] extracts the sequence [0, 0, 1], and then index 2, image[1][2], extracts 1.

Ways to structure data

Bundle values (sequences, objects)
Order values (sequences)
Nest bundles (objects as attributes of objects, sequences of sequences, sequences of objects)
Connect values in multiple ways
Allow changes

As we've seen, ways to structure data include bundling of values, imposing order, nesting, like in the images example where one kind of structure is nested inside another, and allowing multiple ways of values being connected with each other. We can also allow different kinds of changes to what is stored.

Associative arrays

Definition

An associative array allows you to use strings instead of numbers for indices.

One common way of structuring data is using associative arrays, which are a modification on a sequence where arbitrary strings can be used to access elements.

Sequence: index is used to access its elements. — Fig. 1

Associative list: arbitrary strings are used to access its elements. The indices associated with sequences are replaced with arbitrary strings in an associative list. — Fig. 2

For example, as shown in figure 2, consider an associative list called “table”. The associative list contains addresses with labels “zero”, “one”, “two”, “three”, and “four”, which points to addresses with stored values 0, 1, 2, 3, and 4, respectively. Here I've made the values stored in table be numbers, but they really could be any kind of data. In contrast, a sequence called "numbers" is shown in figure 1. It consists of addresses indexed 0, 1, 2, 3, and 4, each of which point to addresses containing the stored values “zero”, “one”, “two”, “three”, and “four”, respectively.

Caution

Not all languages support associative arrays.

Be careful though, not all programming languages support this way of structuring data.

Other data structures

An example of a complex data structure that we will not consider in this course.

The figure shows another example of a way data can be structured, something we won't be considering in the course. There is a lot we won't be considering as this topic is an area of active reserach.

Optional information

Data structures are a subject of research.

How much storage space is needed?
How much time is needed to access an item?
How much time is needed to change an item?

Researchers might ask how much space is needed to store information in a particular way, how much time is required to extract a value, or to change a value.

Sources for structuring data

The following are sources for structuring data.

Use built-ins:
If you have data that you wish to organize, one option is to use a built-in type of structure, like a sequence or an object.
Build your own:
You can use built-ins like building blocks to make your own.
Use libraries:
For many languages, there are publically available archives of code that structure data in many different ways.

Choosing among data structures

With all these options, how can you choose what to use?

Choosing between objects and sequences:

Is order important?
Since both objects and sequences group data, sometimes either one can be used. Ask yourself if order is important. If so, use a sequence.
Is a loop useful?
If you want to use a loop on the data, for many languages a sequence is a better choice.
Is it clearer to give names to items?
If code will be clearer when meaningful names are used for various data items, such as “radius” and “colour”, then an object is a better choice.
Are special functions needed?
Creating methods will be useful if that makes the code clearer.

More generally, there is usually a range of options, going from simple and easy to use choices with limited features to complicated choices with lots of features.

Guiding principle

Use the simplest choice that can get the job done.

You've already figured out how to choose using the simplest choice that can get the job done.