# Types

## Introduction

A data type defines a set of values and a collection of operations that may be performed.

Types are very important to programming languages. They allow us to abstract away the fact that all memory is made of bits.

The type system for a language answers the following questions:

• What types are built in?
• How are types converted?
• How are new types created?
• When are two types considered the same?

## Primitives

Primitives are simple data types consisting of only one value. They normally correspond to a single memory cell on a machine.

##### Integers

Integers come in different sizes which can store different ranges of numbers:

 Size Typical Name Signed Range 8 char or byte -128 to 128 16 short -32768 to 32767 32 int or word -2147483648 to 2147483647 64 long or double word -9223372036854775808 to 9223372036854775807

Some languages define the size of integer primitives (Java) and some do not (C and C++). Some languages have no limits on the size of integers (Python).

Some languages allow unsigned numbers which can be twice as large.

Signed numbers are normally implemented with two's complement notation.

##### Floating Point

Floating point data types model real numbers. They only store approximations.

There are different ways to store floating point numbers. The modern standard is IEEE 754

IEEE 754 floating point numbers store a number called the significand (or mantissa) and an exponent. They also store the sign of the number as one bit.

 Bits Name Significand Bits Exponent Bits 32 Single Precision 23 8 64 Double Precision 52 11
##### Fixed Point

Fixed point numbers are those with a fixed number of decimal positions that can be represented exactly in memory. They are implemented using integers.

##### Characters

Characters represent text characters such as letters, digits or symbols. Because computers can only store numbers, an encoding us used to store characters. ASCII is traditional, but lacks support for non-Western languages.

Unicode has supplanted ASCII as the standard character encoding. Many languages can use both ASCII and Unicode.

##### Booleans

The boolean data type can store one of two values: true or false. They are typically implemented using a byte of memory.

## Conversions

Languages handle type conversions in different ways:

• Automatic.
• Casts.
• Conversion functions.

Types can be coerced or converted. A coercion means that you treat the type as a different one. A conversion means that you actually change it.

## Arrays

Arrays store multiple elements of another data type. The individual elements are accessed with a numerical index.

• All elements in an array must be the same type.
• Indexing can start at 1 or 0.
• Indices can be range-checked.
• Some languages support slices.

Arrays are implemented with a contiguous block of memory. If the array is indexed from 0, the address of individual elements is calculated with:


address = start + (index * size-of-element);


Arrays of arrays, or multi-dimensional arrays, are also supported by most languages. This can be done row-major or column-major.

Some languages allow for arrays to be initialized directly:


int list [] = {1, 3, 5, 7, 9};
list = [x for x in range(0, 10) if x % 2 == 1]


Array operations include:

• Indexing.
• Assignment.
• Concatenation.
• Matrix / vector operations.

## Strings

String types represent a sequence of characters as text.

Strings can be implemented as built-in types, simple arrays of characters, or objects.

Strings can be mutable, or immutable.

Common string operations include:

• Assignment.
• Comparison.
• Concatenation.
• Substring.
• Pattern matching.

## Enumerations

An enumeration is a type where the programmer specifies all possible values. In C-based languages, this looks like:


enum Day {
Monday,
Tuesday,
Wednesday,
Thursday,
Friday,
Saturday,
Sunday
};


They are typically implemented with integers. Some languages allow them to be coerced to/from integers and some do not.

In ML based languages, enumeration types can aid pattern matching:


type day = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday

match d with Monday -> // code
| Tuesday -> //code
...


This will give an error if we do not handle every enumerated value.

Many newer languages such as Python and Ruby do not include enumeration types.

Some languages, such as Pascal and Ada, have subrange types which allow us to specify ranges of integers as valid values:


subtype index is Integer range 1..1000;


When a variable of type "index" is declared, it will be checked to ensure it is within that range.

## Data Structures

• Associative Arrays

An associative array is one that maps one type of data to another. For example an associative array can map people's names to phone numbers or book ISBN's to titles etc.

Associative arrays are also called hashes, maps, or dictionaries.

• Tuples

A tuple is similar to a record except that instead of accessing the individual fields by name, they are accessed by position. Unlike arrays and lists, tuples are typically used with an exact, known number of elements. They can be used for quickly grouping multiple items into a single type.

• Lists

Lists represent linked lists. Some languages implement only arrays or linked lists natively and leave the other to libraries. Some languages (such as Ocaml) implement both natively.

## Type Checking

Type checking is the process of making sure that variables and values are used consistently with the type of data they store. Type checking is also extended to functions and expressions.

A type error is a situation where a variable or value is used with an operation that is not supported for that type. The following is a type error in C:


float x = 5.0;
char* name = "Bob";
int comb = x + name;

x.c:8:16: error: invalid operands to binary + (have float and char *)


A language is called strongly typed if it will never allow invalid operations to take place. C is weakly typed, because we can make the compiler do operations like the above:


float x = 5.0;
char* name = "Bob";
int comb = x + (float)((int)name);


Strong/weak typing is not the same as static/dynamic typing:

 Static Dynamic Strong Java, Haskell Python, Ruby Weak C, C++ Perl, Shell languages

## Type Equivalence

Type equivalence refers to the rules on whether deciding two types are equivalent. Consider this code:


struct A {
int x;
float y;
};

struct B {
int t1;
float t2;
};

void print(A a) {
printf("%d, %f\n", a.x, a.y);
}

int main() {
B b;
print(b);
}


Should this code be allowed?

This depends on the type equivalence rules of the language:

• Nominative

Nominative equivalence means that the names of the types must match. This means the above code would not be valid.

• Structural

Structural equivalence means that the structure of types must match. This means would be valid.

Most languages use nominal equivalence. Some exceptions:

• Ocaml uses structural equivalence in its object system.
• Go uses structural equivalence with its interfaces.
• C++ uses structural equivalence when expanding templates. Java generics use nominative equivalence.

C is nominally typed for most constructs (including structs), but its "typedefs" are just aliases so they are structurally typed.

## Type Systems

A languages type system is composed of the following things:

• What data types are available.
• What operations are available for each type.
• How new data types are created.
• The rules for converting and coercing types.
• Whether it is strongly or weakly typed.
• Whether it is dynamically or statically typed.
• What rules are used for type equivalence.

The type system of a language is probably the most important factor in a languages design.