Home CPSC 401

Types

Introduction

A data type defines a set of values and a collection of operations that may be performed.

Types are very important to programming languages. They allow us to abstract away the fact that all memory is made of bits.

The type system for a language answers the following questions:


Primitives

Primitives are simple data types consisting of only one value. They normally correspond to a single memory cell on a machine.

Integers

Integers come in different sizes which can store different ranges of numbers:

SizeTypical NameSigned Range
8char or byte-128 to 128
16short-32768 to 32767
32int or word-2147483648 to 2147483647
64long or double word-9223372036854775808 to 9223372036854775807

Some languages define the size of integer primitives (Java) and some do not (C and C++). Some languages have no limits on the size of integers (Python).

Some languages allow unsigned numbers which can be twice as large.

Signed numbers are normally implemented with two's complement notation.

Floating Point

Floating point data types model real numbers. They only store approximations.

There are different ways to store floating point numbers. The modern standard is IEEE 754

IEEE 754 floating point numbers store a number called the significand (or mantissa) and an exponent. They also store the sign of the number as one bit.

BitsNameSignificand BitsExponent Bits
32Single Precision238
64Double Precision5211
Fixed Point

Fixed point numbers are those with a fixed number of decimal positions that can be represented exactly in memory. They are implemented using integers.

Characters

Characters represent text characters such as letters, digits or symbols. Because computers can only store numbers, an encoding us used to store characters. ASCII is traditional, but lacks support for non-Western languages.

Unicode has supplanted ASCII as the standard character encoding. Many languages can use both ASCII and Unicode.

Booleans

The boolean data type can store one of two values: true or false. They are typically implemented using a byte of memory.


Conversions

Languages handle type conversions in different ways:

Types can be coerced or converted. A coercion means that you treat the type as a different one. A conversion means that you actually change it.


Arrays

Arrays store multiple elements of another data type. The individual elements are accessed with a numerical index.

Arrays are implemented with a contiguous block of memory. If the array is indexed from 0, the address of individual elements is calculated with:


address = start + (index * size-of-element);

Arrays of arrays, or multi-dimensional arrays, are also supported by most languages. This can be done row-major or column-major.

Some languages allow for arrays to be initialized directly:


int list [] = {1, 3, 5, 7, 9};
list = [x for x in range(0, 10) if x % 2 == 1]

Array operations include:


Strings

String types represent a sequence of characters as text.

Strings can be implemented as built-in types, simple arrays of characters, or objects.

Strings can be mutable, or immutable.

Common string operations include:


Enumerations

An enumeration is a type where the programmer specifies all possible values. In C-based languages, this looks like:


enum Day {
    Monday,
    Tuesday,
    Wednesday,
    Thursday,
    Friday,
    Saturday,
    Sunday
};

They are typically implemented with integers. Some languages allow them to be coerced to/from integers and some do not.

In ML based languages, enumeration types can aid pattern matching:


type day = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday

match d with Monday -> // code
           | Tuesday -> //code
    ...

This will give an error if we do not handle every enumerated value.

Many newer languages such as Python and Ruby do not include enumeration types.

Some languages, such as Pascal and Ada, have subrange types which allow us to specify ranges of integers as valid values:


subtype index is Integer range 1..1000;

When a variable of type "index" is declared, it will be checked to ensure it is within that range.


Data Structures


Type Checking

Type checking is the process of making sure that variables and values are used consistently with the type of data they store. Type checking is also extended to functions and expressions.

A type error is a situation where a variable or value is used with an operation that is not supported for that type. The following is a type error in C:


float x = 5.0;
char* name = "Bob";
int comb = x + name;

x.c:8:16: error: invalid operands to binary + (have float and char *)

A language is called strongly typed if it will never allow invalid operations to take place. C is weakly typed, because we can make the compiler do operations like the above:


float x = 5.0;
char* name = "Bob";
int comb = x + (float)((int)name);

Strong/weak typing is not the same as static/dynamic typing:

 StaticDynamic
StrongJava, HaskellPython, Ruby
WeakC, C++Perl, Shell languages

Type Equivalence

Type equivalence refers to the rules on whether deciding two types are equivalent. Consider this code:


struct A {
    int x;
    float y;
};

struct B {
    int t1;
    float t2;
};

void print(A a) {
    printf("%d, %f\n", a.x, a.y);
}

int main() {
    B b;
    print(b);
}

Should this code be allowed?

This depends on the type equivalence rules of the language:

Most languages use nominal equivalence. Some exceptions:

C is nominally typed for most constructs (including structs), but its "typedefs" are just aliases so they are structurally typed.


Type Systems

A languages type system is composed of the following things:

The type system of a language is probably the most important factor in a languages design.

Copyright © 2018 Ian Finlayson | Licensed under a Creative Commons Attribution 4.0 International License.