Course notes for ESC190 Computer Algorithms and Data Structures
Variables in Python are dynamically typed, which means that it’s the interpreter’s job to infer what is the type of a variable when declaring/accessing it. We let go of this abstraction in C, where we must specificy the type of the data stored inside a variable.
Below are some examples of types:
Type | Example |
---|---|
int | Integer number |
double | Double-precision floating point number |
char | ASCII character (the basic english character set) |
int * | Address of an int variable |
double * | Address of a double variable |
char * | Address of a char variable |
The last 3 types are known as pointers and are covered in more depth in a future topic. Note that we also haven’t addressed strings (or lists for that matter); these a discussed in the arrays and strings topic.
sizeof
sizeof returns the size occupied in memory by the contents of a variable.
It returns a value of type size_t
. The size_t
is defined in a way to guarantee that it can count the elements in an array (among other use cases) so it is forced to be the size of the largest unsigned integer. On my machine, this means that it resembles the unsigned long long which is 64-bit in size. Note that the compiler will give a warning rather than an error when type casting a size_t
to fit an integer in the limit that the size obtained does not exceed 32 bits (about 4 billion; nothing to worry about hopefully).
Most computers nowadays use a 64 bit architecture but this was not always the case! Even before the standardisation of the 32-bit architecture, different computer platforms had different architectures. Note that an architecture size refers to the maximum register size available. In C, primitive type were introduces in part to abstract away the need to manually account for a specific platforms architecture and this concept is still applicable nowadays although you should be able to find the same primitive type sizes as shown in the following section.
Primitive Type | Size |
---|---|
character (ASCII) | 1 byte |
integer | 4 bytes |
double | 8 bytes |
float | 4 bytes |
pointer | 8 bytes |
short integer | 2 bytes |
long integer | 4 bytes |
long long integer | 8 bytes |
long double | 8 bytes |
size_t | 8 bytes |
boolean | 1 byte |
unsigned
variants of datatypes are of the same length but have different possible values. Note that floats and doubles cannot be unsigned and that there are no long long doubles or long floats nor short short integers. The boolean type bool
is available for the C99 standard onwards and takes on values of either true
or false
.
A literal is a value that is used in the program, but not explicitly defined as a variable.
Integers and decimal points are of int and double type, respectively. Symbols in single quotes are of char type, and strings (in double quotes) are of const char *
type; which are discussed in the arrays and strings topic.
A binary number is a number represented in its base 2 notation. We can interpret it as a sum of powers of 2 where the position of a digit determines its exponent.
small note: the convention in the notes is to write binary numbers in little endian meaning the least significant bit (LSB), so the smallest power of 2, is on the left. You DO NOT need to study this for ESC190.
For example, the number \(14_{10}\) is written \(1110_2\) in binary and we can represent it as \(1110_{2} = 1\cdot2^3+1\cdot2^2+1\cdot2^1+0\cdot2^0.\) Recall this is exactly how we decompose numbers in decimal \(14=1\cdot10^1+4\cdot10^0.\)
Let’s take a look now at how we might convert between \(14_{10}\) and `\(1110_2\).
Going from right to left:
We thus write, from right to left: \(1110_{2}\)
This process can be done in Python too
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def get_decimal_digits(d):
digits = []
while d != 0:
digits = [(d % 10)] + digits
d //= 10
return digits
def get_binary_digits(d):
digits = []
while d != 0:
digits = [(d % 2)] + digits
d //= 2
return digits
print(get_binary_digits(14))
print(get_decimal_digits(521))
print(get_binary_digits(25))
Going from left to right:
What about negative numbers? and floating point number? The numbers we dealt with above are called unsigned because we assume they are positive when looking at their binary representation. However, dealing with negative numbers necessarily means using 1 bit of information to communicate whether the number is positive or negative (after all, binary is most useful in digital logic which is strictly a 2 state system of HIGH or LOW so its not practical to write a minus sign in hardware).
In 8 bit binary representations (it’s common to use 8 bits because they form a byte), it’s common to use the bit of highest value as a sign bit where 0 means positive and 1 means negative. This leaves 7 bits to represent the decimal portion of the number. For example \(b_7b_6b_5b_4b_3b_2b_1b_0=\begin{cases}+b_6b_5b_4b_3b_2b_1b_0& \text{if } b_7 = 0\\ +b_6b_5b_4b_3b_2b_1b_0&\text{if }b_7=1\end{cases}\)
Note: you might notice that a naive approach would lead us to have 2 zeros (namely, 00000000 and 10000000), we avoid this redundancy in a system called two’s complement where 10000000 actually rolls over to negative values, starting at -128.
Unsigned Decimal | Two’s Complement Decimal | Binary Representation |
---|---|---|
0 | 0 | 00000000 |
1 | 1 | 00000001 |
… | … | … |
127 | 127 | 01111111 |
128 | -128 | 10000000 |
129 | -127 | 10000001 |
… | … | … |
255 | -1 | 11111111 |
There are many other schemes to represent numbers using binary and these are best done through codified standards, for example:
Unicode is an interesting case where backwards compatability with ASCII was a must-requirement. To achieve this, Unicode uses the non-extended ASCII standard which only requires 7 bits then sets the first bit to 0 for a total of 8 bits (or 1 byte). None of the UTF encodings have a 0 as the first bit so this signals to Unicode readers that the character is an ASCII 1 byte character while ASCII readers are non the wiser since the bit string matches the standard.