Course notes for ESC190 Computer Algorithms and Data Structures
Motivation: We cannot dynamically add an element to an existing array or block of memory easily, as there could be no space there! The only option we have is to re-allocate everything to a new location with enough space. This is inefficient, especially for larger arrays. Instead, we can use a special data structure called a linked list.
Linked lists are in fact a data structure which implement the list abstract data type (ADT). Recall the definition of a list:
A list is an ordered collection of data items that supports the following operations:
Insert(list, i, x)
: add item x
at index i
of list
. The item previously at index i
, and all following items, are shifted up by one index to make room for x
.Remove(list, i)
: remove item at index i
of list
. The item previously at index i+1
, and all following items, are shifted down by one index to fill the gap.Get(list, i)
: return the item at index i
of list
.A linked list consists of a data structure called a node
, which consists of the following information:
data:
the actual data you need to storenext:
the address of the next nodeWe can then make an array by creating a chain of these nodes, where we can find the next node by following the next
pointer. While this uses up more space than a regular array, it has the benefit of preventing re-allocating everything.
For example, if we want to add an element to the end of the list, we can simply create a new node and set the next
pointer of the last node to point to the new node (+ a minor detail we’ll touch on in a bit).
See the lecture slides for a visual of linked lists and their operations.
We want to implement the following operations: Inserting, Removing, and Searching.
Inserting: Suppose we already have a pointer to a node, and we wish to insert a node after that. What we can do is create this new node, and set its next
pointer to point to the node that the original node pointed to. Then, we can set the original node’s next
pointer to point to the new node we just created.
Notice that we can insert nodes in constant time; this is faster than using an array implementation which would run in linear time.
Removing: Suppose we already have a pointer to a node, and we wish to delete that node. We can do so by setting the next
pointer of the node before the node we want to delete to point to the node after the node we want to delete. Finally, we free up the memory.
Just as with insertion, deletion is constant using a linked list.
Getting: Suppose we wish to find a node (i.e. its pointer) at a certain index (i.e. the i
th node) In the previous 2 operations, we assumed we already had this pointer. We can do so by iterating through the linked list by following the next
pointer until we find the node we want.
Notice here the drawback of using a linked lists versus an array: getting the ith node requires us to walk the linked list until the desired node is found, meaning we run in linear time. Contrast this to an array which gets its elements in constant time through pointers.
We can implement the linked list as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef struct node{
int data;
struct node *next;
} node;
typedef struct LL{
node *head;
int size;
} LL;
void create_node(node **p_n, int data)
{
*p_n = (node*)malloc(sizeof(node));
(*p_n)->data = data;
(*p_n)->next = NULL;
}
void create_LL_from_data(LL **p_LL, int *data_arr, int size)
{
(*p_LL) = (LL*)malloc(sizeof(LL));
(*p_LL)->size = 0; // initialize size to 0
// keep track of the last node of the linked list
node *tail = 0;
for(int i = 0; i < size; i++){
// n is a pointer to a node with data = data[i], next = NULL
node *n;
create_node(&n, data_arr[i]);
// If the last node is the 1st node
if(tail == 0){
(*p_LL)->head = n;
}
// If the last node is not the 1st node
else{
// append the new node to the end of the linked list
tail->next = n;
}
// update the tail
tail = n;
// update the size
(*p_LL)->size++;
}
}
int main(){
int data_arr[] = {1, 2, 3, 4, 5};
LL *p_LL;
create_LL_from_data(&p_LL, data_arr, 5);
}
Here are some observations:
create_LL_from_data
function is to keep track of the last node of the linked list (known as the tail) and constantly update it such that every time we create a new node, the tail’s next
pointer points to the new node.node *n
will cause the next
pointer to be 0
i.e. NULL
. This is how we can tell we reached the end of a linked list.LL
pointer (i.e. the memory address it points to), rather than just the value of the data it points to.We can implement insertion as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
void LL_insert(LL *my_list, int new_elem, int index)
{
// Create the new node
node *n;
create_node(&n, new_elem);
// If the list is empty
if(my_list->head == NULL){
my_list->head = n;
my_list->size++;
return;
}
// To insert at the very beginning
if(index == 0){
n->next = my_list->head;
my_list->head = n;
my_list->size++;
return;
}
// Errors
if (index < 0 || index > my_list->size-1){
fprintf(stderr, "Invalid index %d", index);
exit(1); // exits the program, returns 1 to the OS
}
// Traverse to the desired index
node *cur = my_list->head;
for(int i = 0; i < index; i++){
cur = cur->next;
}
n->next = cur->next;
cur->next = n;
my_list->size++;
}
Note that we deal with a few edge cases:
NULL
, then we proceed to make the new node the first (and only) element of the linked list.0
corresponds to inserting it at the very beginning. To do this, not only do we need to update the next
pointer of the new node, but the head
pointer of our linked list. The order here is important. If we instead ran my_list->head = n;
first, we lose information on where the original head is located.-1
or "Error"
, but how do we know that this is actually an error and their list doesn’t actually include this!fprintf(stderr, "message")
because it prints to the standard error stream, which is different from the standard output stream. Unlike printf
(which sometimes doesn’t show in terminal), this will always show, and allows regular print statements and error messages to be separated.In the other cases, we first need to traverse to the desired index. Because this is not an array, we need to follow the linked list via the next
pointers. Once we reach the desired index, we can update the next
pointers of the new node and the node before it. Similar to above, the order of updating the next
pointers matter!
The code to remove and get for linked lists can be found here and is very similar to the code for insert. For deletion, we just need to remember to update the next
pointer of the node before the one we are deleting, and freeing up the memory. Like most cases, the order here is important. Getting is actually partially implemented in insertion and deletion, as we need to find the i
th element. Error handling for all three functions are also similar.
Suppose you have a piece of code, that relies on an abstract data structure (i.e. a list). We can define this structure in a header file list.h. Recall that an ADT doesn’t include how it is implemented, just the operations.
We can now implement this ADT in a separate file. For example, we can either implement it using linkedlist.c or arraylist.c. For example, if our list.h
file looks like
1
2
3
4
5
6
7
8
9
#if !defined(LIST_H)
#define LIST_H
void create_list_from_data(void **p_list, int *data_arr, int size);
void list_append(void *list, int new_elem);
void list_insert(void *list, int new_elem, int index);
void list_delete(void *list, int index);
void list_free_all(void *list);
int list_get(void *list, int index);
#endif
Then our linkedlist.c
file needs to contain these functions. If we already have code (i.e. the earlier ones we wrote), we don’t need to modify them. Instead, we can just create a new function that calls them in order to have the right form. For example,
1
2
3
4
5
6
7
8
9
void list_insert(void *list, int new_elem, int index)
{
LL_insert((LL*)list, new_elem, index);
}
int list_get(void *list, int index)
{
return LL_get(list, index);
}
Notice that the functions in list.h
take in void pointers, because we want them to be as general as possible (i.e. could be implemented using either an array or linked list). These additional functions we write in linkedlist.c
are just wrappers. One final note is that although we can cast the void pointer to the correct type (LL*)
, it is not necessary.
We can do this for both linkedlist.c
and arraylist.c
, and when we want to compile our main program, we can choose to compile it with either linkedlist.c
or arraylist.c
. For example, we can run
1
gcc linkedlist.c -c -o linkedlist.o
to create the object file linkedlist.o
. We can then compile main.c
alongside linkedlist.o
to create an executable. We can do this by running
1
gcc main.c linkedlist.o -o main
If we instead wanted to compile using the array list implementation, we can run
1
gcc main.c arraylist.o -o main
This has several advantages:
linkedlist.o
can save compilation time. Although this is not a big problem now, compilation time can be an issue for larger and more complex projects.In this lecture, we will discuss how to implement a linked list in Python. Most of the time, you won’t need to use a linked list in Python (we’ll see an example next Monday), but it is still important to understand how linked lists work, and the work done in this lecture will be useful for implementing other data structures in Python for the rest of this course. The advantage of Python over C is pedagogical here, we want to focus on the data structure and not the specific implementation details.
If you are not familiar with classes, please see the last part of L19. Consider the Node class,
1
2
3
4
5
6
7
class Node:
def __init__(self, value):
self.value = value
self.next = None
def __str__(self):
return f"{self.value}"
By default, we have set self.next = None
to indicate that the next node is not yet defined. We can then create a linked list by creating a series of nodes and linking them together. For example,
1
2
3
4
5
6
7
if __name__ == '__main__':
n1 = Node(12)
n2 = Node(15)
n3 = Node(500)
n1.next = n2
n2.next = n3
Recall that in the Python memory model, values are not copied. Everything is passed by reference. Therefore, something like n1.next = n2
means we make n1.next
refer to the address of n2
(so they are treated as the “same” thing). In C, the analogy would be
1
2
3
node *n1 = create_node(12)
node *n2 = create_node(15)
n1->next = n2;
where the function create_node()
performs the memory allocation for you, which is what Python is doing behind the scenes. We can also make a linked list class and include some functions. For example,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class LinkedList:
def __init__(self):
self.head = None
def get_i(self, i):
# return the value at index i
cur = self.head
for j in range(i):
cur = cur.next
return cur.value
def append(self, value):
'''Add a new node with the value value to the end of the list'''
# Create a new node
new_node = Node(value)
print(new_node)
if self.head == None:
self.head = new_node
else:
# Find the last node
cur = self.head
while cur.next != None:
cur = cur.next
cur.next = new_node
def insert(self, value, i):
'''Insert a node with the value value at index i'''
new_node = Node(value)
if i == 0:
new_node.next = self.head
self.head = new_node
else:
cur = self.head
for j in range(i-1):
cur = cur.next
new_node.next = cur.next
cur.next = new_node
def __str__(self):
cur = self.head
s = ""
if(cur == None):
return "Empty list :("
while cur != None:
s += str(cur) + " -> "
cur = cur.next
return s[:-4] # remove last arrow
Note that we didn’t do anything new here. We took the existing code we had in L18 and transformed it into Python equivalents. The only new thing is the __str__
function, which allows us to easily print out the linked list. For this function we have two cases: if the list is empty, we print something, and if the list is non-empty we print out the values of each node, connected with the -> symbol. We can then test our code,
1
2
3
4
5
6
7
if __name__ == '__main__':
LL = LinkedList()
LL.append(3)
LL.append(50)
LL.append(100)
print(LL)