Linked Lists

Course notes for ESC190 Computer Algorithms and Data Structures

Motivation: We cannot dynamically add an element to an existing array or block of memory easily, as there could be no space there! The only option we have is to re-allocate everything to a new location with enough space. This is inefficient, especially for larger arrays. Instead, we can use a special data structure called a linked list.

Linked lists are in fact a data structure which implement the list abstract data type (ADT). Recall the definition of a list:

A list is an ordered collection of data items that supports the following operations:

A linked list consists of a data structure called a node, which consists of the following information:

We can then make an array by creating a chain of these nodes, where we can find the next node by following the next pointer. While this uses up more space than a regular array, it has the benefit of preventing re-allocating everything.
For example, if we want to add an element to the end of the list, we can simply create a new node and set the next pointer of the last node to point to the new node (+ a minor detail we’ll touch on in a bit).

See the lecture slides for a visual of linked lists and their operations.

Linked List Operations

We want to implement the following operations: Inserting, Removing, and Searching.

Insert Operation

Inserting: Suppose we already have a pointer to a node, and we wish to insert a node after that. What we can do is create this new node, and set its next pointer to point to the node that the original node pointed to. Then, we can set the original node’s next pointer to point to the new node we just created.

Notice that we can insert nodes in constant time; this is faster than using an array implementation which would run in linear time.

Remove Operation

Removing: Suppose we already have a pointer to a node, and we wish to delete that node. We can do so by setting the next pointer of the node before the node we want to delete to point to the node after the node we want to delete. Finally, we free up the memory.

Just as with insertion, deletion is constant using a linked list.

Get Operation

Getting: Suppose we wish to find a node (i.e. its pointer) at a certain index (i.e. the ith node) In the previous 2 operations, we assumed we already had this pointer. We can do so by iterating through the linked list by following the next pointer until we find the node we want.

Notice here the drawback of using a linked lists versus an array: getting the ith node requires us to walk the linked list until the desired node is found, meaning we run in linear time. Contrast this to an array which gets its elements in constant time through pointers.

Implementation in C

We can implement the linked list as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct node{
    int data;
    struct node *next;
} node;

typedef struct LL{
    node *head;
    int size;
} LL;

void create_node(node **p_n, int data)
{
    *p_n = (node*)malloc(sizeof(node));
    (*p_n)->data = data;
    (*p_n)->next = NULL;
}


void create_LL_from_data(LL **p_LL, int *data_arr, int size)
{
    (*p_LL) = (LL*)malloc(sizeof(LL));
    (*p_LL)->size = 0; // initialize size to 0
    
    // keep track of the last node of the linked list
    node *tail = 0;

    for(int i = 0; i < size; i++){

        // n is a pointer to a node with data = data[i], next = NULL
        node *n;
        create_node(&n, data_arr[i]);

        // If the last node is the 1st node
        if(tail == 0){
            (*p_LL)->head = n;
        }

        // If the last node is not the 1st node
        else{
            // append the new node to the end of the linked list
            tail->next = n;
        }

        // update the tail
        tail = n;

        // update the size
        (*p_LL)->size++;
    }
}

int main(){
    int data_arr[] = {1, 2, 3, 4, 5};
    LL *p_LL;
    create_LL_from_data(&p_LL, data_arr, 5);
}

Here are some observations:

Insert for Linked List

We can implement insertion as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
void LL_insert(LL *my_list, int new_elem, int index)
{
  // Create the new node 
  node *n;
  create_node(&n, new_elem);

  // If the list is empty
  if(my_list->head == NULL){
    my_list->head = n;
    my_list->size++;
    return;
  }

  // To insert at the very beginning
  if(index == 0){
    n->next = my_list->head;
    my_list->head = n;
    my_list->size++;
    return;
  }

  // Errors
  if (index < 0 || index > my_list->size-1){
    fprintf(stderr, "Invalid index %d", index);
    exit(1); // exits the program, returns 1 to the OS
  }

  // Traverse to the desired index
  node *cur = my_list->head;
  for(int i = 0; i < index; i++){
    cur = cur->next;
  }

  n->next = cur->next;
  cur->next = n;
  my_list->size++;
}

Note that we deal with a few edge cases:

In the other cases, we first need to traverse to the desired index. Because this is not an array, we need to follow the linked list via the next pointers. Once we reach the desired index, we can update the next pointers of the new node and the node before it. Similar to above, the order of updating the next pointers matter!

Remove and Get for Linked List

The code to remove and get for linked lists can be found here and is very similar to the code for insert. For deletion, we just need to remember to update the next pointer of the node before the one we are deleting, and freeing up the memory. Like most cases, the order here is important. Getting is actually partially implemented in insertion and deletion, as we need to find the ith element. Error handling for all three functions are also similar.

Best Practises

Suppose you have a piece of code, that relies on an abstract data structure (i.e. a list). We can define this structure in a header file list.h. Recall that an ADT doesn’t include how it is implemented, just the operations.

We can now implement this ADT in a separate file. For example, we can either implement it using linkedlist.c or arraylist.c. For example, if our list.h file looks like

1
2
3
4
5
6
7
8
9
#if !defined(LIST_H)
#define LIST_H
void create_list_from_data(void **p_list, int *data_arr, int size);
void list_append(void *list, int new_elem);
void list_insert(void *list, int new_elem, int index);
void list_delete(void *list, int index);
void list_free_all(void *list);
int list_get(void *list, int index);
#endif

Then our linkedlist.c file needs to contain these functions. If we already have code (i.e. the earlier ones we wrote), we don’t need to modify them. Instead, we can just create a new function that calls them in order to have the right form. For example,

1
2
3
4
5
6
7
8
9
void list_insert(void *list, int new_elem, int index)
{
    LL_insert((LL*)list, new_elem, index);
}

int list_get(void *list, int index)
{
    return LL_get(list, index);
}

Notice that the functions in list.h take in void pointers, because we want them to be as general as possible (i.e. could be implemented using either an array or linked list). These additional functions we write in linkedlist.c are just wrappers. One final note is that although we can cast the void pointer to the correct type (LL*), it is not necessary.

We can do this for both linkedlist.c and arraylist.c, and when we want to compile our main program, we can choose to compile it with either linkedlist.c or arraylist.c. For example, we can run

1
gcc linkedlist.c -c -o linkedlist.o

to create the object file linkedlist.o. We can then compile main.c alongside linkedlist.o to create an executable. We can do this by running

1
gcc main.c linkedlist.o -o main

If we instead wanted to compile using the array list implementation, we can run

1
gcc main.c arraylist.o -o main

This has several advantages:

Implementation in Python

In this lecture, we will discuss how to implement a linked list in Python. Most of the time, you won’t need to use a linked list in Python (we’ll see an example next Monday), but it is still important to understand how linked lists work, and the work done in this lecture will be useful for implementing other data structures in Python for the rest of this course. The advantage of Python over C is pedagogical here, we want to focus on the data structure and not the specific implementation details.

If you are not familiar with classes, please see the last part of L19. Consider the Node class,

1
2
3
4
5
6
7
class Node:
  def __init__(self, value):
      self.value = value
      self.next = None
  
  def __str__(self):
      return f"{self.value}"

By default, we have set self.next = None to indicate that the next node is not yet defined. We can then create a linked list by creating a series of nodes and linking them together. For example,

1
2
3
4
5
6
7
if __name__ == '__main__':
  n1 = Node(12)
  n2 = Node(15)
  n3 = Node(500)

  n1.next = n2
  n2.next = n3

Recall that in the Python memory model, values are not copied. Everything is passed by reference. Therefore, something like n1.next = n2 means we make n1.next refer to the address of n2 (so they are treated as the “same” thing). In C, the analogy would be

1
2
3
node *n1 = create_node(12)
node *n2 = create_node(15)
n1->next = n2;

where the function create_node() performs the memory allocation for you, which is what Python is doing behind the scenes. We can also make a linked list class and include some functions. For example,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class LinkedList:
  def __init__(self):
      self.head = None
  
  def get_i(self, i):
      # return the value at index i
      cur = self.head
      for j in range(i):
          cur = cur.next
      return cur.value

  def append(self, value):
      '''Add a new node with the value value to the end of the list'''
      # Create a new node
      new_node = Node(value)
      print(new_node)

      if self.head == None:
          self.head = new_node
      else:
          # Find the last node
          cur = self.head
          while cur.next != None:
              cur = cur.next
          cur.next = new_node
  
  def insert(self, value, i):
      '''Insert a node with the value value at index i'''
      new_node = Node(value)

      if i == 0:
          new_node.next = self.head
          self.head = new_node
      else:
          cur = self.head
          for j in range(i-1):
              cur = cur.next
          new_node.next = cur.next
          cur.next = new_node

  def __str__(self):
      cur = self.head
      s = ""
      if(cur == None):
          return "Empty list :("
      
      while cur != None:
          s += str(cur) + " -> "
          cur = cur.next
      return s[:-4] # remove last arrow

Note that we didn’t do anything new here. We took the existing code we had in L18 and transformed it into Python equivalents. The only new thing is the __str__ function, which allows us to easily print out the linked list. For this function we have two cases: if the list is empty, we print something, and if the list is non-empty we print out the values of each node, connected with the -> symbol. We can then test our code,

1
2
3
4
5
6
7
if __name__ == '__main__':
  LL = LinkedList()
  LL.append(3)
  LL.append(50)
  LL.append(100)

  print(LL)