Dynamic Strings

We’ve seen previously how to declare strings - static, using pointers, as constants - and noticed all approaches share a similar pattern of declaring some fixed size of memory and storing characters. What if we wanted to shorten or lenghthen our string?

This is a luxury we took for granted when programming in Python! We could declare some variable and give it any string we wanted, modifying it liberally without second thought. Armed with pointers and structs, we will attempt to emulate this desired behaviour in C and implement our own improvements!

Functionality Requirements

Let’s concretely define the behaviour we are aiming for:

Implementation

Let’s define a custom string struct mystr as follows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// my_str.h

#if !defined(MYSTR_H)
#define MYSTR_H
typedef struct mystr{
    char *block;
    int length;   // keep track of the length, we could use a size_t
    int capacity; // keep track of the size of the block
} mystr;

// initialise the string
void mystr_create(mystr **p_p_s);                           // Input: a pointer to apointer of a mystr
                                                            // Effect: set the pointer to the address of a valid mystr
// change a character at an index
void set_char(mystr *p_s, const char character, int index); // Input: a pointer of a mystr
                                                            // Effect: set the value at index to character
                                                            // sidenote - const char is used here because we don't expect
                                                            //            to change the value of character inside the 
                                                            //            function (but this doesn't necessarily mean it 
                                                            //            can't be changed outside of the function
// append to the string
void append_str(mystr *p_s, const char *src);               // Input: a pointer of a mystr and a string literal
                                                            // Effect: appropriately append src to the mystr 
                                                            //         (adjusting the length and capacity as needed)
// These are secret tools we will use later 
void mystr_cat(mystr *p_dest, const mystr *p_src);
void mystr_destroy(mystr *p_s);
mystr mystr_substr(mystr *p_s, int i, int j);

#endif

As you can see, the main difference from a regular dynamically allocated string is that our string will know its length at all times, given that we always store its correct value. We will also go ahead and declare some basic functionality for our string: allocating/freeing and concatenation. Let’s flesh out these functions!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// mystr.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mystr.h"

void mystr_create(mystr **p_p_s)
{
  // In this function, we want to allocate an address to a valid mystr
  // so we need a pointer that points to the address of a valid mystr
  // (as opposed to a pointer that points directly to a valid mystr)
  *p_p_s = (mystr *)malloc(sizeof(mystr));                            // allocate enough memory for a char pointer and 2 integers
  if(*p_p_s == NULL){
    printf("ERROR: failed to allocate block of size 'mystr'");
    exit(1);
  }
  (*p_p_s)->capacity = 100;                                           // some default capacity
  (*p_p_s)->block    = (char *)malloc(sizeof(char) * (*p_p_s)->capacity);
  if(*p_p_s == NULL){
    printf("ERROR: failed to allocate block of size 100 chars");
    exit(1);
  }
  ((*p_p_s)->block)[0] = '\0';
  (*p_p_s)->length   = 0;
}

void set_char(mystr *p_s, const char character, int index)
{
  if((index >= p_s->length) || (index < 0)){
    printf("ERROR: index %d out of bounds in mystr (length: %d)\n", index, p_s->length);
    exit(1);
  }
  (p_s->block)[index] = character;
}

void append_str(mystr *p_s, const char *src)
{
  // We can't simply use strcat because we expect mystr to manage its capacity (while strcat doesn't not offer this)
  if(p_s->capacity < p_s->sz + strlen(src)){
    // We need to increase the capacity to fit the src string
    int new_capacity = p_s->capacity + strlen(src) + 1;
    // If we are increasing the capacity often, we might want to proactively increase the size
    new_capacity *= 2;
    p_s->block = (char *)realloc(p_s->block, new_capacity * sizeof(char));
    if(p_s->block == NULL){  // error checking: did realloc return a valid address?
      printf("ERROR: failed to reallocate memory block of size %d", new_capacity);
      exit(1);
    }
    p_s->capacity = new_capacity;
    p_s->length = p_s->length + strlen(src);  // Note that we could update the size by setting it to the new capacity before multiplying by 2
  }

  strcat(p_s->block, src);
}

Testing our mystr so far

Let’s try to create our string and read its value with our current implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// testing_mystr.c
#include <stdio.h>
#include "mystr.h"

int main()
{
  mystr *p_s;
  mystr_create(&p_s);  // notice we are passing the address of our pointer!
                       // now we have an empty mystr with 100 characters of capacity

  append_str(p_s, "ESC");
  printf("The string is %s\n", p_s->block); // "ESC"

  append_str(p_s, "190");
  printf("The string is %s\n", p_s->block); // "ESC190"

  return 0;
}

We can now compile our code and test out the functionality! To do this, we need to tell the compiler what implementation of mystr.h to use first.

In VSCode, this tasks.json configuration seems to work fine

1
2
3
4
5
6
7
8
9
10
...
"args": [
    "-fdiagnostics-color=always",
    "-g",
    "mystr.c",
    "testing_mystr.c",
    "-o",
    "${fileDirname}/${fileBasenameNoExtension"
],
...

If you’re following along on your favourite terminal, the equivalent command can be run

1
gcc -fdiagnostics-color=always -g -o testing_mystr.exe mystr.c testing_mystr.c

There’s a leak in my string!

Running this code through Valgrind will reveal that there are memory leaks from the mystr allocation. This is because we haven’t freed any of the memory we allocated. To do this, we can implement the void mystr_destroy function.

1
2
3
4
5
6
// mystr.c
void mystr_destroy(mystr *p_s)
{
  free(p_s->block);
  free(p_s);
}

Now we can recompile and see that the memory is properly released before the program terminates. One subtlety to notice here is the order in which we free the memory: if we free the pointer to mystr first then we effectively have no reference to the block it contained and this causes a memory leak of its own (losing access to allocated memory locks that memory block so it counts as consuming memory without control; though modern OS’s can deal with this by destroying the process entirely). Generally, it’s good practice to think about what parts have no dependent memory blocks and free those first. Note also that the integers are not freed because they are not allocated and their memory is released when we free the pointer to mystr.