Categories

# Pointers in C & C++

Today’s languages seem frightened of pointers – for some good reasons. But they’re not that scary!

A long (long!) time ago I answered a question on Stack Exchange, but it is still applicable today:

# Many programmers have a disjoint between a pointer and an array. Here’s an explanation.

### Here is a variable of type `char`, initialised with the value `'A'`:

```char c = 'A';
```

`'A'` has the ASCII value of 65, so that means that there’s a byte in memory that holds the value 65. Let’s say that byte in memory is at address `0x2000`. If you were to add `1` to that byte, it would become 66, and if you were to print it as a character, it would print `B` (ASCII 66).

### Here is a pointer to a `char`, initialised with the address in memory of `c`:

```char *p_c = &c; // Get the address of c into p_c
```

[Note that `char *p_c = c;` would be invalid: you can’t initialise a pointer with a `char`! That’d give `p_c` the value `65` – not what you want!]

In Arduino, `p_c` represents two successive bytes of memory that together store the address of a `char`: in this case the value 0x2000. Where those two bytes are doesn’t matter for this explanation – but trust that they too have an address…

If you were to change `c` into a ‘Z’, that would change `c`‘s contents, but not `p_c`‘s (it would still be `0x2000`). But if you were to use `p_c` to look at the byte it was pointing to (`c`), then you’d see the ‘Z’.

• If you were to do a `printf("%c", c);` you’d get a ‘Z’.

`%c` means “I’ve got a character as the next variable in the `printf()` list”.

• If you were to do a `printf("%c", *p_c);` you’d also get a ‘Z’.
• If you were to do a `printf("%c", p_c);` you’d get horrible results.
You lied to `printf()` – you didn’t give it a one-byte character, you gave it a two-byte pointer!

Now let’s move on to arrays.

### Here is a variable of type `char []`, initialised with the value `"John"`.

```char s[] = "John";
```

`'J'` has the ASCII value 74, `'o'` 112, `'h'` 104, and `'n'` 111. That means that in memory there are four sequential bytes with the values 74, 112, 104, 111. Let’s say that those bytes are at addresses 0x3000 through 0x3003.

### And here is another pointer to `char`, this time initialised with the address in memory of `s`.

```char *p_s = s; // Point to s with p_s
```

I hate that the C language allows the above. It is one of the roots of the whole array/pointer confusion. It kind-of implies that `p_s` somehow gets all of `"John"` assigned to it. Nope!

A better statement would have been:

```char *p_s = &s; // Initialise p_s with the address of s // ILLEGAL!
```

That gives a MUCH better idea of what’s happening – alas, it’s not how it’s done…

But also note that neither of the above could possibly store `0x3000-0x3003` (the addresses for all of `s`), so they have to be satisfied with only storing the start of the string: `p_s` will get the value `0x3000`.

Note that `s` does NOT have the value `0x3000` in it. That is its address, like `0x2000` was `c`‘s address (it too doesn’t have `0x2000` in it). But `p_s` does have `0x3000` in it – and don’t forget that `p_s` also has its own address (I won’t bother with that here).

### And this is where the “equivalency” (they’re not) between a pointer and an array starts to get people confused:

• `s[2] = 'h';` and `p_s[2] = 'h';` do the same thing – but in different ways!
• The first accesses `0x3002` directly.
• The second needs to find out that `p_s` is `0x3000` and add `2` to it before it can access `0x3002`.
• `*(p_s + 2) = 'h';` is yet a third way to do the same thing.

So now let’s `printf()` these new variables:

• `printf("%c", *s);` wouldn’t compile.

`s` is not a pointer, so you can’t dereference it with `*`.

• `printf("%c", *p_s);` would print the character at where `p_s` was pointing.
This would print out `J`.
• `printf("%c", s[0]);` would print the character at `s[0]`.
This would print out `J`.
• `printf("%c", s);` would give horrible results.
Again, you lied to `printf()`!
• `printf("%s", s);` would work (but see below).
`%s` means “I’ve got a sequence of characters that I’d like you to print. I’ve provided the address of the first character.”
• `printf("%s", p_s);` would be identical to the previous one.
After all, the same address value was passed!
• `printf("%s", *p_s);` would give horrible results.
Again, you lied to `printf()`!

But note something: you’ve given `printf()` the beginning of the string, so it can start printing. How does `printf()` know where the end of the string is? You didn’t provide the number `4`; it’s not stored anywhere; how can it know when to stop?

That’s where a C convention for strings steps in. By convention, when the compiler sees a string `"between quotes"`, it stores the ASCII values of the characters, and then follows them with a `0`. Not an ASCII `'0'` (with value 48), but an ASCII `NUL` (with value 0). Thus if you looked at address `0x3004` (the byte directly after `John`), you’d see a 0 in that memory byte. That’s how `printf()` knows to stop printing – and yes, many a bug has resulted from forgetting to maintain a `NUL` at the end of a sequence of chars!

### That then leads me to the `sizeof()` function

`sizeof()` always returns a constant at compile time using information that the compiler then knows about:

• `sizeof(c)` would return `1`.
• `sizeof(p_c)` would return `2`.
• `sizeof(p_s)` would also return `2`.

It is after all merely a pointer too.

• `sizeof(*p_c)` would return `1`.
• `sizeof(*p_s)` would also return `1`.
It’s only pointing to a single char…
• `sizeof(s)` would return `5`.
The compiler knows what `s` is: an array of 5 bytes (including the final `NUL`).
• Given a pointer, there is no (native) way to find out how large the buffer it is pointing to can be.

OK: so if you’ve got a `char []` you can find out its size. Note that that is its maximum size – the string inside it could be a lot shorter! If you did the following:

```s[3] = 0; // Overwrite the fourth character (arrays start at 0)
printf("%s", s);
```

[Note a better version of the first line would be `s[3] = '\0';` which means “store the ASCII character `0` (NOT `'0'`)” instead of “store the number `0`“.]

you’d get `Joh` printed out. Note:

• `sizeof(s)` remains `5`;
• You can’t get the length with `sizeof(*p_s)`;
• The string inside it is only 3 characters long.

So… if you’ve got a string buffer, or a pointer to one, how do you know how long it is? Count the number of characters until you reach a `NUL` – and there’s already a function to do that for you: `strlen()`.

### Which now leads to the final point about pointers: dynamic allocation of memory.

The array `s` above was sized by the compiler at compile time: it could count the characters, add one for the `NUL`, and reserve the right number of bytes. You could also have written:

```char s[10] = "John";
```

which would have reserved 10 bytes and only initialised the first 5 of them. The following would be an error:

```char s[3] = "John"; // Buffer too small!
```

Scarily, `char s[4] = "John";` works… (The compiler says to itself “No room for the `NUL`?! Ah well…”) Better never pass that string to `strlen()` or `printf()` then!

What if you don’t know at compile time how big a string is going to be? If you know at runtime, you can ask the system for a lump of memory from the heap – a reserved area that can have chunks taken from it and returned as required. There are two ways to do this:

```char *p_b = (char *)malloc(size); // Where size is a variable
char *buffer = new char[size];    // Grab 'size' new chars
```

The first calls `malloc()`, passing in the number of desired bytes. If `malloc()` can give those bytes it will reserve them and return a pointer to them. If it can’t it will return `0`. You should always test the return value! If not, you’ll corrupt things horribly…

The second uses the more sophisticated `new[]` syntax, which is more object oriented. Notice how `malloc()`‘s return value needed to be turned into a `(char *)`? That’s because it doesn’t know what type of pointer it’s returning, so it just picks `void *`. `new` does know, so it returns the correct type – but it still may return `0` if there’s no room!

Once you’ve got either pointer, you use them just like `p_s` before. There is nothing stopping you from accessing outside the boundaries of the buffer – just like there isn’t for `p_s`! Be careful!

Oh, and once you’ve finished with the lump of memory, you need to return it to the heap:

```free(p_b);
delete [] buffer;
```

If you used `malloc()`, return it with `free()`. If you used `new[]`, return it with `delete[]`.

### And all of the above is why they invented the `String` object.

It manages the memory; keeps track of the string’s length; lets you access and modify individual characters; lets you resize the string as needed; and gets rid of the used memory once you’re finished with it. No wonder people start to use `String` syntax when they see string pointers…

But `String`s are only useful for `char`s. Arrays and pointers can be of and to anything – including `String`s and (GASP!) pointers! Uh oh – here we go again…