How does realloc treat null bytes in strings? - c

Relatively new C programmer here. I am reviewing the following code for a tutorial for a side project I am working on to practice C. The point of the abuf struct is to create a string that can be appended to. Here is the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef struct abuf {
char* str;
unsigned int size;
} abuf;
void abAppend(abuf *ab, const char *s, int len) {
char *new = realloc(ab->str, ab->size + len);
if (new == NULL) return;
memcpy(&new[ab->size], s, len);
ab->str = new;
ab->size += len;
}
int main(void) {
abuf ab = {
NULL,
0
};
char *s = "Hello";
abAppend(&ab, s, 5);
abAppend(&ab, ", world", 7);
return 0;
}
Everything compiles and my tests (redacted for simplicity) show that the string "Hello" is stored in ab's str pointer, and then "Hello, world" after the second call to abAppend. However, something about this code confuses me. On the initial call to abAppend, the str pointer is null, so realloc, according to its man page, should behave like malloc and allocate 5 bytes of space to store the string. But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string, if I understand this correctly. Isn't this null byte lost if we store "Hello\0" in a malloc-ed container large enough only to store "Hello"?
On the second call to abAppend, we concatenate ", world" to str. The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for. And yet, everything works, and if I test for the null byte with a loop like for (int i = 0; ab.str[i] != '\0'; i++), the loop works fine and increments i 12 times (0 thru 11), and stops, meaning it encountered the null byte on the 13th iteration. What I don't get is why does it encounter the null byte, if we don't allocate space for it?
I tried to break this code by doing weird combinations of strings, to no avail. I also tried to allocate an extra byte in each call to abAppend and changed the function a little to account for the extra space, and it performed the exact same as this version. How the null byte gets processed is eluding me.

How does realloc treat null bytes in strings?
The behavior of realloc is not affected by the contents of the memory it manages.
But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string,…
The characters are copied with memcpy(&new[ab->size], s, len);, where len is 5. memcpy copies characters without regard to whether there is a terminating null byte. Given length of 5, it copies 5 bytes. It does not append a terminating null character to those.
The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for.
On the second called to abAppend, 7 more bytes are copied with memcpy, after the first 5 bytes. memcpy is given a length of 7 and copies only 7 bytes.
… it encountered the null byte on the 13th iteration.
When you tested ab.str[12], you exceeded the rules for which the C standard defines the behavior. ab.str[12] is outside the allocated memory. It is possible it contained a null byte solely because nothing else in your process had used that memory for another purpose, and that is why your loop stopped. If you attempted this in the middle of a larger program that had done previous work, that byte might have contained a different value, and your test might have gone awry in a variety of ways.

You're correct that you only initially allocated space for the characters in the string "Hello" but not the terminating null byte, and that the second call only added enough bytes for the characters in tge string ", world" with no null terminating byte.
So what you have is an array of characters but not a string since it's not null terminated. If you then attempt to read past the allocated bytes, you trigger undefined behavior, and one of the ways UB can manifest itself is that things appear to work properly.
So you got "lucky" that things happened to work as if you allocated space for the null byte and set it.

Related

Null termination of string not working it seems

In these two versions:
//VERSION 1
char *c=malloc(10);
c[0]='h';
c[1]='i';
c[2]='\0';
c[3]='l';
printf("%s\n",c);
I am getting the expected result i.e. hi is being printed.
Now in this one:
//VERSION 2
char *c;
size_t siz=8;
c=malloc(sizeof(char)*(siz+1)); //char size is 1 byte on system
getline(&c,&siz,stdin);
c[siz]='\0';
printf("%s\n",c);
On inputting the value 'hello world' the output is 'hello world' - I was expecting that it won't print anything after reading 9th byte (it is set to \0).
Why is there difference in the two?
Is it happening because pointer c in version 2 is made to point to stdin and `\0' modification doesn't work that way in a stream? If yes then why is compiler now issuing any warning or error?
As you yourself noted in the comments, getline will check the pointer and size arguments to see if it needs to reallocate (or allocate) the buffer in the event the line from the stream exceeds the given buffer's size (a NULL buffer of size 0 being a plain allocation instead of a reallocation). When this happens, both the pointer and the size arguments are changed to match the new buffer (remember, you passed in pointers to the buffer pointer and size arguments, not just the arguments themselves, references not values).
So, in your example, after allocating a buffer of size 9 chars (9 bytes in your case); your c pointer is set to some memory with at least 9 bytes available and siz is still 8. However, after typing a line longer than 8 characters (including the new line) like "hello world\n", the buffer is reallocated to fit the whole string "hello world\n\0", ie 13 bytes, AND the size arguments is changed to 13. So, when getline returns, c points to this new buffer and siz is 13. You don't need to add a null termination since getline does it for you (assuming it succeeds). What you are doing is then setting c[13] to '\0' which luckily for you didn't trigger any exceptions as you are accessing past the end of the buffer (making the string "hello world\n\0\0").
For the results you're looking for, keep the original size aside, like in a macro:
#define SIZE 8
char* c;
size_t siz = SIZE;
c = malloc(sizeof(char) * (siz +1));
getline(&c, &siz, stdin); // if you type something longer than 8 bytes including new line, it will trigger the realloc and siz will be changed
c[SIZE] = '\0'; // prematurely end the string at 8 bytes
printed("%s\n", c); // now you'll get shorter strings, noting siz will still keep the full length for you

What happens when I write `char str[80];`?

What happens behind the scenes when I write: char str[80];?
I notice that I can now set str = "hello"; and also str = "hello world"; right afterwards. First time strlen(str) is 5, and second time it is 11;
But why? I thought that after str = "hello";, the char at index 5 becomes null (str[5] becomes '\0'). Doesn't that mean that str's size is now 6 and I shouldn't be able to set it to "hello world"?
And if not, then how does strlen and sizeof calculate the correct values every time?
I think you're getting confused between two different concepts: the allocated length of the array (how much total space is available), and the logical length of the string (how much space is being used).
When you write char str[80], you're getting storage space for 80 characters. You might not end up using all of that space, but regardless of what string you try storing in it, you're always going to have 80 slots into which you can place characters.
If you store the string "hello" into str, then the first six characters of str will be set to h, e, l, l, o, and a null terminating character. This doesn't change the allocated length, though - you still have 74 other slots that you can work with. If you then change it to "hello, world", you're using an extra seven characters, which fits just fine because you easily have enough allocated space to hold things. You've just changed the logical length, how much of that space is being used for meaningful data, but not the allocated length, how much space there is available.
Think of it this way. When you say char str[80], you're buying a plot of land that's, say, 80 acres. If you then put "hello" into it, you're using six acres of that available 80 acres. The rest of the land is still yours - you can build whatever you'd like there - so if you decide to tear everything down and build a longer string that uses up more acres of land, that's fine. No one is going to object.
The strlen function gives back the logical length of the string - how many characters are in the string that you're storing. It works by counting up characters until it finds a null terminator indicating the logical end of the string. The sizeof operator returns the allocated length of the array, how many slots you have. It works at compile-time and doesn't care what the array contents are.
When you declare a variable as char str[80], space for an 80 character array is allocated on the stack. This memory will be automatically released when that particular stack frame is out of scope.
When you assign it to the string literal "hello", it is copying each character into the array, then putting a null terminator at the end of the string (str[5] == '\0'). String length and array size are two different things, which is why you can reassign it to "hello world". String length is simply how many consecutive characters there are before the null terminator. If you instead declared str as char str[5], you would indeed cause a crash when you tried to reassign it to "hello world". It may be helpful to view a simple implementation of strlen:
size_t strlen(const char *str)
{
size_t return_val = 0;
while (str[return_val] != '\0') return_val++;
return return_val;
}
Of course, if there is no null terminating character, the above naive implementation will crash.
I am assuming that you are working in C. When you compile "char str[80];" basically a 80 character long space is allocated for you. sizeof(str) should always tell you that it is an 80 byte long chunk of memory. strlen(str) will count the non-zero characters starting at str[0]. This is why "Hello" is 5 and "Hello world".
I would suggest that you learn to use functions like strnlen, strncpy, strncmp, snprintf ..., this way you can prevent reading/writing beyond the end the array, for example: strnlen(str,sizeof(str)).
Also start working through online tutorials and find an introductory C/C++ book to learn from.
When you declare an array like char str[80]; 80 chars of space are reserved on the stack for you, but they are not initialized - they get whatever was already in memory at the time. It's your job as the programmer to initialize the array.
strlen does something along these lines:
int strlen(char *s)
{
int len = 0;
while(*s++) len++;
return len;
}
In other words, it returns the length of a null-terminated string in a character array, even if the length is less than the size of the total array.
sizeof returns the size of a type or expression. If your array is 80 chars long, and a char is a byte long, it will return 80, even if none of the values in the array have been initialized. If you had an array of 5 ints, and an int was 4 bytes long, sizeof would produce 20.

Weird behavior of printf() calls after usage of itoa() function

I am brushing up my C skills.I tried the following code for learning the usage of itoa() function:
#include<stdio.h>
#include<stdlib.h>
void main(){
int x = 9;
char str[] = "ankush";
char c[] = "";
printf("%s printed on line %d\n",str,__LINE__);
itoa(x,c,10);
printf(c);
printf("\n %s \n",str); //this statement is printing nothing
printf("the current line is %d",__LINE__);
}
and i got the following output:
ankush printed on line 10
9
//here nothing is printed
the current line is 14
The thing is that if i comment the statement itoa(x,c,10); from the code i get the above mentioned statement printed and got the following output:
ankush printed on 10 line
ankush //so i got it printed
the current line is 14
Is this a behavior of itoa() or i am doing something wrong.
Regards.
As folks pointed out in the comments, the size of the array represented by the variable c is 1. Since C requires strings have a NULL terminator, you can only store a string of length 0 in c. However, when you call itoa, it has no idea that the buffer you're handing it is only 1 character long, so it will happily keep writing out digits into memory after c (which is likely to be memory that contains str).
To fix this, declare c to be of a size large enough to handle the string you plan to put into it, plus 1 for the NULL terminator. The largest value a 32-bit int can hold is 10 digits long, so you can use char c[11].
To further explain the memory overwriting situation above, let's consider that c and str are allocated in contiguous regions on the stack (since they are local variables). So c might occupy memory address 1000 (because it is a zero character string plus a NULL terminator), and str would occupy memory address 1001 through 1008 (because it has 6 characters, plus the NULL terminator). When you try to write the string "9" into c, the digit 9 is put into memory address 1000 and the NULL terminator is put in memory address 1001. Since 1001 is the first address of str, str now represents a zero-length string (NULL terminator before any other characters). That's why you are getting the blank.
c must be a buffer long enough to hold your number.
Write
char c[20] ;
instead of
char c[] = "";

C memory question

char buffer[10];
strcat(buffer, "hi");
printf("%s", buffer);
In the above code, it prints some weird symbol or number followed by the "hi", I know strcat is appending to buffer. And I normally zero the memory in buffer. But i'm curious why I usually have to do that.
If i do printf("%i", buffer); without the strcat, it just prints a random number. What is that number? Could anyone explain or link to a tut that explains what is in buffer, before I fill it with anything?
"buffer" is a 10 byte region on the stack, and it'd contain whatever was last written to that region of memory. When you strcat, it would concatenate "hi" after the first null byte in that region (So if the first null byte is beyond 10 bytes, you'd overwrite something on the stack). When you print without zeroing, it'd print the bytes until the first 0 (null). Again, this might print beyond the 10 bytes.
When you printf (%I, buffer), you print the address of that location.
First up, you need to zero-init the buffer:
char buffer[10] = {0};
buffer[0] = 0; /* As R.. points out, initializing all elements is excessive. */
Second, the number is the address of buffer, as a decimal. If you really want to print that, you are better off printing:
printf("%p\n", buffer);
You need a terminating '\0' to mark the end of the string,
use
strcpy(buffer,"hi");
strcat() tries to append to an already existing string which is assumed to be '\0' terminated. Your buffer isn't initialized.
do a memset(buffer, 0, 10) to zero the memory first, before appending.
The strcat() function appends the src string to the dest
string, overwriting the null byte ('\0') at the end of dest,
and then adds a terminating null byte. The strings may not
overlap, and the dest string must have enough space for the
result.
buffer is not '\0' terminated, as it is not initialized, we do not know what it contains. Therefore it is an undefined behavior. You should first make sure the buffer is '\0' terminated.
And the number printed is not a random number it is the starting address of the buffer.

second memcpy() attaches previous memcpy() array to it

I have a little problem here with memcpy()
When I write this
char ipA[15], ipB[15];
size_t b = 15;
memcpy(ipA,line+15,b);
It copies b bytes from array line starting at 15th element (fine, this is what i want)
memcpy(ipB,line+31,b);
This copies b bytes from line starting at 31st element, but it also attaches to it the result for previous command i.e ipA.
Why? ipB size is 15, so it shouldnt have enough space to copy anything else. whats happening here?
result for ipA is 192.168.123.123
result for ipB becomes 205.123.123.122 192.168.123.123
Where am I wrong? I dont actually know alot about memory allocation in C.
It looks like you're not null-terminating the string in ipA. The compiler has put the two variables next to one another in memory, so string operations assume that the first null terminator is sometime after the second array (whenever the next 0 occurs in memory).
Try:
char ipA[16], ipB[16];
size_t b = 15;
memcpy(ipA,line+15,b);
ipA[15] = '\0';
memcpy(ipB,line+31,b);
ipB[15] = '\0';
printf("ipA: %s\nipB: %s\n", ipA, ipB)
This should confirm whether this is the problem. Obviously you could make the code a bit more elegant than my test code above. As an alternative to manually terminating, you could use printf("%.*s\n", b, ipA); or similar to force printf to print the correct number of characters.
Are you checking the content of the arrays by doing printf("%s", ipA) ? If so, you'll end up with the described effect since your array is interpreted as a C string which is not null terminated. Do this instead: printf("%.*s", sizeof(ipA), ipA)
Character strings in C require a terminating mark. It is the char value 0.
As your two character strings are contiguous in memory, if you don't terminate the first character string, then when reading it, you will continue until memory contains the end-of-string character.

Resources