How is this null terminated? - c

I am aware when using char arrays you must initialise the size to at the length of the string + 1 to account for \0
But if I don't account for \0 how does strchr claim that \0 Is in the char array since this prints "is null terminated"
char mark[4] = "mark";
if(strchr(mark, '\0')) {
puts("Is null terminated.");
} else {
puts("Is not");
}

strchr isn't limited to, or even aware of, the length of the array you're passing to it, so will continue on throughout the rest of memory until it finds what it is looking for.
If you print the value returned by strchr you'll see that it is beyond the end of the array.

char mark[4] = "mark";
This line above can be divided in two:
char mark[4];
sprintf(mark,"mark");
The first line is reserving 4 bytes in the memory, and the mark name will return the address of the first byte.
And the second line (which is equivalent to mark = "mark", but writing during the execution) is writing the string "mark" to the memory, starting at the address named mark, the problem here is that how you are writting a string you will write 5 bytes, so you will write the 4 characters in the 4 bytes reserved to the variable AND will write a NULL (0x00) to the 5th byte, which doesn't belong to the variable!
if you have another variable allocated right after the mark variable, you probably will corrupt it when you write the string, as it will invade this variable address and overwrite it.

Related

How does realloc treat null bytes in strings?

Relatively new C programmer here. I am reviewing the following code for a tutorial for a side project I am working on to practice C. The point of the abuf struct is to create a string that can be appended to. Here is the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef struct abuf {
char* str;
unsigned int size;
} abuf;
void abAppend(abuf *ab, const char *s, int len) {
char *new = realloc(ab->str, ab->size + len);
if (new == NULL) return;
memcpy(&new[ab->size], s, len);
ab->str = new;
ab->size += len;
}
int main(void) {
abuf ab = {
NULL,
0
};
char *s = "Hello";
abAppend(&ab, s, 5);
abAppend(&ab, ", world", 7);
return 0;
}
Everything compiles and my tests (redacted for simplicity) show that the string "Hello" is stored in ab's str pointer, and then "Hello, world" after the second call to abAppend. However, something about this code confuses me. On the initial call to abAppend, the str pointer is null, so realloc, according to its man page, should behave like malloc and allocate 5 bytes of space to store the string. But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string, if I understand this correctly. Isn't this null byte lost if we store "Hello\0" in a malloc-ed container large enough only to store "Hello"?
On the second call to abAppend, we concatenate ", world" to str. The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for. And yet, everything works, and if I test for the null byte with a loop like for (int i = 0; ab.str[i] != '\0'; i++), the loop works fine and increments i 12 times (0 thru 11), and stops, meaning it encountered the null byte on the 13th iteration. What I don't get is why does it encounter the null byte, if we don't allocate space for it?
I tried to break this code by doing weird combinations of strings, to no avail. I also tried to allocate an extra byte in each call to abAppend and changed the function a little to account for the extra space, and it performed the exact same as this version. How the null byte gets processed is eluding me.
How does realloc treat null bytes in strings?
The behavior of realloc is not affected by the contents of the memory it manages.
But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string,…
The characters are copied with memcpy(&new[ab->size], s, len);, where len is 5. memcpy copies characters without regard to whether there is a terminating null byte. Given length of 5, it copies 5 bytes. It does not append a terminating null character to those.
The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for.
On the second called to abAppend, 7 more bytes are copied with memcpy, after the first 5 bytes. memcpy is given a length of 7 and copies only 7 bytes.
… it encountered the null byte on the 13th iteration.
When you tested ab.str[12], you exceeded the rules for which the C standard defines the behavior. ab.str[12] is outside the allocated memory. It is possible it contained a null byte solely because nothing else in your process had used that memory for another purpose, and that is why your loop stopped. If you attempted this in the middle of a larger program that had done previous work, that byte might have contained a different value, and your test might have gone awry in a variety of ways.
You're correct that you only initially allocated space for the characters in the string "Hello" but not the terminating null byte, and that the second call only added enough bytes for the characters in tge string ", world" with no null terminating byte.
So what you have is an array of characters but not a string since it's not null terminated. If you then attempt to read past the allocated bytes, you trigger undefined behavior, and one of the ways UB can manifest itself is that things appear to work properly.
So you got "lucky" that things happened to work as if you allocated space for the null byte and set it.

Weird behavior of printf() calls after usage of itoa() function

I am brushing up my C skills.I tried the following code for learning the usage of itoa() function:
#include<stdio.h>
#include<stdlib.h>
void main(){
int x = 9;
char str[] = "ankush";
char c[] = "";
printf("%s printed on line %d\n",str,__LINE__);
itoa(x,c,10);
printf(c);
printf("\n %s \n",str); //this statement is printing nothing
printf("the current line is %d",__LINE__);
}
and i got the following output:
ankush printed on line 10
9
//here nothing is printed
the current line is 14
The thing is that if i comment the statement itoa(x,c,10); from the code i get the above mentioned statement printed and got the following output:
ankush printed on 10 line
ankush //so i got it printed
the current line is 14
Is this a behavior of itoa() or i am doing something wrong.
Regards.
As folks pointed out in the comments, the size of the array represented by the variable c is 1. Since C requires strings have a NULL terminator, you can only store a string of length 0 in c. However, when you call itoa, it has no idea that the buffer you're handing it is only 1 character long, so it will happily keep writing out digits into memory after c (which is likely to be memory that contains str).
To fix this, declare c to be of a size large enough to handle the string you plan to put into it, plus 1 for the NULL terminator. The largest value a 32-bit int can hold is 10 digits long, so you can use char c[11].
To further explain the memory overwriting situation above, let's consider that c and str are allocated in contiguous regions on the stack (since they are local variables). So c might occupy memory address 1000 (because it is a zero character string plus a NULL terminator), and str would occupy memory address 1001 through 1008 (because it has 6 characters, plus the NULL terminator). When you try to write the string "9" into c, the digit 9 is put into memory address 1000 and the NULL terminator is put in memory address 1001. Since 1001 is the first address of str, str now represents a zero-length string (NULL terminator before any other characters). That's why you are getting the blank.
c must be a buffer long enough to hold your number.
Write
char c[20] ;
instead of
char c[] = "";

Malloc adding one to initializing a char for string concat C

I found this function on stackoverflow which concates two strings together. Here is the function:
char* concatstring(char *s1,char *s2)
{
char *result = malloc(strlen(s1)+strlen(s2)+1);
strcpy(result,s1);
strcat(result,s2);
return result;
}
My question is, why do we add 1 to the malloc call?
It's because in C "strings" are stored as arrays of chars followed by a null byte. This is by convention. Consequently, null bytes may not appear inside any C string.
However, the actual string itself does not contain the null byte (which is just part of the representation of the string), and so strlen reports the number of non-null bytes in the string. To create a C string that is the result of concatenating two strings, you thus need to leave room for the null terminator.
In fact, every string operation one way or another needs to deal with the null terminator. Unfortunately, the details vary from function to function (e.g. snprintf does it right, but strncpy is dangerously different), and you should read each function's manual very carefully to understand who takes care of the null terminator and how.
You need to allocate space for the '\0' (NULL character) which is used to terminate strings in C.
i.e. the string "cat" is actually "cat\0".
If the string is "cat":
char * mystring = "cat";
Then strlen(mystring), would return 3.
But in reality it takes 4 bytes to store mystring, with one byte to store null character.
So if you have two strings, "dog" and "cat", their length will be 3 and 3 , although the number of bytes required to store them would be 4 each. The memory required to store their concatenation would be 3+3 +1 = 7.
So the 1 in malloc is to allocate extra byte to store the null character.

C memory question

char buffer[10];
strcat(buffer, "hi");
printf("%s", buffer);
In the above code, it prints some weird symbol or number followed by the "hi", I know strcat is appending to buffer. And I normally zero the memory in buffer. But i'm curious why I usually have to do that.
If i do printf("%i", buffer); without the strcat, it just prints a random number. What is that number? Could anyone explain or link to a tut that explains what is in buffer, before I fill it with anything?
"buffer" is a 10 byte region on the stack, and it'd contain whatever was last written to that region of memory. When you strcat, it would concatenate "hi" after the first null byte in that region (So if the first null byte is beyond 10 bytes, you'd overwrite something on the stack). When you print without zeroing, it'd print the bytes until the first 0 (null). Again, this might print beyond the 10 bytes.
When you printf (%I, buffer), you print the address of that location.
First up, you need to zero-init the buffer:
char buffer[10] = {0};
buffer[0] = 0; /* As R.. points out, initializing all elements is excessive. */
Second, the number is the address of buffer, as a decimal. If you really want to print that, you are better off printing:
printf("%p\n", buffer);
You need a terminating '\0' to mark the end of the string,
use
strcpy(buffer,"hi");
strcat() tries to append to an already existing string which is assumed to be '\0' terminated. Your buffer isn't initialized.
do a memset(buffer, 0, 10) to zero the memory first, before appending.
The strcat() function appends the src string to the dest
string, overwriting the null byte ('\0') at the end of dest,
and then adds a terminating null byte. The strings may not
overlap, and the dest string must have enough space for the
result.
buffer is not '\0' terminated, as it is not initialized, we do not know what it contains. Therefore it is an undefined behavior. You should first make sure the buffer is '\0' terminated.
And the number printed is not a random number it is the starting address of the buffer.

second memcpy() attaches previous memcpy() array to it

I have a little problem here with memcpy()
When I write this
char ipA[15], ipB[15];
size_t b = 15;
memcpy(ipA,line+15,b);
It copies b bytes from array line starting at 15th element (fine, this is what i want)
memcpy(ipB,line+31,b);
This copies b bytes from line starting at 31st element, but it also attaches to it the result for previous command i.e ipA.
Why? ipB size is 15, so it shouldnt have enough space to copy anything else. whats happening here?
result for ipA is 192.168.123.123
result for ipB becomes 205.123.123.122 192.168.123.123
Where am I wrong? I dont actually know alot about memory allocation in C.
It looks like you're not null-terminating the string in ipA. The compiler has put the two variables next to one another in memory, so string operations assume that the first null terminator is sometime after the second array (whenever the next 0 occurs in memory).
Try:
char ipA[16], ipB[16];
size_t b = 15;
memcpy(ipA,line+15,b);
ipA[15] = '\0';
memcpy(ipB,line+31,b);
ipB[15] = '\0';
printf("ipA: %s\nipB: %s\n", ipA, ipB)
This should confirm whether this is the problem. Obviously you could make the code a bit more elegant than my test code above. As an alternative to manually terminating, you could use printf("%.*s\n", b, ipA); or similar to force printf to print the correct number of characters.
Are you checking the content of the arrays by doing printf("%s", ipA) ? If so, you'll end up with the described effect since your array is interpreted as a C string which is not null terminated. Do this instead: printf("%.*s", sizeof(ipA), ipA)
Character strings in C require a terminating mark. It is the char value 0.
As your two character strings are contiguous in memory, if you don't terminate the first character string, then when reading it, you will continue until memory contains the end-of-string character.

Resources