C memory question - c

char buffer[10];
strcat(buffer, "hi");
printf("%s", buffer);
In the above code, it prints some weird symbol or number followed by the "hi", I know strcat is appending to buffer. And I normally zero the memory in buffer. But i'm curious why I usually have to do that.
If i do printf("%i", buffer); without the strcat, it just prints a random number. What is that number? Could anyone explain or link to a tut that explains what is in buffer, before I fill it with anything?

"buffer" is a 10 byte region on the stack, and it'd contain whatever was last written to that region of memory. When you strcat, it would concatenate "hi" after the first null byte in that region (So if the first null byte is beyond 10 bytes, you'd overwrite something on the stack). When you print without zeroing, it'd print the bytes until the first 0 (null). Again, this might print beyond the 10 bytes.
When you printf (%I, buffer), you print the address of that location.

First up, you need to zero-init the buffer:
char buffer[10] = {0};
buffer[0] = 0; /* As R.. points out, initializing all elements is excessive. */
Second, the number is the address of buffer, as a decimal. If you really want to print that, you are better off printing:
printf("%p\n", buffer);

You need a terminating '\0' to mark the end of the string,
use
strcpy(buffer,"hi");
strcat() tries to append to an already existing string which is assumed to be '\0' terminated. Your buffer isn't initialized.

do a memset(buffer, 0, 10) to zero the memory first, before appending.

The strcat() function appends the src string to the dest
string, overwriting the null byte ('\0') at the end of dest,
and then adds a terminating null byte. The strings may not
overlap, and the dest string must have enough space for the
result.
buffer is not '\0' terminated, as it is not initialized, we do not know what it contains. Therefore it is an undefined behavior. You should first make sure the buffer is '\0' terminated.
And the number printed is not a random number it is the starting address of the buffer.

Related

How does realloc treat null bytes in strings?

Relatively new C programmer here. I am reviewing the following code for a tutorial for a side project I am working on to practice C. The point of the abuf struct is to create a string that can be appended to. Here is the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef struct abuf {
char* str;
unsigned int size;
} abuf;
void abAppend(abuf *ab, const char *s, int len) {
char *new = realloc(ab->str, ab->size + len);
if (new == NULL) return;
memcpy(&new[ab->size], s, len);
ab->str = new;
ab->size += len;
}
int main(void) {
abuf ab = {
NULL,
0
};
char *s = "Hello";
abAppend(&ab, s, 5);
abAppend(&ab, ", world", 7);
return 0;
}
Everything compiles and my tests (redacted for simplicity) show that the string "Hello" is stored in ab's str pointer, and then "Hello, world" after the second call to abAppend. However, something about this code confuses me. On the initial call to abAppend, the str pointer is null, so realloc, according to its man page, should behave like malloc and allocate 5 bytes of space to store the string. But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string, if I understand this correctly. Isn't this null byte lost if we store "Hello\0" in a malloc-ed container large enough only to store "Hello"?
On the second call to abAppend, we concatenate ", world" to str. The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for. And yet, everything works, and if I test for the null byte with a loop like for (int i = 0; ab.str[i] != '\0'; i++), the loop works fine and increments i 12 times (0 thru 11), and stops, meaning it encountered the null byte on the 13th iteration. What I don't get is why does it encounter the null byte, if we don't allocate space for it?
I tried to break this code by doing weird combinations of strings, to no avail. I also tried to allocate an extra byte in each call to abAppend and changed the function a little to account for the extra space, and it performed the exact same as this version. How the null byte gets processed is eluding me.
How does realloc treat null bytes in strings?
The behavior of realloc is not affected by the contents of the memory it manages.
But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string,…
The characters are copied with memcpy(&new[ab->size], s, len);, where len is 5. memcpy copies characters without regard to whether there is a terminating null byte. Given length of 5, it copies 5 bytes. It does not append a terminating null character to those.
The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for.
On the second called to abAppend, 7 more bytes are copied with memcpy, after the first 5 bytes. memcpy is given a length of 7 and copies only 7 bytes.
… it encountered the null byte on the 13th iteration.
When you tested ab.str[12], you exceeded the rules for which the C standard defines the behavior. ab.str[12] is outside the allocated memory. It is possible it contained a null byte solely because nothing else in your process had used that memory for another purpose, and that is why your loop stopped. If you attempted this in the middle of a larger program that had done previous work, that byte might have contained a different value, and your test might have gone awry in a variety of ways.
You're correct that you only initially allocated space for the characters in the string "Hello" but not the terminating null byte, and that the second call only added enough bytes for the characters in tge string ", world" with no null terminating byte.
So what you have is an array of characters but not a string since it's not null terminated. If you then attempt to read past the allocated bytes, you trigger undefined behavior, and one of the ways UB can manifest itself is that things appear to work properly.
So you got "lucky" that things happened to work as if you allocated space for the null byte and set it.

How is this null terminated?

I am aware when using char arrays you must initialise the size to at the length of the string + 1 to account for \0
But if I don't account for \0 how does strchr claim that \0 Is in the char array since this prints "is null terminated"
char mark[4] = "mark";
if(strchr(mark, '\0')) {
puts("Is null terminated.");
} else {
puts("Is not");
}
strchr isn't limited to, or even aware of, the length of the array you're passing to it, so will continue on throughout the rest of memory until it finds what it is looking for.
If you print the value returned by strchr you'll see that it is beyond the end of the array.
char mark[4] = "mark";
This line above can be divided in two:
char mark[4];
sprintf(mark,"mark");
The first line is reserving 4 bytes in the memory, and the mark name will return the address of the first byte.
And the second line (which is equivalent to mark = "mark", but writing during the execution) is writing the string "mark" to the memory, starting at the address named mark, the problem here is that how you are writting a string you will write 5 bytes, so you will write the 4 characters in the 4 bytes reserved to the variable AND will write a NULL (0x00) to the 5th byte, which doesn't belong to the variable!
if you have another variable allocated right after the mark variable, you probably will corrupt it when you write the string, as it will invade this variable address and overwrite it.

Null termination of char array

Consider following case:
#include<stdio.h>
int main()
{
char A[5];
scanf("%s",A);
printf("%s",A);
}
My question is if char A[5] contains only two characters. Say "ab", then A[0]='a', A[1]='b' and A[2]='\0'.
But if the input is say, "abcde" then where is '\0' in that case. Will A[5] contain '\0'?
If yes, why?
sizeof(A) will always return 5 as answer. Then when the array is full, is there an extra byte reserved for '\0' which sizeof() doesn't count?
If you type more than four characters then the extra characters and the null terminator will be written outside the end of the array, overwriting memory not belonging to the array. This is a buffer overflow.
C does not prevent you from clobbering memory you don't own. This results in undefined behavior. Your program could do anything—it could crash, it could silently trash other variables and cause confusing behavior, it could be harmless, or anything else. Notice that there's no guarantee that your program will either work reliably or crash reliably. You can't even depend on it crashing immediately.
This is a great example of why scanf("%s") is dangerous and should never be used. It doesn't know about the size of your array which means there is no way to use it safely. Instead, avoid scanf and use something safer, like fgets():
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
Example:
if (fgets(A, sizeof A, stdin) == NULL) {
/* error reading input */
}
Annoyingly, fgets() will leave a trailing newline character ('\n') at the end of the array. So you may also want code to remove it.
size_t length = strlen(A);
if (A[length - 1] == '\n') {
A[length - 1] = '\0';
}
Ugh. A simple (but broken) scanf("%s") has turned into a 7 line monstrosity. And that's the second lesson of the day: C is not good at I/O and string handling. It can be done, and it can be done safely, but C will kick and scream the whole time.
As already pointed out - you have to define/allocate an array of length N + 1 in order to store N chars correctly. It is possible to limit the amount of characters read by scanf. In your example it would be:
scanf("%4s", A);
in order to read max. 4 chars from stdin.
character arrays in c are merely pointers to blocks of memory. If you tell the compiler to reserve 5 bytes for characters, it does. If you try to put more then 5 bytes in there, it will just overwrite the memory past the 5 bytes you reserved.
That is why c can have serious security implementations. You have to know that you are only going to write 4 characters + a \0. C will let you overwrite memory until the program crashes.
Please don't think of char foo[5] as a string. Think of it as a spot to put 5 bytes. You can store 5 characters in there without a null, but you have to remember you need to do a memcpy(otherCharArray, foo, 5) and not use strcpy. You also have to know that the otherCharArray has enough space for those 5 bytes.
You'll end up with undefined behaviour.
As you say, the size of A will always be 5, so if you read 5 or more chars, scanf will try to write to a memory, that it's not supposed to modify.
And no, there's no reserved space/char for the \0 symbol.
Any string greater than 4 characters in length will cause scanf to write beyond the bounds of the array. The resulting behavior is undefined and, if you're lucky, will cause your program to crash.
If you're wondering why scanf doesn't stop writing strings that are too long to be stored in the array A, it's because there's no way for scanf to know sizeof(A) is 5. When you pass an array as the parameter to a C function, the array decays to a pointer pointing to the first element in the array. So, there's no way to query the size of the array within the function.
In order to limit the number of characters read into the array use
scanf("%4s", A);
There isn't a character that is reserved, so you must be careful not to fill the entire array to the point it can't be null terminated. Char functions rely on the null terminator, and you will get disastrous results from them if you find yourself in the situation you describe.
Much C code that you'll see will use the 'n' derivatives of functions such as strncpy. From that man page you can read:
The strcpy() and strncpy() functions return s1. The stpcpy() and
stpncpy() functions return a
pointer to the terminating `\0' character of s1. If stpncpy() does not terminate s1 with a NUL
character, it instead returns a pointer to s1[n] (which does not necessarily refer to a valid mem-
ory location.)
strlen also relies on the null character to determine the length of a character buffer. If and when you're missing that character, you will get incorrect results.
the null character is used for the termination of array. it is at the end of the array and shows that the array is end at that point. the array automatically make last character as null character so that the compiler can easily understand that the array is ended.
\0 is an terminator operator which terminates itself when array is full
if array is not full then \0 will be at the end of the array
when you enter a string it will read from the end of the array

C Sprintf() appends junk characters

I tried to use sprintf to append a int, string and an int.
sprintf(str,"%d:%s:%d",count-1,temp_str,start_id);
Here, the value of start_id is always the same. The value of temp_str which is a char * increases every time. I get correct output for some time and then my sprintf starts printing junk characters between temp_str and startid. So my str get corrupted.
Can anyone explain this behavior ?
example
at count 11
11:1:2:3:1:2:3:1:2:3:1:21:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2
at count 8
8:1:2:3:1:2:3:1:2:3:1:21:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1:2:3:1�:2
I don't understand why and how "�" is appended to the string
Either temp_str is not null-terminated at some point or you've blown the buffer for str and some other memory access is affecting it.
Without seeing the code, it's a little hard to tell but, if you double the size of str and the problem behaviour changes, then it's probably the latter.
1> try to memset your str buffer with 0 befor using sprintf
2> The value of temp_str which is a char * increases every time
what do u mean by this ?
this should be normal charachter pointer which will point some string and that string should be null terminated and tha will be copied to str
3> the total size by combing all three argument should not be exceed the size of str buffer
It looks like the string temp_str isn't NUL-terminated. You can either terminate it before the call to sprintf, or if you know the length you want to print, use the %.*s formatting operator like this:
int str_len = ...; // Calculate length of temp_str
sprintf(str, "%d:%.*s:%d", count-1, str_len, temp_str, start_id);
You are running off the end of temp_str. Check your bounds and make sure it's null terminated. Stop incrementing sun you get to the end.

second memcpy() attaches previous memcpy() array to it

I have a little problem here with memcpy()
When I write this
char ipA[15], ipB[15];
size_t b = 15;
memcpy(ipA,line+15,b);
It copies b bytes from array line starting at 15th element (fine, this is what i want)
memcpy(ipB,line+31,b);
This copies b bytes from line starting at 31st element, but it also attaches to it the result for previous command i.e ipA.
Why? ipB size is 15, so it shouldnt have enough space to copy anything else. whats happening here?
result for ipA is 192.168.123.123
result for ipB becomes 205.123.123.122 192.168.123.123
Where am I wrong? I dont actually know alot about memory allocation in C.
It looks like you're not null-terminating the string in ipA. The compiler has put the two variables next to one another in memory, so string operations assume that the first null terminator is sometime after the second array (whenever the next 0 occurs in memory).
Try:
char ipA[16], ipB[16];
size_t b = 15;
memcpy(ipA,line+15,b);
ipA[15] = '\0';
memcpy(ipB,line+31,b);
ipB[15] = '\0';
printf("ipA: %s\nipB: %s\n", ipA, ipB)
This should confirm whether this is the problem. Obviously you could make the code a bit more elegant than my test code above. As an alternative to manually terminating, you could use printf("%.*s\n", b, ipA); or similar to force printf to print the correct number of characters.
Are you checking the content of the arrays by doing printf("%s", ipA) ? If so, you'll end up with the described effect since your array is interpreted as a C string which is not null terminated. Do this instead: printf("%.*s", sizeof(ipA), ipA)
Character strings in C require a terminating mark. It is the char value 0.
As your two character strings are contiguous in memory, if you don't terminate the first character string, then when reading it, you will continue until memory contains the end-of-string character.

Resources