memcpy overlapping not understandable [duplicate] - c

This question already has an answer here:
Please look into this inexplicable behavior and output of memcpy() for overlapping memory blocks
(1 answer)
Closed 8 years ago.
I'm playing with memcpy in order to acquire better perception of its work and i've run into things i can't understand.
I start from very simple piece of code:
char str [] = "0123456789abcdef";
memcpy(str + 5, str, 5);
puts(str);// prints 0123401234abcdef
that's absolutely understandable for me. Then i move on:
char str [] = "0123456789abcdef";
memcpy(str + 5, str, 6);
puts(str); // 01234012340bcdef
at first i expected the output to be 01234012345bcdef assuming that the function will take first six characters but it started from 0 again. Ok, thought i, probably it somehow takes characters from already built new string. And putting 7 like this memcpy(str + 5, str, 7); confirmed this my assumption because it had produced 012340123401cdef as output.
But then things started to get more unclear. If i do this memcpy(str + 5, str, 8); it outputs 0123401234567def!!!. Just like i expected from the beginning.
I'm totally confused. Why it behaves this way? Ok, i can even understand printing 01 as 11th and 12th characters of the string (but this is not what i expected and i would be grateful for an explanation). But why when i determine the length as 8 it changes its behaviour???
I hope you understand what i mean. Please provide a detailed explanation. Thanks a lot in advance

The memcpy() function shall copy n bytes from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
memcpy assumes that the source and destination don't overlap. If they do, anything can happen, and exactly what happens may depend on what compiler you use, what machine you compile or run on, how big the region you're copying is, what time of day it is, etc. If you want to copy a block of memory to a destination overlapping the original position, use memmove.

For memcpy():
[...] If the objects overlap, the behavior is undefined.
You should use memmove() instead for overlapping cases.
[...] The objects may overlap: copying takes place as if the characters were copied to a temporary character array and then the characters were copied from the array to dest.

Related

If strncat adding NUL may cause the array go out of bound

I have some trouble with strncat().The book called Pointers On C says the function ,strncat(),always add a NUL in the end of the character string.To better understand it ,I do an experiment.
#include<stdio.h>
#include<string.h>
int main(void)
{
char a[14]="mynameiszhm";
strncat(a,"hello",3);
printf("%s",a);
return 0;
}
The result is mynameiszhmhel
In this case the array has 14 char memory.And there were originally 11 characters in the array except for NUL.Thus when I add three more characters,all 14 characters fill up the memory of array.So when the function want to add a NUL,the NUL takes up memory outside the array.This cause the array to go out of bounds but the program above can run without any warning.Why?Will this causes something unexpected?
So when we use the strncat ,should we consider the NUL,in case causes the array go out of bound?
And I also notice the function strncpy don't add NUL.Why this two string function do different things about the same thing?And why the designer of C do this design?
This cause the array to go out of bounds but the program above can run without any warning. Why?
Maybe. With strncat(a,"hello",3);, code attempted to write beyond the 14 of a[]. It might go out of bounds, it might not. It is undefined behavior (UB). Anything is allowed.
Will this causes something unexpected?
Maybe, the behavior is not defined. It might work just as you expect - whatever that is.
So when we use thestrncat ,should we consider the NUL, in case causes the array go out of bound?
Yes, the size parameter needs to account for appending a null character, else UB.
I also notice the function strncpy don't add NUL. Why this two string function do different things about the same thing? And why the designer of C do this design?
The 2 functions strncpy()/strncat() simple share similar names, not highly similar paired functionality of strcpy()/strcat().
Consider that the early 1970s, memory was far more expensive and many considerations can be traced back to a byte of memory more that an hour's wage. Uniformity of functionality/names was of lesser importance.
And there were originally 11 characters in the array except for NUL.
More like "And there were originally 11 characters in the array except for 3 NUL.". This is no partial initialization in C.
This is not really an answer, but a counterexample.
Observe the following modification to your program:
#include<stdio.h>
#include<string.h>
int main(void)
{
char p[]="***";
char a[14]="mynameiszhm";
char q[]="***";
strncat(a,"hello",3);
printf("%s%s%s", p, a, q);
return 0;
}
The results of this program are dependent on where p and q are located in memory, compared to a. If they are not adjacent, the results are not so clear but if either p or q immediately comes after a, then your strncat will overwrite the first * causing one of them not to be printed anymore because that will now be a string of length 0.
So the results are dependent on memory layout, and it should be clear that the compiler can put the variables in memory in any order it likes. And they can be adjacent or not.
So the problem is that you are not keeping to your promise not to put more than 14 bytes into a. The compiler did what you asked, and the C standards guarantee behaviour as long as you keep to the promises.
And now you have a program that may or may not do what you wanted it to do.

How char array behaves for longer strings?

I asked this question as one of multiple questions here. But people asked me to ask them separately. So why this question.
Consider below code lines:
char a[5] = "geeks"; //1
char a3[] = {'g','e','e','k','s'}; //d
printf("a:%s,%u\n",a,sizeof(a)); //5
printf("a3:%s,%u\n",a3,sizeof(a3)); //j
printf("a[5]:%d,%c\n",a[5],a[5]);
printf("a3[5]:%d,%c\n",a3[5],a3[5]);
Output:
a:geeksV,5
a3:geeks,5
a[5]:86,V
a3[5]:127,
However the output in original question was:
a:geeks,5
a3:geeksV,5
The question 1 in original question was:
Does line #1 adds \0? Notice that sizeof prints 5 in line #5 indicating \0 is not there. But then, how #5 does not print something like geeksU as in case of line #j? I feel \0 does indeed gets added in line #1, but is not considered in sizeof, while is considered by printf. Am I right with this?
Realizing that the output has changed (for same online compiler) when I took out only those code lines which are related to first question in original question, now I doubt whats going on here? I believe these are undefined behavior by C standard. Can someone shed more light? Possibly for another compiler?
Sorry again for asking 2nd question.
char a[5] = "geeks"; //1
Here, you specify the array's size as '5', and initialize it with 5 characters.
Therefore, you do not have a "C string", which by definition is ended by a NUL. (0).
printf("a:%s,%u\n",a,sizeof(a)); //5
The array itself still has a size of 5, which is correctly reported by the sizeof operator, but your call to printf is undefined behaviour and could print anything after the arrray's contents - it will just keep looking at the next address until it finds a 0 somewhere. That could be immediately, or it could print a 1000000 garbage characters, or it could cause some sort of segfault or other crash.
char a3[] = {'g','e','e','k','s'}; //d
Because you don't specify the array's size, the compiler will, through the initialization syntax, determine the size of the array. However, the way you chose to initialize a3, it will still only provide 5 bytes of length.
The reason for that is that your initialization just is an initialization list, and not a "string". Therefore, your subsequent call to printf also is undefined behaviour, and it is just luck that at the position a3[5] there seems to be a 0 in your case.
Effectively, both examples have the very same error.
You could have it different thus:
char a3[] = "geeks";
Using a string literal for initialization of the array with unspecified size will cause the compiler to allocate enough memory to hold the string and the additional NUL-terminator, and sizeof (a3) will now yield 6.
"geeks" here is a string literal in C.
When you define "geeks" the compiler automatically adds the NULL character to the end. This makes it 6 characters long.
But you are assigning it to char a[5]. This will cause undefined behaviour.
As mentioned by #DavidBowling, in this case the following condition applies
(Section 6.7.8.14) C99 standard.
An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array
the elements "geeks" will be copied into the array 'a' but the NULL character will not be copied.
So in this case when you try to print the array, it will continue printing until it encounters a \0 in the memory.
From the further print statements it is seen that a[5] has the value V. Presumably the next byte on your system is \0 and the array print stops.
So, in your system, at that instance, "geeksV" is printed.

What happens if I strncat onto a string without a null terminator?

SET UP:
Given this sort of code:
char myString[4];
printf("%s\n", myString);
strncpy(myString, "hi", 2);
printf("%s\n", myString);
strncat(myString, "h123", 2);
printf("%s\n", myString);
This will print:
KU�
hiU�
hiU�h1
WHAT I EXPECTED:
In my mind, myString is a pointer to an allocated spot in memory that looks like this:
MEMORY: [random][random][random][\0][random][random][random]....
PRINTED: [random][random][random][\0]
It adds a null pointer to the memory in the fourth spot after the beginning of the string
After strncpy:
MEMORY: [h][i][random][\0][random][random][random]...
PRINTED: [h][i][random][\0]
It changes the first 2 characters to be hi and does not add an \0
After strncat:
MEMORY: [h][i][random][h][1][2][3][\0]...
PRINTED: [h][i][random][h][1][2][3][\0]
It looks for the \0 after the beginning of the string then removes the \0 and adds its own string as well as a \0 at the end.
What I expected did not occur.
QUESTION:
What is being printed in there?
Which part if what I expected is an incorrect understanding?
NOTE:
Now, I understand that this undefined behavior and it should be avoided, but I am asking this question from the perspective of trying to understand all possible exploits that could be used on given code.
I am not looking for the proper coding practice. I am looking for an understanding of what exactly is going wrong.
EDIT 1
I do understand that the docs say that its undefined behavior and from a developers perspective one must just avoid the possibility nasal demons.
But from a exploiter's perspective, something is happening here and this may not just be a bug, but instead it may be a security flaw that can be understood in a deeper manner such that a consistent exploit may be formed. I am hoping for this deeper level of understanding. of
In my mind, myString is a pointer to an allocated spot in memory that looks like this:
MEMORY: [random][random][random][\0][random][random][random]....
Maybe in your mind it does, in reality it will look like this:
MEMORY: [random][random][random][random][random][random][random]....
In fact, as the comments say, the characters are not random but indeterminate. Most likely they will be the remnants of previous stack frames that occupied but you don't know.
When you allocate a char array on the stack, no nul bytes are put in. It just increments the stack pointer by 4 and that's it.
Edit
Sorry, I leaped in without reading the whole question.
strncpy(myString, "hi", 2);
The above line copies an h and then an i and then stops because it has copied two chars. If it were sensible, it would just copy the h and then a \0 but it isn't.
strncat is a bizarre function that should probably be consigned to the fiery pits of hell. It goes along to the end of the first string and then adds up to n characters from the second string and a terminating \0. The n has no relevance to the size of the buffer to which you are copying and which you can therefor overrun.
strncat(myString, "h123", 2);
There's no guarantee that your first string has a \0 anywhere (as already discussed), so it will copy the h and the 1 to an indeterminate memory location.

Different ways to calculate string length

A comment on one of my answers has left me a little puzzled. When trying to compute how much memory is needed to concat two strings to a new block of memory, it was said that using snprintf was preferred over strlen, as shown below:
size_t length = snprintf(0, 0, "%s%s", str1, str2);
// preferred over:
size_t length = strlen(str1) + strlen(str2);
Can I get some reasoning behind this? What is the advantage, if any, and would one ever see one result differ from the other?
I was the one who said it, and I left out the +1 in my comment which was written quickly and carelessly, so let me explain. My point was merely that you should use the pattern of using the same method to compute the length that will eventually be used to fill the string, rather than using two different methods that could potentially differ in subtle ways.
For example, if you had three strings rather than two, and two or more of them overlapped, it would be possible that strlen(str1)+strlen(str2)+strlen(str3)+1 exceeds SIZE_MAX and wraps past zero, resulting in under-allocation and truncation of the output (if snprintf is used) or extremely dangerous memory corruption (if strcpy and strcat are used).
snprintf will return -1 with errno=EOVERFLOW when the resulting string would be longer than INT_MAX, so you're protected. You do need to check the return value before using it though, and add one for the null terminator.
If you only need to determine how big would be the concatenation of the two strings, I don't see any particular reason to prefer snprintf, since the minimum operations to determine the total length of the two strings is what the two strlen calls do. snprintf will almost surely be slower, because it has to check the parameters and parse the format string besides just walking the two strings counting the characters.
... but... it may be an intelligent move to use snprintf if you are in a scenario where you want to concatenate two strings, and have a static, not too big buffer to handle normal cases, but you can fallback to a dynamically allocated buffer in case of big strings, e.g.:
/* static buffer "big enough" for most cases */
char buffer[256];
/* pointer used in the part where work on the string is actually done */
char * outputStr=buffer;
/* try to concatenate, get the length of the resulting string */
int length = snprintf(buffer, sizeof(buffer), "%s%s", str1, str2);
if(length<0)
{
/* error, panic and death */
}
else if(length>sizeof(buffer)-1)
{
/* buffer wasn't enough, allocate dynamically */
outputStr=malloc(length+1);
if(outputStr==NULL)
{
/* allocation error, death and panic */
}
if(snprintf(outputStr, length, "%s%s", str1, str2)<0)
{
/* error, the world is doomed */
}
}
/* here do whatever you want with outputStr */
if(outputStr!=buffer)
free(outputStr);
One advantage would be that the input strings are only scanned once (inside the snprintf()) instead of twice for the strlen/strcpy solution.
Actually, on rereading this question and the comment on your previous answer, I don't see what the point is in using sprintf() just to calculate the concatenated string length. If you're actually doing the concatenation, my above paragraph applies.
You need to add 1 to the strlen() example. Remember you need to allocate space for nul terminating byte.
So snprintf( ) gives me the size a string would have been. That means I can malloc( ) space for that guy. Hugely useful.
I wanted (but did not find until now) this function of snprintf( ) because I format tons of strings for output later; but I wanted not to have to assign static bufs for the outputs because it's hard to predict how long the outputs will be. So I ended up with a lot of 4096-long char arrays :-(
But now -- using this newly-discovered (to me) snprintf( ) char-counting function, I can malloc( ) output bufs AND sleep at night, both.
Thanks again and apologies to the OP and to Matteo.
EDIT: random, mistaken nonsense removed. Did I say that?
EDIT: Matteo in his comment below is absolutely right and I was absolutely wrong.
From C99:
2 The snprintf function is equivalent to fprintf, except that the output is written into
an array (specified by argument s) rather than to a stream. If n is zero, nothing is written,
and s may be a null pointer. Otherwise, output characters beyond the n-1st are
discarded rather than being written to the array, and a null character is written at the end
of the characters actually written into the array. If copying takes place between objects
that overlap, the behavior is undefined.
Returns
3 The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
Thank you, Matteo, and I apologize to the OP.
This is great news because it gives a positive answer to a question I'd asked here only a three weeks ago. I can't explain why I didn't read all of the answers, which gave me what I wanted. Awesome!
The "advantage" that I can see here is that strlen(NULL) might cause a segmentation fault, while (at least glibc's) snprintf() handles NULL parameters without failing.
Hence, with glibc-snprintf() you don't need to check whether one of the strings is NULL, although length might be slightly larger than needed, because (at least on my system) printf("%s", NULL); prints "(null)" instead of nothing.
I wouldn't recommend using snprintf() instead of strlen() though. It's just not obvious. A much better solution is a wrapper for strlen() which returns 0 when the argument is NULL:
size_t my_strlen(const char *str)
{
return str ? strlen(str) : 0;
}

C String Null Zero?

I have a basic C programming question, here is the situation. If I am creating a character array and if I wanted to treat that array as a string using the %s conversion code do I have to include a null zero. Example:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
The console output for this is:
abcdef
Notice that there is not a null zero as the last element in the array, yet I am still printing this as a string.
I am new to programming...So I am reading a beginners C book, which states that since I am not using a null zero in the last element I cannot treat it as a string.
This is the same output as above, although I include the null zero.
char name[7] = {'a','b','c','d','e','f','\0'};
printf("%s",name);
You're just being lucky; probably after the end of that array, on the stack, there's a zero, so printf stops reading just after the last character. If your program is very short and that zone of stack is still "unexplored" - i.e. the stack hasn't grown yet up to that point - it's very easy that it's zero, since generally modern OSes give initially zeroed pages to the applications.
More formally: by not having explicitly the NUL terminator, you're going in the land of undefined behavior, which means that anything can happen; such anything may also be that your program works fine, but it's just luck - and it's the worst type of bug, since, if it generally works fine, it's difficult to spot.
TL;DR version: don't do that. Stick to what is guaranteed to work and you won't introduce sneaky bugs in your application.
The output of your fist printf is not predictable specifically because you failed to include the terminating zero character. If it appears to work in your experiment, it is only because by a random chance the next byte in memory happened to be zero and worked as a zero terminator. The chances of this happening depend greatly on where you declare your name array (it is not clear form your example). For a static array the chances might be pretty high, while for a local (automatic) array you'll run into various garbage being printed pretty often.
You must include the null character at the end.
It worked without error because of luck, and luck alone. Try this:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
printf("%d",name[6]);
You'll most probably see that you can read that memory, and that there's a zero in it. But it's sheer luck.
What most likely happened is that there happened to be the value of 0 at memory location name + 6. This is not defined behavior though, you could get different output on a different system or with a different compiler.
Yes. You do. There are a few other ways to do it.
This form of initialization, puts the NUL character in for you automatically.
char name[7] = "abcdef";
printf("%s",name);
Note that I added 1 to the array size, to make space for that NUL.
One can also get away with omitting the size, and letting the compiler figure it out.
char name[] = "abcdef";
printf("%s",name);
Another method is to specify it with a pointer to a char.
char *name = "abcdef";
printf("%s",name);

Resources