strcpy vs direct assignment: Overflow issue - c

This is a practice question from my school, it's not a homework question:
Given the following declaration, write a snippet of C code that might
lead to strlen(arr) returning no less than 8.
char arr[4];
My attempt to this question is: impossible, there is no way to achieve this. Since strlen will return the number of chars in an char array until it meets the first \0, I don't see there is anyway we can let strlen return 8 in this case. I think if we force to assign a 8-length-long string to this array, the behavior is not predictable.
However, the solution that our instructor gives is the:
strcpy(arr, Any 8-char-long string);
For example:
strcpy(arr, "Overflow");
I don't understand why this is valid, in my understanding, I don't see this array has enough space to hold this 8-length string, is there something I miss for understanding the string in C?

"Given the following declaration, write a snippet of C code that might lead to strlen(arr) returning no less than 8."
That is not possible, since arr can only hold 3 characters and 1 null terminator.
My attempt to this question is: impossible, there is no way to achieve this
Correct.
However, the solution that our instructor gives is the: strcpy(arr, Any 8-char-long string);
Your instructor is incompetent and shouldn't be teaching C. This will write out of bounds of the array and anything can happen, including program crashes or the program seeming to work as intended this time of execution.
I don't understand why this is valid
It is not, it invokes undefined behavior.

Related

If strncat adding NUL may cause the array go out of bound

I have some trouble with strncat().The book called Pointers On C says the function ,strncat(),always add a NUL in the end of the character string.To better understand it ,I do an experiment.
#include<stdio.h>
#include<string.h>
int main(void)
{
char a[14]="mynameiszhm";
strncat(a,"hello",3);
printf("%s",a);
return 0;
}
The result is mynameiszhmhel
In this case the array has 14 char memory.And there were originally 11 characters in the array except for NUL.Thus when I add three more characters,all 14 characters fill up the memory of array.So when the function want to add a NUL,the NUL takes up memory outside the array.This cause the array to go out of bounds but the program above can run without any warning.Why?Will this causes something unexpected?
So when we use the strncat ,should we consider the NUL,in case causes the array go out of bound?
And I also notice the function strncpy don't add NUL.Why this two string function do different things about the same thing?And why the designer of C do this design?
This cause the array to go out of bounds but the program above can run without any warning. Why?
Maybe. With strncat(a,"hello",3);, code attempted to write beyond the 14 of a[]. It might go out of bounds, it might not. It is undefined behavior (UB). Anything is allowed.
Will this causes something unexpected?
Maybe, the behavior is not defined. It might work just as you expect - whatever that is.
So when we use thestrncat ,should we consider the NUL, in case causes the array go out of bound?
Yes, the size parameter needs to account for appending a null character, else UB.
I also notice the function strncpy don't add NUL. Why this two string function do different things about the same thing? And why the designer of C do this design?
The 2 functions strncpy()/strncat() simple share similar names, not highly similar paired functionality of strcpy()/strcat().
Consider that the early 1970s, memory was far more expensive and many considerations can be traced back to a byte of memory more that an hour's wage. Uniformity of functionality/names was of lesser importance.
And there were originally 11 characters in the array except for NUL.
More like "And there were originally 11 characters in the array except for 3 NUL.". This is no partial initialization in C.
This is not really an answer, but a counterexample.
Observe the following modification to your program:
#include<stdio.h>
#include<string.h>
int main(void)
{
char p[]="***";
char a[14]="mynameiszhm";
char q[]="***";
strncat(a,"hello",3);
printf("%s%s%s", p, a, q);
return 0;
}
The results of this program are dependent on where p and q are located in memory, compared to a. If they are not adjacent, the results are not so clear but if either p or q immediately comes after a, then your strncat will overwrite the first * causing one of them not to be printed anymore because that will now be a string of length 0.
So the results are dependent on memory layout, and it should be clear that the compiler can put the variables in memory in any order it likes. And they can be adjacent or not.
So the problem is that you are not keeping to your promise not to put more than 14 bytes into a. The compiler did what you asked, and the C standards guarantee behaviour as long as you keep to the promises.
And now you have a program that may or may not do what you wanted it to do.

What happens if I strncat onto a string without a null terminator?

SET UP:
Given this sort of code:
char myString[4];
printf("%s\n", myString);
strncpy(myString, "hi", 2);
printf("%s\n", myString);
strncat(myString, "h123", 2);
printf("%s\n", myString);
This will print:
KU�
hiU�
hiU�h1
WHAT I EXPECTED:
In my mind, myString is a pointer to an allocated spot in memory that looks like this:
MEMORY: [random][random][random][\0][random][random][random]....
PRINTED: [random][random][random][\0]
It adds a null pointer to the memory in the fourth spot after the beginning of the string
After strncpy:
MEMORY: [h][i][random][\0][random][random][random]...
PRINTED: [h][i][random][\0]
It changes the first 2 characters to be hi and does not add an \0
After strncat:
MEMORY: [h][i][random][h][1][2][3][\0]...
PRINTED: [h][i][random][h][1][2][3][\0]
It looks for the \0 after the beginning of the string then removes the \0 and adds its own string as well as a \0 at the end.
What I expected did not occur.
QUESTION:
What is being printed in there?
Which part if what I expected is an incorrect understanding?
NOTE:
Now, I understand that this undefined behavior and it should be avoided, but I am asking this question from the perspective of trying to understand all possible exploits that could be used on given code.
I am not looking for the proper coding practice. I am looking for an understanding of what exactly is going wrong.
EDIT 1
I do understand that the docs say that its undefined behavior and from a developers perspective one must just avoid the possibility nasal demons.
But from a exploiter's perspective, something is happening here and this may not just be a bug, but instead it may be a security flaw that can be understood in a deeper manner such that a consistent exploit may be formed. I am hoping for this deeper level of understanding. of
In my mind, myString is a pointer to an allocated spot in memory that looks like this:
MEMORY: [random][random][random][\0][random][random][random]....
Maybe in your mind it does, in reality it will look like this:
MEMORY: [random][random][random][random][random][random][random]....
In fact, as the comments say, the characters are not random but indeterminate. Most likely they will be the remnants of previous stack frames that occupied but you don't know.
When you allocate a char array on the stack, no nul bytes are put in. It just increments the stack pointer by 4 and that's it.
Edit
Sorry, I leaped in without reading the whole question.
strncpy(myString, "hi", 2);
The above line copies an h and then an i and then stops because it has copied two chars. If it were sensible, it would just copy the h and then a \0 but it isn't.
strncat is a bizarre function that should probably be consigned to the fiery pits of hell. It goes along to the end of the first string and then adds up to n characters from the second string and a terminating \0. The n has no relevance to the size of the buffer to which you are copying and which you can therefor overrun.
strncat(myString, "h123", 2);
There's no guarantee that your first string has a \0 anywhere (as already discussed), so it will copy the h and the 1 to an indeterminate memory location.

Strings behvior on C

I want to understand a number of things about the strings on C:
I could not understand why you can not change the string in a normal assignment. (But only through the functions of string.h), for example: I can't do d="aa" (d is a pointer of char or a array of char).
Can someone explain to me what's going on behind the scenes - the compiler gives to run such thing and you receive segmentation fault error.
Something else, I run a program in C that contains the following lines:
char c='a',*pc=&c;
printf("Enter a string:");
scanf("%s",pc);
printf("your first char is: %c",c);
printf("your string is: %s",pc);
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
If I put two letters, the first letter printed right! And the string is printed with a lot of profits (incorrect)
If I put a letter, the letter is printed right! And the string is printed with a lot of profits and at the end something weird (a square with four numbers containing zeros and ones)
Can anyone explain what is happening behind?
Please note: I do not want the program to work, I did not ask the question to get suggestions for another program, I just want to understand what happens behind the scenes in these situations.
Strings almost do not exist in C (except as C string literals like "abc" in some C source file).
In fact, strings are mostly a convention: a C string is an array of char whose last element is the zero char '\0'.
So declaring
const char s[] = "abc";
is exactly the same as
const char s[] = {'a','b','c','\0'};
in particular, sizeof(s) is 4 (3+1) in both cases (and so is sizeof("abc")).
The standard C library contains a lot of functions (such as strlen(3) or strncpy(3)...) which obey and/or presuppose the convention that strings are zero-terminated arrays of char-s.
Better code would be:
char buf[16]="a",*pc= buf;
printf("Enter a string:"); fflush(NULL);
scanf("%15s",pc);
printf("your first char is: %c",buf[0]);
printf("your string is: %s",pc);
Some comments: be afraid of buffer overflow. When reading a string, always give a bound to the read string, or else use a function like getline(3) which dynamically allocates the string in the heap. Beware of memory leaks (use a tool like valgrind ...)
When computing a string, be also aware of the maximum size. See snprintf(3) (avoid sprintf).
Often, you adopt the convention that a string is returned and dynamically allocated in the heap. You may want to use strdup(3) or asprintf(3) if your system provides it. But you should adopt the convention that the calling function (or something else, but well defined in your head) is free(3)-ing the string.
Your program can be semantically wrong and by bad luck happening to sometimes work. Read carefully about undefined behavior. Avoid it absolutely (your points 1,2,3 are probable UB). Sadly, an UB may happen to sometimes "work".
To explain some actual undefined behavior, you have to take into account your particular implementation: the compiler, the flags -notably optimization flags- passed to the compiler, the operating system, the kernel, the processor, the phase of the moon, etc etc... Undefined behavior is often non reproducible (e.g. because of ASLR etc...), read about heisenbugs. To explain the behavior of points 1,2,3 you need to dive into implementation details; look into the assembler code (gcc -S -fverbose-asm) produced by the compiler.
I suggest you to compile your code with all warnings and debugging info (e.g. using gcc -Wall -g with GCC ...), to improve the code till you got no warning, and to learn how to use the debugger (e.g. gdb) to run your code step by step.
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
Because memory is allocated for only one byte.
See char c and assigned with "a". Which is equal to 'a' and '\0' is written in one byte memory location.
If scanf() uses this memory for reading more than one byte, then this is simply undefined behavior.
char c="a"; is a wrong declaration in c language since even a single character is enclosed within a pair of double quotes("") will treated as string in C because it is treated as "a\0" since all strings ends with a '\0' null character.
char c="a"; is wrong where as char c='c'; is correct.
Also note that the memory allocated for char is only 1byte, so it can hold only one character, memory allocation details for datatypes are described bellow

Why I encounter a NULL terminating character at start of a string when I go backwards through it?

I found the following piece of code embedded in a C++ project. The code goes backwards through a C-style string. When I saw this I thought this should result in undefined behaviour. But it seems to work perfectly:
const char * hello = "Hello World.";
const char * helloPointPos = strchr(hello, '.');
for (const char * curchar = helloPointPos; *curchar; curchar--) {
printf("%s", curchar);
}
What I was wondering about is the part with *curchar; curchar--. This assumes that the string begins with a \0. Is this a legal assumption? Does this piece of code result in undefined behaviour? If not, why not?
I would appreciate if you could put some light on this. BTW platform is Windows and Compiler is VC++ 2010.
EDIT : Thank you all for your participation. Both answers are very good and helped me. But since I can only accept one answer I will go for paxdiablo's answer since it has more detail. Thank you!
No, it's very much not a requirement that the character before a string be \0, so that code does not have defined behaviour.
In fact, it's doubly undefined since you're not permitted to derefernce a pointer that's not within the array or one byte beyond the end. Since this is dereferencing one byte before the array, it's invalid in that sense as well.
It may work in some situations(a) but it's by no means good code.
In any case, the printing of the string rather than the character is going to give you strange results:
.d.ld.rld.orld.World. World.
and so on.
A better reverse iterator would be something like:
char *curchar = &(hello[strlen (hello)]); // one byte beyond
while (curchar-- != hello) // check if reached start, post-decr
putchar (*curchar); // just the character, thanks.
(a) In fact, it's often one of the most annoying things about undefined behaviour is that it sometimes does work, lulling you into a false sense of security.
I've often thought that all coders should have electrical wires hooked up to their most private parts so that undefined behaviour could deliver a short sharp shock - I suspect there would be a lot less undefined behaviour (or far fewer developers) after a while :-)
It's certainly not defined behavior, but in this case it isn't surprising that it works.
const char * hello = "Hello World."; puts the string Hello World. in a section with all other string literals. So very likely, there's a string literal before it, and it ends with \0, so there's \0' before Hello World., and the code works.
Obviously you can't rely on it - you're string might be the first in the section, or some non-string constant may be in there. Also, if the string is allocated any other way, chances to get \0 before it are lower.

C String Null Zero?

I have a basic C programming question, here is the situation. If I am creating a character array and if I wanted to treat that array as a string using the %s conversion code do I have to include a null zero. Example:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
The console output for this is:
abcdef
Notice that there is not a null zero as the last element in the array, yet I am still printing this as a string.
I am new to programming...So I am reading a beginners C book, which states that since I am not using a null zero in the last element I cannot treat it as a string.
This is the same output as above, although I include the null zero.
char name[7] = {'a','b','c','d','e','f','\0'};
printf("%s",name);
You're just being lucky; probably after the end of that array, on the stack, there's a zero, so printf stops reading just after the last character. If your program is very short and that zone of stack is still "unexplored" - i.e. the stack hasn't grown yet up to that point - it's very easy that it's zero, since generally modern OSes give initially zeroed pages to the applications.
More formally: by not having explicitly the NUL terminator, you're going in the land of undefined behavior, which means that anything can happen; such anything may also be that your program works fine, but it's just luck - and it's the worst type of bug, since, if it generally works fine, it's difficult to spot.
TL;DR version: don't do that. Stick to what is guaranteed to work and you won't introduce sneaky bugs in your application.
The output of your fist printf is not predictable specifically because you failed to include the terminating zero character. If it appears to work in your experiment, it is only because by a random chance the next byte in memory happened to be zero and worked as a zero terminator. The chances of this happening depend greatly on where you declare your name array (it is not clear form your example). For a static array the chances might be pretty high, while for a local (automatic) array you'll run into various garbage being printed pretty often.
You must include the null character at the end.
It worked without error because of luck, and luck alone. Try this:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
printf("%d",name[6]);
You'll most probably see that you can read that memory, and that there's a zero in it. But it's sheer luck.
What most likely happened is that there happened to be the value of 0 at memory location name + 6. This is not defined behavior though, you could get different output on a different system or with a different compiler.
Yes. You do. There are a few other ways to do it.
This form of initialization, puts the NUL character in for you automatically.
char name[7] = "abcdef";
printf("%s",name);
Note that I added 1 to the array size, to make space for that NUL.
One can also get away with omitting the size, and letting the compiler figure it out.
char name[] = "abcdef";
printf("%s",name);
Another method is to specify it with a pointer to a char.
char *name = "abcdef";
printf("%s",name);

Resources