I am using gcc on Linux and the below code compiles successfully but not printing the values of variable i correctly, if a character is entered once at a time i jumps or reduces to 0. I Know I am using %d for a char at scanf(I was trying to erase the stack). Is this a case of attempt to erase stack or something else ?( I thought if the stack was erased the program would crash).
#include <stdio.h>
int main()
{
int i;
char c;
for (i=0; i<5; i++) {
scanf ("%d", &c);
printf ("%d ", i);
}
return 0;
}
Besides the arguments to main, you have an int and a char on the stack.
Lets assume sizeof(int) == 4 and only have a look at i and c.
( int i )(char c )
[ 0 ][ 1 ][ 2 ][ 3 (&i)][ 4 (&c)]
So this is actually your stack layout without argc and *argv.
With i consuming four times more memory than c in this case.
The stack grows in the opposite direction, so if you write something bigger than a char to c, it will write to [4] and further to the left, never to the right. So [5] will never get written to. Instead you overwrite [3].
For the case where you write an int to c and int is four times bigger than c, you'll actually write to [1][2][3][4], just [0] will not be overwritten, but 3/4 of the memory for the int will be corrupted.
On a big-endian system, the most significant byte of i will be stored in [3] and therefore get overwritten by this operation. On a little-endian system, the most significant byte is stored in [0] and would be preserved. Nonetheless, you corrupt your stack this way.
As ams mentions this is not always true. There could be different alignments for efficiency or because the platform only supports aligned access, leaving gaps between variables. Also a compiler is allowed to do any optimizations as long as it has no visible side-effects as stated by the as-if rule. In this case the variables could perfectly be stored in a register and never be saved on the stack at all. But a lot of other compiler optimizations and platform dependencies can make this way more complex.
So this is only the simplest case without taking platform dependencies and compiler optimizations into account and also seems to be what happens in your special case with maybe some minor differences.
With your scanf(), you are inserting a int inside a char. One char is usually stored using just 1 byte, so your compiler will probably overflow, but it could or not overwrite other values, depending on the alignment of the variables.
If your compiler reserves 1 byte for the char, and the memory address of the int is just after the address of the char (that will probably be the case), then your scanf() will just overwrite the first bytes of i. If you are in a little-endian machine and you enter values smaller than 256, then i will always be 0.
But it can grow larger if you enter a bigger value. Try entering 256; i will become 1. With 512, i will become 2, and so one.
But you are not "erasing the stack", just overwriting some bytes (in fact, you are overwriting sizeof(int) bytes; one of them correspond to the char and the others will probably be all the bytes in your int but one).
If you really want to "erase the stack", you could do something like:
#include <string.h>
int
main(void) {
char c;
memset(&c, 0, 10000);
return 0;
}
#foobar has given a very nice description of what one compiler on one architecture happens to do. He(?) also gives an answer to what a hypothetical big-endian compiler/system might do.
To be fair, that's probably the most useful answer, but it's still not correct.
There is no rule that says the stack must grow in one way or the other (although, much like whether to drive on the left or the right, a choice must be made, and a descending stack is most common).
There is no rule that says the compiler must layout the variables on the stack in any particular way. The C standard doesn't care. Not even the official architecture/OS-specific ABI cares a jot about that, as long as the bits necessary for unwinding work. In this case, the compiler could even choose a different scheme for every function in the system (not that it's likely).
All that is certain is that scanf will try to write something int-sized to a char, and that this is undefined. In practice there are several possible outcomes:
It works fine, nothing extra is overwritten, and you got lucky. (Perhaps int is the same size as char, or there was padding in the stack.)
It overwrites the data following the char in memory, whatever that is.
It overwrites the data just before and/or after the char. (This might happen on an architecture where an aligned store instruction disregards the bottom bits of the write address.)
It crashes with an unaligned access exception.
It detects the stack scribble, prints a message, and reports the incident.
Of course, none of this will happen because you compile with -Wall -Wextra -Werror enabled, and GCC tells you that your scanf format doesn't match your variable types.
As Kerrek SB commented, the behavior is undefined.
As you know, you pass a char * to the scanf function, but tells the function to treat it like an int *.
It might (although very unlikely to) overwrite something else on the stack, for example, i, the previous stack pointer or the return address.
It might just override unused bytes, for example if the compiler uses padding to align the stack.
It might cause a crash, for example if the address of c is not 4- or 8-byte aligned and the platform requires de-reference of int to be 4- or 8-byte aligned.
And it might do anything else.
But the answer is still - anything is possible in such case. The behavior is simply not defined.
Related
I have some trouble with strncat().The book called Pointers On C says the function ,strncat(),always add a NUL in the end of the character string.To better understand it ,I do an experiment.
#include<stdio.h>
#include<string.h>
int main(void)
{
char a[14]="mynameiszhm";
strncat(a,"hello",3);
printf("%s",a);
return 0;
}
The result is mynameiszhmhel
In this case the array has 14 char memory.And there were originally 11 characters in the array except for NUL.Thus when I add three more characters,all 14 characters fill up the memory of array.So when the function want to add a NUL,the NUL takes up memory outside the array.This cause the array to go out of bounds but the program above can run without any warning.Why?Will this causes something unexpected?
So when we use the strncat ,should we consider the NUL,in case causes the array go out of bound?
And I also notice the function strncpy don't add NUL.Why this two string function do different things about the same thing?And why the designer of C do this design?
This cause the array to go out of bounds but the program above can run without any warning. Why?
Maybe. With strncat(a,"hello",3);, code attempted to write beyond the 14 of a[]. It might go out of bounds, it might not. It is undefined behavior (UB). Anything is allowed.
Will this causes something unexpected?
Maybe, the behavior is not defined. It might work just as you expect - whatever that is.
So when we use thestrncat ,should we consider the NUL, in case causes the array go out of bound?
Yes, the size parameter needs to account for appending a null character, else UB.
I also notice the function strncpy don't add NUL. Why this two string function do different things about the same thing? And why the designer of C do this design?
The 2 functions strncpy()/strncat() simple share similar names, not highly similar paired functionality of strcpy()/strcat().
Consider that the early 1970s, memory was far more expensive and many considerations can be traced back to a byte of memory more that an hour's wage. Uniformity of functionality/names was of lesser importance.
And there were originally 11 characters in the array except for NUL.
More like "And there were originally 11 characters in the array except for 3 NUL.". This is no partial initialization in C.
This is not really an answer, but a counterexample.
Observe the following modification to your program:
#include<stdio.h>
#include<string.h>
int main(void)
{
char p[]="***";
char a[14]="mynameiszhm";
char q[]="***";
strncat(a,"hello",3);
printf("%s%s%s", p, a, q);
return 0;
}
The results of this program are dependent on where p and q are located in memory, compared to a. If they are not adjacent, the results are not so clear but if either p or q immediately comes after a, then your strncat will overwrite the first * causing one of them not to be printed anymore because that will now be a string of length 0.
So the results are dependent on memory layout, and it should be clear that the compiler can put the variables in memory in any order it likes. And they can be adjacent or not.
So the problem is that you are not keeping to your promise not to put more than 14 bytes into a. The compiler did what you asked, and the C standards guarantee behaviour as long as you keep to the promises.
And now you have a program that may or may not do what you wanted it to do.
While writing some C code, I came across a little problem where I had to convert a character into a "string" (some memory chunk the beginning of which is given by a char* pointer).
The idea is that if some sourcestr pointer is set (not NULL), then I should use it as my "final string", otherwise I should convert a given charcode into the first character of another array, and use it instead.
For the purposes of this question, we'll assume that the types of the variables cannot be changed beforehand. In other words, I can't just store my charcode as a const char* instead of an int.
Because I tend to be lazy, I thought to myself : "hey, couldn't I just use the character's address and treat that pointer as a string?". Here's a little snippet of what I wrote (don't smash my head against the wall just yet!) :
int charcode = FOO; /* Assume this is always valid ASCII. */
char* sourcestr = "BAR"; /* Case #1 */
char* sourcestr = NULL; /* Case #2 */
char* finalstr = sourcestr ? sourcestr : (char*)&charcode;
Now of course I tried it, and as I expected, it does work. Even with a few warning flags, the compiler is still happy. However, I have this weird feeling that this is actually undefined behaviour, and that I just shouldn't be doing it.
The reason why I think this way is because char* arrays need to be null-terminated in order to be printed properly as strings (and I want mine to be!). Yet, I have no certainty that the value at &charcode + 1 will be zero, hence I might end up with some buffer overflow madness.
Is there an actual reason why it does work properly, or have I just been lucky to get zeroes in the right places when I tried?
(Note that I'm not looking for other ways to achieve the conversion. I could simply use a char tmp[2] = {0} variable, and put my character at index 0. I could also use something like sprintf or snprintf, provided I'm careful enough with buffer overflows. There's a myriad of ways to do this, I'm just interested in the behaviour of this particular cast operation.)
Edit: I've seen a few people call this hackery, and let's be clear: I completely agree with you. I'm not enough of a masochist to actual do this in released code. This is just me getting curious ;)
Your code is well-defined as you can always cast to char*. But some issues:
Note that "BAR" is a const char* literal - so don't attempt to modify the contents. That would be undefined.
Don't attempt to use (char*)&charcode as a parameter to any of the string functions in the C standard library. It will not be null-terminated. So in that sense, you cannot treat it as a string.
Pointer arithmetic on (char*)&charcode will be valid up to and including one past the scalar charcode. But don't attempt to dereference any pointer beyond charcode itself. The range of n for which the expression (char*)&charcode + n is valid depends on sizeof(int).
The cast and assignment, char* finalstr = (char*)&charcode; is defined.
Printing finalstr with printf as a string, %s, if it points to charcode is undefined behavior.
Rather than resorting to hackery and hiding string in a type int, convert the values stored in the integer to a string using a chosen conversion function. One possible example is:
char str[32] = { 0 };
snprintf( str , 32 , "%d" , charcode );
char* finalstr = sourcestr ? sourcestr : str;
or use whatever other (defined!) conversion you like.
Like other said it happens to work because the internal representation of an int on your machine is little endian and your char is smaller than an int. Also the ascii value of your character is either below 128 or you have unsigned chars (otherwise there would be sign extension). This means that the value of the character is in the lower byte(s) of the representation of the int and the rest of the int will be all zeroes (assuming any normal representation of an int). You're not "lucky", you have a pretty normal machine.
It is also completely undefined behavior to give that char pointer to any function that expects a string. You might get away with it now but the compiler is free to optimize that to something completely different.
For example if you do a printf just after that assignment, the compiler is free to assume that you'll always pass a valid string to printf which means that the check for sourcestr being NULL is unnecessary because if sourcestr was NULL printf would be called with something that isn't a string and the compiler is free to assume that undefined behavior never happens. Which means that any check of sourcestr being NULL before or after that assignment are unnecessary because the compiler already knows it isn't NULL. This assumption is allowed to spread to everywhere in your code.
This was rarely a thing to worry about and you could get away with tricks uglier than this until a decade ago or so when compiler writers started an arms race about how much they can follow the C standard to the letter to get away with more and more brutal optimizations. Today compilers are getting more and more aggressive and while the optimization I speculated about probably doesn't exist yet, if a compiler person sees this, they'll probably implement it just because they can.
This is absolutely undefined behavior for the following reasons:
Less probable, but to consider when strictly referencing to the standards: you can't assume the sizeof int on the machine/system where code will be compiled
As above you can't assume the codeset. E.g. what happen on an EBCDIC machine/system?
Easy to say that your machine has a little endian processor. On big endian machines the code fails due to big-endian memory layout.
Because on many systems char is a signed integer, as is int, when your char is a negative value (i.e. char>127 on machines having 8bits char), it could fail due to sign extension if you assign the value as in the code below
code:
char ch = FOO;
int charcode = ch;
P.S. About the point 3: your string will be indeed NULL terminated in a little endian machine having sizeof(int)>sizeof(char) and char having a positive value, because the MSB of int will be 0 and the memory layout for such endianess is LSB-MSB (LSB first).
Here is the source code for the program
#include <stdio.h>
% filename: test.c
int main(){
int local = 0;
char buff[7];
printf("Password: ");
gets(buff);
if (local)
printf("Buff: %s, local:%d\n", buff, local);
return 0;
}
I using "gcc test.c -fno-stack-protector -o test" to compile this file, and when I run the file, in order to make the gets overflow, I need to enter at least 13 characters. The way I think is since I only declare 7 bytes to the buff, which means when user enter at least 8 characters, overflow happened. But seems like not this case, why?
The stack layout is implementation dependent.
There is no guarantee that local object will be after or before buff object or that both will be contiguous.
In your case, what is likely is 5-bytes (1 + 4) of padding were inserted after buff object. Such padding is very common and is usually inserted by the compiler for performance reasons. The size of the padding can vary between compilers, compiler versions, compiler options or even from different source codes.
To have a better idea of the layout just print the addresses of both buff and local objects:
printf("%p %p\n", (void *) buff, (void *) local);
gets(buff);
There is no default bound check in C. At the best program will crash, usually it will get unallocated memory, so it just works fine. Even sometime you overrun already used memory, and may not realize it till you get into problem.
There's no guarantee that the items on your stack will be allocated packed tightly together, in fact I'm pretty certain they're not even guaranteed to be put in the same order.
If you want to understand why it's not failing as you seem to expect, your best bet is to look at the assembler output, with something like gcc -S.
For what it's worth, gcc 4.8.3 under CygWin fails at eight characters as one might expect. It overflows at seven characters but, because the overflowing character is the NUL terminator, that still leaves local as zero.
Bottom line, undefined behaviour is exactly that. It's most annoying feature (to developers) is that it sometimes works, meaning that we sometimes miss it until the code gets out into production. If UB were somehow tied to electrodes connected to the private parts of developers, I guarantee there'd be far fewer bugs :-)
Here is my code :
#include<stdio.h>
#include <stdlib.h>
#define LEN 2
int main(void)
{
char num1[LEN],num2[LEN]; //works fine with
//char *num1= malloc(LEN), *num2= malloc(LEN);
int number1,number2;
int sum;
printf("first integer to add = ");
scanf("%s",num1);
printf("second integer to add = ");
scanf("%s",num2);
//adds integers
number1= atoi(num1);
number2= atoi(num2);
sum = number1 + number2;
//prints sum
printf("Sum of %d and %d = %d \n",number1, number2, sum);
return 0;
}
Here is the output :
first integer to add = 15
second integer to add = 12
Sum of 0 and 12 = 12
Why it is taking 0 instead of first variable 15 ?
Could not understand why this is happening.
It is working fine if I am using
char *num1= malloc(LEN), *num2= malloc(LEN);
instead of
char num1[LEN],num2[LEN];
But it should work fine with this.
Edited :
Yes, it worked for LEN 3 but why it showed this undefined behaviour. I mean not working with the normal arrays and working with malloc. Now I got that it should not work with malloc also. But why it worked for me, please be specific so that I can debug more accurately ?
Is there any issue with my system or compiler or IDE ?
Please explain a bit more as it will be helpful or provide any links to resources. Because I don't want to be unlucky anymore.
LEN is 2, which is enough to store both digits but not the required null terminating character. You are therefore overrunning the arrays (and the heap allocations, in that version of the code!) and this causes undefined behavior. The fact that one works and the other does not is simply a byproduct of how the undefined behavior plays out on your particular system; the malloc version could indeed crash on a different system or a different compiler.
Correct results, incorrect results, crashing, or something completely different are all possibilities when you invoke undefined behavior.
Change LEN to 3 and your example input would work fine.
I would suggest indicating the size of your buffers in your scanf() line to avoid the undefined behavior. You may get incorrect results, but your program at least would not crash or have a security vulnerability:
scanf("%2s", num1);
Note that the number you use there must be one less than the size of the array -- in this example it assumes an array of size 3 (so you read a maximum of 2 characters, because you need the last character for the null terminating character).
LEN is defined as 2. You left no room for a null terminator. In the array case you would overrun the array end and damage your stack. In the malloc case you would overrun your heap and potentially damage the malloc structures.
Both are undefined behaviour. You are unlucky that your code works at all: if you were "lucky", your program would decide to crash in every case just to show you that you were triggering undefined behaviour. Unfortunately that's not how undefined behaviour works, so as a C programmer, you just have to be defensive and avoid entering into undefined behaviour situations.
Why are you using strings, anyway? Just use scanf("%d", &number1) and you can avoid all of this.
Your program does not "work fine" (and should not "work fine") with either explicitly declared arrays or malloc-ed arrays. Strings like 15 and 12 require char buffers of size 3 at least. You provided buffers of size 2. Your program overruns the buffer boundary in both cases, thus causing undefined behavior. It is just that the consequences of that undefined behavior manifest themselves differently in different versions of the code.
The malloc version has a greater chance to produce illusion of "working" since sizes of dynamically allocated memory blocks are typically rounded to the nearest implementation-depended "round" boundary (like 8 or 16 bytes). That means that your malloc calls actually allocate more memory than you ask them to. This might temporarily hide the buffer overrun problems present in your code. This produces the illusion of your program "working fine".
Meanwhile, the version with explicit arrays uses local arrays. Local arrays often have precise size (as declared) and also have a greater chance to end up located next to each other in memory. This means that buffer overrun in one array can easily destroy the contents of the other array. This is exactly what happened in your case.
However, even in the malloc-based version I'd still expect a good debugging version of standard library implementation to catch the overrun problems. It is quite possible that if you attempt to actually free these malloc-ed memory blocks (something you apparently didn't bother to do), free will notice the problem and tell you that heap integrity has been violated at some point after malloc.
P.S. Don't use atoi to convert strings to integers. Function that converts strings to integers is called strtol.
I want to understand a number of things about the strings on C:
I could not understand why you can not change the string in a normal assignment. (But only through the functions of string.h), for example: I can't do d="aa" (d is a pointer of char or a array of char).
Can someone explain to me what's going on behind the scenes - the compiler gives to run such thing and you receive segmentation fault error.
Something else, I run a program in C that contains the following lines:
char c='a',*pc=&c;
printf("Enter a string:");
scanf("%s",pc);
printf("your first char is: %c",c);
printf("your string is: %s",pc);
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
If I put two letters, the first letter printed right! And the string is printed with a lot of profits (incorrect)
If I put a letter, the letter is printed right! And the string is printed with a lot of profits and at the end something weird (a square with four numbers containing zeros and ones)
Can anyone explain what is happening behind?
Please note: I do not want the program to work, I did not ask the question to get suggestions for another program, I just want to understand what happens behind the scenes in these situations.
Strings almost do not exist in C (except as C string literals like "abc" in some C source file).
In fact, strings are mostly a convention: a C string is an array of char whose last element is the zero char '\0'.
So declaring
const char s[] = "abc";
is exactly the same as
const char s[] = {'a','b','c','\0'};
in particular, sizeof(s) is 4 (3+1) in both cases (and so is sizeof("abc")).
The standard C library contains a lot of functions (such as strlen(3) or strncpy(3)...) which obey and/or presuppose the convention that strings are zero-terminated arrays of char-s.
Better code would be:
char buf[16]="a",*pc= buf;
printf("Enter a string:"); fflush(NULL);
scanf("%15s",pc);
printf("your first char is: %c",buf[0]);
printf("your string is: %s",pc);
Some comments: be afraid of buffer overflow. When reading a string, always give a bound to the read string, or else use a function like getline(3) which dynamically allocates the string in the heap. Beware of memory leaks (use a tool like valgrind ...)
When computing a string, be also aware of the maximum size. See snprintf(3) (avoid sprintf).
Often, you adopt the convention that a string is returned and dynamically allocated in the heap. You may want to use strdup(3) or asprintf(3) if your system provides it. But you should adopt the convention that the calling function (or something else, but well defined in your head) is free(3)-ing the string.
Your program can be semantically wrong and by bad luck happening to sometimes work. Read carefully about undefined behavior. Avoid it absolutely (your points 1,2,3 are probable UB). Sadly, an UB may happen to sometimes "work".
To explain some actual undefined behavior, you have to take into account your particular implementation: the compiler, the flags -notably optimization flags- passed to the compiler, the operating system, the kernel, the processor, the phase of the moon, etc etc... Undefined behavior is often non reproducible (e.g. because of ASLR etc...), read about heisenbugs. To explain the behavior of points 1,2,3 you need to dive into implementation details; look into the assembler code (gcc -S -fverbose-asm) produced by the compiler.
I suggest you to compile your code with all warnings and debugging info (e.g. using gcc -Wall -g with GCC ...), to improve the code till you got no warning, and to learn how to use the debugger (e.g. gdb) to run your code step by step.
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
Because memory is allocated for only one byte.
See char c and assigned with "a". Which is equal to 'a' and '\0' is written in one byte memory location.
If scanf() uses this memory for reading more than one byte, then this is simply undefined behavior.
char c="a"; is a wrong declaration in c language since even a single character is enclosed within a pair of double quotes("") will treated as string in C because it is treated as "a\0" since all strings ends with a '\0' null character.
char c="a"; is wrong where as char c='c'; is correct.
Also note that the memory allocated for char is only 1byte, so it can hold only one character, memory allocation details for datatypes are described bellow