Scanf read string over size of predefined char array - c

I'm trying to read a string via scanf as follows:
char input[8];
scanf("%s",input);
It turns out that the program could read more than 8 characters. Say I inputed 123456789012345 and strlen(input) returns 15.
However when I set input as:
char input[4];
scanf("%s",input);
Inputing "12345" will cause '16146 segmentation fault'.
Anyone knows how this happens?

Technically both cases invoke undefined behavior. That the first case happens to work on your system should not be taken to mean that your program is well-defined. Testing can only indicate the presence of bugs, not their absence.
Since you're still learning C I will take the opportunity to offer you advice for reading input from stdin: always limit the length of input that will be read to the length of the buffer it's being read in to, reserving one spot at the end for the null-terminator.
If you want to use scanf to read strings from stdin, then it is safer to prefix the string format specifier with the maximum length of the string than to use a raw "%s". For example, if I had a char buffer[20]; that was the destination of a call to scanf, I would use the format string "%19s".

Both are so called undefined behavior and should be avoided at all costs. No bugs are so tricky to find as those caused by this.
So why does this work? Well, that's the problem with undefined behavior. It may work. You have no guarantees at all.
Read more about UB here: Undefined, unspecified and implementation-defined behavior

Related

Why can i not use %s instead of %c?

The whole function the question is about is about giving a two dimensional array initialized with {0} as output and making a user able to move a 1 over the field with
char wasd;
scanf("%c", &wasd);
(the function to move by changing the value of the variable wasd is not important i think)
now my question is why using
scanf("%s", &wasd);
does only work partly(sometimes the 1 keeps being at a field and appears a 2nd time at the new place though it actually should be deleted)
and
scanf("%.1s", &wasd);
leads to the field being printed out without stop until closing the execution program. I came up with using %.1s after researching the difference between %c and %s here Why does C's printf format string have both %c and %s?? If one can figure out the answer by reading through that, i am not clever or far enough with c learning to get it.
I also found this fscanf() in C - difference between %s and %c but i do not know anything about EOF which one answer says is the cause of the problem so i would prefer getting an answer without it.
Thank you for an answer
Simple as that, %s is the conversion for a (non-empty) string. A string in C always ends with a 0 byte, so any non-empty string needs at least two bytes. If you pass a pointer to a single char variable, scanf() will just overwrite whatever is in memory after that variable -- you cause undefined behavior and anything can happen.
Side note, scanf("%s", ..), even if you give it an array of char, will always overflow the buffer if something longer is entered, therefore causing undefined behavior. You have to include a field width like
char str[10];
scanf("%9s", str);
Best is not to use scanf() at all. For your single character input, you can just use getchar() (be aware it returns an int). You might also want to read my beginners' guide away from scanf.
A char variable can hold only one byte of memory to hold a single character. But a string (array of characters) is different from a char variable as it is always ended with a null character \0 or numeric 0. So in scanf you specifically mentioned whether you are reading a character or a string so that scanf can add a null character at the end of a string. So you are not suppose to use a %s to read a value for a char variable

inputting a character string using scanf()

I started learning about inputting character strings in C. In the following source code I get a character array of length 5.
#include<stdio.h>
int main(void)
{
char s1[5];
printf("enter text:\n");
scanf("%s",s1);
printf("\n%s\n",s1);
return 0;
}
when the input is:
1234567891234567, and I've checked it's working fine up to 16 elements(which I don't understand because it is more than 5 elements).
12345678912345678, it's giving me an error segmentation fault: 11 (I gave 17 elements in this case)
123456789123456789, the error is Illegal instruction: 4 (I gave 18 elements in this case)
I don't understand why there are different errors. Is this the behavior of scanf() or character arrays in C?. The book that I am reading didn't have a clear explanation about these things. FYI I don't know anything about pointers. Any further explanation about this would be really helpful.
Is this the behavior of scanf() or character arrays in C?
TL;DR - No, you're facing the side-effects of undefined behavior.
To elaborate, in your case, against a code like
scanf("%s",s1);
where you have defined
char s1[5];
inputting anything more than 4 char will cause your program to venture into invalid memory area (past the allocated memory) which in turn invokes undefined behavior.
Once you hit UB, the behavior of the program cannot be predicted or justified in any way. It can do absolutely anything possible (or even impossible).
There is nothing inherent in the scanf() which stops you from reading overly long input and overrun the buffer, you should keep control on the input string scanning by using the field width, like
scanf("%4s",s1); //1 saved for terminating null
The scanf function when reading strings read up to the next white-space (e.g. newline, space, tab etc.), or the "end of file". It has no idea about the size of the buffer you provide it.
If the string you read is longer than the buffer provided, then it will write out of bounds, and you will have undefined behavior.
The simplest way to stop this is to provide a field length to the scanf format, as in
char s1[5];
scanf("%4s",s1);
Note that I use 4 as field length, as there needs to be space for the string terminator as well.
You can also use the "secure" scanf_s for which you need to provide the buffer size as an argument:
char s1[5];
scanf_s("%s", s1, sizeof(s1));

Why output length is coming 6?

I have written a simple program to calculate length of string in this way.
I know that there are other ways too. But I just want to know why this program is giving this output.
#include <stdio.h>
int main()
{
char str[1];
printf( "%d", printf("%s", gets(str)));
return 0;
}
OUTPUT :
(null)6
Unless you always pass empty strings from the standard input, you are invoking undefined behavior, so the output could be pretty much anything, and it could crash as well. str cannot be a well-formed C string of more than zero characters.
char str[1] allocates storage room for one single character, but that character needs to be the NUL character to satisfy C string constraints. You need to create a character array large enough to hold the string that you're writing with gets.
"(null)6" as the output could mean that gets returned NULL because it failed for some reason or that the stack was corrupted in such a way that the return value was overwritten with zeroes (per the undefined behavior explanation). 6 following "(null)" is expected, as the return value of printf is the number of characters that were printed, and "(null)" is six characters long.
There's several issues with your program.
First off, you're defining a char buffer way too short, a 1 char buffer for a string can only hold one string, the empty one. This is because you need a null at the end of the string to terminate it.
Next, you're using the gets function which is very unsafe, (as your compiler almost certainly warned you about), as it just blindly takes input and copies it into a buffer. As your buffer is 0+terminator characters long, you're going to be automatically overwriting the end of your string into other areas of memory which could and probably does contain important information, such as your rsp (your return pointer). This is the classic method of smashing the stack.
Third, you're passing the output of a printf function to another printf. printf isn't designed for formating strings and returning strings, there are other functions for that. Generally the one you will want to use is sprintf and pass it in a string.
Please read the documentation on this sort of thing, and if you're unsure about any specific thing read up on it before just trying to program it in. You seem confused on the basic usage of many important C functions.
It invokes undefined behavior. In this case you may get any thing. At least str should be of 2 bytes if you are not passing a empty string.
When you declare a variable some space is reserved to store the value.
The reserved space can be a space that was previously used by some other
code and has values. When the variable goes out of scope or is freed
the value is not erased (or it may be, anything goes.) only the programs access
to that variable is revoked.
When you read from an unitialised location you can get anything.
This is undefined behaviour and you are doing that,
Output on gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 is 0
For above program your input is "(null)", So you are getting "(null)6". Here "6" is the output from printf (number of characters successfully printed).

Problem using Scanf?

Why does scanf give a max value in case of "int" but crash the program in case of "char" when the limit is exceeded?
#include<stdio.h>
main(){
int a;
char ch[10];
scanf("%d",&a);
printf("%d",a);
scanf("%s",ch);
printf("%s",ch);
}
It crashes your program in this case because scanf() has an inherently unsafe interface for strings. It has no way of knowing that your parameter ch is an array large enough to hold a 9-character string plus its terminating nul. As a result, it is perfectly happy to keep reading characters from stdin and storing them in memory well past the end of the array.
If your program crashes, you are actually lucky. In the worst case, the attacker has used a carefully crafted string to manipulate the content of the stack in such a way that he has gained control of your computer. This is an example of a buffer overflow attack. It sounds unlikely, but it has been documented to occur on a large number of occasions.
Used for only numbers, scanf is generally safe enough, but it is not very good at handling errors in the input. As a result, it is usually a good idea to use something like fgets() to read the input (it has a buffer length parameter to control overflow) and sscanf() to parse from that buffer, testing its return values for sanity as you go.
Edit: As the comment from R points out, I overstated the dangers inherent to the scanf interface. With care to correctly use the field width specifier on all strings, scanf becomes safer. But then you take responsibility for guaranteeing that the specified width does fit within the buffer. For the example, you should write scanf("%9s",ch); because your buffer was declared to be ten bytes long and you need room for the terminating nul.
Note that you should also be testing the return value from scanf(). It returns the number of fields it successfully matched, or EOF if an I/O error occurred. It might return 0 if the user entered "abc" when it expected a number, for instance.
Because you're not reading a character, you're reading a string. And scanf does not "know" that you only have space for 10 characters (including the null). This is the joy of C programming.
You can protect yourself in this case by adding a width modifier:
%9s

difference betweent printf and gets

I am beginner for programming.I referred books of C programming,but i am confused.
1.) What's the difference betweent printf and gets?
I believe gets is simpler and doesn't have any formats?
printf
The printf function writes a formatted string to the standard output. A formatted string is the result of replacing placeholders with their values. This sounds a little complicated but it will become very clear with an example:
printf("Hello, my name is %s and I am %d years old.", "Andreas", 22);
Here %s and %d are the placeholders, that are substituted with the first and second argument. You should read on the man page (linked above) the list of placeholders and their options, but the ones you'll run into most often are %d (a number) and %s (a string).
Making sure that the placeholder arguments match their type is extremely important. For example, the following code will result in undefined behavior (meaning that anything can happen: the program may crash, it may work, it may corrupt data, etc):
printf("Hello, I'm %s years old.", 22);
Unfortunately in C there is no way to avoid these relatively common mistakes.
gets
The gets function is used for a completely different purpose: it reads a string from the standard input.
For example:
char name[512];
printf("What's your name? ");
gets(name);
This simple program will ask the user for a name and save what he or she types into name.
However, gets() should NEVER be used. It will open your application and the system it runs on to security vulnerabilities.
Quoting from the man page:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many characters
gets() will read, and because gets()
will continue to store characters past
the end of the buffer, it is extremely
dangerous to use. It has been used to
break computer security. Use fgets()
instead.
Explained in a more simple way the problem is that if the variable you give gets (name in this case) is not big enough to hold what the user types a buffer overflow will occur, which is, gets will write past the end of the variable. This is undefined behavior and on some systems it will allow execution of arbitrary code by the attacker.
Since the variable must have a finite, static size and you can't set a limit of the amount of characters the user can type as the input, gets() is never secure and should never be used. It exists only for historical reasons.
As the manual suggested, you should use fgets instead. It has the same purpose as gets but has a size argument that specifies the size of the variable:
char *fgets(char *s, int size, FILE *stream);
So, the program above would become:
char name[512];
printf("What's your name? ");
fgets(name, sizeof(name) /* 512 */, stdin /* The standard input */);
They fundamentally perform different tasks.
printf: prints out text to a console.
gets: reads in input from the keyboard.
printf: allowing you to format a string from components (ie. taking results from variables), and when output to stdout, it does not append new line character. You have to do this by inserting '\n' in the format string.
puts: only output a string to stdout, but does append new line afterward.
scanf: scan the input fields, one character at a time, and convert them according to the given format.
gets: simply read a string from stdin, with no format consideration, the return character is replaced by string terminator '\0'.
http://en.wikipedia.org/wiki/Printf
http://en.wikipedia.org/wiki/Gets

Resources