I started learning about inputting character strings in C. In the following source code I get a character array of length 5.
#include<stdio.h>
int main(void)
{
char s1[5];
printf("enter text:\n");
scanf("%s",s1);
printf("\n%s\n",s1);
return 0;
}
when the input is:
1234567891234567, and I've checked it's working fine up to 16 elements(which I don't understand because it is more than 5 elements).
12345678912345678, it's giving me an error segmentation fault: 11 (I gave 17 elements in this case)
123456789123456789, the error is Illegal instruction: 4 (I gave 18 elements in this case)
I don't understand why there are different errors. Is this the behavior of scanf() or character arrays in C?. The book that I am reading didn't have a clear explanation about these things. FYI I don't know anything about pointers. Any further explanation about this would be really helpful.
Is this the behavior of scanf() or character arrays in C?
TL;DR - No, you're facing the side-effects of undefined behavior.
To elaborate, in your case, against a code like
scanf("%s",s1);
where you have defined
char s1[5];
inputting anything more than 4 char will cause your program to venture into invalid memory area (past the allocated memory) which in turn invokes undefined behavior.
Once you hit UB, the behavior of the program cannot be predicted or justified in any way. It can do absolutely anything possible (or even impossible).
There is nothing inherent in the scanf() which stops you from reading overly long input and overrun the buffer, you should keep control on the input string scanning by using the field width, like
scanf("%4s",s1); //1 saved for terminating null
The scanf function when reading strings read up to the next white-space (e.g. newline, space, tab etc.), or the "end of file". It has no idea about the size of the buffer you provide it.
If the string you read is longer than the buffer provided, then it will write out of bounds, and you will have undefined behavior.
The simplest way to stop this is to provide a field length to the scanf format, as in
char s1[5];
scanf("%4s",s1);
Note that I use 4 as field length, as there needs to be space for the string terminator as well.
You can also use the "secure" scanf_s for which you need to provide the buffer size as an argument:
char s1[5];
scanf_s("%s", s1, sizeof(s1));
Related
I understand that assigning memory allocation for string requires n+1 due to the NULL character. However, the question is what if you allocate 10 chars but enter an 11 char string?
#include <stdlib.h>
int main(){
int n;
char *str;
printf("How long is your string? ");
scanf("%d", &n);
str = malloc(n+1);
if (str == NULL) printf("Uh oh.\n");
scanf("%s", str);
printf("Your string is: %s\n", str);
}
I tried running the program but the result is still the same as n+1.
If you allocated a char* of 10 characters but wrote 11 characters to it, you're writing to memory you haven't allocated. This has undefined behavior - it may happen to work, it may crash with a segmentation fault, and it may do something completely different. In short - don't rely on it.
If you overrun an area of memory given you by malloc, you corrupt the RAM heap. If you're lucky your program will crash right away, or when you free the memory, or when your program uses the chunk of memory right after the area you overran. When your program crashes you'll notice the bug and have a chance to fix it.
If you're unlucky your code goes into production, and some cybercriminal figures out how to exploit your overrun memory to trick your program into running some malicious code or using some malicious data they fed you. If you're really unlucky, you get featured in Krebs On Security or some other information security news outlet.
Don't do this. If you're not confident of your ability to avoid doing it, don't use C. Instead use a language with a native string data type. Seriously.
what if you allocate 10 chars but enter an 11 char string?
scanf("%s", str); experiences undefined behavior (UB). Anything may happen including "I tried running the program but the result is still the same as n+1." will appear OK.
Instead always use a width with scanf() and "%s" to stop reading once str[] is full. Example:
char str[10+1];
scanf("%10s", str);
Since n is variable here, consider instead using fgets() to read a line of input.
Note that fgets() also reads and saves a trailing '\n'.
Better to use fgets() for user input and drop scanf() call altogether until you understand why scanf() is bad.
str = malloc(n+1);
if (str == NULL) printf("Uh oh.\n");
if (fgets(str, n+1, stdin)) {
str[strcspn(str, "\n")] = 0; // Lop off potential trailing \n
When you write 11 bytes to a 10-byte buffer, the last byte will be out-of-bounds. Depending on several factors, the program may crash, have unexpected and weird behavior, or may run just fine (i.e., what you are seeing). In other words, the behavior is undefined. You pretty much always want to avoid this, because it is unsafe and unpredictable.
Try writing a bigger string to your 10-byte buffer, such as 20 bytes or 30 bytes. You will see problems start to appear.
I'm trying to read a string via scanf as follows:
char input[8];
scanf("%s",input);
It turns out that the program could read more than 8 characters. Say I inputed 123456789012345 and strlen(input) returns 15.
However when I set input as:
char input[4];
scanf("%s",input);
Inputing "12345" will cause '16146 segmentation fault'.
Anyone knows how this happens?
Technically both cases invoke undefined behavior. That the first case happens to work on your system should not be taken to mean that your program is well-defined. Testing can only indicate the presence of bugs, not their absence.
Since you're still learning C I will take the opportunity to offer you advice for reading input from stdin: always limit the length of input that will be read to the length of the buffer it's being read in to, reserving one spot at the end for the null-terminator.
If you want to use scanf to read strings from stdin, then it is safer to prefix the string format specifier with the maximum length of the string than to use a raw "%s". For example, if I had a char buffer[20]; that was the destination of a call to scanf, I would use the format string "%19s".
Both are so called undefined behavior and should be avoided at all costs. No bugs are so tricky to find as those caused by this.
So why does this work? Well, that's the problem with undefined behavior. It may work. You have no guarantees at all.
Read more about UB here: Undefined, unspecified and implementation-defined behavior
See the following code:
int main()
{
char test[3];
scanf("%s", test);
__fpurge(stdin);
printf("%s", test);
}
The program should record only 3 characters, but when I type, for example, 8 characters, the program records all 8! This should not happen. The correct would record 3 characters, because the scanf do it?
scanf accepts more data than you can fit in test because you allow it to do so by using %s without a limit. This is dangerous, and must be avoided in production code.
Replace %s with %3s to fix this problem. If you want to read three characters, test must be four-characters wide to accommodate null terminator:
char test[4];
scanf("%3s", test);
When you pass test to scanf(), you are passing nothing but a pointer to the first character of your buffer, so scanf() has no idea how large your buffer is. It will happily accept as many characters as you type, and it will store them all in there. So, when you type more than 2 characters, you are causing scanf() to write characters (plus the zero asciiz terminator character) past the end of your buffer. Normally, what is to be expected in such a case is a program crash.
The fact that you did not experience a crash is largely coincidence, what is probably happening is that the compiler has allocated room for more than 3 characters in the stack due to alignment considerations, possibly room for 8 characters or more. If you type enough characters, your program will surely crash.
For this reason, this usage of scanf() is considered completely unsafe. One should never use scanf() like that when doing any serious coding. Instead you should specify the width of your string, like this: "%2s". (Note that you must specify a number which is smaller than the size of your buffer by one, in order to account for the zero asciiz terminator character that will be automatically appended by scanf().)
I am writing a C program, which has a 5-element array to store a string. And I am using gets() to get input. When I typed in more than 5 characters and then output the string, it just gave me all the characters I typed in. I know the string is terminated by a \0 so even I exceeded my array, it will still output the whole thing.
But what I am curious is where exactly gets() stores input, either buffer or just directly goes to my array?
What if I type in a long long string, will gets() try to store characters in the memories that should not be touched? Would it gives me a segment fault?
That's why gets is an evil. It does not check array bound and often invokes undefined behavior. Never use gets, instead you can use fgets.
By the way, now gets is no longer be a part of C. It has been removed in C11 standard in favor of a new safe alternative, gets_s1 (see the wiki). So, better to forget about gets.
1. C11: K.3.5.4.1 The gets_s function
Synopsis
#define _ _STDC_WANT_LIB_EXT1_ _ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
gets() will store the characters in the 5-element buffer. If you type in more than 4 characters, the end of string character will be missed and the result may not work well in any string operations in your program.
excerpt from man page on Ubuntu Linux
gets() reads a line from stdin into the buffer pointed to by s until
either a terminating newline or EOF, which it replaces with a null byte
('\0'). No check for buffer overrun is performed
The string is stored in the buffer and if it is too long it is stored in contiguous memory after the buffer. This can lead to unintended writing over of data or a SEGV fault or other problems. It is a security issue as it can be used to inject code into programs.
gets() stores the characters you type directly into your array and you can safely use/modify them. But indeed, as haccks and unxnut correctly state, gets doesn't care about the size of the array you give it to store its chars in, and when you type more characters than the array has space for you might eventually get a segmentation fault or some other weird results.
Just for the sake of completeness, gets() reads from a buffered file called stdin which contains the chars you typed. More specifically, it takes the chars until it reaches a newline. That newline too is put into your array and next the '\0' terminator. You should, as haccks says, use fgets which is very much alike:
char buf[100]; // the input buffer
fgets(buf, 100, stdin); // reads until it finds a newline (your enter) but never
// more than 99 chars, using the last char for the '\0'
// you can now use and modify buf
I have written a simple program to calculate length of string in this way.
I know that there are other ways too. But I just want to know why this program is giving this output.
#include <stdio.h>
int main()
{
char str[1];
printf( "%d", printf("%s", gets(str)));
return 0;
}
OUTPUT :
(null)6
Unless you always pass empty strings from the standard input, you are invoking undefined behavior, so the output could be pretty much anything, and it could crash as well. str cannot be a well-formed C string of more than zero characters.
char str[1] allocates storage room for one single character, but that character needs to be the NUL character to satisfy C string constraints. You need to create a character array large enough to hold the string that you're writing with gets.
"(null)6" as the output could mean that gets returned NULL because it failed for some reason or that the stack was corrupted in such a way that the return value was overwritten with zeroes (per the undefined behavior explanation). 6 following "(null)" is expected, as the return value of printf is the number of characters that were printed, and "(null)" is six characters long.
There's several issues with your program.
First off, you're defining a char buffer way too short, a 1 char buffer for a string can only hold one string, the empty one. This is because you need a null at the end of the string to terminate it.
Next, you're using the gets function which is very unsafe, (as your compiler almost certainly warned you about), as it just blindly takes input and copies it into a buffer. As your buffer is 0+terminator characters long, you're going to be automatically overwriting the end of your string into other areas of memory which could and probably does contain important information, such as your rsp (your return pointer). This is the classic method of smashing the stack.
Third, you're passing the output of a printf function to another printf. printf isn't designed for formating strings and returning strings, there are other functions for that. Generally the one you will want to use is sprintf and pass it in a string.
Please read the documentation on this sort of thing, and if you're unsure about any specific thing read up on it before just trying to program it in. You seem confused on the basic usage of many important C functions.
It invokes undefined behavior. In this case you may get any thing. At least str should be of 2 bytes if you are not passing a empty string.
When you declare a variable some space is reserved to store the value.
The reserved space can be a space that was previously used by some other
code and has values. When the variable goes out of scope or is freed
the value is not erased (or it may be, anything goes.) only the programs access
to that variable is revoked.
When you read from an unitialised location you can get anything.
This is undefined behaviour and you are doing that,
Output on gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 is 0
For above program your input is "(null)", So you are getting "(null)6". Here "6" is the output from printf (number of characters successfully printed).