I understand that assigning memory allocation for string requires n+1 due to the NULL character. However, the question is what if you allocate 10 chars but enter an 11 char string?
#include <stdlib.h>
int main(){
int n;
char *str;
printf("How long is your string? ");
scanf("%d", &n);
str = malloc(n+1);
if (str == NULL) printf("Uh oh.\n");
scanf("%s", str);
printf("Your string is: %s\n", str);
}
I tried running the program but the result is still the same as n+1.
If you allocated a char* of 10 characters but wrote 11 characters to it, you're writing to memory you haven't allocated. This has undefined behavior - it may happen to work, it may crash with a segmentation fault, and it may do something completely different. In short - don't rely on it.
If you overrun an area of memory given you by malloc, you corrupt the RAM heap. If you're lucky your program will crash right away, or when you free the memory, or when your program uses the chunk of memory right after the area you overran. When your program crashes you'll notice the bug and have a chance to fix it.
If you're unlucky your code goes into production, and some cybercriminal figures out how to exploit your overrun memory to trick your program into running some malicious code or using some malicious data they fed you. If you're really unlucky, you get featured in Krebs On Security or some other information security news outlet.
Don't do this. If you're not confident of your ability to avoid doing it, don't use C. Instead use a language with a native string data type. Seriously.
what if you allocate 10 chars but enter an 11 char string?
scanf("%s", str); experiences undefined behavior (UB). Anything may happen including "I tried running the program but the result is still the same as n+1." will appear OK.
Instead always use a width with scanf() and "%s" to stop reading once str[] is full. Example:
char str[10+1];
scanf("%10s", str);
Since n is variable here, consider instead using fgets() to read a line of input.
Note that fgets() also reads and saves a trailing '\n'.
Better to use fgets() for user input and drop scanf() call altogether until you understand why scanf() is bad.
str = malloc(n+1);
if (str == NULL) printf("Uh oh.\n");
if (fgets(str, n+1, stdin)) {
str[strcspn(str, "\n")] = 0; // Lop off potential trailing \n
When you write 11 bytes to a 10-byte buffer, the last byte will be out-of-bounds. Depending on several factors, the program may crash, have unexpected and weird behavior, or may run just fine (i.e., what you are seeing). In other words, the behavior is undefined. You pretty much always want to avoid this, because it is unsafe and unpredictable.
Try writing a bigger string to your 10-byte buffer, such as 20 bytes or 30 bytes. You will see problems start to appear.
Related
See the following code:
int main()
{
char test[3];
scanf("%s", test);
__fpurge(stdin);
printf("%s", test);
}
The program should record only 3 characters, but when I type, for example, 8 characters, the program records all 8! This should not happen. The correct would record 3 characters, because the scanf do it?
scanf accepts more data than you can fit in test because you allow it to do so by using %s without a limit. This is dangerous, and must be avoided in production code.
Replace %s with %3s to fix this problem. If you want to read three characters, test must be four-characters wide to accommodate null terminator:
char test[4];
scanf("%3s", test);
When you pass test to scanf(), you are passing nothing but a pointer to the first character of your buffer, so scanf() has no idea how large your buffer is. It will happily accept as many characters as you type, and it will store them all in there. So, when you type more than 2 characters, you are causing scanf() to write characters (plus the zero asciiz terminator character) past the end of your buffer. Normally, what is to be expected in such a case is a program crash.
The fact that you did not experience a crash is largely coincidence, what is probably happening is that the compiler has allocated room for more than 3 characters in the stack due to alignment considerations, possibly room for 8 characters or more. If you type enough characters, your program will surely crash.
For this reason, this usage of scanf() is considered completely unsafe. One should never use scanf() like that when doing any serious coding. Instead you should specify the width of your string, like this: "%2s". (Note that you must specify a number which is smaller than the size of your buffer by one, in order to account for the zero asciiz terminator character that will be automatically appended by scanf().)
I started learning about inputting character strings in C. In the following source code I get a character array of length 5.
#include<stdio.h>
int main(void)
{
char s1[5];
printf("enter text:\n");
scanf("%s",s1);
printf("\n%s\n",s1);
return 0;
}
when the input is:
1234567891234567, and I've checked it's working fine up to 16 elements(which I don't understand because it is more than 5 elements).
12345678912345678, it's giving me an error segmentation fault: 11 (I gave 17 elements in this case)
123456789123456789, the error is Illegal instruction: 4 (I gave 18 elements in this case)
I don't understand why there are different errors. Is this the behavior of scanf() or character arrays in C?. The book that I am reading didn't have a clear explanation about these things. FYI I don't know anything about pointers. Any further explanation about this would be really helpful.
Is this the behavior of scanf() or character arrays in C?
TL;DR - No, you're facing the side-effects of undefined behavior.
To elaborate, in your case, against a code like
scanf("%s",s1);
where you have defined
char s1[5];
inputting anything more than 4 char will cause your program to venture into invalid memory area (past the allocated memory) which in turn invokes undefined behavior.
Once you hit UB, the behavior of the program cannot be predicted or justified in any way. It can do absolutely anything possible (or even impossible).
There is nothing inherent in the scanf() which stops you from reading overly long input and overrun the buffer, you should keep control on the input string scanning by using the field width, like
scanf("%4s",s1); //1 saved for terminating null
The scanf function when reading strings read up to the next white-space (e.g. newline, space, tab etc.), or the "end of file". It has no idea about the size of the buffer you provide it.
If the string you read is longer than the buffer provided, then it will write out of bounds, and you will have undefined behavior.
The simplest way to stop this is to provide a field length to the scanf format, as in
char s1[5];
scanf("%4s",s1);
Note that I use 4 as field length, as there needs to be space for the string terminator as well.
You can also use the "secure" scanf_s for which you need to provide the buffer size as an argument:
char s1[5];
scanf_s("%s", s1, sizeof(s1));
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
char *input = (char *)malloc(sizeof(char));
input = "\0";
while (1){
scanf("%s\n", input);
if (strcmp(input, "0 0 0") == 0) break;
printf("%s\n",input);
}
}
I'm trying to read in a string of integers until "0 0 0" is entered in.
The program spits out bus error as soon as it executes the scanf line, and I have no clue how to fix it.
Below is the error log.
[1] 59443 bus error
You set input to point to the first element of a string literal (while leaking the recently allocated buffer):
input = "\0"; // now the malloc'd buffer is lost
Then you try to modify said literal:
scanf("%s\n", input);
That is undefined behaviour. You can't write to that location. You can fix that problem by removing the first line, input = "\0";.
Next, note that you're only allocating space for one character:
char *input = (char *)malloc(sizeof(char));
Once you fix the memory leak and the undefined behaviour, you can think about allocating more space. How much space you need is for you to say, but you need enough to contain the longest string you want to read in plus an extra character for the null terminator. For example,
char *input = malloc(257);
would allow you to read in strings up to 256 characters long.
The immediate problem, (thanks to another answer) is that you're initializing input wrong, by pointing it at read-only data, then later trying to write to it via scanf. (Yes, even the lowly literal "" is a pointer to a memory area where the empty string is stored.)
The next problem is semantic: there's no point in trying to initialize it when scanf() will soon overwrite whatever you put there. But if you wanted to, a valid way is input[0] = '\0', which would be appropriate for, say, a loop using strcat().
And finally, waiting in the wings to bite you is a deeper issue: You need to understand malloc() and sizeof() better. You're only allocating enough space for one character, then overrunning the 1-char buffer with a string of arbitrary length (up to the maximum that your terminal will allow on a line.)
A rough cut would be to allocate far more, say 256 chars, than you'll ever need, but scanf is an awful function for this reason -- makes buffer overruns painfully easy especially for novices. I'll leave it to others to suggest alternatives.
Interestingly, the type of crash can indicate something about what you did wrong. A Bus error often relates to modifying read-only memory (which is still a mapped page), such as you're trying to do, but a Segmentation Violation often indicates overrunning a buffer of a writable memory range, by hitting an unmapped page.
input = "\0";
is wrong.
'input' is pointer, not memory.
"\0" is string, not char.
You assigning pointer to a new value which points to a segment of memory which holds constants because "\0" is constant string literal.
Now when you are trying to modify this constant memory, you are getting bus error which is expected.
In your case i assume you wanted to initialize 'input' with empty string.
Use
input[0]='\0';
note single quotes around 0.
Next problem is malloc:
char *input = (char *)malloc(sizeof(char));
you are allocating memory for 1 character only.
When user will enter "0 0 0" which is 5 characters + zero you will get buffer overflow and will probably corrupt some innocent variable.
Allocate enough memory upfront to store all user input. Usual values are 256, 8192 bytes it doesn't matter.
Then,
scanf("%s\n", input);
may still overrun the buffer if user enters alot of text. Use fgets(buf, limit(like 8192), stdin), that would be safer.
The following code reads up to 10 chars from stdin and output the chars.
When I input more than 10 chars, I expect it to crash because msg has not enough room, but it does NOT! How could that be?
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
char* msg = malloc(sizeof(char)*10);
if (NULL==msg)
exit(EXIT_FAILURE);
scanf("%s", msg);
printf("You said: %s\n", msg);
if (strlen(msg)<10)
free(msg);
return EXIT_SUCCESS;
}
Use fgets instead, scanf is not buffer safe. What you are seeing is Undefined Behavior.
You may allocate "safe" big size when using scanf(). On user input it should be 2 lines (cca. 2x80 chars), in case of files some bigger.
Conclusion: scanf() is kinda quick-and-dirty stuff, don't use it in serious projects.
You can specify max size in scanf() format string
scanf("%9s", msg);
I would imagine that malloc() allocates blocks of memory aligned to word boundaries. On a 32-bit machine, that means whatever you ask for will be rounded up to the nearest multiple of 4. That means you might get away with a string of at least 11 characters (plus a '\0' terminator) without suffering any problems.
But don't ever assume this to be the case. Like everyone else is saying, you should always specify a safe maximum length in your format string if you want to avoid problems.
It does not crash because c is very lenient, contrary to popular belief. It is not required for the program to crash or even complain if a buffer is overflown. Say you define
union{
uint8_t a[3]
uint32_t b
}
then a[4] is perfectly fine memory and there is no reason to crash (but don't ever do this). Even a[5] or a[100] may be perfectly fine.
On the other hand I may try to access a[-1] which happens to be memory the OS does not allow you to access, causing a segfault.
As to what you should do to fix this:as others have pointed out, scanf is not safe to use with buffers. Use on of their suggetsions.
I have this snippet of the code:
char* receiveInput(){
char *s;
scanf("%s",s);
return s;
}
int main()
{
char *str = receiveInput();
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
I receive this output:
Your string is hellàÿ", length is 11
my input was:
helloworld!
can somebody explain why, and why this style of the coding is bad, thanks in advance
Several questions have addressed what you've done wrong and how to fix it, but you also said (emphasis mine):
can somebody explain why, and why this style of the coding is bad
I think scanf is a terrible way to read input. It's inconsistent with printf, makes it easy to forget to check for errors, makes it hard to recover from errors, and is incompatable with ordinary (and easier to do correctly) read operations (like fgets and company).
First, note that the "%s" format will read only until it sees whitespace. Why whitespace? Why does "%s" print out an entire string, but reads in strings in such a limited capacity?
If you'd like to read in an entire line, as you may often be wont to do, scanf provides... with "%[^\n]". What? What is that? When did this become Perl?
But the real problem is that neither of those are safe. They both freely overflow with no bounds checking. Want bounds checking? Okay, you got it: "%10s" (and "%10[^\n]" is starting to look even worse). That will only read 9 characters, and add a terminating nul-character automatically. So that's good... for when our array size never needs to change.
What if we want to pass the size of our array as an argument to scanf? printf can do this:
char string[] = "Hello, world!";
printf("%.*s\n", sizeof string, string); // prints whole message;
printf("%.*s\n", 6, string); // prints just "Hello,"
Want to do the same thing with scanf? Here's how:
static char tmp[/*bit twiddling to get the log10 of SIZE_MAX plus a few*/];
// if we did the math right we shouldn't need to use snprintf
snprintf(tmp, sizeof tmp, "%%%us", bufsize);
scanf(tmp, buffer);
That's right - scanf doesn't support the "%.*s" variable precision printf does, so to do dynamic bounds checking with scanf we have to construct our own format string in a temporary buffer. This is all kinds of bad, and even though it's actually safe here it will look like a really bad idea to anyone just dropping in.
Meanwhile, let's look at another world. Let's look at the world of fgets. Here's how we read in a line of data with fgets:
fgets(buffer, bufsize, stdin);
Infinitely less headache, no wasted processor time converting an integer precision into a string that will only be reparsed by the library back into an integer, and all the relevant elements are sitting there on one line for us to see how they work together.
Granted, this may not read an entire line. It will only read an entire line if the line is shorter than bufsize - 1 characters. Here's how we can read an entire line:
char *readline(FILE *file)
{
size_t size = 80; // start off small
size_t curr = 0;
char *buffer = malloc(size);
while(fgets(buffer + curr, size - curr, file))
{
if(strchr(buffer + curr, '\n')) return buffer; // success
curr = size - 1;
size *= 2;
char *tmp = realloc(buffer, size);
if(tmp == NULL) /* handle error */;
buffer = tmp;
}
/* handle error */;
}
The curr variable is an optimization to prevent us from rechecking data we've already read, and is unnecessary (although useful as we read more data). We could even use the return value of strchr to strip off the ending "\n" character if you preferred.
Notice also that size_t size = 80; as a starting place is completely arbitrary. We could use 81, or 79, or 100, or add it as a user-supplied argument to the function. We could even add an int (*inc)(int) argument, and change size *= 2; to size = inc(size);, allowing the user to control how fast the array grows. These can be useful for efficiency, when reallocations get costly and boatloads of lines of data need to be read and processed.
We could write the same with scanf, but think of how many times we'd have to rewrite the format string. We could limit it to a constant increment, instead of the doubling (easily) implemented above, and never have to adjust the format string; we could give in and just store the number, do the math with as above, and use snprintf to convert it to a format string every time we reallocate so that scanf can convert it back to the same number; we could limit our growth and starting position in such a way that we can manually adjust the format string (say, just increment the digits), but this could get hairy after a while and may require recursion (!) to work cleanly.
Furthermore, it's hard to mix reading with scanf with reading with other functions. Why? Say you want to read an integer from a line, then read a string from the next line. You try this:
int i;
char buf[BUSIZE];
scanf("%i", &i);
fgets(buf, BUFSIZE, stdin);
That will read the "2" but then fgets will read an empty line because scanf didn't read the newline! Okay, take two:
...
scanf("%i\n", &i);
...
You think this eats up the newline, and it does - but it also eats up leading whitespace on the next line, because scanf can't tell the difference between newlines and other forms of whitespace. (Also, turns out you're writing a Python parser, and leading whitespace in lines is important.) To make this work, you have to call getchar or something to read in the newline and throw it away it:
...
scanf("%i", &i);
getchar();
...
Isn't that silly? What happens if you use scanf in a function, but don't call getchar because you don't know whether the next read is going to be scanf or something saner (or whether or not the next character is even going to be a newline)? Suddenly the best way to handle the situation seems to be to pick one or the other: do we use scanf exclusively and never have access to fgets-style full-control input, or do we use fgets exclusively and make it harder to perform complex parsing?
Actually, the answer is we don't. We use fgets (or non-scanf functions) exclusively, and when we need scanf-like functionality, we just call sscanf on the strings! We don't need to have scanf mucking up our filestreams unnecessarily! We can have all the precise control over our input we want and still get all the functionality of scanf formatting. And even if we couldn't, many scanf format options have near-direct corresponding functions in the standard library, like the infinitely more flexible strtol and strtod functions (and friends). Plus, i = strtoumax(str, NULL) for C99 sized integer types is a lot cleaner looking than scanf("%" SCNuMAX, &i);, and a lot safer (we can use that strtoumax line unchanged for smaller types and let the implicit conversion handle the extra bits, but with scanf we have to make a temporary uintmax_t to read into).
The moral of this story: avoid scanf. If you need the formatting it provides, and don't want to (or can't) do it (more efficiently) yourself, use fgets / sscanf.
scanf doesn't allocate memory for you.
You need to allocate memory for the variable passed to scanf.
You could do like this:
char* receiveInput(){
char *s = (char*) malloc( 100 );
scanf("%s",s);
return s;
}
But warning:
the function that calls receiveInput will take the ownership of the returned memory: you'll have to free(str) after you print it in main. (Giving the ownership away in this way is usually not considered a good practice).
An easy fix is getting the allocated memory as a parameter.
if the input string is longer than 99 (in my case) your program will suffer of buffer overflow (which is what it's already happening).
An easy fix is to pass to scanf the length of your buffer:
scanf("%99s",s);
A fixed code could be like this:
// s must be of at least 100 chars!!!
char* receiveInput( char *s ){
scanf("%99s",s);
return s;
}
int main()
{
char str[100];
receiveInput( str );
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
You have to first allocate memory to your s object in your receiveInput() method. Such as:
s = (char *)calloc(50, sizeof(char));