I have this snippet of the code:
char* receiveInput(){
char *s;
scanf("%s",s);
return s;
}
int main()
{
char *str = receiveInput();
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
I receive this output:
Your string is hellàÿ", length is 11
my input was:
helloworld!
can somebody explain why, and why this style of the coding is bad, thanks in advance
Several questions have addressed what you've done wrong and how to fix it, but you also said (emphasis mine):
can somebody explain why, and why this style of the coding is bad
I think scanf is a terrible way to read input. It's inconsistent with printf, makes it easy to forget to check for errors, makes it hard to recover from errors, and is incompatable with ordinary (and easier to do correctly) read operations (like fgets and company).
First, note that the "%s" format will read only until it sees whitespace. Why whitespace? Why does "%s" print out an entire string, but reads in strings in such a limited capacity?
If you'd like to read in an entire line, as you may often be wont to do, scanf provides... with "%[^\n]". What? What is that? When did this become Perl?
But the real problem is that neither of those are safe. They both freely overflow with no bounds checking. Want bounds checking? Okay, you got it: "%10s" (and "%10[^\n]" is starting to look even worse). That will only read 9 characters, and add a terminating nul-character automatically. So that's good... for when our array size never needs to change.
What if we want to pass the size of our array as an argument to scanf? printf can do this:
char string[] = "Hello, world!";
printf("%.*s\n", sizeof string, string); // prints whole message;
printf("%.*s\n", 6, string); // prints just "Hello,"
Want to do the same thing with scanf? Here's how:
static char tmp[/*bit twiddling to get the log10 of SIZE_MAX plus a few*/];
// if we did the math right we shouldn't need to use snprintf
snprintf(tmp, sizeof tmp, "%%%us", bufsize);
scanf(tmp, buffer);
That's right - scanf doesn't support the "%.*s" variable precision printf does, so to do dynamic bounds checking with scanf we have to construct our own format string in a temporary buffer. This is all kinds of bad, and even though it's actually safe here it will look like a really bad idea to anyone just dropping in.
Meanwhile, let's look at another world. Let's look at the world of fgets. Here's how we read in a line of data with fgets:
fgets(buffer, bufsize, stdin);
Infinitely less headache, no wasted processor time converting an integer precision into a string that will only be reparsed by the library back into an integer, and all the relevant elements are sitting there on one line for us to see how they work together.
Granted, this may not read an entire line. It will only read an entire line if the line is shorter than bufsize - 1 characters. Here's how we can read an entire line:
char *readline(FILE *file)
{
size_t size = 80; // start off small
size_t curr = 0;
char *buffer = malloc(size);
while(fgets(buffer + curr, size - curr, file))
{
if(strchr(buffer + curr, '\n')) return buffer; // success
curr = size - 1;
size *= 2;
char *tmp = realloc(buffer, size);
if(tmp == NULL) /* handle error */;
buffer = tmp;
}
/* handle error */;
}
The curr variable is an optimization to prevent us from rechecking data we've already read, and is unnecessary (although useful as we read more data). We could even use the return value of strchr to strip off the ending "\n" character if you preferred.
Notice also that size_t size = 80; as a starting place is completely arbitrary. We could use 81, or 79, or 100, or add it as a user-supplied argument to the function. We could even add an int (*inc)(int) argument, and change size *= 2; to size = inc(size);, allowing the user to control how fast the array grows. These can be useful for efficiency, when reallocations get costly and boatloads of lines of data need to be read and processed.
We could write the same with scanf, but think of how many times we'd have to rewrite the format string. We could limit it to a constant increment, instead of the doubling (easily) implemented above, and never have to adjust the format string; we could give in and just store the number, do the math with as above, and use snprintf to convert it to a format string every time we reallocate so that scanf can convert it back to the same number; we could limit our growth and starting position in such a way that we can manually adjust the format string (say, just increment the digits), but this could get hairy after a while and may require recursion (!) to work cleanly.
Furthermore, it's hard to mix reading with scanf with reading with other functions. Why? Say you want to read an integer from a line, then read a string from the next line. You try this:
int i;
char buf[BUSIZE];
scanf("%i", &i);
fgets(buf, BUFSIZE, stdin);
That will read the "2" but then fgets will read an empty line because scanf didn't read the newline! Okay, take two:
...
scanf("%i\n", &i);
...
You think this eats up the newline, and it does - but it also eats up leading whitespace on the next line, because scanf can't tell the difference between newlines and other forms of whitespace. (Also, turns out you're writing a Python parser, and leading whitespace in lines is important.) To make this work, you have to call getchar or something to read in the newline and throw it away it:
...
scanf("%i", &i);
getchar();
...
Isn't that silly? What happens if you use scanf in a function, but don't call getchar because you don't know whether the next read is going to be scanf or something saner (or whether or not the next character is even going to be a newline)? Suddenly the best way to handle the situation seems to be to pick one or the other: do we use scanf exclusively and never have access to fgets-style full-control input, or do we use fgets exclusively and make it harder to perform complex parsing?
Actually, the answer is we don't. We use fgets (or non-scanf functions) exclusively, and when we need scanf-like functionality, we just call sscanf on the strings! We don't need to have scanf mucking up our filestreams unnecessarily! We can have all the precise control over our input we want and still get all the functionality of scanf formatting. And even if we couldn't, many scanf format options have near-direct corresponding functions in the standard library, like the infinitely more flexible strtol and strtod functions (and friends). Plus, i = strtoumax(str, NULL) for C99 sized integer types is a lot cleaner looking than scanf("%" SCNuMAX, &i);, and a lot safer (we can use that strtoumax line unchanged for smaller types and let the implicit conversion handle the extra bits, but with scanf we have to make a temporary uintmax_t to read into).
The moral of this story: avoid scanf. If you need the formatting it provides, and don't want to (or can't) do it (more efficiently) yourself, use fgets / sscanf.
scanf doesn't allocate memory for you.
You need to allocate memory for the variable passed to scanf.
You could do like this:
char* receiveInput(){
char *s = (char*) malloc( 100 );
scanf("%s",s);
return s;
}
But warning:
the function that calls receiveInput will take the ownership of the returned memory: you'll have to free(str) after you print it in main. (Giving the ownership away in this way is usually not considered a good practice).
An easy fix is getting the allocated memory as a parameter.
if the input string is longer than 99 (in my case) your program will suffer of buffer overflow (which is what it's already happening).
An easy fix is to pass to scanf the length of your buffer:
scanf("%99s",s);
A fixed code could be like this:
// s must be of at least 100 chars!!!
char* receiveInput( char *s ){
scanf("%99s",s);
return s;
}
int main()
{
char str[100];
receiveInput( str );
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
You have to first allocate memory to your s object in your receiveInput() method. Such as:
s = (char *)calloc(50, sizeof(char));
Related
I tried reading using:
char *input1, *input2;
scanf("%s[^\n]", input1);
scanf("%s[^\n]", input2);
I am obviously doing something wrong because the second string is read as null. I know using scanf() is not recommended but I couldn't find any other simple way to do the same.
The statement:
char *input1, *input2;
allocates memory for two pointers to char. Note that this only allocated memory for that pointers — which are uninitialised and aren't pointing to anything meaningful — not what they're pointing to.
The call to scanf() then tries to write to memory out of bounds, and results in undefined behaviour.
You could instead, declare character arrays of fixed size with automatic storage duration:
char input1[SIZE];
This will allocate memory for the array, and the call to scanf() will be valid.
Alternatively, you could allocate memory dynamically for the pointers with one of the memory allocation functions:
char *input1 = malloc (size);
This declares a pointer to char whose contents are indeterminate, but are immediately overwritten with a pointer to a chunk of memory of size size. Note that the call to malloc() may have failed. It returns NULL as an error code, so check for it.
But scanf() should not be used as a user-input interface. It does not guard against buffer overflows, and will leave a newline in the input buffer (which leads to more problems down the road).
Consider using fgets instead. It will null-terminate the buffer and read at most size - 1 characters.
The calls to scanf() can be replaced with:
fgets (buf, sizeof buf, stdin);
You can then parse the string with sscanf, strtol, et cetera.
Note that fgets() will retain the trailing newline if there was space. You could use this one-liner to remove it:
buf [strcspn (buf, "\n\r") = '\0`;
This takes care of the return carriage as well, if any.
Or if you wish to continue using scanf() (which I advise against), use a field width to limit input and check scanf()'s return value:
scanf ("%1023s", input1); /* Am using 1023 as a place holder */
That being said, if you wish to read a line of variable length, you need to allocate memory dynamically with malloc(), and then resize it with realloc() as necessary.
On POSIX-compliant systems, you could use getline() to read strings of arbitrary length, but note that it's vulnerable to a DOS attack.
You can use m modifier to format specifier. Note that it is not standard C but rather a standard POSIX extension.
char *a, *b;
scanf("%m[^\n] %m[^\n]", &a, &b);
// use a and b
printf("*%s*\n*%s*\n", a, b);
free(a);
free(b);
There are 2 simple ways to read variable length strings from the input stream:
using fgets() with an array large enough for the maximum length:
char input1[200];
if (fgets(input1, sizeof input1, stdin)) {
/* string was read. strip the newline if present */
input1[strcspn(input1, "\n")] = '\0';
...
} else {
/* nothing was read: premature end of file? */
...
}
on POSIX compliant systems, you can use getline() to read strings of arbitrary length into arrays allocated with malloc():
char *input1 = NULL;
size_t input1_size = 0;
ssize_t input1_length = getline(&input1, &input1_size, stdin);
if (input1_length >= 0) {
/* string was read. length is input1_length */
if (input1_length > 0 && input1[input1_length - 1] == '\n') {
/* remove the newline if present */
input1[--input1_length] = '\0';
}
...
} else {
/* nothing was read: premature end of file? */
...
}
Using scanf is not recommended because it is difficult to use correctly and reading input with "%s" or "%[^\n]" without a specified maximum length is risky as any sufficiently long input will cause a buffer overflow and undefined behavior. Passing uninitialized pointers to scanf as you do in the posted code has undefined behavior.
Any simple way to read a string of variable length in C?
Unfortunately the answer is NO
The input functions (e.g. scanf, fgets, etc.) specified by the C standard all requires the caller to provide the input buffer. Once the input buffer is full, the functions will (when used correctly) return. So if the input is longer than the size of the provided buffer, the functions will only read partial input. So the caller must add code to check for partial input and do additional function calls as needed.
Posix systems has the getline and getdelim functions that can do it. So if you can accept limiting your code to Posix compliant systems, that's what you want to use.
If you need portable, standard compliant code, you need to write your own function. For that you need to look into functions like realloc, fgets, strcpy, memcpy, etc. It's not a simple task but it's not "rocket science" either. It's been done many, many times before... and if you search the net, it's very likely you can find an open source implementation that you can just copy (make sure to follow the rules for doing that).
This question already has answers here:
Read line from file without knowing the line length
(5 answers)
Closed 4 years ago.
I am trying to get an input by using scanf operator but I also needs its length, I can't define char name[number] because i don't know the size of the input that I get from the user...
Any ideas how to do it without using string.h?
This is what I tried to do but it is illegal since I define the length of the array:
char string1[30];
printf("Enter string1:");
scanf("%s",string1);
You can't possibly know how long the user's input string will be ahead of time. So you can only provide a buffer of some size, and hope that the input is shorter than that size; and if the input is too long, then your program will not get all of the input at once.
Suppose you are prompting the user for his/her first name. My first name is "Steve" so I only need 5 letters, but other names are longer. I just did a Google search for "longest first name" and one example I found was "Lydiakalanirowealeekundemondisha" which needs 32 letters. Okay, let's make the input buffer 128 letters long and figure it's probably long enough.
#include <stdio.h>
char first_name[128];
printf("Please enter your first name: ");
fgets(first_name, sizeof(first_name), stdin);
So here we are using fgets() to get the user input. We tell fgets() to store the input in our buffer (first_name) and we tell it the size of our buffer. NOTE: we are using the sizeof() operator here, so that the program will still be correct if someone edits the file and changes the size of the buffer.
Now, no matter what the user types in (or pastes in from the clipboard), our program will only read in the first 127 characters. If the user types in less than that, we get everything.
So now we can check to see how much we got:
if (strlen(first_name) >= sizeof(first_name) - 1)
{
printf("That input is too long, sorry.\n");
}
Here we check to see how long the string actually was. If it was too long we give an error message to the user. In theory, the user could have typed exactly 127 characters and the input would fit exactly, but we have no way to know if the user typed exactly 127 characters or more than that. And in practice we don't expect a first name to be anywhere near that long, so it's safe to treat this condition as an error.
Now, here is how to not do it. Never do this ever.
char short_buffer[16];
printf("Please enter your first name: ");
gets(short_buffer);
The function gets() has no way to know how many characters will fit in the buffer. If the user types "Lydiakalanirowealeekundemondisha" then gets() will write off the end of the buffer and likely cause an error. Never use gets().
You probably won't get along without defining a maximum size.
It is not important not to define a size, but to know and respect it afterwards.
The easiest way to get input from a user is fgets():
char string1[50];
fgets(string1, sizeof string1, stdin);
Of course, you should check its return value.
If you want to accept (almost) any length, you can try the solution I gave here.
This is required to prevent an overflow of the given array. In order to work with the string, you can get its length wither with strlen(), or, if you are not allowed to use that or are walking to the strings nevertheless, by counting the characters until you hit a NUL byte.
The background of this is that strings in C are terminated by a NUL byte. They are sequences of chars, and the NUL byte (0, not '0' which would be 48) terminates this sequence.
If your only task is to verify that the strings you read are small enough, and complain if they aren't, then just do that :-)
int main(int argc, char ** argv)
{
char string2[50]; // larger than required; in order to be able to check.
char string1[30]; // if all is ok, you have maximum length of 29, plus the NUL terminator. So 30 is ok.
char * ret = fgets(string2, sizeof string2, stdin);
if (!ret) {
fprintf(stderr, "Read error.\n")
return 1; // indicate error
}
if (strlen(string2) >= sizeof string1) { // we can take this size as a reference...
fprintf(stderr, "String 1 too long.\n")
return 1; // indicate error
}
strcpy(string1, string2); // as we have verified that this will match, it is ok.
// Otherwise, we would have to use strncpy.
// Now read the 2nd string by the same way:
ret = fgets(string2, sizeof string2, stdin);
if (!ret) {
fprintf(stderr, "Read error.\n")
return 1; // indicate error
}
if (strlen(string2) >= sizeof string1) { // we can take this size as a reference...
fprintf(stderr, "String 2 too long.\n")
return 1; // indicate error
}
// Now we know that both strings are ok in length an we can use strcmp().
int c = strcmp(string1, string2);
printf("strcmp() result: %d.\n", c);
return 0; // indicate success
}
I am not clear now if you are supposed to implement strcmp() as well. If so, I'll leave that as an exercise.
Today I've looked over some C code that was parsing data from a text file
and I've stumbled upon these lines
fgets(line,MAX,fp);
if(line[strlen(line)-1]=='\n'){
line[strlen(line)-1]='\0');
}else{
printf("Error on line length\n");
exit(1);
}
sscanf((line,"%s",records->bday));
with record being a structure
typedef struct {
char bday[11];
}record;
So my question here regards the fgets-sscanf combination to create a type/length safe stream reader:
Is there any other way to work this out beside having to combine these two readers?
What about the \n checking-removing sequence?
The combination of fgets() with sscanf() is usually good. However, you should probably be using:
if (fgets(line, sizeof(line), fp) != 0)
{
...
}
This checks for I/O errors and EOF. It also assumes that the definition of the array is visible (otherwise sizeof gives you the size of a pointer, not of the array). If the array is not in scope, you should probably pass the size of the array to the function containing this code. All that said, there are worse sins than using MAX in place of sizeof(line).
You have not checked for a zero-length birthday string; you will probably end up doing quite a lot of validation on the string that is entered, though (dates are fickle and hard to process).
Given that MAX is 60, but sizeof(records->bday) == 11, you need to protect yourself from buffer overflows in the sscanf(). One way to do that is:
if (sscanf(line, "%10s", records->bday) != 1)
...handle error...
Note that the 10 is sizeof(records->bday) - 1, but you can't provide the length as an argument to sscanf(); it has to appear in the format string literally. Here, you can probably live with the odd sizing, but if you were dealing with more generic code, you'd probably think about:
sprintf(format, "%%%zus", sizeof(records->bday) - 1);
The first %% maps to %; the %zu formats the size (z is C99 for size_t); the s is for the string conversion when the format is used.
Or you could consider using strcpy() or memcpy() or memmove() to copy the right subsection of the input string to the structure - but note that %10s skips leading blanks which strcpy() et al will not. You have to know how long the string is before you do the copying, of course, and make sure the string is null terminated.
I am using Linux.
I am trying to write a program in c that will print a string backward.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main (){
char string[100];
printf ("Enter string:\n");
gets (string);
int length = strlen (string)-1;
for (length = length; length>=0; length--){
puts (string[length]);
}
}
And here is the error:
a.c:10: warning: passing argument 1 of ‘puts’ makes pointer from integer without a cast
/usr/include/stdio.h:668: note: expected ‘const char *’ but argument is of type ‘char’
/tmp/cc5rpeG7.o: In function `main':
a.c:(.text+0x29): warning: the `gets' function is dangerous and should not be used.
What should I do?
Forget that the function gets() exists - it is lethal. Use fgets() instead (but note that it does not remove the newline at the end of the line).
You want to put a single character at a time: use putchar() to write it to stdout. Don't forget to add a newline to the output after the loop.
Also, for (length = length; length >= 0; length--) is not idiomatic C. Use one of:
for ( ; length >= 0; length--)
for (length = strlen(string) - 1; length >= 0; length--)
for (int length = strlen(string) - 1; length >= 0; length--)
The last alternative uses a feature added to C99 (which was available in C++ long before).
Also, we could debate whether length is the appropriate name for the variable. It would be better renamed as i or pos or something similar because, although it is initialized to the length of the input, it is actually used as an array index, not as the length of anything.
Subjective: Don't put a space between the name of a function and its parameter list. The founding fathers of C don't do that - neither should you.
Why is gets() lethal?
The first Internet worm - the Morris worm from 1988 - exploited the fingerd program that used gets() instead of fgets(). Since then, numerous programs have been crashed because they used gets() and not fgets() or another alternative.
The fundamental problem is that gets() does not know how much space is available to store the data it reads. This leads to 'buffer overflows', a term which can be searched for in your favourite search engine that will return an enormous number of entries.
If someone types 150 characters of input to the example program, then gets() will store 150 characters in the array which has length 100. This never leads to happiness - it usually leads to a core dump, but with carefully chosen inputs - often generated by a Perl or Python script - you can probably get the program to execute arbitrary other code. This really matters if the program will ever be run by a user with 'elevated privileges'.
Incidentally, gets() is likely to be removed from the Standard C library in the next release (C1x - see n1494 from WG14). It won't vanish from actual C libraries for a long time yet (20 years?), but it should be replaced with this implementation (or something similar):
#undef NDEBUG
#include <assert.h>
char *gets(char *buffer)
{
assert("Probability of using gets() safely" == 0);
}
One other minor detail, discussed in part under the comments to the main question.
The code shown is clearly for C99; the declaration of length part way through the function is invalid in C89. Given that, it is 'OK' for the main() function not to explicitly return a value, because the C99 standard follows the lead of the C++ standard and allows you to omit the return from main() and the effect is the same as return(0); or return 0; at the end.
As such, the program in this question cannot strictly be faulted for not having a return at the end. However, I regard that as one of the more peculiar standardizing decisions, and would much prefer it if the standards had left that provision out - or done something more radical like allowing the ubiquitous but erroneous void main() observing that when control returns from that, the result is that a success status is returned to the environment. It isn't worth fighting to get that aspect of the standard changed - sadly - but as a personal style decision, I don't take advantage of the licence granted to omit the final return from main(). If the code has to work with C89 compilers, it should have the explicit return 0; at the end (but then the declaration of length has to be fixed too).
You can also use recursion to do it. I think it looks nicer then when using a loop.
Just call the method with your string, and before printing the char in the method, call the method again with the same string, minus the first char.
This will print out you string in reversed order.
First:
NEVER NEVER NEVER NEVER NEVER use gets(); it will introduce a point of failure in your code. There's no way to tell gets() how big the target buffer is, so if you pass a buffer sized to hold 10 characters and there's 100 characters in the input stream, gets() will happily store those extra 90 characters in the memory beyond the end of your buffer, potentially clobbering something important. Buffer overruns are an easy malware exploit; the Morris worm specifically exploited a gets() call in sendmail.
Use fgets() instead; it allows you to specify the maximum number of characters to read from the input stream. However, unlike gets(), fgets() will save the terminating newline character to the buffer if there's room for it, so you have to account for that:
char string[100];
char *newline;
printf("Enter a string: ");
fflush(stdout);
fgets(string, sizeof string, stdin);
newline = strchr(buffer, '\n'); // search for the newline character
if (newline) // if it's present
*newline = 0; // set it to zero
Now that's out of the way...
Your error is coming from the fact that puts() expects an argument of type char *, but you're passing an argument of type char, hence the "pointer from integer without cast" message (char is an integral type). To write a single character to stdout, use putchar() or fputc().
You should use putchar instead of puts
So this loop:
for (length = length; length>=0; length--){
puts (string[length]);
}
Will be:
for (length = length; length>=0; length--){
putchar (string[length]);
}
putchar will take a single char as a parameter and print it to stdout, which is what you want. puts, on the other hand, will print the whole string to stdout. So when you pass a single char to a function that expects a whole string (char array, NULL terminated string), compiler gets confused.
Use putc or putchar, as puts is specified to take a char* and you are feeding it a char.
I want to know the disadvantages of scanf().
In many sites, I have read that using scanf might cause buffer overflows. What is the reason for this? Are there any other drawbacks with scanf?
Most of the answers so far seem to focus on the string buffer overflow issue. In reality, the format specifiers that can be used with scanf functions support explicit field width setting, which limit the maximum size of the input and prevent buffer overflow. This renders the popular accusations of string-buffer overflow dangers present in scanf virtually baseless. Claiming that scanf is somehow analogous to gets in the respect is completely incorrect. There's a major qualitative difference between scanf and gets: scanf does provide the user with string-buffer-overflow-preventing features, while gets doesn't.
One can argue that these scanf features are difficult to use, since the field width has to be embedded into format string (there's no way to pass it through a variadic argument, as it can be done in printf). That is actually true. scanf is indeed rather poorly designed in that regard. But nevertheless any claims that scanf is somehow hopelessly broken with regard to string-buffer-overflow safety are completely bogus and usually made by lazy programmers.
The real problem with scanf has a completely different nature, even though it is also about overflow. When scanf function is used for converting decimal representations of numbers into values of arithmetic types, it provides no protection from arithmetic overflow. If overflow happens, scanf produces undefined behavior. For this reason, the only proper way to perform the conversion in C standard library is functions from strto... family.
So, to summarize the above, the problem with scanf is that it is difficult (albeit possible) to use properly and safely with string buffers. And it is impossible to use safely for arithmetic input. The latter is the real problem. The former is just an inconvenience.
P.S. The above in intended to be about the entire family of scanf functions (including also fscanf and sscanf). With scanf specifically, the obvious issue is that the very idea of using a strictly-formatted function for reading potentially interactive input is rather questionable.
The problems with scanf are (at a minimum):
using %s to get a string from the user, which leads to the possibility that the string may be longer than your buffer, causing overflow.
the possibility of a failed scan leaving your file pointer in an indeterminate location.
I very much prefer using fgets to read whole lines in so that you can limit the amount of data read. If you've got a 1K buffer, and you read a line into it with fgets you can tell if the line was too long by the fact there's no terminating newline character (last line of a file without a newline notwithstanding).
Then you can complain to the user, or allocate more space for the rest of the line (continuously if necessary until you have enough space). In either case, there's no risk of buffer overflow.
Once you've read the line in, you know that you're positioned at the next line so there's no problem there. You can then sscanf your string to your heart's content without having to save and restore the file pointer for re-reading.
Here's a snippet of code which I frequently use to ensure no buffer overflow when asking the user for information.
It could be easily adjusted to use a file other than standard input if necessary and you could also have it allocate its own buffer (and keep increasing it until it's big enough) before giving that back to the caller (although the caller would then be responsible for freeing it, of course).
#include <stdio.h>
#include <string.h>
#define OK 0
#define NO_INPUT 1
#define TOO_LONG 2
#define SMALL_BUFF 3
static int getLine (char *prmpt, char *buff, size_t sz) {
int ch, extra;
// Size zero or one cannot store enough, so don't even
// try - we need space for at least newline and terminator.
if (sz < 2)
return SMALL_BUFF;
// Output prompt.
if (prmpt != NULL) {
printf ("%s", prmpt);
fflush (stdout);
}
// Get line with buffer overrun protection.
if (fgets (buff, sz, stdin) == NULL)
return NO_INPUT;
// Catch possibility of `\0` in the input stream.
size_t len = strlen(buff);
if (len < 1)
return NO_INPUT;
// If it was too long, there'll be no newline. In that case, we flush
// to end of line so that excess doesn't affect the next call.
if (buff[len - 1] != '\n') {
extra = 0;
while (((ch = getchar()) != '\n') && (ch != EOF))
extra = 1;
return (extra == 1) ? TOO_LONG : OK;
}
// Otherwise remove newline and give string back to caller.
buff[len - 1] = '\0';
return OK;
}
And, a test driver for it:
// Test program for getLine().
int main (void) {
int rc;
char buff[10];
rc = getLine ("Enter string> ", buff, sizeof(buff));
if (rc == NO_INPUT) {
// Extra NL since my system doesn't output that on EOF.
printf ("\nNo input\n");
return 1;
}
if (rc == TOO_LONG) {
printf ("Input too long [%s]\n", buff);
return 1;
}
printf ("OK [%s]\n", buff);
return 0;
}
Finally, a test run to show it in action:
$ printf "\0" | ./tstprg # Singular NUL in input stream.
Enter string>
No input
$ ./tstprg < /dev/null # EOF in input stream.
Enter string>
No input
$ ./tstprg # A one-character string.
Enter string> a
OK [a]
$ ./tstprg # Longer string but still able to fit.
Enter string> hello
OK [hello]
$ ./tstprg # Too long for buffer.
Enter string> hello there
Input too long [hello the]
$ ./tstprg # Test limit of buffer.
Enter string> 123456789
OK [123456789]
$ ./tstprg # Test just over limit.
Enter string> 1234567890
Input too long [123456789]
From the comp.lang.c FAQ: Why does everyone say not to use scanf? What should I use instead?
scanf has a number of problems—see questions 12.17, 12.18a, and 12.19. Also, its %s format has the same problem that gets() has (see question 12.23)—it’s hard to guarantee that the receiving buffer won’t overflow. [footnote]
More generally, scanf is designed for relatively structured, formatted input (its name is in fact derived from “scan formatted”). If you pay attention, it will tell you whether it succeeded or failed, but it can tell you only approximately where it failed, and not at all how or why. You have very little opportunity to do any error recovery.
Yet interactive user input is the least structured input there is. A well-designed user interface will allow for the possibility of the user typing just about anything—not just letters or punctuation when digits were expected, but also more or fewer characters than were expected, or no characters at all (i.e., just the RETURN key), or premature EOF, or anything. It’s nearly impossible to deal gracefully with all of these potential problems when using scanf; it’s far easier to read entire lines (with fgets or the like), then interpret them, either using sscanf or some other techniques. (Functions like strtol, strtok, and atoi are often useful; see also questions 12.16 and 13.6.) If you do use any scanf variant, be sure to check the return value to make sure that the expected number of items were found. Also, if you use %s, be sure to guard against buffer overflow.
Note, by the way, that criticisms of scanf are not necessarily indictments of fscanf and sscanf. scanf reads from stdin, which is usually an interactive keyboard and is therefore the least constrained, leading to the most problems. When a data file has a known format, on the other hand, it may be appropriate to read it with fscanf. It’s perfectly appropriate to parse strings with sscanf (as long as the return value is checked), because it’s so easy to regain control, restart the scan, discard the input if it didn’t match, etc.
Additional links:
longer explanation by Chris Torek
longer explanation by yours truly
References: K&R2 Sec. 7.4 p. 159
It is very hard to get scanf to do the thing you want. Sure, you can, but things like scanf("%s", buf); are as dangerous as gets(buf);, as everyone has said.
As an example, what paxdiablo is doing in his function to read can be done with something like:
scanf("%10[^\n]%*[^\n]", buf));
getchar();
The above will read a line, store the first 10 non-newline characters in buf, and then discard everything till (and including) a newline. So, paxdiablo's function could be written using scanf the following way:
#include <stdio.h>
enum read_status {
OK,
NO_INPUT,
TOO_LONG
};
static int get_line(const char *prompt, char *buf, size_t sz)
{
char fmt[40];
int i;
int nscanned;
printf("%s", prompt);
fflush(stdout);
sprintf(fmt, "%%%zu[^\n]%%*[^\n]%%n", sz-1);
/* read at most sz-1 characters on, discarding the rest */
i = scanf(fmt, buf, &nscanned);
if (i > 0) {
getchar();
if (nscanned >= sz) {
return TOO_LONG;
} else {
return OK;
}
} else {
return NO_INPUT;
}
}
int main(void)
{
char buf[10+1];
int rc;
while ((rc = get_line("Enter string> ", buf, sizeof buf)) != NO_INPUT) {
if (rc == TOO_LONG) {
printf("Input too long: ");
}
printf("->%s<-\n", buf);
}
return 0;
}
One of the other problems with scanf is its behavior in case of overflow. For example, when reading an int:
int i;
scanf("%d", &i);
the above cannot be used safely in case of an overflow. Even for the first case, reading a string is much more simpler to do with fgets rather than with scanf.
Yes, you are right. There is a major security flaw in scanf family(scanf,sscanf, fscanf..etc) esp when reading a string, because they don't take the length of the buffer (into which they are reading) into account.
Example:
char buf[3];
sscanf("abcdef","%s",buf);
clearly the the buffer buf can hold MAX 3 char. But the sscanf will try to put "abcdef" into it causing buffer overflow.
Problems I have with the *scanf() family:
Potential for buffer overflow with %s and %[ conversion specifiers. Yes, you can specify a maximum field width, but unlike with printf(), you can't make it an argument in the scanf() call; it must be hardcoded in the conversion specifier.
Potential for arithmetic overflow with %d, %i, etc.
Limited ability to detect and reject badly formed input. For example, "12w4" is not a valid integer, but scanf("%d", &value); will successfully convert and assign 12 to value, leaving the "w4" stuck in the input stream to foul up a future read. Ideally the entire input string should be rejected, but scanf() doesn't give you an easy mechanism to do that.
If you know your input is always going to be well-formed with fixed-length strings and numerical values that don't flirt with overflow, then scanf() is a great tool. If you're dealing with interactive input or input that isn't guaranteed to be well-formed, then use something else.
Many answers here discuss the potential overflow issues of using scanf("%s", buf), but the latest POSIX specification more-or-less resolves this issue by providing an m assignment-allocation character that can be used in format specifiers for c, s, and [ formats. This will allow scanf to allocate as much memory as necessary with malloc (so it must be freed later with free).
An example of its use:
char *buf;
scanf("%ms", &buf); // with 'm', scanf expects a pointer to pointer to char.
// use buf
free(buf);
See here. Disadvantages to this approach is that it is a relatively recent addition to the POSIX specification and it is not specified in the C specification at all, so it remains rather unportable for now.
The advantage of scanf is once you learn how use the tool, as you should always do in C, it has immensely useful usecases. You can learn how to use scanf and friends by reading and understanding the manual. If you can't get through that manual without serious comprehension issues, this would probably indicate that you don't know C very well.
scanf and friends suffered from unfortunate design choices that rendered it difficult (and occasionally impossible) to use correctly without reading the documentation, as other answers have shown. This occurs throughout C, unfortunately, so if I were to advise against using scanf then I would probably advise against using C.
One of the biggest disadvantages seems to be purely the reputation it's earned amongst the uninitiated; as with many useful features of C we should be well informed before we use it. The key is to realise that as with the rest of C, it seems succinct and idiomatic, but that can be subtly misleading. This is pervasive in C; it's easy for beginners to write code that they think makes sense and might even work for them initially, but doesn't make sense and can fail catastrophically.
For example, the uninitiated commonly expect that the %s delegate would cause a line to be read, and while that might seem intuitive it isn't necessarily true. It's more appropriate to describe the field read as a word. Reading the manual is strongly advised for every function.
What would any response to this question be without mentioning its lack of safety and risk of buffer overflows? As we've already covered, C isn't a safe language, and will allow us to cut corners, possibly to apply an optimisation at the expense of correctness or more likely because we're lazy programmers. Thus, when we know the system will never receive a string larger than a fixed number of bytes, we're given the ability to declare an array that size and forego bounds checking. I don't really see this as a down-fall; it's an option. Again, reading the manual is strongly advised and would reveal this option to us.
Lazy programmers aren't the only ones stung by scanf. It's not uncommon to see people trying to read float or double values using %d, for example. They're usually mistaken in believing that the implementation will perform some kind of conversion behind the scenes, which would make sense because similar conversions happen throughout the rest of the language, but that's not the case here. As I said earlier, scanf and friends (and indeed the rest of C) are deceptive; they seem succinct and idiomatic but they aren't.
Inexperienced programmers aren't forced to consider the success of the operation. Suppose the user enters something entirely non-numeric when we've told scanf to read and convert a sequence of decimal digits using %d. The only way we can intercept such erroneous data is to check the return value, and how often do we bother checking the return value?
Much like fgets, when scanf and friends fail to read what they're told to read, the stream will be left in an unusual state;
In the case of fgets, if there isn't sufficient space to store a complete line, then the remainder of the line left unread might be erroneously treated as though it's a new line when it isn't.
In the case of scanf and friends, a conversion failed as documented above, the erroneous data is left unread on the stream and might be erroneously treated as though it's part of a different field.
It's no easier to use scanf and friends than to use fgets. If we check for success by looking for a '\n' when we're using fgets or by inspecting the return value when we use scanf and friends, and we find that we've read an incomplete line using fgets or failed to read a field using scanf, then we're faced with the same reality: We're likely to discard input (usually up until and including the next newline)! Yuuuuuuck!
Unfortunately, scanf both simultaneously makes it hard (non-intuitive) and easy (fewest keystrokes) to discard input in this way. Faced with this reality of discarding user input, some have tried scanf("%*[^\n]%*c");, not realising that the %*[^\n] delegate will fail when it encounters nothing but a newline, and hence the newline will still be left on the stream.
A slight adaptation, by separating the two format delegates and we see some success here: scanf("%*[^\n]"); getchar();. Try doing that with so few keystrokes using some other tool ;)
There is one big problem with scanf-like functions - the lack of any type safety. That is, you can code this:
int i;
scanf("%10s", &i);
Hell, even this is "fine":
scanf("%10s", i);
It's worse than printf-like functions, because scanf expects a pointer, so crashes are more likely.
Sure, there are some format-specifier checkers out there, but, those are not perfect and well, they are not part of the language or the standard library.