There is more to this code obviously but I am just curious as to what this line of code actually does. I know the while loop and such but am new to the fscanf()
while (fscanf(input_file, "%s", curr_word) == 1)
fscanf() returns the number of input items successfully scanned and stored.
as per the man page
Return Value
These functions return the number of input items successfully matched and assigned, which can be fewer than provided for, or even zero in the event of an early matching failure.
In your case
while (fscanf(input_file, "%s", curr_word) == 1)
fsaacf() will return a value of 1 if it is able to successfully scan a string (as per the %s format specifier) from input_file and put it into curr_word.
fscanf(input_file, "%s", curr_word) reads the input stream input_file and stores the next sequence of non spacing characters into the array pointed to by cuur_word and appends a '\0' byte. As you can see, the size of this array is not passed to fscanf. This is a classical case of potential buffer overflow, a security flaw that can be exploited by a hacker by storing appropriate contents in the input stream.
After gets, the scanf family of library functions is the best source of buffer overflow bugs one can find.
It is very difficult to use fscanf correctly. Most C programmers should avoid it.
Related
So I have a .txt file that I want to read via stdin in c11 program using scanf().
The file is essentially many lines made of one single string.
example:
hello
how
are
you
How can I know when the file is finished, I tried comparing a string with a string made only with eof character but the code loops in error.
Any advice is much appreciated.
Linux manual says (RETURN section):
RETURN VALUE
On success, these functions return the number of input items
successfully matched and assigned; this can be fewer than
provided for, or even zero, in the event of an early matching
failure.
The value EOF is returned if the end of input is reached before
either the first successful conversion or a matching failure
occurs. EOF is also returned if a read error occurs, in which
case the error indicator for the stream (see ferror(3)) is set,
and errno is set to indicate the error.
So test if the return value of scanf equals to EOF.
You can read the file redirected from standard input using scanf(), one word at time, testing for successful conversion, until no more words can be read from stdin.
Here is a simple example:
#include <stdio.h>
int main() {
char word[40];
int n = 0;
while (scanf("%39s", word) == 1) {
printf("%d: %s\n", ++n, word);
}
return 0;
}
Note that you must tell scanf() the maximum number of characters to store into the destination array before the null pointer. Otherwise, any longer word present in the input stream will cause undefined behavior, a flaw attackers can try and exploit using specially crafted input.
i have tried to use k = getchar() but it doesn't work too;
here is my code
#include<stdio.h>
int main()
{
float height;
float k=0;
do
{
printf("please type a value..\n");
scanf("%f",&height);
k=height;
}while(k<0);// i assume letters and non positive numbers are below zero.
//so i want the loop to continue until one types a +ve float.
printf("%f",k);
return 0;
}
i want a if a user types letters or negative numbers or characters he/she should be prompted to type the value again until he types a positive number
Like Govind Parmar already suggested, it is better/easier to use fgets() to read a full line of input, rather than use scanf() et al. for human-interactive input.
The underlying reason is that the interactive standard input is line-buffered by default (and changing that is nontrivial). So, when the user starts typing their input, it is not immediately provided to your program; only when the user presses Enter.
If we do read each line of input using fgets(), we can then scan and convert it using sscanf(), which works much like scanf()/fscanf() do, except that sscanf() works on string input, rather than an input stream.
Here is a practical example:
#include <stdlib.h>
#include <stdio.h>
#define MAX_LINE_LEN 100
int main(void)
{
char buffer[MAX_LINE_LEN + 1];
char *line, dummy;
double value;
while (1) {
printf("Please type a number, or Q to exit:\n");
fflush(stdout);
line = fgets(buffer, sizeof buffer, stdin);
if (!line) {
printf("No more input; exiting.\n");
break;
}
if (sscanf(line, " %lf %c", &value, &dummy) == 1) {
printf("You typed %.6f\n", value);
continue;
}
if (line[0] == 'q' || line[0] == 'Q') {
printf("Thank you; now quitting.\n");
break;
}
printf("Sorry, I couldn't parse that.\n");
}
return EXIT_SUCCESS;
}
The fflush(stdout); is not necessary, but it does no harm either. It basically ensures that everything we have printf()'d or written to stdout, will be flushed to the file or device; in this case, that it will be displayed in the terminal. (It is not necessary here, because standard output is also line buffered by default, so the \n in the printf pattern, printing a newline, also causes the flush.
I do like to sprinkle those fflush() calls, wherever I need to remember that at this point, it is important for all output to be actually flushed to output, and not cached by the C library. In this case, we definitely want the prompt to be visible to the user before we start waiting for their input!
(But, again, because that printf("...\n"); before it ends with a newline, \n, and we haven't changed the standard output buffering, the fflush(stdout); is not needed there.)
The line = fgets(buffer, sizeof buffer, stdin); line contains several important details:
We defined the macro MAX_LINE_LEN earlier on, because fgets() can only read a line as long as the buffer it is given, and will return the rest of that line in following calls.
(You can check if the line read ended with a newline: if it does not, then either it was the final line in an input file that does not have a newline at the end of the last line, or the line was longer than the buffer you have, so you only received the initial part, with the rest of the line still waiting for you in the buffer.)
The +1 in char buffer[MAX_LINE_LEN + 1]; is because strings in C are terminated by a nul char, '\0', at end. So, if we have a buffer of 19 characters, it can hold a string with at most 18 characters.
Note that NUL, or nul with one ell, is the name of the ASCII character with code 0, '\0', and is the end-of-string marker character.
NULL (or sometimes nil), however, is a pointer to the zero address, and in C99 and later is the same as (void *)0. It is the sentinel and error value we use, when we want to set a pointer to a recognizable error/unused/nothing value, instead of pointing to actual data.
sizeof buffer is the number of chars, total (including the end-of-string nul char), used by the variable buffer.
In this case, we could have used MAX_LINE_LEN + 1 instead (the second parameter to fgets() being the number of characters in the buffer given to it, including the reservation for the end-of-string char).
The reason I used sizeof buffer here, is because it is so useful. (Do remember that if buffer was a pointer and not an array, it would evaluate to the size of a pointer; not the amount of data available where that pointer points to. If you use pointers, you will need to track the amount of memory available there yourself, usually in a separate variable. That is just how C works.)
And also because it is important that sizeof is not a function, but an operator: it does not evaluate its argument, it only considers the size (of the type) of the argument. This means that if you do something silly like sizeof (i++), you'll find that i is not incremented, and that it yields the exact same value as sizeof i. Again, this is because sizeof is an operator, not a function, and it just returns the size of its argument.
fgets() returns a pointer to the line it stored in the buffer, or NULL if an error occurred.
This is also why I named the pointer line, and the storage array buffer. They describe my intent as a programmer. (That is very important when writing comments, by the way: do not describe what the code does, because we can read the code; but do describe your intent as to what the code should do, because only the programmer knows that, but it is important to know that intent if one tries to understand, modify, or fix the code.)
The scanf() family of functions returns the number of successful conversions. To detect input where the proper numeric value was followed by garbage, say 1.0 x, I asked sscanf() to ignore any whitespace after the number (whitespace means tabs, spaces, and newlines; '\t', '\n', '\v', '\f', '\r', and ' ' for the default C locale using ASCII character set), and try to convert a single additional character, dummy.
Now, if the line does contain anything besides whitespace after the number, sscanf() will store the first character of that anything in dummy, and return 2. However, because I only want lines that only contain the number and no dummy characters, I expect a return value of 1.
To detect the q or Q (but only as the first character on the line), we simply examine the first character in line, line[0].
If we included <string.h>, we could use e.g. if (strchr(line, 'q') || strchr(line, 'Q')) to see if there is a q or Q anywhere in the line supplied. The strchr(string, char) returns a pointer to the first occurrence of char in string, or NULL if none; and all pointers but NULL are considered logically true. (That is, we could equivalently write if (strchr(line, 'q') != NULL || strchr(line, 'Q') != NULL).)
Another function we could use declared in <string.h> is strstr(). It works like strchr(), but the second parameter is a string. For example, (strstr(line, "exit")) is only true if line has exit in it somewhere. (It could be brexit or exitology, though; it is just a simple substring search.)
In a loop, continue skips the rest of the loop body, and starts the next iteration of the loop body from the beginning.
In a loop, break skips the rest of the loop body, and continues execution after the loop.
EXIT_SUCCESS and EXIT_FAILURE are the standard exit status codes <stdlib.h> defines. Most prefer using 0 for EXIT_SUCCESS (because that is what it is in most operating systems), but I think spelling the success/failure out like that makes it easier to read the code.
I wouldn't use scanf-family functions for reading from stdin in general.
fgets is better since it takes input as a string whose length you specify, avoiding buffer overflows, which you can later parse into the desired type (if any). For the case of float values, strtof works.
However, if the specification for your deliverable or homework assignment requires the use of scanf with %f as the format specifier, what you can do is check its return value, which will contain a count of the number of format specifiers in the format string that were successfully scanned:
ยง 7.21.6.2:
The [scanf] function returns the value of the macro EOF if an input failure occurs
before the first conversion (if any) has completed. Otherwise, the function returns the
number of input items assigned, which can be fewer than provided for, or even zero, in
the event of an early matching failure.
From there, you can diagnose whether the input is valid or not. Also, when scanf fails, stdin is not cleared and subsequent calls to scanf (i.e. in a loop) will continue to see whatever is in there. This question has some information about dealing with that.
I'm making a program that counts the number of words contained within a file. My code works for certain test cases with files that have less than a certain amount of words/characters...But when I test it with, let's say, a word like:
"loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong", (this is not random--this is an actual test case I'm required to check), it gives me this error:
*** stack smashing detected ***: ./wcount.out terminated
Abort (core dumped)
I know what the error means and that I have to implement some sort of malloc line of code to be able to allocate the right amount of memory, but I can't figure out where in my function to put it or how to do it:
int NumberOfWords(char* argv[1]) {
FILE* inFile = NULL;
char temp_word[20]; <----------------------I think this is the problem
int num_words_in_file;
int words_read = 0;
inFile = fopen(argv[1], "r");
while (!feof(inFile)) {
fscanf(inFile, "%s", temp_word);
words_read++;
}
num_words_in_file = words_read;
printf("There are %d word(s).\n", num_words_in_file - 1);
fclose(inFile);
return num_words_in_file;
}
As you've correctly identified by rendering your source code invalid (future tip: /* put your arrows in comments */), the problem is that temp_word only has enough room for 20 characters (one of which must be a terminal null character).
In addition, you should check the return value of fopen. I'll leave that as an exercise for you. I've answered this question in other questions (such as this one), but I don't think just shoving code into your face will help you.
In this case, I think it may pay to better analyse the problem you have, to see if you actually need to store words to count them. As we define a word (the kind read by scanf("%s", ...) as a sequence of non-whitespace characters followed by a sequence of (zero or more) whitespace characters, we can see that such a counting program as yours needs to follow the following procedure:
Read as much whitespace as possible
Read as much non-whitespace as possible
Increment the "word" counter if all was successful
You don't need to store the non-whitespace any more than you do the whitespace, because once you've read it you'll never revisit it. Thus you could write this as two loops embedded into one: one loop which reads as much whitespace as possible, another which reads non-whitespace, followed by your incrementation and then the outer loop repeats the whole lot... until EOF is reached...
This will be best achieved using the %*s directive, which tells scanf-related functions not to try to store the word. For example:
size_t word_count = 0;
do {
fscanf(inFile, "%*s");
} while (!feof(inFile) && ++word_count);
You are limited by the size of your array. A simple solution would be to increase the size of your array. But you are always susceptible to stack smashing if someone enters a long word.
A word is delimited by spaces.
You can simply store a counter variable initialized to zero, and a variable that records the current char that you are looking at. Every time you read in a character using fgetc(inFile, &temp) that is a space, you increment the counter.
In your current code you simply want to count the words. Therefore you are not interested in the words themselves. You can suppress the assignment with the optional * character:
fscanf(inFile, "%*s");
I was wondering if it is possible to only read in particular parts of a string using scanf.
For example since I am reading from a file i use fscanf
if I wanted to read name and number (where number is the 111-2222) when they are in a string such as:
Bob Hardy:sometext:111-2222:sometext:sometext
I use this but its not working:
(fscanf(read, "%23[^:] %27[^:] %10[^:] %27[^:] %d\n", name,var1, number, var2, var3))
Your initial format string fails because it does not consume the : delimiters.
If you want scanf() to read a portion of the input, but you don't care what is actually read, then you should use a field descriptor with the assignment-suppression flag (*):
char nl;
fscanf(read, "%23[^:]:%*[^:]:%10[^:]%*[^\n]%c", name, number, &nl);
As a bonus, you don't need to worry about buffer overruns for fields with assignment suppressed.
You should not attempt to match a single newline via a trailing newline character in the format, because a literal newline (or space or tab) in the format will match any run of whitespace. In this particular case, it would consume not just the line terminator but also any leading whitespace on the next line.
The last field is not suppressed, even though it will almost always receive a newline, because that way you can tell from the return value if you've scanned the last line of the file and it is not newline-terminated.
Check fscanf() return value.
fscanf(read, "%23[^:] %27[^:] ... is failing because after scanning the first field with %23[^:], fscanf() encounters a ':'. Since that does not match the next part of the format, a white-space as in ' ', scanning stops.
Had code checked the returned value of fscanf(), which was certainly 1, it may have been self-evident the source of the problem. So the scanning needs to consume the ':', add it to the format: "%23[^:]: %27[^:]: ...
Better to use fgets()
Using fscanf() to read data and detect properly and improperly formatted data is very challenging. It can be done correctly to scan expected input. Yet it rarely works to handle some incorrectly formated input.
Instead, simple read a line of data and then parse it. Using '%n' is an easy way to detect complete conversion as it saves the char scan count - if scanning gets there.
char buffer[200];
if (fgets(buffer, sizeof buffer, read) == NULL) {
return EOF;
}
int n = 0;
sscanf(buffer, " %23[^:]: %27[^:]: %10[^:]: %27[^:]:%d %n",
name, var1, number, var2, &var3, &n);
if (n == 0) {
return FAIL; // scan incomplete
}
if (buffer[n]) {
return FAIL; // Extra data on line
}
// Success!
Note: sample input ended with text, but original format used "%d". Unclear on OP's intent.
I have been told that scanf should not be used when user inputs a string. Instead, go for gets() by most of the experts and also the users on StackOverflow. I never asked it on StackOverflow why one should not use scanf over gets for strings. This is not the actual question but answer to this question is greatly appreciated.
Now coming to the actual question. I came across this type of code -
scanf("%[^\n]s",a);
This reads a string until user inputs a new line character, considering the white spaces also as string.
Is there any problem if I use
scanf("%[^\n]s",a);
instead of gets?
Is gets more optimized than scanf function as it sounds, gets is purely dedicated to handle strings. Please let me know about this.
Update
This link helped me to understand it better.
gets(3) is dangerous and should be avoided at all costs. I cannot envision a use where gets(3) is not a security flaw.
scanf(3)'s %s is also dangerous -- you must use the "field width" specifier to indicate the size of the buffer you have allocated. Without the field width, this routine is as dangerous as gets(3):
char name[64];
scanf("%63s", name);
The GNU C library provides the a modifier to %s that allocates the buffer for you. This non-portable extension is probably less difficult to use correctly:
The GNU C library supports a nonstandard extension that
causes the library to dynamically allocate a string of
sufficient size for input strings for the %s and %a[range]
conversion specifiers. To make use of this feature, specify
a as a length modifier (thus %as or %a[range]). The caller
must free(3) the returned string, as in the following
example:
char *p;
int n;
errno = 0;
n = scanf("%a[a-z]", &p);
if (n == 1) {
printf("read: %s\n", p);
free(p);
} else if (errno != 0) {
perror("scanf");
} else {
fprintf(stderr, "No matching characters\n"):
}
As shown in the above example, it is only necessary to call
free(3) if the scanf() call successfully read a string.
Firstly, it is not clear what that s is doing in your format string. The %[^\n] part is a self-sufficient format specifier. It is not a modifier for %s format, as you seem to believe. This means that "%[^\n]s" format string will be interpreted by scanf as two independent format specifiers: %[^\n] followed by a lone s. This will direct scanf to read everything until \n is encountered (leaving \n unread), and then require that the next input character is s. This just doesn't make any sense. No input will match such self-contradictory format.
Secondly, what was apparently meant is scanf("%[^\n]", a). This is somewhat close to [no longer available] gets (or fgets), but it is not the same. scanf requires that each format specifiers matches at least one input character. scanf will fail and abort if it cannot match any input characters for the requested format specifier. This means that scanf("%[^\n]",a) is not capable of reading empty input lines, i.e. lines that contain \n character immediately. If you feed such a line into the above scanf, it will return 0 to indicate failure and leave a unchanged. That's very different from how typical line-based input functions work.
(This is a rather surprising and seemingly illogical properly of %[] format. Personally, I'd prefer %[] to be able to match empty sequences and produce empty strings, but that's not how standard scanf works.)
If you want to read the input in line-by-lane fashion, fgets is your best option.