Nongreedy fscanf and buffer overflow check in c

Nongreedy fscanf and buffer overflow check in c - c

I'm looking to have fscanf identify when a potential overflow happens, and I can't wrap my head around how best to do it.
For example, for a file containing the string
**a**bb**cccc**
I do a
char str[10];
while (fscanf(inputf, "*%10[^*]*", str) != EOF) {
}
because I'm guaranteed that what is between ** and ** is usually less than 10. But sometimes I might get a
**a**bb**cccc*
(without the last *) or even potentially a buffer overflow.
I considered using
while (fscanf(inputf, "*%10[^*]", str) != EOF) {
}
(without the last *) or even
while (fscanf(inputf, "*%10s*", str) != EOF) {
}
but that would return the entire string. I tried seeing if I could check for the presence or lack of a *, but I can't get that to work. I've also seen implementation of fgets, but I'd rather not make it complicated. Any ideas?

While fscanf() seems to have been designed as a general purpose expression parser, few programmers rely on that ability. Instead, use fgets() to read a text line, and then use a parser of your choosing or design to dissect the text buffer.
Using the full features of fgets() is dodgy on different implementations and doesn't always provide full functionality, nor even get those implemented right.

I'm not clear on exactly what you want. Is it to skip over any number of stars, and then read up to 9 non-star characters into a buffer? If so, try this:
void read_field(FILE *fin, char buf[10])
{
int c;
char *ptr = buf;
while ((c = getc(fin)) == '*')
/*continue*/;
while (c != '*' && c != EOF && ptr < buf+9)
{
*ptr++ = c;
c = getc(fin);
}
*ptr = '\0';
/* skip to next star here? */
}
You will note that I am not using fscanf. That is because fscanf is nearly always more trouble than it's worth. The above is more typing, but I can be confident that it does what I described it as doing.

Related

Have to hit enter twice with fgets() in C?

Good Morning, I'm having an issue with some C code where I am forced to hit enter twice each time input is entered if the length of the input is less than the size of 'guess'.
If the length of the input is longer than guess, I only hit enter once, and it functions normally.
I'm not sure what the issue is here, but I provided the function in question that I believe is the source of the problem along with the caller function and main just for context.
Output:
Guess a number: 5555555555
Invalid guess.
Guess a number: 55555555555
Invalid guess.
Guess a number: 555
Invalid guess.
Guess a number:
Invalid guess.
Guess a number: 5555
When I remove the line:
while((ch = getchar()) != '\n' && ch != EOF); // Flush the input buffer
and I extend past the size of the buffer, I receive this output:
Guess a number: 5555555555555555555555555555555555
Invalid guess.
Guess a number: Invalid guess.
Guess a number: Invalid guess.
Guess a number: Invalid guess.
Guess a number: Invalid guess.
Function in Question
char * get_input(char * guess)
{
print_message("guess"); // Prompt user to input a guess
fgets(guess, sizeof (guess), stdin);
if(feof(stdin))
{
printf("error");
exit(EXIT_FAILURE);
}
int ch = 0;
while((ch = getchar()) != '\n' && ch != EOF); // Flush the input buffer
guess[strlen(guess)-1] = '\0'; // Erase new line character
return guess;
}
Caller Function
int make_guess(int *v_guess_count)
{
int result = 0;
bool valid = false;
char guess[10] = {'\0'}; // Buffer to store the guess
do
{
get_input(guess); // Get the input
if(is_valid(guess)) // Check if the input is valid
{
valid = true;
*v_guess_count += 1;
}
}
while (! valid); // Keep asking for input until guess is valid
result = assign_value(guess); // Assign the guess
return result;
}
Main
int main(int argc, char * argv[])
{
int red = 0;
int white = 0;
int v_guess_count = 0;
int target = get_target();
bool game_won = false;
while(game_won == false)
{
red, white = 0; // Reset to zero after each guess
int guess = make_guess(&v_guess_count); // Make a guess. If it's valid, assign it.
printf("guess is: %d\n", guess);
compare(guess, target, &red, &white); // Check the guess with the target number.
print_hints(&red, &white);
if (red == 4)
{
game_won = true;
}
}
printf("You win! It took you %d guesses.\n", v_guess_count);
return 0;
}

You have two somewhat-related problems.
One. In your function
char * get_input(char * guess)
your line
fgets(guess, sizeof (guess), stdin);
does not do what you think it does. You want to tell fgets how big the buffer is, that is, how much memory is pointed to by guess for fgets to read into. But in function get_input, parameter guess is a pointer, so sizeof(guess) is going to be the size of that pointer, not the size of the array it points to. That is, you're going to get a size of probably 4 or 8, not the 10 that array guess up in make_guess is declared as.
To fix this, change your input function to
char * get_input(char * guess, int guess_size)
and change the call in make_guess to
get_input(guess, sizeof(guess));
For more on this point, see this question and also this answer.
Two. Your array guess for reading the user's guess is too small. Instead of making it size 10, make it size 500 or something. That way it will "never" overflow. Don't worry that you're wasting memory by doing that — memory is cheap.
The reason for making the input buffer huge is this: If you make the buffer small, you have to worry about the case that the user might type a too-long line and that fgets might not be able to read all of it. If you make the buffer huge, on the other hand, you can declare that the problem "won't happen" and that you therefore don't have to worry about it. And the reason you'd like to not worry about it it is that worrying about it is hard, and leads to problems like the one you've had here.
To use fgets strictly correctly, while worrying about the possibility that the user's input overflows the buffer, means detecting that it happened. If fgets didn't read all the input, that means it's still sitting there on the input stream, waiting to confuse the rest of your program. In that case, yes, you have to read or "flush" or discard it. That's what your line
while((ch = getchar()) != '\n' && ch != EOF);
tries to do — but the point is that you need to do that only if fgets had the not-big-enough problem. If fgets didn't have the problem — if the buffer was big enough — you don't want to do the flush-the-input thing, because it'll gobble up the user's next line of intended input instead, as you've discovered.
Now, with this said, I have to caution you. In general, a strategy of "make your arrays huge, so you don't have to worry about the possibility that they're not big enough" is not a good strategy. In the general case, that strategy leads to insecure programs and horrible security problems due to buffer overruns.
In this case, though, the problem isn't too bad. fgets is going to do its best not to write more into the destination array than the destination array can hold. (fgets will do a perfect job of this — a perfect job of avoiding buffer overflow — if you pass the size correctly, that is, if you fix problem one.) If the buffer isn't big enough, the worst problem that will happen is that the too-long part of the input line will stay on the input stream and get read buy a later input function, thus confusing things.
So you do always want to think about the exceptional cases, and think about what your program is going to do under all circumstances, not just the "good" ones. And for "real" programs, you do have to strive to make the behavior correct for all cases. For a beginning program like this one, though, I think most people would agree that it's fine to just use a huge buffer, and be done with it.
If you want to go for the extra credit, and perfectly handle the case where the user typed more than the fgets input buffer will hold, you're going to first have to detect that case. The code would look something like:
if(fgets(guess, guess_size, stdin) == NULL)
{
printf("error");
exit(EXIT_FAILURE);
}
if(guess[strlen(guess)-1] != '\n')
{
/* buffer wasn't big enough */
int ch = 0;
while((ch = getchar()) != '\n' && ch != EOF); // Flush the input buffer
/* now print an error message or something, */
/* and ask the user to try again with shorter input */
}
But the point is that you do the while((ch = getchar()) != '\n' && ch != EOF) thing only in the case where fgets failed to read the whole line, not in the case where it succeeded.
If you're still with me, here are two somewhat-important footnotes.
I suggested changing your get_input function to take a second parameter int guess_size, but it turns out a better type to use for the sizes of things is size_t, so a better declaration would be size_t guess_size.
I suggested the test if(guess[strlen(guess)-1] != '\n') to detect that fgets wasn't able to read a full line, but that could fail (pretty badly) in the obscure case where fgets somehow returned an empty line. In that case strlen(guess) would be 0, and we'd end up accessing guess[-1] to see if it was the newline character, which is undefined and wrong. In practice it's probably impossible for fgets to return an empty string (at least, as long as you give it a buffer bigger than 1 to read into), but it's probably easier to write the code in a safer way than to convince yourself that it can't happen. There are a bunch of questions elsewhere on SO about practically and efficiently detecting the case that fgets didn't read a full line successfully, but just now I can't find any of them.

find = strchr(st, '\n'); replaced with *find = '\0';

I read a snippet of code from C Primer Plus, and tried hard to understand *find = '\0';
#include <stdio.h>
#include <string.h>
char *s_gets(char *st, int n);
struct book {
char title[40];
char author[40];
float value;
}
int main(void) {
...
}
char *s_gets(char *st, int n) {
char *ret_val;
char *find;
ret_val = fgets(st, n, stdin);
if (ret_val) {
find = strchr(st, '\n'); //look for newline
if (find) // if address is not null
*find = '\0'; //place a null character there
else
while (getchar() != '\n')
continue; //dispose rest of line
}
return ret_val;
}
For what purpose should find = strchr(st, '\n'); be followed by *find = '\0';
I searched strchr but found it an odd name although could get idea about it's function. Does the name strchr come from stringcharacter?

The code using find = strchr(s, '\n') and what follows zaps the newline that was read by fgets() and included in the result string, if there is indeed a newline in the string. Often, you can use an alternative, more compact, notation:
s[strcspn(s, "\n")] = '\0';
which is written without any conditional code visible. (If there's no newline, the null byte overwrites the existing null byte.)
The overall objective seems to be to make s_gets() behave more like an antique, dangerous and no longer standard function, gets(), which reads up to and including a newline, but does not include the newline in the result. The gets() function has other design flaws which make it a function to be forgotten — never use it!
The code shown also detects when no newline was read and then goes into a dangerous loop to read the rest of the line. The loop should be:
else
{
int c;
while ((c = getchar()) != EOF && c != '\n')
;
}
It is important to detect EOF; not all files end with a newline. It is also important to detect EOF reliably, which means this code has to use int c (whereas the original flawed loop could avoid using a variable like c). If this code carelessly used char c instead of int c, it could either fail to detect EOF altogether (if plain char is an unsigned type) or it could give a false positive for EOF when the data being read contains a byte with value 0xFF (if plain char is a signed type).
Note that using strcspn() directly as shown is not an option in this code because then you can't detect whether there was a newline in the data; you merely know there is no newline in the data after the call. As Antti Haapala points out, you could capture the result of strcspn() and then decide whether a newline was found and therefore whether to read to the end of line (or end of file if there is no EOL before the EOF).

Translation limit in C

I am attempting to capture input from the user via scanf:
char numStrings[5000];
printf("Enter string of numbers:\n\n");
scanf("%s", numStrings);
However, the length of the string that is inputted is 5000 characters. The translation limit in c99 is 4095 characters. Do I need to instruct the user to break their input in half or is there a better work around that I cannot think of?

You can input a string a lot larger than that, the stack is at least 1MB in common OS's, it's 8MB's on Linux, so that's the actual limit, 1M is 1024KB so you could for example try with 512KB which is 524288B
char string[524288];
scanf("%524287s", string);
will be most likely ok, if it's still too small, then use malloc().

No, you do not need to instruct the user to separate the input if it goes over a set length. The limit is on string literals, not strings. See the answer in this stackoverflow thread for more information. If you don't know what a reasonable max length is, then I would recommend using getline() or getdelim() if the delimiter that you want to use is not a line break.

Do I need to instruct the user to break their input in half or is there a better work around that I cannot think of?
As far as the code you've given goes, if the input word is longer than 4999 bytes then you can expect a buffer overflow. Yes, it would be wise to let someone (e.g. the user, or the guy who maintains this code next) know that's the maximum length. It's nice that you can truncate the input by using code like this: scanf("%4999s" "%*[^ \n]", numStrings);... The %*[^ \n] directive performs the truncation, in this case.
It'd be nicer yet if you can let the user know at the time that they overflow the buffer, but scanf doesn't make that an easy task. What would be even nicer (for the user, I mean) is if you could use dynamic allocation.
Ahh, the problem of dynamically sized input. If it can be avoided, then avoid it. One common method to avoid this problem is to require input in the form of argv, rather than stdin... but that's not always possible, useful or feasible.
scanf doesn't make this problem a particularly easy one to solve; in fact, it'd be much easier to solve if there were a similar functionality provided by %s in the form of an interface similar to fgets.
Without further adieu, here's an adaptation of the code I wrote in this answer, adapted for the purpose of reading (and simultaneously allocating) words in a similar procedure to that behind %s, rather than lines in a similar procedure to that behind fgets. Feel free to read that answer if you would like to know more about the inspiration behind it.
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
char *get_dynamic_word(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
int c;
do {
c = fgetc(f);
} while (c >= 0 && isspace(c));
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
bytes[bytes_read] = c >= 0 && !isspace(c)
? c
: '\0';
c = fgetc(f);
} while (bytes[bytes_read++]);
if (c >= 0) {
ungetc(c, f);
}
return bytes;
}

Dynamically allocate user inputted string

I am trying to write a function that does the following things:
Start an input loop, printing '> ' each iteration.
Take whatever the user enters (unknown length) and read it into a character array, dynamically allocating the size of the array if necessary. The user-entered line will end at a newline character.
Add a null byte, '\0', to the end of the character array.
Loop terminates when the user enters a blank line: '\n'
This is what I've currently written:
void input_loop(){
char *str = NULL;
printf("> ");
while(printf("> ") && scanf("%a[^\n]%*c",&input) == 1){
/*Add null byte to the end of str*/
/*Do stuff to input, including traversing until the null byte is reached*/
free(str);
str = NULL;
}
free(str);
str = NULL;
}
Now, I'm not too sure how to go about adding the null byte to the end of the string. I was thinking something like this:
last_index = strlen(str);
str[last_index] = '\0';
But I'm not too sure if that would work though. I can't test if it would work because I'm encountering this error when I try to compile my code:
warning: ISO C does not support the 'a' scanf flag [-Wformat=]
So what can I do to make my code work?
EDIT: changing scanf("%a[^\n]%*c",&input) == 1 to scanf("%as[^\n]%*c",&input) == 1 gives me the same error.

First of all, scanf format strings do not use regular expressions, so I don't think something close to what you want will work. As for the error you get, according to my trusty manual, the %a conversion flag is for floating point numbers, but it only works on C99 (and your compiler is probably configured for C90)
But then you have a bigger problem. scanf expects that you pass it a previously allocated empty buffer for it to fill in with the read input. It does not malloc the sctring for you so your attempts at initializing str to NULL and the corresponding frees will not work with scanf.
The simplest thing you can do is to give up on n arbritrary length strings. Create a large buffer and forbid inputs that are longer than that.
You can then use the fgets function to populate your buffer. To check if it managed to read the full line, check if your string ends with a "\n".
char str[256+1];
while(true){
printf("> ");
if(!fgets(str, sizeof str, stdin)){
//error or end of file
break;
}
size_t len = strlen(str);
if(len + 1 == sizeof str){
//user typed something too long
exit(1);
}
printf("user typed %s", str);
}
Another alternative is you can use a nonstandard library function. For example, in Linux there is the getline function that reads a full line of input using malloc behind the scenes.

No error checking, don't forget to free the pointer when you're done with it. If you use this code to read enormous lines, you deserve all the pain it will bring you.
#include <stdio.h>
#include <stdlib.h>
char *readInfiniteString() {
int l = 256;
char *buf = malloc(l);
int p = 0;
char ch;
ch = getchar();
while(ch != '\n') {
buf[p++] = ch;
if (p == l) {
l += 256;
buf = realloc(buf, l);
}
ch = getchar();
}
buf[p] = '\0';
return buf;
}
int main(int argc, char *argv[]) {
printf("> ");
char *buf = readInfiniteString();
printf("%s\n", buf);
free(buf);
}

If you are on a POSIX system such as Linux, you should have access to getline. It can be made to behave like fgets, but if you start with a null pointer and a zero length, it will take care of memory allocation for you.
You can use in in a loop like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strcmp
int main(void)
{
char *line = NULL;
size_t nline = 0;
for (;;) {
ptrdiff_t n;
printf("> ");
// read line, allocating as necessary
n = getline(&line, &nline, stdin);
if (n < 0) break;
// remove trailing newline
if (n && line[n - 1] == '\n') line[n - 1] = '\0';
// do stuff
printf("'%s'\n", line);
if (strcmp("quit", line) == 0) break;
}
free(line);
printf("\nBye\n");
return 0;
}
The passed pointer and the length value must be consistent, so that getline can reallocate memory as required. (That means that you shouldn't change nline or the pointer line in the loop.) If the line fits, the same buffer is used in each pass through the loop, so that you have to free the line string only once, when you're done reading.

Some have mentioned that scanf is probably unsuitable for this purpose. I wouldn't suggest using fgets, either. Though it is slightly more suitable, there are problems that seem difficult to avoid, at least at first. Few C programmers manage to use fgets right the first time without reading the fgets manual in full. The parts most people manage to neglect entirely are:
what happens when the line is too large, and
what happens when EOF or an error is encountered.
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer...
I don't feel I need to stress the importance of checking the return value too much, so I won't mention it again. Suffice to say, if your program doesn't check the return value your program won't know when EOF or an error occurs; your program will probably be caught in an infinite loop.
When no '\n' is present, the remaining bytes of the line are yet to have been read. Thus, fgets will always parse the line at least once, internally. When you introduce extra logic, to check for a '\n', to that, you're parsing the data a second time.
This allows you to realloc the storage and call fgets again if you want to dynamically resize the storage, or discard the remainder of the line (warning the user of the truncation is a good idea), perhaps using something like fscanf(file, "%*[^\n]");.
hugomg mentioned using multiplication in the dynamic resize code to avoid quadratic runtime problems. Along this line, it would be a good idea to avoid parsing the same data over and over each iteration (thus introducing further quadratic runtime problems). This can be achieved by storing the number of bytes you've read (and parsed) somewhere. For example:
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL, *temp;
do {
size_t alloc_size = bytes_read * 2 + 1;
temp = realloc(bytes, alloc_size);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time */
bytes_read += strcspn(bytes + bytes_read, "\n"); /* Parsing data the second time */
} while (temp && bytes[bytes_read] != '\n');
bytes[bytes_read] = '\0';
return bytes;
}
Those who do manage to read the manual and come up with something correct (like this) may soon realise the complexity of an fgets solution is at least twice as poor as the same solution using fgetc. We can avoid parsing data the second time by using fgetc, so using fgetc might seem most appropriate. Alas most C programmers also manage to use fgetc incorrectly when neglecting the fgetc manual.
The most important detail is to realise that fgetc returns an int, not a char. It may return typically one of 256 distinct values, between 0 and UCHAR_MAX (inclusive). It may otherwise return EOF, meaning there are typically 257 distinct values that fgetc (or consequently, getchar) may return. Trying to store those values into a char or unsigned char results in loss of information, specifically the error modes. (Of course, this typical value of 257 will change if CHAR_BIT is greater than 8, and consequently UCHAR_MAX is greater than 255)
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
int c = fgetc(f);
bytes[bytes_read] = c >= 0 && c != '\n'
? c
: '\0';
} while (bytes[bytes_read++]);
return bytes;
}

Why do these two methods return different things?

So...I was trying to make my own simple keylogger and this works for things typed at the shell, but if I double click the executable file it just puts a lot of these in the file: ÿ
I understand that as of now if I type a j it will end; this is for debugging:
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
int main(void)
{
FILE *fp = fopen("log", "w");
if (fp != NULL)
{
int x=0;
while (x==0)
{
char input=fgetc(stdin);
if (input==*"j")
x=1;
else
{
fprintf(fp, "%c\n",input);
}
}
fclose(fp);
}
return 0;
}

Probably because there's no input stream when you double click, or it's empty straight away. In those conditions, fgetc will return EOF continously. I can't say that for sure but it explains the symptoms you're seeing.
You need to compare input against EOF to see if the end of the stream has been found because, in that circumstance, you'll never get the chance to input j. Try changing:
if (input==*"j")
to:
if ((input == 'j') || (input == EOF))
(you'll noticed I've changed the rather ... unusual *"j" xonstruct to the simpler 'j' as well).
The return value from fgetc should also be an int since it has to represent every possible character plus EOF.

char input=fgetc(stdin);
Please note that fgetc() requires an int (well, something larger than a char -- int is customary) for its return value: EOF is a possible return value in addition to any of the values that char might take.
if (input==*"j")
Yikes, this is awkward. :) Character comparison (if that were okay in this case) would look like:
if (input == 'j')
Knowing the difference between a '' character and "" string are vital to being a good c programmer. It might feel stilted after the free-form 'string' "string" and """string""" sorts of behaviors from other scripting languages, but it's the way it is.
Typically, these sorts of programs are written with a different layout:
int c;
while((c = getchar()) != EOF) {
/* do something with c */
}
Putting an assignment and test in the condition of a while might feel weird at first, but it is idiomatic. (And I've sorely missed this behavior in languages that forbid it.)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Nongreedy fscanf and buffer overflow check in c - c

Related

Have to hit enter twice with fgets() in C?

find = strchr(st, '\n'); replaced with *find = '\0';

Translation limit in C

Dynamically allocate user inputted string

Why do these two methods return different things?

Categories

Resources