Skipping oversized inputs when using fgets(3) - c

this is probably quite easy to figure out, maybe i'm just looking in the wrong places, but how does one test if fgets has read an oversized input? In the code below, i'm trying to skip further processing for empty lines and oversized ones and go straight to the next line, for empty lines it works just fine.
Printing the strlen(buffer) when using line lengths < maxsize and it gives me expected values.
However when i enter lines that exceed the maxsize, it prints a value over 9000, which should still exceed the maxsize, and therefore enter the if-clause, but this doesn't happen. I've tried casting the return value of strlen into an int, didn't work.
What am i missing here? Thanks for any replies :)
char buffer[102];
while (fgets(buffer,100,stdin)!=NULL){
size_t maxsize = 102;
printf("%ld",strlen(buffer));
if(strcmp(buffer,"\n")==0||strlen(buffer)>maxsize){
continue;
}
//further processing
}
I

in the code:
char buffer[102];
while (fgets(buffer,100,stdin)!=NULL){
You don't need to give two more characters to buffer. The parameter size of fgets just can be the application of the sizeof operator, as in:
char buffer[102];
while (fgets(buffer, sizeof buffer, stdin) != NULL) {
That will give you space for lines of up to 101 characters (to leave space to the string terminator) including (or not, see below) the new line character.
But, answering your question, I understand that you want to know what happens if your input in one line is indeed bigger that the buffer size you provided, what happens then to the input, and how fgets deal with this:
Fgets() reads as many characters as it finds a \n in the input, or the buffer fills completely (this is, after including the \0 character that it must append to the string to terminate it) So, fgets() will fill as many characters in the buffer as the buffer has, minus one, reserved for the null string terminator, and the rest of the line will be read in the next fgets() (or another call to any of the functions of the stdio package).
So, basically, lines longer than one less than the buffer size are split in pieces, in which all except the last don't actually end in a new line, and the last will have the new line included, and will be shorter, all with a length of the length you specified minus one, but the last piece, in which the length is what it requires (again, always less than or equal than the length specified minus one)

Related

How to print file contents to stdout without storing them in memory?

My program takes in files with arbitrarily long lines. Since I don't know how much characters would be on a line, I would like to print the whole line to stdout, without malloc-ing an array to store it. Is this possible?
I am aware that it's possible to print these lines one chunk at a time-- however, the function doing the printing would be called very often, and I wish to avoid the overhead of malloc-ing arrays that hold the output, in every single call.
First of all you can't print things that's not exist, means that you have to store it somewhere, either in the stack or heap. If you use FILE* then libc will do it for you automatically.
Now if you use FILE*, you can use getc to get an ASCII character a time, check if the character is a newline character and push it to stdout.
If you's using file descriptor, you can read a character a time and do exactly the same thing.
Both approaches does not require you explicitly allocate memory in the heap.
Now if you use mmap, you can perform some strtok family function and then print the string to stdout.
takes in files with arbitrarily long lines ... print the whole line to stdout, without malloc-ing an array to store it. Is this possible?
In general, for arbitrary long lines: no.
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. C11dr §7.21.2 2
The length of a line is not limited to SIZE_MAX, the longest array possible in C. The length of a line can exceed the memory capacity of the computer. There is just no way to read arbitrary long lines. Simply code could use the following. I doubt it will be satisfactory, yet it does print the entire contents of a file with scant memory.
// Reads one character at a time.
int ch;
while((ch = fgetc(fp)) != EOF) {
putchar(ch);
}
Instead, code should set a sane upper bound on line length. Create an array or allocate for the line. As much as a flexible long line is useful, it is also susceptible to malicious abuse by a hacker exploit consuming unrestrained resources.
#define LINE_LENGTH_MAX 100000
char *line = malloc(LINE_LENGTH_MAX + 1);
if (line) {
while (fgets(line, LINE_LENGTH_MAX+1, fp)) {
if (strlen(line) >= LINE_LENGTH_MAX) {
Handle_Possible_Attach();
}
foo(line); // Use line
}
free(line);
)

how to scan line in c program not from file

How to scan total line from user input with c program?
I tried scanf("%99[^\n]",st), but it is not working when I scan something before this scan statment.It worked if this is the first scan statement.
How to scan total line from user input with c program?
There are many ways to read a line of input, and your usage of the word scan suggests you're already focused on the scanf() function for the job. This is unfortunate, because, although you can (to some extent) achieve what you want with scanf(), it's definitely not the best tool for reading a line.
As already stated in the comments, your scanf() format string will stop at a newline, so the next scanf() will first find that newline and it can't match [^\n] (which means anything except newline). As a newline is just another whitespace character, adding a blank in front of your conversion will silently eat it up ;)
But now for the better solution: Assuming you only want to use standard C functions, there's already one function for exactly the job of reading a line: fgets(). The following code snippet should explain its usage:
char line[1024];
char *str = fgets(line, 1024, stdin); // read from the standard input
if (!str)
{
// couldn't read input for some reason, handle error here
exit(1); // <- for example
}
// fgets includes the newline character that ends the line, but if the line
// is longer than 1022 characters, it will stop early here (it will never
// write more bytes than the second parameter you pass). Often you don't
// want that newline character, and the following line overwrites it with
// 0 (which is "end of string") **only** if it was there:
line[strcspn(line, "\n")] = 0;
Note that you might want to check for the newline character with strchr() instead, so you actually know whether you have the whole line or maybe your input buffer was to small. In the latter case, you might want to call fgets() again.
How to scan total line from user input with c program?
scanf("%99[^\n]",st) reads a line, almost.
With the C Standard Library a line is
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2
scanf("%99[^\n]",st) fails to read the end of the line, the '\n'.
That is why on the 2nd call, the '\n' remains in stdin to be read and scanf("%99[^\n]",st) will not read it.
There are ways to use scanf("%99[^\n]",st);, or a variation of it as a step in reading user input, yet they suffer from 1) Not handling a blank line "\n" correctly 2) Missing rare input errors 3) Long line issues and other nuances.
The preferred portable solution is to use fgets(). Loop example:
#define LINE_MAX_LENGTH 200
char buf[LINE_MAX_LENGTH + 1 + 1]; // +1 for long lines detection, +1 for \0
while (fgets(buf, sizeof buf, stdin)) {
size_t eol = strcspn(buf, "\n"); **
buf[eol] = '\0'; // trim potential \n
if (eol >= LINE_MAX_LENGTH) {
// IMO, user input exceeding a sane generous threshold is a potential hack
fprintf(stderr, "Line too long\n");
// TBD : Handle excessive long line
}
// Use `buf[[]`
}
Many platforms support getline() to read a line.
Short-comings: Non C-standard and allow a hacker to overwhelm system resources with insanely long lines.
In C, there is not a great solution. What is best depends on the various coding goals.
** I prefer size_t eol = strcspn(buf, "\n\r"); to read lines in a *nix environment that may end with "\r\n".
scanf() should never be used for user input. The best way to get input from the user is with fgets().
Read more: http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html
char str[1024];
char *alline = fgets(str, 1024, stdin);
scanf("%[^'\n']s",alline);
I think the correct solution should be like this. It is worked for me.
Hope it helps.

Reading the string with defined number of characters from the input

So I am trying to read a defined number of characters from the input. Let's say that I want to read 30 characters and put them in to a string. I managed to do this with a for loop, and I cleaned the buffer as shown below.
for(i=0;i<30;i++){
string[i]=getchar();
}
string[30]='\0';
while(c!='\n'){
c=getchar(); // c is some defined variable type char
}
And this is working for me, but I was wondering if there is another way to do this. I was researching and some of them are using sprintf() for this problem, but I didn't understand that solution. Then I found that you can use scanf with %s. And some of them use %3s when they want to read 3 characters. I tried this myself, but this command only reads the string till the first empty space. This is the code that I used:
scanf("%30s",string);
And when I run my program with this line, if I for example write: "Today is a beatiful day. It is raining, but it's okay i like rain." I thought that the first 30 characters would be saved in to the string. But when i try to read this string with puts(string); it only shows "Today".
If I use scanf("%s",string) or gets(string) that would rewrite some parts of my memory if the number of characters on input is greater than 30.
You can use scanf("%30[^\n]",s)
Actually, this is how you can set which characters to input. Here, carat sign '^' denotes negation, ie. this will input all characters except \n. %30 asks to input 30 characters. So, there you are.
The API you're looking for is fgets(). The man page describes
char *fgets(char *s, int size, FILE *stream);
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.

I/O in C Errors

I'm trying for hours to find the answer for this question i've got in university. I tried running this with writing a file with two lines of :
hello
world
and it reads the file perfectly, So i cant find the answer. I would appreciate your help !
A student wrote the next function for reading a text file and printing it exactly as it is.
void ReadFile(FILE *fIn)
{
char nextLine[MAX_LINE_LENGTH];
while(!feof(fIn))
{
fscanf(fIn,"%s",nextLine);
printf("%s\n",nextLine);
}
}
What are the two errors in this function?
You can assume that each line in the file is not longer than MAX_LINE_LENGTH characters, and that it is a text file that contains only alphabet characters, and that each line is terminated by '\n'.
Thanks.
It discards white space. Try adding multiple spaces and tabs.
It may evaluate a stream more than once, and If there is a read error, the loop never terminates.
See: Why is “while ( !feof (file) )” always wrong?
Reading strings via scanf is dangerous. There is no bounds checking. You may read past you MAX_LINE_LENGTH.(and boom! Segfault)
The main error is that fsacnf( fIn, "%s", nextLine ) doesn't scan a complete line.
From man page:
s
Matches a sequence of non-white-space characters; the next pointer must be a pointer to character array that is long enough to hold the input sequence and the terminating null byte ('\0'), which is added automatically. The input string stops at white space or at the maximum field width, whichever occurs first.
Thus if you have a line "a b" the first fscanf() will scan just "a" and the second one "b" and both are printed in two different lines. You can use fgets() to read a whole line.
The second one is maybe that it's stated "each line in the file is not longer than MAX_LINE_LENGTH characters" but nextLine can contain atmost MAX_LINE_LENGTH-1 characters (+ '\0'). That problem becomes even more important if you replace fscanf() by fgets() because than nextLine must have also capacity to store '\n' or '\r\n' (depending on the platform you're on)
A correct way of doing that is:
void ReadFile(FILE *fIn)
{
char nextLine[MAX_LINE_LENGTH];
while(fgets(nextLine, MAX_LINE_LENGTH, fIn)) {
printf("%s", nextLine);
}
}
As some have posted using feof to control a loop is not a good idea nor using fscanf to read lines.

fread() size argument

I want to read some data from the file, the data will have different sizes at different times.
If I use the below code, then:
char dataStr[256];
fread(dataStr, strlen(dataStr), 1, dFd);
fread is returning 0 for the above call and not reading any thing from the file.
But, if I give size as 1 then it successfully reads one char from the file.
What should be the value of size argument to the fread() function when we do not know how much is the size of the data in the file?
strlen counts the number of characters until it hits \0.
In this case you probably hit \0 on the very first character hence strlen returns 0 as the length and nothing is read.
You sould use sizeof instead of strlen.
You can't do that, obviously.
You can read until a known delimiter, often line feed, using fgets() to read a line. Or you can read a known-in-advance byte count, using that argument.
Of course, if there's an upper bound on the amount of data, you can read that always, and then somehow inspect the data to see what you got.
Also, in your example you're using strlen() on the argument that is going to be overwritten, that implies that it already contains a proper string of the exact same size as the data that is going to be read. This seems unlikely, you probably mean sizeof dataStr there.
You should use:
fread(dataStr, 1, sizeof dataStr, dFd);
to indicate that you want to read the number of bytes equal to the size of your array buffer.
The reason why your code doesn't work is that strlen() finds the length of a NULL-terminated string, not the size of the buffer. In your case, you run it on an uninitialized buffer and simply get lucky, your first byte in the buffer is NULL, so strlen(dataStr) returns 0, but is just as likely to crash or return some random number greater than your buffer size.
Also note that fread() returns the number of items read, not the number of characters (I swapped the second and the third arguments so that each character is equivalent to one item).
fread returns the number of successfully readed numblocks.
You can:
if( 1==fread(dataStr, 256, 1, dFd) )
puts("OK");
It reads ever the full length of your defined data; fread can't break on '\0'.

Resources