Reading a string, char[] until end of line C - c

I need to read a file name, but I want my code working for names contains space.
How to read until end of line from keyboard?
My code:
#define szoveghosz 256
//....
char bemenet[szoveghosz];
fgets (bemenet,sizeof(bemenet),stdin);

Read carefully the documentation of fgets(3) (which might be locally available on your Linux computer with man fgets)
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops after
an EOF or a newline. If a newline is read, it is stored into the
buffer. A terminating null byte ('\0') is stored after the last
character in the buffer.
As documented, fgets will (when possible) keep the newline character. You probably want to remove it. So I recommend coding instead
memset (bemenet, 0, sizeof(bemenet)); // clear the buffer
if (fgets(bemenet, sizeof(bemenet), stdin)) {
char *eol = strchr(bemenet, '\n');
if (eol)
*eol = '\0';
/// do appropriate things on bemenet
}
See also strchr(3) & memset(3)
But as I commented, on Linux and POSIX systems, getline(3) is preferable (because it is allocating dynamically an arbitrarily long line). See this.
Notice that (in principle) a filename could contain a newline (but in most cases, you can forget that possibility). See also glob(3) & wordexp(3) and glob(7) and path_resolution(7).

Related

how to scan line in c program not from file

How to scan total line from user input with c program?
I tried scanf("%99[^\n]",st), but it is not working when I scan something before this scan statment.It worked if this is the first scan statement.
How to scan total line from user input with c program?
There are many ways to read a line of input, and your usage of the word scan suggests you're already focused on the scanf() function for the job. This is unfortunate, because, although you can (to some extent) achieve what you want with scanf(), it's definitely not the best tool for reading a line.
As already stated in the comments, your scanf() format string will stop at a newline, so the next scanf() will first find that newline and it can't match [^\n] (which means anything except newline). As a newline is just another whitespace character, adding a blank in front of your conversion will silently eat it up ;)
But now for the better solution: Assuming you only want to use standard C functions, there's already one function for exactly the job of reading a line: fgets(). The following code snippet should explain its usage:
char line[1024];
char *str = fgets(line, 1024, stdin); // read from the standard input
if (!str)
{
// couldn't read input for some reason, handle error here
exit(1); // <- for example
}
// fgets includes the newline character that ends the line, but if the line
// is longer than 1022 characters, it will stop early here (it will never
// write more bytes than the second parameter you pass). Often you don't
// want that newline character, and the following line overwrites it with
// 0 (which is "end of string") **only** if it was there:
line[strcspn(line, "\n")] = 0;
Note that you might want to check for the newline character with strchr() instead, so you actually know whether you have the whole line or maybe your input buffer was to small. In the latter case, you might want to call fgets() again.
How to scan total line from user input with c program?
scanf("%99[^\n]",st) reads a line, almost.
With the C Standard Library a line is
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2
scanf("%99[^\n]",st) fails to read the end of the line, the '\n'.
That is why on the 2nd call, the '\n' remains in stdin to be read and scanf("%99[^\n]",st) will not read it.
There are ways to use scanf("%99[^\n]",st);, or a variation of it as a step in reading user input, yet they suffer from 1) Not handling a blank line "\n" correctly 2) Missing rare input errors 3) Long line issues and other nuances.
The preferred portable solution is to use fgets(). Loop example:
#define LINE_MAX_LENGTH 200
char buf[LINE_MAX_LENGTH + 1 + 1]; // +1 for long lines detection, +1 for \0
while (fgets(buf, sizeof buf, stdin)) {
size_t eol = strcspn(buf, "\n"); **
buf[eol] = '\0'; // trim potential \n
if (eol >= LINE_MAX_LENGTH) {
// IMO, user input exceeding a sane generous threshold is a potential hack
fprintf(stderr, "Line too long\n");
// TBD : Handle excessive long line
}
// Use `buf[[]`
}
Many platforms support getline() to read a line.
Short-comings: Non C-standard and allow a hacker to overwhelm system resources with insanely long lines.
In C, there is not a great solution. What is best depends on the various coding goals.
** I prefer size_t eol = strcspn(buf, "\n\r"); to read lines in a *nix environment that may end with "\r\n".
scanf() should never be used for user input. The best way to get input from the user is with fgets().
Read more: http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html
char str[1024];
char *alline = fgets(str, 1024, stdin);
scanf("%[^'\n']s",alline);
I think the correct solution should be like this. It is worked for me.
Hope it helps.

Why does fgets() store a \0 after the last character in a buffer?

I've been doing abit of reading through the Linux programmer's manual looking up various functions and trying to get a deeper understanding of what they are/how they work.
Looking at fgets() I read "A '\0' is stored after the last character in the buffer .
I've read through What does \0 stand for? and have a pretty solid understanding of what \0 symbolizes (a null character right ?). But what I'm struggling to grasp is its relevance to fgets(), I don't really understand why it "needs" to end with a null character.
As you already said, you are probably aware that \0 constitutes the end of all strings in C. As per the C standard, everything that is a string needs to be \0 terminated.
Since fgets() makes a string, that string, of course, will be properly null terminated.
Do note that for all string functions in C, any string you use or generate with them must be terminated with a \0 character.
Because otherwise you do not know how long the resulting string is.
One of the arguments to fgets is the maximum number of characters to read, but it's just that: a maximum. If you ask for 512 characters, but there are only 8 in the buffer, you will only get 8 characters … and a NULL in the 9th slot to demark the logical end of the C-string.
Arguably, fgets could instead have been designed to return the number of characters read, but then for most purposes you'd only have to add the NULL byte yourself manually, and the function would have to find a way to signify an error other than returning a null pointer.
From C standards:
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
This is to make sure that there is no buffer-overflow (characters/contents are not going beyond the provided storage) is in the created string.
As all the people before me said, fgets reads bytes from a file and makes them into a standard C string, which is null-terminated. The termination with the \0 byte reflects the fact that this function is text-oriented.
If you don't want to use null-termination for the data read from the file, it's not a string (not text), and also the end-of-line byte \n has no significance. In this case, you can use fread.
So C has two functions to read from file: fgets for text and fread for non-text (binary data).
BTW if the input file has a genuine zero-valued byte, fgets will do an uncomfortable thing: it will continue reading until it reads an end-of-line byte \n, and the output "string" will have two (or more) null-terminations. This doesn't make any sense as text, so it's another example of fgets being text-oriented and unsuitable for arbitrary data.

Reading line by line in C

Currently to read a file line by line in C I am using:
char buffer[1024];
while(fgets(buffer, sizeof(buffer), file) != NULL) {
//do something with each line that is now stored in buffer
}
However there is no guarantee in the file that the line will be shorter than 1024. What will happen if a line is longer than 1024? Will the rest of the line be read in the next iteration of the while loop?
And how can I read line by line without a maximum length?
Yes, the rest of the line will be read in the next iteration.
You can detect whether or not you read a whole line by inspecting the last character of the string (i.e. the one before the null terminator) to see if it is '\n' or not -- fgets passes '\n' through to you.
There is no Standard C function which will read a line whilst dynamically allocating enough memory for it, however there is a POSIX function getline() which does that. You could write your own that uses fgets or otherwise to do the reading, in a loop with realloc, of course.
From the standards §7.19.7.2,
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into the
array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
From MSDN,
fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The newline character, if read, is included in the string.
So, yes fgets will read the rest of the line in next iteration if the it doesn't encounters the newline character within sizeof(buffer)-1 range.
If you want to read the whole line in one shot, then it is better to go with malloc and, if needed, reallocing the memory as per your needs.

About gets() in C

I am writing a C program, which has a 5-element array to store a string. And I am using gets() to get input. When I typed in more than 5 characters and then output the string, it just gave me all the characters I typed in. I know the string is terminated by a \0 so even I exceeded my array, it will still output the whole thing.
But what I am curious is where exactly gets() stores input, either buffer or just directly goes to my array?
What if I type in a long long string, will gets() try to store characters in the memories that should not be touched? Would it gives me a segment fault?
That's why gets is an evil. It does not check array bound and often invokes undefined behavior. Never use gets, instead you can use fgets.
By the way, now gets is no longer be a part of C. It has been removed in C11 standard in favor of a new safe alternative, gets_s1 (see the wiki). So, better to forget about gets.
1. C11: K.3.5.4.1 The gets_s function
Synopsis
#define _ _STDC_WANT_LIB_EXT1_ _ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
gets() will store the characters in the 5-element buffer. If you type in more than 4 characters, the end of string character will be missed and the result may not work well in any string operations in your program.
excerpt from man page on Ubuntu Linux
gets() reads a line from stdin into the buffer pointed to by s until
either a terminating newline or EOF, which it replaces with a null byte
('\0'). No check for buffer overrun is performed
The string is stored in the buffer and if it is too long it is stored in contiguous memory after the buffer. This can lead to unintended writing over of data or a SEGV fault or other problems. It is a security issue as it can be used to inject code into programs.
gets() stores the characters you type directly into your array and you can safely use/modify them. But indeed, as haccks and unxnut correctly state, gets doesn't care about the size of the array you give it to store its chars in, and when you type more characters than the array has space for you might eventually get a segmentation fault or some other weird results.
Just for the sake of completeness, gets() reads from a buffered file called stdin which contains the chars you typed. More specifically, it takes the chars until it reaches a newline. That newline too is put into your array and next the '\0' terminator. You should, as haccks says, use fgets which is very much alike:
char buf[100]; // the input buffer
fgets(buf, 100, stdin); // reads until it finds a newline (your enter) but never
// more than 99 chars, using the last char for the '\0'
// you can now use and modify buf

Can fgets ever read an empty string?

Assuming the FILE* is valid, consider:
char buf[128];
if(fgets(buf,sizeof buf,myFile) != NULL) {
strlen(buf) == 0; //can this ever be true ? In what cases ?
}
Yes. Besides passing 1 (as noted by Ignacio), fgets doesn't do any special handling for embedded nulls. So if the next character in the FILE * is NUL, strlen will be 0. This is one of the reasons why I prefer the POSIX getline function. It returns the number of characters read so embedded nulls are not a problem.
From the fgets(3) man page:
DESCRIPTION
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an
EOF or a newline. If a newline is read, it is stored into the buffer.
A '\0' is stored after the last character in the buffer.
...
RETURN VALUE
...
gets() and fgets() return s on success, and NULL on error or when end
of file occurs while no characters have been read.
From that, it can be inferred that a size of 1 will cause it to read an empty string. Experimentation here confirms that.
Incidentally, a size of 0 appears to not modify the buffer at all, not even putting in a \0.

Resources