Why %[^\n]s does not work in loop? - c

I was writing a code to take a space-separated string input for a multiple time in between a loop. I saw that %[^\n]s did not work in loop but %[^\n]%*c did. My question is why %[^\n]sdid not work. Here is my code:
#include<stdio.h>
main(){
while(1){
char str[10];
scanf("%[^\n]s",str);
printf("%s\n",str);
}
return 0;
}

The format specifier %[^\n] means "read a string containing any characters until a newline is found". The newline itself is not consumed. When the newline is found, it is left on the input stream for the next conversion.
The format specifier %*c means "read exactly one character and discard it".
So the combination
scanf( "%[^\n]%*c", str );
means "read a string up to the newline character, put the string into the memory that str points to, and then discard the newline".
Given the format %[^\n]s, the s is not part of the conversion specifier. That format says "read characters until a newline is found, and then the next character should be an s". But next character will never be an s, because the next character will always be the newline. So the s in that format serves no purpose.

%[^\n] does not look for space-separated strings. It reads the whole line up until (but not including) the newline.
If your line contains more than 9 characters this causes undefined behaviour by overflowing the buffer.
The buffer overflow could be prevented by writing %9[^\n] but then you have a new issue: on a longer line, %*c would discard the 10th character and the next scan would keep reading from that same line.
Another complicating factor is that if your file contains a blank line, then %[ considers it a matching failure. That means scanf stops, so it does not go on to process %*c. In this case, the newline is not consumed, and the output buffer is not written to at all.
Your code also goes into an infinite loop because you never break out of while(1).
This code shows correct use of ^[:
while (1)
{
str[0] = '\0'; // In case of matching failure
scanf("%9[^\n]", str); // read as much of the line as we can
scanf("%*[^\n]"); // discard the rest of the line
if ( getchar() == EOF ) // discard the newline
break; // exit loop when we finished the input
printf("%s\n", str);
}
If you really do want to read space-separated text then you could use %9s instead of %9[^\n] in my above example. That introduces a new difference though: %s skips over a blank line.
If that's OK then that's OK. If you want to not skip blank lines, then you could use my code above, but add in at the end:
char *p = strchr(str, ' ');
if ( p )
*p = '\0';
Robust string input is difficult in C!

Related

Difference between `"%[^\n]"` and `"%[^\n]\n"` and `"%[^\n]%*c"` in C how do they work?

I could not find the answer anywhere else.
%[^\n] - When I run this one, scanf is getting input and terminating after I press enter. ( Probably leaving \n in the input system)
%[^\n]\n - this one is getting the input but scanf is NOT terminating immediately after I press enter like the one above. I hit more enter and it makes more newlines. When I give a character and then press enter it finally terminates. Example:
int main(void)
{
char s[100];
scanf("%[^\n]\n", s);
printf("%s", s);
return 0;
}
The results:
Last one:
%[^\n]%*c - When I give some input and press enter. scanf immediately terminates.
How do those 3 work and how do they differ?
All 3 format begin with "%[^\n]".
"%[^\n]" is poor code2 that lacks a width limit and is susceptible to buffer overrun. Use a width limit like "%99[^\n]" with char s[100];.
"%[...]" does not consume leading whitespace like many other specifiers.
This specifier directs reading input until a '\n'1 is encountered. The '\n' is put back into stdin. All other characters are saved in the matching destination's array s.
If no characters were read (not counting the '\n'), the specifier fails and scanf() returns without changing s - rest of the format is not used. No null character appended.
If characters were read, they are saved and a null character is appended to s and scanning continues with the next portion of the format.
"\n" acts just like " ", "\t", "any_white_space_chracter" and reads and tosses 0 or more whitespace characters. It continues to do so until a non-white-space1 is read. That non-whitespace character is put back into stdin.
Given line buffered input, this means a line with non-whitespace following input is needed to see the next non-whitespace and allow scanf() to move on.
With scanf(), a "\n" at the end of a format is bad and caused OP's problem.
"%*c" reads 1 character1 and throws it away - even if it is not a '\n'. This specifier does not contribute to the return count due to the '*'.
A better alternative is to use fgets().
char s[100];
if (fgets(s, sizeof s, stdin)) {
s[strcspn(s, "\n")] = '\0'; // To lop off potential trailing \n
A lesser alternative is
char s[100] = { 0 };
scanf("%99[^\n]", s);
// With a separate scanf ....
scanf("%*1[\n]"); // read the next character if it is a \n and toss it.
1 ... or end-of-file or rare input error.
2 IMO, worse than gets().

What does scanf("%[^\n]%*c", &s); actually do?

I know for a fact that [^\n] is a delimiter that makes scanf to scan everything until "enter" key is hit. But I don't know what the remainder of "%[^\n]%*c" is used for. Also why do we have to mention "&s" instead of "s" in the scanf function.
I tried running this:
char s[100];
scanf("%[^\n]s",s);
scanf("%[^\n]s",&s);
Both the above scanf statements worked exactly the same for me. If there is any difference between them, What is it?
Also Why should I prefer scanf("%[^\n]%*c", &s); to the above declarations?
scanf("%[^\n]s",&s); has many troubles:
No check of return value.
No overrun protection
No consumption of '\n'
No assignment of s on '\n' only
No need for s in "%[^\n]s"
Wrong type &s with format.
scanf("%[^\n]s",s); only has one less problem (last one).
Use fgets(s, sizeof s, stdin)
if (fgets(s, sizeof s, stdin)) {
s[strcspn(s, "\n")] = 0; // Lop off potential ending \n
// Success
} else {
// failure
}
No check of return value.
Without checking the return value, success of reading is unknown and the state of s indeterminate.
No overrun protection
What happens when the 100th character is inputted? - "Very bad. Not good. Not good." Necron 99, Wizards 1977
No consumption of '\n'
'\n' remains in stdin to foul up the next read.
No assignment of s on '\n' only
If the input begin with '\n', scanf() returns and s unchanged. It is not assigned "".
No need for "s" in "%[^\n]s"
The "s" is not part of the specifier "%[^\n]". Drop it.
Wrong type &s with format.
%[...] matches a char *, not a pointer to a char[100] like &s (UB).
You can take a string as input in C using scanf(“%s”, s). But, it accepts string only until it finds the first space.
In order to take a line as input, you can use scanf("%[^\n]%*c", s); where s is defined as char s[MAX_LEN] where MAX_LEN is the maximum size of s . Here, [] is the scanset character. ^\n stands for taking input until a newline isn't encountered. Then, with this %*c, it reads the newline character and here, the used * indicates that this newline character is discarded.
Note: After inputting the character and the string, inputting the sentence by the above mentioned statement won't work. This is because, at the end of each line, a new line character (\n) is present. So, the statement: scanf("%[^\n]%*c", s); will not work because the last statement will read a newline character from the previous line. This can be handled in a variety of ways and one of them being: scanf("\n"); before the last statement.

What does %[^\n] mean in C?

What does %[^\n] mean in C?
I saw it in a program which uses scanf for taking multiple word input into a string variable. I don't understand though because I learned that scanf can't take multiple words.
Here is the code:
#include <stdio.h>
#include <stdlib.h>
int main() {
char line[100];
scanf("%[^\n]",line);
printf("Hello,World\n");
printf("%s",line);
return 0;
}
[^\n] is a kind of regular expression.
[...]: it matches a nonempty sequence of characters from the scanset (a set of characters given by ...).
^ means that the scanset is "negated": it is given by its complement.
^\n: the scanset is all characters except \n.
Furthermore fscanf (and scanf) will read the longest sequence of input characters matching the format.
So scanf("%[^\n]", s); will read all characters until you reach \n (or EOF) and put them in s. It is a common idiom to read a whole line in C.
See also §7.21.6.2 The fscanf function.
scanf("%[^\n]",line); is a problematic way to read a line. It is worse than gets().
C defines line as:
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined.
The scanf("%[^\n]", line) has the specifier "%[^\n]". It scans for unlimited number of characters that match the scan-set ^\n. If none are read, the specifier fails and scanf() returns with line unaltered. If at least one character is read, all matching characters are read and saved and a null character is appended.
The scan-set ^\n implies all character that are not (due to the '^') '\n'.
'\n' is not read
scanf("%[^\n]",.... fails to read a new line character '\n'. It remains in stdin. The entire line is not read.
Buffer overflow
The below leads to undefined behavior (UB) should more than 99 characters get read.
char line[100];
scanf("%[^\n]",line); // buffer overflow possible
Does nothing on empty line
When the line consists of only "\n", scanf("%[^\n]",line); returns a 0 without setting line[] - no null character is appended. This can readily lead to undefined behavior should subsequent code use an uninitialized line[]. The '\n' remains in stdin.
Failure to check the return value
scanf("%[^\n]",line); assumes input succeeded. Better code would check the scanf() return value.
Recommendation
Do not use scanf() and instead use fgets() to read a line of input.
#define EXPECTED_INPUT_LENGTH_MAX 49
char line[EXPECTED_INPUT_LENGTH_MAX + 1 + 1 + 1];
// \n + \0 + extra to detect overly long lines
if (fgets(line, sizeof line, stdin)) {
size_t len = strlen(line);
// Lop off potential trailing \n if desired.
if (len > 0 && line[len-1] == '\n') {
line[--len] = '\0';
}
if (len > EXPECTED_INPUT_LENGTH_MAX) {
// Handle error
// Usually includes reading rest of line if \n not found.
}
The fgets() approach has it limitations too. e.g. (reading embedded null characters).
Handling user input, possible hostile, is challenging.
scanf("%[^\n]",line);
means: scan till \n or an enter key.
scanf("%[^\n]",line);
Will read user input until enter is pressed or a newline character is added (\n) and store it into a variable named line.
Question: what is %[^\n] mean in C?
Basically the \n command prints the output in the next line, but in
case of C gives the Null data followed by the above problem only.
Because of that to remove the unwanted data or null data, need to add
Complement/negotiated symbol[^\n]. It gives all characters until the next line
and keeps the data in the defined expression.
Means it is the Complemented data or rewritten data from the trash
EX:
char number[100]; //defined a character ex: StackOverflow
scanf("%[^\n]",number); //defining the number without this statement, the
character number gives the unwanted stuff `���`
printf("HI\n"); //normaly use of printf statement
printf("%s",number); //printing the output
return 0;

What is the difference between these two scanf statements?

I am having some doubt. The doubt is
What is the difference between the following two scanf statements.
scanf("%s",buf);
scanf("%[^\n]", buf);
If I am giving the second scanf in the while loop, it is going infinitely. Because the \n is in the stdin.
But in the first statement, reads up to before the \n. It also will not read the \n.
But The first statement does not go in infinitely. Why?
Regarding the properties of the %s format specifier, quoting C11 standrad, chapter §7.21.6.2, fscanf()
s Matches a sequence of non-white-space characters.
The newline is a whitespace character, so only a newlinew won't be a match for %s.
So, in case the newline is left in the buffer, it does not scan the newline alone, and wait for the next non-whitespace input to appear on stdin.
The %s format specifier specifies that scanf() should read all characters in the standard input buffer stdin until it encounters the first whitespace character, and then stop there. The whitespace ('\n') remains in the stdin buffer until consumed by another function, like getchar().
In the second case there is no mention of stopping.
You can think of scanf as extracting words separated by whitespace from a stream of characters. Imagine reading a file which contains a table of numbers, for example, without worrying about the exact number count per line or the exact space count and nature between numbers.
Whitespace, for the record, is horizontal and vertical (these exist) tabs, carriage returns, newlines, form feeds and last not least, actual spaces.
In order to free the user from details, scanf treats all whitespace the same: It normally skips it until it hits a non-whitespace and then tries to convert the character sequence starting there according to the specified input conversion. E.g. with "%d" it expects a sequence of digits, perhaps preceded by a minus sign.
The input conversion "%s" also starts with skipping whitespace (and that's clearer documented in the opengroup's man page than in the Linux one).
After skipping leading whitespace, "%s" accepts everything until another whitespace is read (and put back in the input, because it isn't made part of the "word" being read). That sequence of non-whitespace chars -- basically a "word" -- is stored in the buffer provided. For example, scanning a string from " a bc " results in skipping 3 spaces and storing "a" in the buffer. (The next scanf would skip the intervening space and put "bc" in the buffer. The next scanf after that would skip the remaining whitespace, encounter the end of file and return EOF.) So if a user is asked to enter three words they could give three words on one line or on three lines or on any number of lines preceded or separated by any number of empty lines, i.e. any number of subsequent newlines. Scanf couldn't care less.
There are a few exceptions to the "skip leading whitespace" strategy. Both concern conversions which usually indicate that the user wants to have more control about the input conversion. One of them is "%c" which just reads the next character. The other one is the "%[" spec which details exactly which characters are considered part of the next "word" to read. The conversion specification you use, "%[^\n]", reads everything except newline. Input from the keyboard is normally passed to a program line by line, and each line is by definition terminated by a newline. The newline of the first line passed to your program will be the first character from the input stream which does not match the conversion specification. Scanf will read it, inspect it and then put it back in the input stream (with ungetc()) for somebody else to consume. Unfortunately, it will itself be the next consumer, in another loop iteration (as I assume). Now the very first character it encounters (the newline) does not match the input conversion (which demands anything but the newline). Scanf therefore gives up immediately, puts the offending character dutifully back in the input for somebody else to consume and returns 0 indicating the failure to even perfom the very first conversion in the format string. But alas, it itself will be the next consumer. Yes, machines are stupid.
First scanf("%s",buf); scan only word or string, but second scanf("%[^\n]", buf); reads a string until a user inputs is new line character.
Let's take a look at these two code snippets :
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%s", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%[^\n]", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello, my name is Claudio.
%[^\n] is an inverted group scan and this is how I personally use it, as it allows me to input a sentece with blank spaces in it.
Common
Both expect buf to be a pointer to a character array. Both append a null character to that array if at least 1 character was saved. Both return 1 if something was saved. Both return EOF if end-of-file detected before saving anything. Both return EOF in input error is detected. Both may save buf with embedded '\0' characters in it.
scanf("%s",buf);
scanf("%[^\n]", buf);
Differences
"%s" 1) consumes and discards leading white-space including '\n', space, tab, etc. 2) then saves non-white-space to buf until 3) a white-space is detected (which is then put back into stdin). buf will not contain any white-space.
"%[^\n]" 1) does not consume and discards leading white-space. 2) it saves non-'\n' characters to buf until 3) a '\n' is detected (which is then put back into stdin). If the first character read is a '\n', then nothing is saved in buf and 0 is returned. The '\n' remains in stdin and explains OP's infinite loop.
Failure to test the return value of scanf() is a common code oversight. Better code checks the return value of scanf().
IMO: code should never use either:
Both fail to limit the number of characters read. Use fgets().
You can think of %s as %[^\n \t\f\r\v], that is, after skipping any leading whitespace, a group a non-whitespace characters.

Why fgets takes cursor to next line?

I have taken a string from the keyboard using the fgets() function. However, when I print the string using printf(), the cursor goes to a new line.
Below is the code.
#include<stdio.h>
int main()
{
char name[25];
printf("Enter your name: ");
fgets(name, 24, stdin);
printf("%s",name);
return 0;
}
And below is the output.
-bash-4.1$ ./a.out
Enter your name: NJACK1 HERO
NJACK1 HERO
-bash-4.1$
Why is the cursor going to the next line even though I have not added a \n in the printf()?
However, I have noticed that if I read a string using scanf(), and then print it using printf() (without using \n), the cursor does not go to next line.
Does fgets() append a \n in the string ? If it does, will it append \0 first then \n, or \n first and then \0?
The reason printf is outputting a newline is that you have one in your string.
fgets is not "adding" a newline --- it is simply reading it from the input as well. Reading for fgets stops just after the newline (if any).
Excerpt from the manpage, emphasis mine:
The fgets() function reads at most one less than the number of characters specified by size from the given stream and stores them in the string str. Reading stops when a newline character is found, at end-of-file or error. The newline, if any, is retained. If any characters are read and there is no error, a `\0' character is appended to end the string.
An easy way to check if there's a newline is to use the help of one of my favorite little-known functions --- strcspn():
size_t newline_pos = strcspn(name, "\r\n");
if(name[newline_pos])
{
/* we had a newline, so name is complete; do whatever you want here */
//...
/* if this is the only thing you do
you do *not* need the `if` statement above (just this line) */
name[newline_pos] = 0;
}
else
{
/* `name` was truncated (the line was longer than 24 characters) */
}
Or, as an one-liner:
// WARNING: This means you have no way of knowing if the name was truncated!
name[strcspn(name, "\r\n")] = 0;
Because if there is a '\n' in the read text it will be taken by fgets(), the following was extracted from the 1570 draft §7.21.7.2 ¶ 2
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
I highlighted by making bold the part which says that the '\n' is kept by fgets().

Resources