Details for reading strings in C? - c

Can someone explain how does this work? I know that it reads a string(with spaces), but I don't really understand the mechanism behind. Can someone explain
it to me piece by piece?
scanf("%[^\n]%*c",string);

scanf("%[^\n]%*c",string);
This says "read everything until a newline character and then read one character". * (in %*c) is to suppresses the assignment. That's it reads a line and consumes the newline character.
From scanf():
An optional '*' assignment-suppression character: scanf() reads
input as directed by the conversion specification, but discards the
input. No corresponding pointer argument is required, and this
specification is not included in the count of successful assignments
returned by scanf().
But I'd, for reading lines, use fgets() instead and consume the trailing newline afterwards.
char string[256];
if (fgets(string, sizeof string, stdin) == NULL) {
/* handle failure */
}
string[strcspn(string, "\n")] = 0; /* For removing the newline if present */
which is less error prone and better to understand.
If you are using glibc, you can use getline() as well.

From the man page:
All conversions are introduced by the % (percent sign) character
A conversion is how we match certain text strings. For example a %s matches a string, and %d matches a decimal integer. So looking at your string, we have a "%[ conversion, which according to the man page:
[ Matches a nonempty sequence of characters from the specified set of accepted characters; ... the set is defined by the characters between the open bracket [ character and a close bracket ] character.
So this conversion is going to define a list of characters which will be matched and read into your string. Important to this is:
The set excludes those characters if the first character after the open bracket is a circumflex ^.
And if you look at your string "%[^\n]%*c" you've got %[^\n] which means that you're matching any character until you hit a newline character.
Next, you have a %* the star is a conversion which ignores what matches after it. From the man page:
Suppresses assignment. The conversion that follows occurs as usual, but no pointer is used; the result of the conversion is simply discarded.
So if you look at your last match, you've got a c,
c Matches a sequence of width count characters (default 1);
so the %*c means that you're going to match 1 character, and then discard it (the character being matched is the newline - which the %[^\n] didn't consume because you matched everything up to that newline), it won't be stored in your string variable.
Reading the man page is your friend. I hope this helps.

Related

What does scanf("%*[\n] %[^\n]" do? [duplicate]

I am not able to understand the difference. I use %[^\n]s, for taking phrases input from the user. But this was not working when I needed to add two phrases. But the above one did. Please help me understanding me the difference.
The %[\n] directive tells scanf() to match newline characters, and the * flag signals that no assignment should be made, so %*[\n] skips over any leading newline characters (assuming there is at least one leading \n character: more on this in a moment). There is a space following this first directive, so zero or more whitespace characters are skipped before the final %[^\n] directive, which matches characters until a newline is encountered. These are stored in input_string[], and the newline character is left behind in the input stream. Subsequent calls using this format string will skip over this remaining newline character.
But, there is probably no need for the %*[\n] directive here, since \n is a whitespace character; almost the same thing could be accomplished with a leading space in the format string: " %[^\n]".
One difference between the two: "%*[\n] %[^\n]" expects there to be a newline at the beginning of the input, and without this the match fails and scanf() returns without making any assignments, while " %[^\n]" does not expect a leading newline, or even a leading whitespace character (but skips them if present).
If you used "%[^\n]" instead, as suggested in the body of the question (note that the trailing s is not a part of the scanset directive), the first call to scanf() would match characters until a newline is encountered. The matching characters would be stored in input_string[], and the newline would remain in the input stream. Then, if scanf() is called again with this format string, no characters would be matched before encountering the newline, so the match would fail without assignment.
Please note that you should always specify a maximum width when using %s or %[] in a scanf() format string to avoid buffer overflow. With either of %s or %[], scanf() automatically adds the \0 terminator, so you must be sure to allow room for this. For an array of size 100, the maximum width should be 99, so that at most 99 characters are matched and stored in the array before the null terminator is added. For example: " %99[^\n]".
In scanf function, '*' tells the function to ignore a character from input.
%*[\n]
This tells the function to ignore the first '\n' character and then accept any string
Run the code and first give "ENTER" as input and then give "I am feeling great!!!"
Now print the buffer. You will get I am feeling great!!! as output
Try this code snippet
int main()
{
char buffer[100];
printf("Enter a string:"),scanf("%*[\n] %[^\n]', buffer),printf("buffer:%s\n", buffer);
return 0;
}
%[^\n] is an edit conversion code for scanf() as an alternative of gets(str).
Unlike gets(str), scanf() with %s cannot read more than one word.
Using %[^\n], scanf() can read even the string with whitespace.
It will terminate receiving string input from the user when it encounters a newline character.

What is the difference between these two scanf statements?

I am having some doubt. The doubt is
What is the difference between the following two scanf statements.
scanf("%s",buf);
scanf("%[^\n]", buf);
If I am giving the second scanf in the while loop, it is going infinitely. Because the \n is in the stdin.
But in the first statement, reads up to before the \n. It also will not read the \n.
But The first statement does not go in infinitely. Why?
Regarding the properties of the %s format specifier, quoting C11 standrad, chapter §7.21.6.2, fscanf()
s Matches a sequence of non-white-space characters.
The newline is a whitespace character, so only a newlinew won't be a match for %s.
So, in case the newline is left in the buffer, it does not scan the newline alone, and wait for the next non-whitespace input to appear on stdin.
The %s format specifier specifies that scanf() should read all characters in the standard input buffer stdin until it encounters the first whitespace character, and then stop there. The whitespace ('\n') remains in the stdin buffer until consumed by another function, like getchar().
In the second case there is no mention of stopping.
You can think of scanf as extracting words separated by whitespace from a stream of characters. Imagine reading a file which contains a table of numbers, for example, without worrying about the exact number count per line or the exact space count and nature between numbers.
Whitespace, for the record, is horizontal and vertical (these exist) tabs, carriage returns, newlines, form feeds and last not least, actual spaces.
In order to free the user from details, scanf treats all whitespace the same: It normally skips it until it hits a non-whitespace and then tries to convert the character sequence starting there according to the specified input conversion. E.g. with "%d" it expects a sequence of digits, perhaps preceded by a minus sign.
The input conversion "%s" also starts with skipping whitespace (and that's clearer documented in the opengroup's man page than in the Linux one).
After skipping leading whitespace, "%s" accepts everything until another whitespace is read (and put back in the input, because it isn't made part of the "word" being read). That sequence of non-whitespace chars -- basically a "word" -- is stored in the buffer provided. For example, scanning a string from " a bc " results in skipping 3 spaces and storing "a" in the buffer. (The next scanf would skip the intervening space and put "bc" in the buffer. The next scanf after that would skip the remaining whitespace, encounter the end of file and return EOF.) So if a user is asked to enter three words they could give three words on one line or on three lines or on any number of lines preceded or separated by any number of empty lines, i.e. any number of subsequent newlines. Scanf couldn't care less.
There are a few exceptions to the "skip leading whitespace" strategy. Both concern conversions which usually indicate that the user wants to have more control about the input conversion. One of them is "%c" which just reads the next character. The other one is the "%[" spec which details exactly which characters are considered part of the next "word" to read. The conversion specification you use, "%[^\n]", reads everything except newline. Input from the keyboard is normally passed to a program line by line, and each line is by definition terminated by a newline. The newline of the first line passed to your program will be the first character from the input stream which does not match the conversion specification. Scanf will read it, inspect it and then put it back in the input stream (with ungetc()) for somebody else to consume. Unfortunately, it will itself be the next consumer, in another loop iteration (as I assume). Now the very first character it encounters (the newline) does not match the input conversion (which demands anything but the newline). Scanf therefore gives up immediately, puts the offending character dutifully back in the input for somebody else to consume and returns 0 indicating the failure to even perfom the very first conversion in the format string. But alas, it itself will be the next consumer. Yes, machines are stupid.
First scanf("%s",buf); scan only word or string, but second scanf("%[^\n]", buf); reads a string until a user inputs is new line character.
Let's take a look at these two code snippets :
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%s", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%[^\n]", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello, my name is Claudio.
%[^\n] is an inverted group scan and this is how I personally use it, as it allows me to input a sentece with blank spaces in it.
Common
Both expect buf to be a pointer to a character array. Both append a null character to that array if at least 1 character was saved. Both return 1 if something was saved. Both return EOF if end-of-file detected before saving anything. Both return EOF in input error is detected. Both may save buf with embedded '\0' characters in it.
scanf("%s",buf);
scanf("%[^\n]", buf);
Differences
"%s" 1) consumes and discards leading white-space including '\n', space, tab, etc. 2) then saves non-white-space to buf until 3) a white-space is detected (which is then put back into stdin). buf will not contain any white-space.
"%[^\n]" 1) does not consume and discards leading white-space. 2) it saves non-'\n' characters to buf until 3) a '\n' is detected (which is then put back into stdin). If the first character read is a '\n', then nothing is saved in buf and 0 is returned. The '\n' remains in stdin and explains OP's infinite loop.
Failure to test the return value of scanf() is a common code oversight. Better code checks the return value of scanf().
IMO: code should never use either:
Both fail to limit the number of characters read. Use fgets().
You can think of %s as %[^\n \t\f\r\v], that is, after skipping any leading whitespace, a group a non-whitespace characters.

scanf() behaviour for strings with more than one word

Well I've been programming in C for quite a while now, and there is this question about the function scanf()
here is my problem:
I know that every element in ASCII table is a character and I even know that %s is a data specified for a string which is a collection of characters
My questions:
1.why does scanf() stops scanning after we press enter. If enter is also character why cant it be added as a component of the string that is being scanned.
2.My second question and what I require the most is why does it stops scanning after a space, when space is again a character?
Note: My question is not about how to avoid these but how does this happen
I'd be happy if this is already addressed, I'd gladly delete my question and even if I've presumed something wrong please let me know
"why does scanf() stops scanning after we press enter." is not always true.
The "%s" directs scanf() as follows
char buffer[100];
scanf("%s", buffer);
Scan and consume all white-space including '\n' generated from multiple Enters. This data is not saved.
Input white-space characters (as specified by the isspace function) are skipped, unless the specification includes a [, c, or n specifier C11dr §7.21.6.2 8
Scan and save all non-white-space characters. Continue doing so until a white-space is encountered.
Matches a sequence of non-white-space characters §7.21.6.2 12
This white-space is put back into stdin for the next input function. (OP's 2nd question)
A null character is appended to buffer.
Operations may stop short if EOF occurs.
If too much data is save in buffer, it is UB.
If some non-white-space data is saved, return 1. If EOF encountered, return EOF.
Note: stdin is usually line buffered, so no keyboard data is given to stdin until a '\n' occurs.
From my reading of your question, both of your numbered questions are the same:
Why does scanf with a format specifier of %s stop reading after encountering a space or newline.
And the answer to both of your questions is: Because that is what scanf with the %s format specifier is documented to do.
From the documentation:
%s Matches a sequence of bytes that are not white-space characters.
A space and a newline character (generated by the enter key) are white-space characters.
I made miniprogram with scanf for get multiple name without stop on space or ever enter.
i use while
Scanf("%s",text);
While (1)
{
Scanf("%s",text1)
If (text1=='.'){break;}
//here i simple add text1 to text
}
This way i get one line if use the .
Now i use
scanf("%[^\n]",text);
It work great.

The scanf function, the specifer %s and the new line

I read into C11 standard this:
Input white-space characters (as specified by the isspace function) are
skipped, unless the specification includes a [, c, or n specifier.
so I understand that if I use that specifiers the next scanf can contains for example a new line.
But if I write this:
char buff[5 + 1];
printf("Input: ");
scanf("%10s", buff);
printf("Input: ");
char buff_2[5 + 1];
scanf("%[abcde]", buff_2);
and then I input, i.e., RR and then Return,
the next scanf fails because of \n.
So also %s doesn't discard a new line?
So also %s doesn't discard a new line?
%s tells scanf to discard any leading whitespace, including newlines. It will then read any non-whitespace characters, leaving any trailing whitespace in the input buffer.
So assuming your input stream looks like "\n\ntest\n", scanf("%s", buf) will discard the two leading newlines, consume the string "test", and leave the trailing newline in the input stream, so after the call the input stream looks like "\n".
Edit
Responding to xdevel2000's comment here.
Let's talk about how conversion specifiers work. Here are some relevant paragraphs from the online C 2011 standard:
7.21.6.2 The fscanf function
...
9 An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.285)
The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the
count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following
the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
12 The conversion specifiers and their meanings are:
...
c Matches a sequence of characters of exactly the number specified by the field width (1 if no field width is present in the directive).286)
...
s Matches a sequence of non-white-space characters.286)
...
[ Matches a nonempty sequence of characters from a set of expected characters
(the scanset).286)
...
285) fscanf pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable to strtod, strtol, etc., are unacceptable to fscanf.
286) No special provisions are made for multibyte characters in the matching rules used by the c, s, and [
conversion specifiers — the extent of the input field is determined on a byte-by-byte basis. The
resulting field is nevertheless a sequence of multibyte characters that begins in the initial shift state.
%s matches a sequence of non-whitespace characters. Here's a basic algorithm describing how it works (not taking into account end of file or other exceptional conditions):
c <- next character from input stream
while c is whitespace
c <- next character from input stream
while c is not whitespace
append c to target buffer
c <- next character from input stream
push c back onto input stream
append 0 terminator to target buffer
The first whitespace character after the non-whitespace characters (if any) is pushed back onto the input stream for the next input operation to read.
By contrast, the algorithm for the %c specifier is dead simple (unless you're using a field width greater than 1, which I've never done and won't get into here):
c <- next character from input stream
write c to target
The algorithm for the %[ conversion specifier is a little different:
c <- next character from input stream
while c is in the list of characters in the scan set
append c to target buffer
c <- next character from input stream
append 0 to target buffer
push c back onto input stream
So, it's a mistake to describe any conversion specifier as "retaining" trailing whitespace (which would imply that the trailing whitespace is saved to the target buffer); that's not the case. Trailing whitespace is left in the input stream for the next input operation to read.
%s consumes everything until a whitespace character and discards leading whitespace characters not trailing ones. The [ conversion specifier in the second scanf does not skip leading whitespace characters and therefore, fails to scan because of the newline character(which is a whitespace character) left over by the first scanf.
To fix the issue, either use
int c;
while((c=getchar())!='\n' && c!=EOF);
After the first scanf to clear the stdin or add a space before the format specifier(%[) in the second scanf.
Your excerpt from the standard omits important context. The preceding text specifies that skipping whitespace is the first step in processing a conversion specifier for a type other than c, [, or n.
The next step, other than for an n specifier, is to read an input item, which is defined as "the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence" (quoted from C99, but equivalent applies to C2011).
An s item "[m]atches a sequence of non-white-space characters", so with the input you specify, the first scanf() reads everything up to, but not including, the newline.
The standard explicitly specifies
Trailing white space (including new-line characters) is left unread unless matched by a directive.
so the newline definitely remains unscanned at this point.
The format given to the next scanf() starts with a %[ conversion specifier, which, as you already observed, does not cause whitespace (leading or otherwise) to be skipped, though it can include whitespace in the item that is scanned. Since the next character available from the input is a newline, however, and the given scan set for your %[ does not include that character, zero characters are scanned for that item. Going back to the standard (C99, again):
If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
There are easier ways to read free-form input line by line, but you can do it with scanf() if you must. For example:
char buff[10 + 1] = {0};
printf("Input: ");
/*
* Ignore leading whitespace and scan a string of up to 10 non-whitespace
* characters. Zero-length inputs will produce a matching failure, leaving
* the buffer unchanged (and initialized to an empty string). End of
* input will produce an input error, which is ignored.
*/
scanf("%10s", buff);
/* Scan and ignore anything else up to a newline. There will
* be an (ignorable) matching failure if the next available character is a
* newline. Any input error generated by this call is also ignored.
*/
scanf("%*[^\n]");
/*
* Consume the next character, if any. If there is one, it will be a
* newline. An input error will occur if we're already at the end of stdin;
* a careful program would test for that (by comparing the return value to
* EOF) but this one doesn't.
*/
scanf("%*c");
printf("Input: ");
/* scan the second string; again, we're ignoring matching and input errors */
char buff_2[5 + 1] = {0};
scanf("%5[abcde]", buff_2);
If you're exclusively using scanf() for such a job then it is essential to read each line in three steps, as shown, because each one can produce a matching failure that would prevent any attempt to match subsequent items.
Note, too, how maximum field widths are matched to buffer sizes in that example, which your original code did not do correctly.

Reading tabs in C

I am trying to sscanf from a file. The pattern I am trying to match is the following
"%s\t%s\t%s\t%f"
Thing is that I am surprised because for an input like following:
Hello Hola Hallo 5.344434
it is reading all of the data properly...
Do you know why?
I was expecting it to be finding tabs like |---|---|---|---| not that only one space was matching.
Thanks
The standard reads:
A directive composed of white-space character(s) is executed by
reading input up to the first non-white-space character (which remains
unread), or until no more characters can be read.
In other words, a sequence of white-space characters (space, tab, newline, etc.; as defined by isspace()) in the format string matches any amount of white space in the input.
No way - scanf treat all white-space identically - they're used as delimiter, and just ignored. So if you really want to doing something with tab space, you should parse it yourself.
To parse, you need to read the whole line without any parsing, unlike scanf. So, you need to use fgets.
FILE *fp = /* init.. */;
char buf[1024];
fgets(buf, 1024, fp);
// parse yourself!
If you take a look at the documentation for scanf:
C string that contains a sequence of characters that control how characters extracted from the stream are treated:
Whitespace character: the function will read and ignore any whitespace characters
encountered before the next non-whitespace character (whitespace characters include
spaces, newline and tab characters -- see isspace). A single whitespace in the format
string validates any quantity of whitespace characters extracted from the stream
(including none).
Non-whitespace character, except format specifier (%): Any character that is not
either a whitespace character (blank, newline or tab) or part of a format specifier
(which begin with a % character) causes the function to read the next character
from the stream, compare it to this non-whitespace character and if it matches,
it is discarded and the function continues with the next character of format. If the
character does not match, the function fails, returning and leaving subsequent
characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a
format specifier, which is used to specify the type and format of the data to be
retrieved from the stream and stored into the locations pointed by the additional
arguments.
You will notice that the whitespace characters get ignored.
Did you carefully read scanf(3) documentation? You need to read the entire line using getline(3) then parse that line "manually"!

Resources