This question already has answers here:
%*c in scanf() - what does it mean?
(4 answers)
Closed 3 years ago.
Hi I am reading some code and this line has been used:
scanf("%s%*c",dati[i].part);
What does %s%*c do and why not just use %s?
What does %s%*c do
The %s has the same meaning as anywhere else -- skip leading whitespace and scan the next sequence of non-whitespace characters into the specified character array.
The %*c means the same thing as %c -- read the next input character, whatever it is (i.e. without skipping leading whitespace) -- except that the * within means that the result should not be assigned anywhere, and therefore that no corresponding pointer argument should be expected. Also, assignment suppression means that scanf's return value is not affected by whether that field is successfully scanned.
and why not just use %s?
We cannot say for sure why the author of the code in which you saw it used %s%*c, except for the unsatisfying "because that's what the author thought was appropriate." We have no context at all for making any other judgement.
Certainly the actual effect is to consume the next input character after the string, if any. If there is such a character then it will necessarily be a whitespace character, else it would have been scanned by the preceding %s directive. We might therefore speculate that the author's idea was to consume a trailing newline.
There are at least two problems with that:
The next character might not be a newline. For example, there might be trailing space characters before a newline, in which case the first of those space characters would be consumed, but the newline would remain in the stream. If that's a genuine problem then %*c does not reliably solve it.
In practice, it's not very useful. Most scanf directives are like %s in that they automatically skip leading whitespace, including newlines. The %*c serves only to confuse if the next directive that will be processed is any of those. Moreover, it is possible for a scanf format to explicitly express that a run of whitespace at a given position should be skipped, and it is clearer to make use of that in conjunction with the next directive to be processed if that next directive is one of those that don't automatically skip whitespace (and whitespace skipping is in fact desired).
That doesn't mean that assignment suppression generally or %*c specifically is useless, mind. It's just trying to use that technique to attempt to consume trailing newlines that is poorly conceived.
The %* format specifier in a scanf call instructs the function to read data in the following format (c in your case) from the input buffer but not to store it anywhere (i.e. discard it).
In your specific case, the %*c is being used to read and discard the trailing newline character (added when the user hits the Enter key), which will otherwise remain in the input buffer, and likely upset any subsequent calls to scanf.
Related
This question already has answers here:
Why does scanf ask twice for input when there's a newline at the end of the format string?
(7 answers)
Closed 4 years ago.
Here's a simple example: I have an array of 3 characters, I write an input on terminal and I want to check immediately what i scanned, like this:
scanf("%3s\n",array);
printf("%s",array);
What i want to know is: why, after running the program, if I write on terminal "abc", to make it print out the array I have to put another input, like writing another letter or a number?
If I write "abcd" and then press ENTER then it immediately prints, but if I just write "abc" and press ENTER many times it still doesn't go to the next instruction (which is printf).
I know that it has to do with how \n makes scanf read the string, but I cant quite get it.
It is not the printf (as the first version of your now edited title implied) which needs the additional input, it is the scanf.
Your format string contains a "\n".
That happens to be a white space.
Any whitespace in that position will tell scanf to consume any number of whitespace following the three characters (e.g. "abc").
As long as you continue adding whitespace (including returns), the scanf is not done consuming "all following whitespace".
As soon as you enter any non-whitespace (e.g. "d") it knows "aha, all whitespace done". Then, and not before, it can complete. Leaves the non-whitespace in the input stream and returns.
Note (credits to chux) that with a (commonly, to the point of almost always if you did not intentionally change that) line buffered input, you need to enter such a non-whitespace with a return/newline following it somewhere.
Note, as mentioned in comments, you need 4 characters of space in your target array, to also accomodate the always written "\0" at the end of the string. If you only have three characters space, then the last 0 will be written beyond causing undefined behaviour.
I am not sure how to solve your problem, because I cannot tell what the purpose of your "\n" is. But I think you should try the behaviour if you move the "\n" out of scanf() and maybe into the printf(). A following scanf() call starting with a conversion specifier (except "[", "c", and "n") consumes any leading whitespace anyway, including any remaining newlines/returns from this one.
When I search using keywords of 'fgets' and 'newline', there are many posts regarding how to remove the trailing newline character (and such removal appears to be a burden). Yet it seems there is few explaination on how that newline is necessary for fgets to include. Also in C++, the 'std::getline' and 'std::istream:getline' methods will not keep the newline character. So is there a reason for it?
Here is satisfying (IMHO) explanation:
http://www.cplusplus.com/reference/cstdio/fgets/
Especially:
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str.
No, it's not necessary but if present it will be included in the returned line.
The manual page says:
Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
So that's why it behaves that way.
Note that you can't assume that there will be a newline last in the buffer, you must check before removing it otherwise you risk truncating the last line if it didn't have a newline.
I am confused with this syntax. At first, I thought it was a printing error in the book. But, while programming for quite a long time I came to know that they have different meaning. Still, I'm not able to get clear vision about that syntax.
Likewise, what's difference between:
gets( str);
and
gets(str);
Does whitespace matter? If yes, then how?
When adding a space in the scanf format string, you tell scanf to read and skip whitespace. It can be usefull to skip newlines in the input for example. Also note that some formats automatically skip whitespace anyway.
See e.g. here for a good reference of the scanf family of functions.
The difference between
gets(str);
and
gets( str );
is none at all. Actual code outside of string literals can be formatted with any amount of whitespace. You could even write the above call as
gets
(
str
)
;
It would still be the same.
Oh, and the gets function is deprecated since long ago, and even removed from the latest C standard. You should use fgets instead.
White space (such as blanks, tabs, or newlines) in the format string match any amount of white space, including none, in the input.
http://www.manpagez.com/man/3/scanf/
In gets the space does not mean anything. Its ignored on compile time.
Compiler has many phases and in the first phase lexical analysis,
all unnecessary white spaces are removed this is also unnecessary space which will be removed at that time and so,
there is no difference between gets(a) and gets( a).
There are two important things to learn about scanf here:
All conversion modifiers except %c and %[ ignore whitespace before the scanned item.
You can explicitly invoke this behavior of ignoring all whitespace as follows:
scanf(" %c", &mychar)
scanf("\n%c", &mychar)
scanf("\t%c", &mychar)
That is, any whitespace character (including spaces) in your conversion string instructs scanf to ignore any and all whitespace up until the scanned item.
Since all conversion modifiers except %c and %[ do this automatically, the answer to your original question about scanf("%s") versus scanf(" %s") is that there is no difference.
I would recommend reading all the scanf questions at the C FAQ and writing some test programs to get a better grasp of it all:
http://c-faq.com/stdio/scanfprobs.html
I have a multiline TSV file with the following format:
Type\tBasic Name\tAttribute\tA Long Description\n
As you can see, the Basic Name and the Description can both contain some number of spaces. I am trying to read each line in and extract the elements. For now, I've narrowed it down to just extracting the basic name. My fscanf is as follows:
fscanf(file_in, "%*[^ ]s\t%128[^ ]s\t%*[^ ]s\t%[^ ]s\n", name_string, desc_string);
This doesn't work as I have hoped, and I'm having trouble narrowing down the error. Does anyone know how I could read in the lines properly?
I mostly agree with Pablo (that the scanf family don't make great parsers), but it's worth understanding how to write a scanf pattern. The pattern you're looking for is something like this:
fscanf(" %*[^\t] %128[^\t] %*[^\t] %128[^\n]", name_string, desc_string)
Notes:
%[xyz] is a directive. %[xyz]s is two directives, the second of which matches a literal s
As far a I know, there is no way to match a single literal tab character, since any whitespace in the pattern matches any amount of whitespace (including none) in the input. I used a space in my example, which will match a terminating tab, but it will also match any number of consecutive tabs so empty fields won't be parsed correctly.
The 128-character limit does not include the terminating NUL character.
Also, if the scan stops because the chracter limit is exceeded, it won't skip the rest of the field automatically, so you'll end up out of synch with the input.
A better pattern would be:
fscanf(" %*[^\t] %128[^\t]%*[^\t] %*[^\t] %128[^\n]%*[^\n]", name_string, desc_string)
which explicitly skips the remaining characters in the field, if necessary. An even better solution would be to use the a modifier and get fscanf to malloc memory for you.
I'd rather use strtok for this. It's more acurate than fscanf since this function family only work when the format is 100% OK, otherwise you end up missing values.
Take a look at Parallel to PHP's "explode" in C: Split char* into char* using delimiter, where I explain in more detail how to use strtok.
So, read each line with fgets and parse it with strtok.
Firstly, as it has already been noted, the %[] is a conversion specifier by itself. There's no s after the []. The s-es that you have in your format string will not be considered parts of the conversion specifiers. You have to get rid of those s-es.
Secondly, as you said yourself, your file is TAB-separated. Which immediately means that you should extract the continuous portions of the sequence by using the %[^\t] conversion specifier (or the %[^\n] specifier for the last portion). Why did you use %[^ ] and how did you expect it to work? The %[^ ] actually stops parsing at space character, which is the opposite of what you wanted.
In your example the proper combination of specifiers would be
fscanf(file_in, "%*[^\t]\t%128[^\t]\t%*[^\t]\t%[^\n]\n", name_string, desc_string);
This format string assumes that all 4 portions of the string are guaranteed to be present and that the last portion is guaranteed to be terminated by \n.
I want to know if there is a way to know when fscanf reads a whitespace or a new line.
Example:
formatting asking words italic
links returns
As fscanf read a string till it meets a newline or a whitespace(using %s), it'll read formatting and the space after it and before a. The thing is, is there a way to know that it read a space? And after it entered the second line is there is a way to know that it read a carriage return?
You can instruct fscanf to read whitespace into your variable instead of reading and discarding whitespace. Use something like [ \n\r\t]* but you need to include more characters in that expression. Depending on the locale and some features of the runtime character set, you might want to write a separate function to compute the appropriate format string once before using it.
If you need to distinguish \n from other kinds of whitespace, you have your variable containing the whitespace that you just finished reading. You might want to count all of the \n characters in it, depending on your needs.