What is the purpose of : in scanf?

What is the purpose of : in scanf? - c

scanf("%d:%d:%d%s", &hh, &mm, &ss, t12)
When taking multiple inputs for the time to be displayed the input is written as above where : is used in input statements the above line works fine but can someone explain the need and uses of colon in the input statement

From the standard, C11 7.21.6.2 The fscanf function /3 and /6:
The format is composed of zero or more directives: one or more white-space
characters, an ordinary multibyte character (neither % nor a white-space character), or a conversion specification.
A directive that is an ordinary multibyte character is executed by reading the next characters of the stream. If any of those characters differ from the ones composing the directive, the directive fails and the differing and subsequent characters remain unread.
Hence the : simply means "make sure that the next character in the stream is a colon". Nothing more, nothing less.
Your format string simply means you'll be able to scan things like 12:34:56am - without the literal colons in the format string, the scan would fail.

Related

Using regex within sscanf [duplicate]

I needed to read a string until the following sequence is written: \nx\n :
(.....)\n
x\n
\n is the new line character and (.....) can be any characters that may include other \n characters.
scanf allows regular expressions as far as I know, but i can't make it to read a string untill this pattern. Can you help me with the scanf format string?
I was trying something like:
char input[50000];
scanf(" %[^(\nx\n)]", input);
but it doesn't work.

scanf allows regular expressions as far as I know
Unfortunately, it does not allow regular expressions: the syntax is misleadingly close, but there is nothing even remotely similar to the regex in the implementation of scanf. All that's there is a support for character classes of regex, so %[<something>] is treated implicitly as [<something>]*. That's why your call of scanf translates into read a string consisting of characters other than '(', ')', 'x', and '\n'.
To solve your problem at hand, you can set up a loop that read the input character by character. Every time you get a '\n', check that
You have at least three characters in the input that you've seen so far,
That the character immediately before '\n' is an 'x', and
That the character before the 'x' is another '\n'
If all of the above is true, you have reached the end of your anticipated input sequence; otherwise, your loop should continue.

scanf does not support regular expressions. It has limited support for character classes but that's not at all the same thing.
Never use scanf, fscanf, or sscanf, because:
Numeric overflow triggers undefined behavior. The C runtime is allowed to crash your program just because someone typed too many digits.
Some format specifiers (notably %s) are unsafe in exactly the same way gets is unsafe, i.e. they will cheerfully write past the end of the provided buffer and crash your program.
They make it extremely difficult to handle malformed input robustly.
You don't need regular expressions for this case; read a line at a time with getline and stop when the line read is just "x". However, the standard (not ISO C, but POSIX) regular expression library routines are called regcomp and regexec.

What is the difference between %s and %s%*c [duplicate]

This question already has answers here:
%*c in scanf() - what does it mean?
(4 answers)
Closed 3 years ago.
Hi I am reading some code and this line has been used:
scanf("%s%*c",dati[i].part);
What does %s%*c do and why not just use %s?

What does %s%*c do
The %s has the same meaning as anywhere else -- skip leading whitespace and scan the next sequence of non-whitespace characters into the specified character array.
The %*c means the same thing as %c -- read the next input character, whatever it is (i.e. without skipping leading whitespace) -- except that the * within means that the result should not be assigned anywhere, and therefore that no corresponding pointer argument should be expected. Also, assignment suppression means that scanf's return value is not affected by whether that field is successfully scanned.
and why not just use %s?
We cannot say for sure why the author of the code in which you saw it used %s%*c, except for the unsatisfying "because that's what the author thought was appropriate." We have no context at all for making any other judgement.
Certainly the actual effect is to consume the next input character after the string, if any. If there is such a character then it will necessarily be a whitespace character, else it would have been scanned by the preceding %s directive. We might therefore speculate that the author's idea was to consume a trailing newline.
There are at least two problems with that:
The next character might not be a newline. For example, there might be trailing space characters before a newline, in which case the first of those space characters would be consumed, but the newline would remain in the stream. If that's a genuine problem then %*c does not reliably solve it.
In practice, it's not very useful. Most scanf directives are like %s in that they automatically skip leading whitespace, including newlines. The %*c serves only to confuse if the next directive that will be processed is any of those. Moreover, it is possible for a scanf format to explicitly express that a run of whitespace at a given position should be skipped, and it is clearer to make use of that in conjunction with the next directive to be processed if that next directive is one of those that don't automatically skip whitespace (and whitespace skipping is in fact desired).
That doesn't mean that assignment suppression generally or %*c specifically is useless, mind. It's just trying to use that technique to attempt to consume trailing newlines that is poorly conceived.

The %* format specifier in a scanf call instructs the function to read data in the following format (c in your case) from the input buffer but not to store it anywhere (i.e. discard it).
In your specific case, the %*c is being used to read and discard the trailing newline character (added when the user hits the Enter key), which will otherwise remain in the input buffer, and likely upset any subsequent calls to scanf.

What does %[^<] (and friends) mean in the formatted string family?

A comment (which should probably be submitted as an answer) has the code
sscanf(string, "<title>%[^<]</title>", extracted_string);
Running the code seems to copy the text between the <title> tags to extracted_string, but I cannot find any references to a caret in the printf family, either in the man pages or elsewhere online.
Can someone point me to a resource that explains the use of %[^<], and other similar syntax, in the sscanf() family?

From the C11 standard document, chapter §7.21.6.2, Paragraph 12, conversion specifiers, (emphasis mine)
[
Matches a nonempty sequence of characters from a set of expected characters
(the scanset).
....
The conversion specifier includes all subsequent characters in the format
string, up to and including the matching right bracket (]). The characters
between the brackets (the scanlist) compose the scanset, unless the character
after the left bracket is a circumflex (^), in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the
right bracket.
A draft version of the standard, found online.

It means match anything that is not a <, it's not a good idea to do that without specifying the maximum destination buffer length, if your destination buffer can hold say 100 characters, then
char extracted_string[100];
sscanf(string, "<title>%99[^<]</title>", extracted_string);
would be a better solution.
Using strstr() for this purpose allows you to actually make extracted_string dynamic.

this link explains the [ and ^ usage in scanf family of functions
(emphasis mine)
http://www.cdf.toronto.edu/~ajr/209/notes/printf.html
[
Matches a nonempty sequence of characters from the specified set of accepted characters; the next pointer must be a pointer to char, and there must be enough room for all the characters in the string, plus a terminating null byte. The usual skip of leading white space is suppressed. The string is to be made up of characters in (or not in) a particular set; the set is defined by the characters between the open bracket [ character and a close bracket ] character. The set excludes those characters if the first character after the open bracket is a circumflex (^). To include a close bracket in the set, make it the first character after the open bracket or the circumflex; any other position will end the set. The hyphen character - is also special; when placed between two other characters, it adds all intervening characters to the set. To include a hyphen, make it the last character before the final close bracket. For instance, [^]0-9-] means the set "everything except close bracket, zero through nine, and hyphen". The string ends with the appearance of a character not in the (or, with a circumflex, in) set or when the field width runs out.

Force fscanf to Consume Possible Whitespace

I have a multiline TSV file with the following format:
Type\tBasic Name\tAttribute\tA Long Description\n
As you can see, the Basic Name and the Description can both contain some number of spaces. I am trying to read each line in and extract the elements. For now, I've narrowed it down to just extracting the basic name. My fscanf is as follows:
fscanf(file_in, "%*[^ ]s\t%128[^ ]s\t%*[^ ]s\t%[^ ]s\n", name_string, desc_string);
This doesn't work as I have hoped, and I'm having trouble narrowing down the error. Does anyone know how I could read in the lines properly?

I mostly agree with Pablo (that the scanf family don't make great parsers), but it's worth understanding how to write a scanf pattern. The pattern you're looking for is something like this:
fscanf(" %*[^\t] %128[^\t] %*[^\t] %128[^\n]", name_string, desc_string)
Notes:
%[xyz] is a directive. %[xyz]s is two directives, the second of which matches a literal s
As far a I know, there is no way to match a single literal tab character, since any whitespace in the pattern matches any amount of whitespace (including none) in the input. I used a space in my example, which will match a terminating tab, but it will also match any number of consecutive tabs so empty fields won't be parsed correctly.
The 128-character limit does not include the terminating NUL character.
Also, if the scan stops because the chracter limit is exceeded, it won't skip the rest of the field automatically, so you'll end up out of synch with the input.
A better pattern would be:
fscanf(" %*[^\t] %128[^\t]%*[^\t] %*[^\t] %128[^\n]%*[^\n]", name_string, desc_string)
which explicitly skips the remaining characters in the field, if necessary. An even better solution would be to use the a modifier and get fscanf to malloc memory for you.

I'd rather use strtok for this. It's more acurate than fscanf since this function family only work when the format is 100% OK, otherwise you end up missing values.
Take a look at Parallel to PHP's "explode" in C: Split char* into char* using delimiter, where I explain in more detail how to use strtok.
So, read each line with fgets and parse it with strtok.

Firstly, as it has already been noted, the %[] is a conversion specifier by itself. There's no s after the []. The s-es that you have in your format string will not be considered parts of the conversion specifiers. You have to get rid of those s-es.
Secondly, as you said yourself, your file is TAB-separated. Which immediately means that you should extract the continuous portions of the sequence by using the %[^\t] conversion specifier (or the %[^\n] specifier for the last portion). Why did you use %[^ ] and how did you expect it to work? The %[^ ] actually stops parsing at space character, which is the opposite of what you wanted.
In your example the proper combination of specifiers would be
fscanf(file_in, "%*[^\t]\t%128[^\t]\t%*[^\t]\t%[^\n]\n", name_string, desc_string);
This format string assumes that all 4 portions of the string are guaranteed to be present and that the last portion is guaranteed to be terminated by \n.

parsing a string using fscanf using %[...] in C

I am working with the fscanf function to scan in a large string that is delimited by commas, with the last substring in the larger string separated by an asterisk (*). Here is an example:
substring1,substring2,substring3*substring4
I am able to parse the substrings separated by commas with no problem, but when it gets to the asterisk, it stalls the program, as fscanf is blocking. I am using the %[^...] format specifier in fscanf, shown below:
fscanf(fs, "%[^*,]%*c", str);
The code above is in a simple for loop that scans multiple times. As you can see, I am scanning until either an asterisk or a comma appears. However, I am afraid that I am not including the asterisk in the set properly. Can someone correct my mistake?
Thanks.

The only characters that are special in a %[ pattern are ^, -, and ].
This pattern will fail if the next character to be read is either a ',' or a '*'. So if you have two consecutive commas or asterisks, then your loop will jam and stop reading.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight