What is % in format specifiers? - c

What is the proper name for the % operator in any format specifier? Like what does % in %d stand for? I have searched over the internet to help me figure out the solution but unable to find any. Any help?

Maybe you can learn something from this TOPIC.
Please find time to read and understand.

% means, the character after % is the place holder and will be replaced by the respective argument.

The character % does not have a special name other than "the character %". It is not a C operator in the context of a format, but is simply a character used by 2 family of functions, as they read the format, to introduced a conversion specification.
Pulling verbiage from the C spec...
For both the printf() and scanf() family of functions there is a format.
The format is composed of zero or more directives.
1) one or more white-space characters (scanf)
2) characters (not %)
3) conversion specifications
Each conversion specification is introduced by the character %.

Related

Printf format specifiers

I've been trying to cobble together the format of printf into a sort of linear format. Is the following a correct understanding of the possible printf formats?
% <justification: [-]?> <sign: [ +]?> <alternate: [#]?>
<padding: [0? num]?> <precision: [.num]?> <modifier: [h|hh|l|ll|L|z|t|j]?>
<format: [c|d(i)|e(E)|f|o|p|x(X)|u|s|g(G)]>
Is the order and meanings correct in the above? A couple examples being:
printf(" %-10.3s %-+20ld", "Hello!", 14L);
Is the following a correct understanding of the possible printf formats?
"Generally" yes, but for example you "can't" do %jg or like %0#p.
There is also %n.
Both "precision" and "padding" may be asterisks, like %*s or %.*s (but you could have defined num as ([0-9]+|\*)...).
Also . is optionally followed by a number. So it's more like <precision: [. num? ]> - if only . is specified, precision is taken as zero.
Is the order
The order of - +#0 is irrelevant and you can repeat them, so you can %-+020d and %+0-+++000----20d with same meaning (and 0 is ignored when used with -, so also there are corner cases).
meanings correct in the above?
There is no explanation in the above. - is not "justification" (taken literally, a word?), it's a flag that makes the output be left justified within the field. Also meaning depends on context - "precision" for floats maybe can be understood as the number of digits after comma, but "precision of a string" sounds strange. But generally, yes.
Your specification is too restrictive:
the flags +, -, #, 0 and space can appear in any order, but some combinations are meaning less, such as %+s.
width and precision can be specified as *
a and A were introduced to produce hexadecimal floating point representations
F is available and different from f for NaNs and infinities.
%% and %n should be recognised too.
Here is a regular expression to match all valid printf conversion specifications, but that will not detect invalid combinations:
%[ +-#0]*{[*]|[1-9][0-9]*}?(.{[*]|[0-9]*}?)?{h|hh|l|ll|L|z|t|j}?[%naAcdieEfFopxXusgG]
You might refine it to reject any flags for %% and restrict other cases too, but it will become quite complicated to express as a regex.

Use scanf with Regular Expressions

I've been trying to use regular expressions on scanf, in order to read a string of maximum n characters and discard anything else until the New Line Character. Any spaces should be treated as regular characters, thus included in the string to be read.
I've studied a Wikipedia article about Regular Expressions, yet I can't get scanf to work properly. Here is some code I've tried:
scanf("[ ]*%ns[ ]*[\n]", string);
[ ] is supposed to go for the actual space character, * is supposed to mean one or more, n is the number of characters to read and string is a pointer allocated with malloc.
I have tried several different combinations; however I tend to get only the first word of a sentence read (stops at space character). Furthermore, * seems to discard a character instead of meaning "zero or more"...
Could anybody explain in detail how regular expressions are interpreted by scanf? What is more, is it efficient to use getc repetitively instead?
Thanks in Advance :D
The short answer: scanf does not handle regular expressions literally speaking.
If you want to use regular expressions in C, you could use the regex POSIX library. See the following question for a basic example on this library usage : Regular expressions in C: examples?
Now if you want to do it the scanf way you could try something like
scanf("%*[ ]%ns%*[ ]\n",str);
Replace the n in %ns by the maximal number of characters to read from input stream.
The %*[ ] part asks to ignore any spaces. You could replace the * by a specific number to ignore a precise number of characters. You could add other characters between braces to ignore more than just spaces.
Not sure if the above scanf would work as spaces are also matched with the %s directive.
I would definitely go with a fgets call, then triming the surrounding whitespaces with something like the following: How do I trim leading/trailing whitespace in a standard way?
is it efficient to use getc repetitively instead?
Depends somewhat on the application, but YES, repeated getc() is efficient.
unless I read the question wrong, %[^'\n']s will save everything until the carriage return is encountered.

C printf formatting: What does "." and "|" mean in this context?

I'm taking a security course and am having trouble understanding this code due to a lack of understanding of the C programming language.
printf ("%08x.%08x.%08x.%08x|%s|");
I was told that this code should move along the stack until a pointer to a function is found.
I thought the . was just an indicator of precision of output, so I don't know what this means in this context since there are indicators of precision?
Also, I don't understand what the | means, and I can't find it in the C documentation.
The symbols have no special meaning here since they are outside of a format specifier, they are simply output literally. Note however that you haven't provided all the arguments that printf expects so it will instead print 5 values that happen to be on the stack.
In this string the . and | characters are just outputted. The dots acted as separators for hex strings and the pipes highlighting a string.
The dots are only considered an indicator of precession if they appear after the % sign and before the format specifier, for example %4.2f.

Force fscanf to Consume Possible Whitespace

I have a multiline TSV file with the following format:
Type\tBasic Name\tAttribute\tA Long Description\n
As you can see, the Basic Name and the Description can both contain some number of spaces. I am trying to read each line in and extract the elements. For now, I've narrowed it down to just extracting the basic name. My fscanf is as follows:
fscanf(file_in, "%*[^ ]s\t%128[^ ]s\t%*[^ ]s\t%[^ ]s\n", name_string, desc_string);
This doesn't work as I have hoped, and I'm having trouble narrowing down the error. Does anyone know how I could read in the lines properly?
I mostly agree with Pablo (that the scanf family don't make great parsers), but it's worth understanding how to write a scanf pattern. The pattern you're looking for is something like this:
fscanf(" %*[^\t] %128[^\t] %*[^\t] %128[^\n]", name_string, desc_string)
Notes:
%[xyz] is a directive. %[xyz]s is two directives, the second of which matches a literal s
As far a I know, there is no way to match a single literal tab character, since any whitespace in the pattern matches any amount of whitespace (including none) in the input. I used a space in my example, which will match a terminating tab, but it will also match any number of consecutive tabs so empty fields won't be parsed correctly.
The 128-character limit does not include the terminating NUL character.
Also, if the scan stops because the chracter limit is exceeded, it won't skip the rest of the field automatically, so you'll end up out of synch with the input.
A better pattern would be:
fscanf(" %*[^\t] %128[^\t]%*[^\t] %*[^\t] %128[^\n]%*[^\n]", name_string, desc_string)
which explicitly skips the remaining characters in the field, if necessary. An even better solution would be to use the a modifier and get fscanf to malloc memory for you.
I'd rather use strtok for this. It's more acurate than fscanf since this function family only work when the format is 100% OK, otherwise you end up missing values.
Take a look at Parallel to PHP's "explode" in C: Split char* into char* using delimiter, where I explain in more detail how to use strtok.
So, read each line with fgets and parse it with strtok.
Firstly, as it has already been noted, the %[] is a conversion specifier by itself. There's no s after the []. The s-es that you have in your format string will not be considered parts of the conversion specifiers. You have to get rid of those s-es.
Secondly, as you said yourself, your file is TAB-separated. Which immediately means that you should extract the continuous portions of the sequence by using the %[^\t] conversion specifier (or the %[^\n] specifier for the last portion). Why did you use %[^ ] and how did you expect it to work? The %[^ ] actually stops parsing at space character, which is the opposite of what you wanted.
In your example the proper combination of specifiers would be
fscanf(file_in, "%*[^\t]\t%128[^\t]\t%*[^\t]\t%[^\n]\n", name_string, desc_string);
This format string assumes that all 4 portions of the string are guaranteed to be present and that the last portion is guaranteed to be terminated by \n.

How to know new line character in fscanf?

How to know that fscanf reached a new line \n in a file.
I have been using my own functions for doing that. I know I can use fgets and then sscanf for my required pattern. But my requirements are not stable, some times I want to get TAB separated strings, some times new line separated strings and some times some special character separated strings. So if there is any way to know of new line from fscanf please help me. Or else any alternative ways are also welcome.
Thanks in advance.
fscanf(stream, "%42[^\n]", buffer);
is an equivalant of fgets(buffer, 42, stream). You can't replace the 42 by * to specify the buffer length in the argument (as you can do in printf), its meaning is to suppress the assignment. So
fscanf(stream, "%*[^\n]%*c");
read upto (and included) the next end of line character.
Any conversion specifier other than [, c and n start by skipping whitespaces.
Kernighan and Pike in the excellent book 'The Practice of Programming' show how to use sprintf() to create an appropriate format specifier including the length (similar to the examples in the answer by AProgrammer), and then use that in the call to scanf(). That way, you can also control the separators. Concerns about the 'inefficiency' of this approach are probably misguided - the alternatives are harder to get right.
That said, I most normally do not use the scanf() family of functions for file I/O; I get the data into a string with some sort of 'get line' routine, and then use the sscanf() family of functions to split it up - or other more specialized parsing code.

Resources