Scanning a number having more data than predefined value - c

#include<stdio.h>
main()
{
unsigned int num;
printf("enter the number:\n");
scanf("%u",&num);//4294967299 if i'm scanning more than 4G its not scanning
printf("after scanning num=%u\n",num);// 4294967295 why its giving same 4G
/* unsigned char ch;
printf("enter the character:\n");
scanf("%d",&ch);// if i/p=257 so its follow circulation
printf("after scanning ch=%d\n",ch);// 1 its okk why not in int ..
*/
}
Why is circulation not following while scanning input via scanf(), why is it following in case of char?

The C11 standard draft n1570 7.21.6.2 says the following
paragraph 10
[...] the input item [...] is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
Now, the word "conversion" here is used for string => result data type conversion, it cannot be understood to mean integer conversions. As the string "4294967299" converted to a decimal integer is not representable in an object of type unsigned int that is 32-bit wide, the reading of the standard says that the behaviour is undefined, i.e.
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
Thus, the answer to your question is that the C standard doesn't state the behaviour in this case, and the behaviour you see is the one exhibited by your compiler and C library implementation, and is not portable; on other platforms the possible behaviours might include:
ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

From this scanf (and family) reference for the "%u" format:
The format of the number is the same as expected by strtoul() with the value ​0​ for the base argument (base is determined by the first characters parsed)
Then we go to the strtoul function, and read about the returned value:
Integer value corresponding to the contents of str on success. If the converted value falls out of range of corresponding return type, range error occurs and ULONG_MAX or ULLONG_MAX is returned. If no conversion can be performed, ​0​ is returned.
From this we can see that if you enter a too large value for the scanf "%u" format, then the result will be ULONG_MAX converted to unsigned int. However the result will differ on systems where sizeof(unsigned long) > sizeof(unsigned int). See below for information about that.
It should be noted that on platforms with 64-bit unsigned long and 32-bit unsigned int, a value that is valid in the range of unsigned long will not be converted to e.g. UINT_MAX, instead it will be converted using modulo arithmetic as detailed here.
Lets take the value like 4294967299. It is to big to fit in a 32-bit unsigned int, but fits very well in a 64-bit unsigned long. Therefore the call to strtoul will not return ULONG_MAX, but the value 4294967299. Using the standard conversion rules (linked to above), this will result in an unsigned int value of 3.

Related

given an %hhu and an input 1025, which part is mapped to an unsigned char?

Is the output of the following programm 1 or 0, or is it undefined behaviour?
int main() {
unsigned char u = 10;
sscanf("1025","%hhu",&u);
printf("u, is it 0 or is it 1? u's value is ... %hhu", u);
}
According to fscanf conversion specifier %u with length modifier hh (i.e. %hhu), semantics is defined based on that of strtoul function and a mapping to type pointer to unsigned char:
12) The conversion specifiers and their meanings are:
"u"
Matches an optionally signed decimal integer, whose format is the same
as expected for the subject sequence of the strtoul function with the
value 10 for the base argument. The corresponding argument shall be a
pointer to unsigned integer.
11) The length modifiers and their
meanings are:
"hh" Specifies that a following d, i, o, u, x,
X, or n conversion specifier applies to an argument with type pointer
to signed char or unsigned char.
But what happens if an input sequence represents an integral value exceeding 8 bits, which part of the integral value is mapped to the 8 bits of an unsigned char? Is it defined that it has to be the least significant part, does it depend on endianess, is it unspecified, or does it even yield undefined behaviour?
I cannot believe that it is undefined or unspecified behaviour. This would mean that user input might introduce such behaviour in a program using scanf("%hhu",&u), and checking user input before every use of scanf looks absurd to me.
Undefined. See one section up:
10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

Purpose of shortening length modifiers in *printf* functions [duplicate]

Aside from %hn and %hhn (where the h or hh specifies the size of the pointed-to object), what is the point of the h and hh modifiers for printf format specifiers?
Due to default promotions which are required by the standard to be applied for variadic functions, it is impossible to pass arguments of type char or short (or any signed/unsigned variants thereof) to printf.
According to 7.19.6.1(7), the h modifier:
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
short int or unsigned short int argument (the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing);
or that a following n conversion specifier applies to a pointer to a short
int argument.
If the argument was actually of type short or unsigned short, then promotion to int followed by a conversion back to short or unsigned short will yield the same value as promotion to int without any conversion back. Thus, for arguments of type short or unsigned short, %d, %u, etc. should give identical results to %hd, %hu, etc. (and likewise for char types and hh).
As far as I can tell, the only situation where the h or hh modifier could possibly be useful is when the argument passed it an int outside the range of short or unsigned short, e.g.
printf("%hu", 0x10000);
but my understanding is that passing the wrong type like this results in undefined behavior anyway, so that you could not expect it to print 0.
One real world case I've seen is code like this:
char c = 0xf0;
printf("%hhx", c);
where the author expects it to print f0 despite the implementation having a plain char type that's signed (in which case, printf("%x", c) would print fffffff0 or similar). But is this expectation warranted?
(Note: What's going on is that the original type was char, which gets promoted to int and converted back to unsigned char instead of char, thus changing the value that gets printed. But does the standard specify this behavior, or is it an implementation detail that broken software might be relying on?)
One possible reason: for symmetry with the use of those modifiers in the formatted input functions? I know it wouldn't be strictly necessary, but maybe there was value seen for that?
Although they don't mention the importance of symmetry for the "h" and "hh" modifiers in the C99 Rationale document, the committee does mention it as a consideration for why the "%p" conversion specifier is supported for fscanf() (even though that wasn't new for C99 - "%p" support is in C90):
Input pointer conversion with %p was added to C89, although it is obviously risky, for symmetry with fprintf.
In the section on fprintf(), the C99 rationale document does discuss that "hh" was added, but merely refers the reader to the fscanf() section:
The %hh and %ll length modifiers were added in C99 (see §7.19.6.2).
I know it's a tenuous thread, but I'm speculating anyway, so I figured I'd give whatever argument there might be.
Also, for completeness, the "h" modifier was in the original C89 standard - presumably it would be there even if it wasn't strictly necessary because of widespread existing use, even if there might not have been a technical requirement to use the modifier.
In %...x mode, all values are interpreted as unsigned. Negative numbers are therefore printed as their unsigned conversions. In 2's complement arithmetic, which most processors use, there is no difference in bit patterns between a signed negative number and its positive unsigned equivalent, which is defined by modulus arithmetic (adding the maximum value for the field plus one to the negative number, according to the C99 standard). Lots of software- especially the debugging code most likely to use %x- makes the silent assumption that the bit representation of a signed negative value and its unsigned cast is the same, which is only true on a 2's complement machine.
The mechanics of this cast are such that hexidecimal representations of value always imply, possibly inaccurately, that a number has been rendered in 2's complement, as long as it didn't hit an edge condition of where the different integer representations have different ranges. This even holds true for arithmetic representations where the value 0 is not represented with the binary pattern of all 0s.
A negative short displayed as an unsigned long in hexidecimal will therefore, on any machine, be padded with f, due to implicit sign extension in the promotion, which printf will print. The value is the same, but it is truly visually misleading as to the size of the field, implying a significant amount of range that simply isn't present.
%hx truncates the displayed representation to avoid this padding, exactly as you concluded from your real-world use case.
The behavior of printf is undefined when passed an int outside the range of short that should be printed as a short, but the easiest implementation by far simply discards the high bit by a raw downcast, so while the spec doesn't require any specific behavior, pretty much any sane implementation is going to just perform the truncation. There're generally better ways to do that, though.
If printf isn't padding values or displaying unsigned representations of signed values, %h isn't very useful.
The only use I can think of is for passing an unsigned short or unsigned char and using the %x conversion specifier. You cannot simply use a bare %x - the value may be promoted to int rather than unsigned int, and then you have undefined behaviour.
Your alternatives are either to explicitly cast the argument to unsigned; or to use %hx / %hhx with a bare argument.
The variadic arguments to printf() et al are automatically promoted using the default conversions, so any short or char values are promoted to int when passed to the function.
In the absence of the h or hh modifiers, you would have to mask the values passed to get the correct behaviour reliably. With the modifiers, you no longer have to mask the values; the printf() implementation does the job properly.
Specifically, for the format %hx, the code inside printf() can do something like:
va_list args;
va_start(args, format);
...
int i = va_arg(args, int);
unsigned short s = (unsigned short)i;
...print s correctly, as 4 hex digits maximum
...even on a machine with 64-bit `int`!
I'm blithely assuming that short is a 16-bit quantity; the standard does not actually guarantee that, of course.
I found it useful to avoid casting when formatting unsigned chars to hex:
sprintf_s(tmpBuf, 3, "%2.2hhx", *(CEKey + i));
It's a minor coding convenience, and looks cleaner than multiple casts (IMO).
another place it's handy is snprintf size check.
gcc7 added size check when using snprintf
so this will fail
char arr[4];
char x='r';
snprintf(arr,sizeof(arr),"%d",r);
so it forces you to use bigger char when using %d when formatting a char
here is a commit that shows those fixes instead of increasing the char array size they changed %d to %h. this also give more accurate description
https://github.com/Mellanox/libvma/commit/b5cb1e34a04b40427d195b14763e462a0a705d23#diff-6258d0a11a435aa372068037fe161d24
I agree with you that it is not strictly necessary, and so by that reason alone is no good in a C library function :)
It might be "nice" for the symmetry of the different flags, but it is mostly counter-productive because it hides the "conversion to int" rule.

scanf %u negative number?

I have tried scanf("%u",&number) and I have entered negative number the problem is when I printf("%d",number) I get the negative number. I thought this will prevent me from reading negative number.
Are scanf("%d",&number) and scanf("%u",&number) the same thing really ?
or is it only for readibility.
Am I doing something called undefined behavior so ?
EDIT:
From Wikipedia I read this:
%u : Scan for decimal unsigned int (Note that in the C99 standard the
input value minus sign is optional, so if a minus sign is read, no
errors will arise and the result will be the two's complement of a
negative number, likely a very large value.
It is a little bit confusing reading SO answers and above. Can someone make it more clear ?
Yes, it's undefined behavior, either way.
Considering variable number is of type unsigned, %d in printf() expects an argument of signed int type, passing an unsigned type is UB.
OTOH, if number is signed type, using %u for scanning is UB in first place.
As you might have expected
[...] prevent me from reading negative number
format specifiers are not there to prevent improper input. If the format specifier does not match the supplied argument, it invokes undefined behavior.
Quoting C11, annex J.2, scenarios invoking UB,
The result of a conversion by one of the formatted input functions cannot be
represented in the corresponding object, or the receiving object does not have an
appropriate type
As explained in detail by Sourav Ghosh, using formats that are inconsistent with the actual types passed is a potential problem. Yet for this particular case, on current PC architectures, this is not really a problem, as neither int nor unsigned int have trap representations.
You can scan a negative number with scanf("%u", &number);. It will be negated in the destination type, namely unsigned int, with the same bitwise representation as the negative number in a signed int, for two's complement representation which is almost universal on current architectures.
scanf converts %u by matching an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
The strtol, strtoll, strtoul, and strtoull functions convert the initial portion of the string pointed to by nptr to long int, long long int, unsigned long int, and unsigned long long int representation, respectively. First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling an integer represented in some radix determined by the value of base, and a final string of one or more unrecognized characters, including the terminating null character of the input string. Then, they attempt to convert the subject sequence to an integer, and return the result.
If the value of base is between 2 and 36 (inclusive), the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a plus or minus sign, but not including an integer suffix.
... If the subject sequence has the expected form and the value of base is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).
If the type of number is unsigned int, behavior is defined and the negative value is parsed and stored into number using the unsigned negation semantics. Printing this value with printf("%d", number); is at best implementation defined, but again, on current PC architectures, will print the negative number that was originally parsed by scanf("%u", &number);
Conclusion: although it seems harmless, it is very sloppy to use int and unsigned int interchangeably and to use the wrong formats in printf and scanf. As a matter of fact, mixing signed and unsigned types in expressions, esp. in comparisons is very error prone as the C semantics for such constructions are sometimes counter-intuitive.

printf format for 1 byte signed number

Assuming the following:
sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
The printf format for a 2 byte signed number is %hd, for a 4 byte signed number is %d, for an 8 byte signed number is %ld, but what is the correct format for a 1 byte signed number?
what is the correct format for a 1 byte signed number?
%hh and the integer conversion specifier of your choice (for example, %02hhX. See the C11 standard, §7.21.6.1p5:
hh
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…
The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf), the function never sees a char argument. Many programmers think that that means that it is unnecessary to use h and hh qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.
However, char may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X) will present you with the sign-extended Fs. So if you want to display signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh.
In case that wasn't clear, a simple example (but controversial) example:
/* Read the comments thread to this post; I'll remove
this note when I edit the outcome of the discussion into
the answer
*/
#include <stdio.h>
int main(void) {
char* s = "\u00d1"; /* Ñ */
for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
return 0;
}
Output:
$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)
In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X format specification requires an unsigned argument, whereas the char argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."
However, in the case of char (and short) integer types, the expression in the argument list is promoted to int or unsigned int before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int; promotion of an unsigned char (or an unsigned char) to an unsigned int will only happen on an implementation where sizeof(int) == 1.)
So on most architectures, the argument to an %hx or an %hhx format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.
Furthermore, the standard does not say that fprintf (and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).
Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)
So there is a well-defined procedure to convert the argument expression to a (possibly signed) int argument, and a well-defined procedure for converting that value to an unsigned char. I therefore argue that a program such as the one presented above is entirely well-defined.
For corroboration, the behaviour of fprintf given a format specifier %c is defined as follows (§7.21.6.8/p8), emphasis added:
the int argument is converted to an unsigned char, and the resulting character is written.
If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:
void f(char c) {
printf("This is a '%c'.\n", c);
}
was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.
The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf), the arguments are now the promoted values.
I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".
I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c with char arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)

What is the purpose of the h and hh modifiers for printf?

Aside from %hn and %hhn (where the h or hh specifies the size of the pointed-to object), what is the point of the h and hh modifiers for printf format specifiers?
Due to default promotions which are required by the standard to be applied for variadic functions, it is impossible to pass arguments of type char or short (or any signed/unsigned variants thereof) to printf.
According to 7.19.6.1(7), the h modifier:
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
short int or unsigned short int argument (the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing);
or that a following n conversion specifier applies to a pointer to a short
int argument.
If the argument was actually of type short or unsigned short, then promotion to int followed by a conversion back to short or unsigned short will yield the same value as promotion to int without any conversion back. Thus, for arguments of type short or unsigned short, %d, %u, etc. should give identical results to %hd, %hu, etc. (and likewise for char types and hh).
As far as I can tell, the only situation where the h or hh modifier could possibly be useful is when the argument passed it an int outside the range of short or unsigned short, e.g.
printf("%hu", 0x10000);
but my understanding is that passing the wrong type like this results in undefined behavior anyway, so that you could not expect it to print 0.
One real world case I've seen is code like this:
char c = 0xf0;
printf("%hhx", c);
where the author expects it to print f0 despite the implementation having a plain char type that's signed (in which case, printf("%x", c) would print fffffff0 or similar). But is this expectation warranted?
(Note: What's going on is that the original type was char, which gets promoted to int and converted back to unsigned char instead of char, thus changing the value that gets printed. But does the standard specify this behavior, or is it an implementation detail that broken software might be relying on?)
One possible reason: for symmetry with the use of those modifiers in the formatted input functions? I know it wouldn't be strictly necessary, but maybe there was value seen for that?
Although they don't mention the importance of symmetry for the "h" and "hh" modifiers in the C99 Rationale document, the committee does mention it as a consideration for why the "%p" conversion specifier is supported for fscanf() (even though that wasn't new for C99 - "%p" support is in C90):
Input pointer conversion with %p was added to C89, although it is obviously risky, for symmetry with fprintf.
In the section on fprintf(), the C99 rationale document does discuss that "hh" was added, but merely refers the reader to the fscanf() section:
The %hh and %ll length modifiers were added in C99 (see §7.19.6.2).
I know it's a tenuous thread, but I'm speculating anyway, so I figured I'd give whatever argument there might be.
Also, for completeness, the "h" modifier was in the original C89 standard - presumably it would be there even if it wasn't strictly necessary because of widespread existing use, even if there might not have been a technical requirement to use the modifier.
In %...x mode, all values are interpreted as unsigned. Negative numbers are therefore printed as their unsigned conversions. In 2's complement arithmetic, which most processors use, there is no difference in bit patterns between a signed negative number and its positive unsigned equivalent, which is defined by modulus arithmetic (adding the maximum value for the field plus one to the negative number, according to the C99 standard). Lots of software- especially the debugging code most likely to use %x- makes the silent assumption that the bit representation of a signed negative value and its unsigned cast is the same, which is only true on a 2's complement machine.
The mechanics of this cast are such that hexidecimal representations of value always imply, possibly inaccurately, that a number has been rendered in 2's complement, as long as it didn't hit an edge condition of where the different integer representations have different ranges. This even holds true for arithmetic representations where the value 0 is not represented with the binary pattern of all 0s.
A negative short displayed as an unsigned long in hexidecimal will therefore, on any machine, be padded with f, due to implicit sign extension in the promotion, which printf will print. The value is the same, but it is truly visually misleading as to the size of the field, implying a significant amount of range that simply isn't present.
%hx truncates the displayed representation to avoid this padding, exactly as you concluded from your real-world use case.
The behavior of printf is undefined when passed an int outside the range of short that should be printed as a short, but the easiest implementation by far simply discards the high bit by a raw downcast, so while the spec doesn't require any specific behavior, pretty much any sane implementation is going to just perform the truncation. There're generally better ways to do that, though.
If printf isn't padding values or displaying unsigned representations of signed values, %h isn't very useful.
The only use I can think of is for passing an unsigned short or unsigned char and using the %x conversion specifier. You cannot simply use a bare %x - the value may be promoted to int rather than unsigned int, and then you have undefined behaviour.
Your alternatives are either to explicitly cast the argument to unsigned; or to use %hx / %hhx with a bare argument.
The variadic arguments to printf() et al are automatically promoted using the default conversions, so any short or char values are promoted to int when passed to the function.
In the absence of the h or hh modifiers, you would have to mask the values passed to get the correct behaviour reliably. With the modifiers, you no longer have to mask the values; the printf() implementation does the job properly.
Specifically, for the format %hx, the code inside printf() can do something like:
va_list args;
va_start(args, format);
...
int i = va_arg(args, int);
unsigned short s = (unsigned short)i;
...print s correctly, as 4 hex digits maximum
...even on a machine with 64-bit `int`!
I'm blithely assuming that short is a 16-bit quantity; the standard does not actually guarantee that, of course.
I found it useful to avoid casting when formatting unsigned chars to hex:
sprintf_s(tmpBuf, 3, "%2.2hhx", *(CEKey + i));
It's a minor coding convenience, and looks cleaner than multiple casts (IMO).
another place it's handy is snprintf size check.
gcc7 added size check when using snprintf
so this will fail
char arr[4];
char x='r';
snprintf(arr,sizeof(arr),"%d",r);
so it forces you to use bigger char when using %d when formatting a char
here is a commit that shows those fixes instead of increasing the char array size they changed %d to %h. this also give more accurate description
https://github.com/Mellanox/libvma/commit/b5cb1e34a04b40427d195b14763e462a0a705d23#diff-6258d0a11a435aa372068037fe161d24
I agree with you that it is not strictly necessary, and so by that reason alone is no good in a C library function :)
It might be "nice" for the symmetry of the different flags, but it is mostly counter-productive because it hides the "conversion to int" rule.

Resources