I have tried scanf("%u",&number) and I have entered negative number the problem is when I printf("%d",number) I get the negative number. I thought this will prevent me from reading negative number.
Are scanf("%d",&number) and scanf("%u",&number) the same thing really ?
or is it only for readibility.
Am I doing something called undefined behavior so ?
EDIT:
From Wikipedia I read this:
%u : Scan for decimal unsigned int (Note that in the C99 standard the
input value minus sign is optional, so if a minus sign is read, no
errors will arise and the result will be the two's complement of a
negative number, likely a very large value.
It is a little bit confusing reading SO answers and above. Can someone make it more clear ?
Yes, it's undefined behavior, either way.
Considering variable number is of type unsigned, %d in printf() expects an argument of signed int type, passing an unsigned type is UB.
OTOH, if number is signed type, using %u for scanning is UB in first place.
As you might have expected
[...] prevent me from reading negative number
format specifiers are not there to prevent improper input. If the format specifier does not match the supplied argument, it invokes undefined behavior.
Quoting C11, annex J.2, scenarios invoking UB,
The result of a conversion by one of the formatted input functions cannot be
represented in the corresponding object, or the receiving object does not have an
appropriate type
As explained in detail by Sourav Ghosh, using formats that are inconsistent with the actual types passed is a potential problem. Yet for this particular case, on current PC architectures, this is not really a problem, as neither int nor unsigned int have trap representations.
You can scan a negative number with scanf("%u", &number);. It will be negated in the destination type, namely unsigned int, with the same bitwise representation as the negative number in a signed int, for two's complement representation which is almost universal on current architectures.
scanf converts %u by matching an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
The strtol, strtoll, strtoul, and strtoull functions convert the initial portion of the string pointed to by nptr to long int, long long int, unsigned long int, and unsigned long long int representation, respectively. First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling an integer represented in some radix determined by the value of base, and a final string of one or more unrecognized characters, including the terminating null character of the input string. Then, they attempt to convert the subject sequence to an integer, and return the result.
If the value of base is between 2 and 36 (inclusive), the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a plus or minus sign, but not including an integer suffix.
... If the subject sequence has the expected form and the value of base is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).
If the type of number is unsigned int, behavior is defined and the negative value is parsed and stored into number using the unsigned negation semantics. Printing this value with printf("%d", number); is at best implementation defined, but again, on current PC architectures, will print the negative number that was originally parsed by scanf("%u", &number);
Conclusion: although it seems harmless, it is very sloppy to use int and unsigned int interchangeably and to use the wrong formats in printf and scanf. As a matter of fact, mixing signed and unsigned types in expressions, esp. in comparisons is very error prone as the C semantics for such constructions are sometimes counter-intuitive.
Related
6.4.4.4/10 ...If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
I'm having trouble understanding this paragraph. After this paragraph standard gives the example below:
Example 2: Consider implementations that use two’s complement representation for
integers and eight bits for objects that have type char. In an
implementation in which type char has the same range of values as
signed char, the integer character constant '\xFF' has the value −1;
if type char has the same range of values as unsigned char, the
character constant '\xFF' has the value +255.
What i understand from the expression: "value of an object with type char" is the value we get when we interpret the object's content with type char. But when we look to the example it's like talking about the object's value with pure binary notation. Is my understanding wrong? Does an object's value mean the bits in that object always?
All "integer character constants" (the stuff between ' and ') have type int out of tradition and compatibility reasons. But they are mostly meant to be used together with char, so 6.4.4.4/10 needs to make a distinction between the types. Basically patch up the broken C language - we have cases such as *"\xFF" that results in type char but '\xFF' results in type int, which is very confusing.
The value '\xFF' = 255 will always fit in an int on any implementation, but not necessarily in a char, which has implementation-defined signedness (another inconsistency in the language). The behavior of the escape sequence should be as if we stored the character constant in a char, as done in my string literal example *"\xFF".
This need for consistency with char type even though the value is stored in an int is what 6.4.4.4/10 describes. That is, printf("%d", '\xFF'); should behave just as char ch = 255; printf("%d", (int)ch);
The example is describing one possible implementation, where char is either signed or unsigned and the system uses 2's complement. Generally the value of an object with integer type refers to decimal notation. char is an integer type, so it can have a negative decimal value (if the symbol table has a matching index for the value -1 or not is another story). But "raw binary" cannot have a negative value, 1111 1111 can only be said to be -1 if you say the the memory cell should be interpreted as 8 bit 2's complement. That is, if you know that a signed char is stored there. If you know that an unsigned char is stored there, then the value is 255.
When I build a simple program that lets the user enter a number
(size_t num), I don't understand why input of a negative number
results in a huge number instead of an error message.
size_t num;
printf("enter num:");
scanf("%lu",&num);
printf("%lu",num);
The %u format specifier will actually accept a string representation of a signed integer, with the result being converted to an unsigned integer.
Section 7.21.6.2p12 of the C standard regarding the fscanf function (and by extension, scanf) says the following about the u conversion specifier:
Matches an optionally signed decimal integer, whose format is
the same as expected for the subject sequence of the strtoul
function with the value 10 for the base argument. The
corresponding argument shall be a pointer to unsigned integer.
The conversion from signed to unsigned happens by logically adding the maximum value the unsigned type can hold +1 to the numeric value of the signed type until the result is in the range of the unsigned type. Note that this happen regardless of the underlying representation of the relevant integer types.
So for example, assuming size_t is a 64 bit type, the largest value it can hold is 18446744073709551615. So if you input -1 then 18446744073709551616 is added to -1 to give you 18446744073709551615 which is the result.
This conversion is documented in section 6.3.1.3:
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new
type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.
The specification for the u conversion in C 2018 7.21.6.2 12 says:
Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
(The l modifier further qualifies it to be an unsigned long.)
Thus, a sign is permitted when scanning with %lu. Per paragraph 10:
… the input item … is converted to a type appropriate to the conversion specifier.
Conversions to unsigned long wrap modulo ULONG_MAX+1, so small negative values are converted to large positive values.
Incidentally, to scan a numeral into a size_t, you should use %zu. The z modifier is specifically for size_t.
size_t are unsigned. In the binary representation of a number, the first bit represent the sign (for signed int), so when the computer reads the number thinking it is a size_t, it will not interpret the first bit as a negative sign but as a part of the number. Since it is the first bit, i.e. the highest power of two, you get a large number. You can read more about binary representation here: https://en.wikipedia.org/wiki/Binary_number
There is no error because the the computer just reads the bits indicated in memory by the variable, and this represents a valid size_t, so there is now way for the computer to know that this is wrong.
Is the output of the following programm 1 or 0, or is it undefined behaviour?
int main() {
unsigned char u = 10;
sscanf("1025","%hhu",&u);
printf("u, is it 0 or is it 1? u's value is ... %hhu", u);
}
According to fscanf conversion specifier %u with length modifier hh (i.e. %hhu), semantics is defined based on that of strtoul function and a mapping to type pointer to unsigned char:
12) The conversion specifiers and their meanings are:
"u"
Matches an optionally signed decimal integer, whose format is the same
as expected for the subject sequence of the strtoul function with the
value 10 for the base argument. The corresponding argument shall be a
pointer to unsigned integer.
11) The length modifiers and their
meanings are:
"hh" Specifies that a following d, i, o, u, x,
X, or n conversion specifier applies to an argument with type pointer
to signed char or unsigned char.
But what happens if an input sequence represents an integral value exceeding 8 bits, which part of the integral value is mapped to the 8 bits of an unsigned char? Is it defined that it has to be the least significant part, does it depend on endianess, is it unspecified, or does it even yield undefined behaviour?
I cannot believe that it is undefined or unspecified behaviour. This would mean that user input might introduce such behaviour in a program using scanf("%hhu",&u), and checking user input before every use of scanf looks absurd to me.
Undefined. See one section up:
10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
#include<stdio.h>
main()
{
unsigned int num;
printf("enter the number:\n");
scanf("%u",&num);//4294967299 if i'm scanning more than 4G its not scanning
printf("after scanning num=%u\n",num);// 4294967295 why its giving same 4G
/* unsigned char ch;
printf("enter the character:\n");
scanf("%d",&ch);// if i/p=257 so its follow circulation
printf("after scanning ch=%d\n",ch);// 1 its okk why not in int ..
*/
}
Why is circulation not following while scanning input via scanf(), why is it following in case of char?
The C11 standard draft n1570 7.21.6.2 says the following
paragraph 10
[...] the input item [...] is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
Now, the word "conversion" here is used for string => result data type conversion, it cannot be understood to mean integer conversions. As the string "4294967299" converted to a decimal integer is not representable in an object of type unsigned int that is 32-bit wide, the reading of the standard says that the behaviour is undefined, i.e.
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
Thus, the answer to your question is that the C standard doesn't state the behaviour in this case, and the behaviour you see is the one exhibited by your compiler and C library implementation, and is not portable; on other platforms the possible behaviours might include:
ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
From this scanf (and family) reference for the "%u" format:
The format of the number is the same as expected by strtoul() with the value 0 for the base argument (base is determined by the first characters parsed)
Then we go to the strtoul function, and read about the returned value:
Integer value corresponding to the contents of str on success. If the converted value falls out of range of corresponding return type, range error occurs and ULONG_MAX or ULLONG_MAX is returned. If no conversion can be performed, 0 is returned.
From this we can see that if you enter a too large value for the scanf "%u" format, then the result will be ULONG_MAX converted to unsigned int. However the result will differ on systems where sizeof(unsigned long) > sizeof(unsigned int). See below for information about that.
It should be noted that on platforms with 64-bit unsigned long and 32-bit unsigned int, a value that is valid in the range of unsigned long will not be converted to e.g. UINT_MAX, instead it will be converted using modulo arithmetic as detailed here.
Lets take the value like 4294967299. It is to big to fit in a 32-bit unsigned int, but fits very well in a 64-bit unsigned long. Therefore the call to strtoul will not return ULONG_MAX, but the value 4294967299. Using the standard conversion rules (linked to above), this will result in an unsigned int value of 3.
When you cast a character to an int in C, what exactly is happening? Since characters are one byte and ints are four, how are you able to get an integer value for a character? Is it the bit pattern that is treated as a number. Take for example the character 'A'. Is the bit pattern 01000001 (i.e 65 in binary)?
char and int are both integer types.
When you convert a value from any arithmetic (integer or floating-point) type to another arithmetic type, the conversion preserves the value whenever possible. Arithmetic conversions are always defined in terms of values, not representations (though some of the rules are designed to be simply implemented on most hardware).
In your case, you might have:
char c = 'A';
int i = c;
c is an object of type char with the value 65 (assuming an ASCII representation). The conversion from char to int yields an int with the value 65. The compiler generates whatever code is necessary to make that happen; in terms of representation, it could either sign-extend or pad with 0 bits.
This applies when the value of the source expression can be represented as a value of the target type. For a char to int conversion, that's (almost) always going to be the case. For some other conversions, there are various rules for what to do when the value won't fit:
For any conversion to or from floating-point, if the value is out of range the behavior is undefined ((int)1.0e100 may yield some arbitrary value or it can crash your program), and if it's within range but inexact it's approximated by rounding or truncation;
For conversion of a signed or unsigned integer to an unsigned integer, the result is wrapped (unsigned)-1 == UINT_MAX);
For conversion of a signed or unsigned integer to a signed integer, the result is implementation-defined (wraparound semantics are common) -- or an implementation-defined signal can be raised.
(Floating-point conversions also have to deal with precision.)
Other than converting integers to unsigned types, you should generally avoid out-of-range conversions.
Incidentally, though int may happen to be 4 bytes on your system, it could be any size as long as it's able to represent values from -32767 to +32767. The ranges of the various integer types, and even the number of bits in a byte, are implementation-defined (with some restrictions imposed by the standard). 8-bit bytes are almost universal. 32-bit int is very common, though older systems commonly had 16-bit int (and I've worked on systems with 64-bit int).