I'm trying to convert from hexadecimal or decimal text to an unsigned int using the "%u" format specifier of sscanf. The result is not correct, for the value 0x01, sscanf is returning a 0 (0).
According to C++ Reference, the definition of "%u" specifier (highlighting is mine):
i, u Integer Any number of digits, optionally preceded by a sign (+ or -).
Decimal digits assumed by default (0-9), but a 0 prefix introduces octal digits (0-7), and 0x hexadecimal digits (0-f).
According to Harbison & Steele, 3rd Edition:
The u conversion Unsigned decimal conversion is performed.
...
The format of the number read is the same as expected for the input to the strtol function with the value 10 for the base argument; that is a sequence of decimal digits optionally preceded by - or +.
Note that one of the definitions allows "0x" to be specified in the string.
I am using IAR EW compiler with the compilation set to C99 dialect.
Which definition is correct for C99?
I am receiving two different results in the following program.
Here is a test program:
#include <stdio.h>
int main(void)
{
const char text[] = "0x01 1";
unsigned int first_value = 0U;
unsigned int second_value = 0U;
signed int arguments_satisfied = 0;
arguments_satisfied = sscanf(text, "%u %u", &first_value, &second_value);
printf("Arguments scanned: %d, first: %d, second: %d\n",
arguments_satisfied, first_value, second_value);
return EXIT_SUCCESS;
}
According to the current C standard, C11, 7.21.6.2/12, only %i deduces the base from the context, all other specifiers fix the base:
i Matches an optionally signed integer, whose format is the same as expected
for the subject sequence of the strtol function with the value 0 for the
base argument. The corresponding argument shall be a pointer to signed
integer.
o Matches an optionally signed octal integer, whose format is the same as
expected for the subject sequence of the strtoul function with the value 8
for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
u Matches an optionally signed decimal integer, whose format is the same as
expected for the subject sequence of the strtoul function with the value 10
for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
x Matches an optionally signed hexadecimal integer, whose format is the same
as expected for the subject sequence of the strtoul function with the value
16 for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
Related
When I build a simple program that lets the user enter a number
(size_t num), I don't understand why input of a negative number
results in a huge number instead of an error message.
size_t num;
printf("enter num:");
scanf("%lu",&num);
printf("%lu",num);
The %u format specifier will actually accept a string representation of a signed integer, with the result being converted to an unsigned integer.
Section 7.21.6.2p12 of the C standard regarding the fscanf function (and by extension, scanf) says the following about the u conversion specifier:
Matches an optionally signed decimal integer, whose format is
the same as expected for the subject sequence of the strtoul
function with the value 10 for the base argument. The
corresponding argument shall be a pointer to unsigned integer.
The conversion from signed to unsigned happens by logically adding the maximum value the unsigned type can hold +1 to the numeric value of the signed type until the result is in the range of the unsigned type. Note that this happen regardless of the underlying representation of the relevant integer types.
So for example, assuming size_t is a 64 bit type, the largest value it can hold is 18446744073709551615. So if you input -1 then 18446744073709551616 is added to -1 to give you 18446744073709551615 which is the result.
This conversion is documented in section 6.3.1.3:
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new
type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.
The specification for the u conversion in C 2018 7.21.6.2 12 says:
Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
(The l modifier further qualifies it to be an unsigned long.)
Thus, a sign is permitted when scanning with %lu. Per paragraph 10:
… the input item … is converted to a type appropriate to the conversion specifier.
Conversions to unsigned long wrap modulo ULONG_MAX+1, so small negative values are converted to large positive values.
Incidentally, to scan a numeral into a size_t, you should use %zu. The z modifier is specifically for size_t.
size_t are unsigned. In the binary representation of a number, the first bit represent the sign (for signed int), so when the computer reads the number thinking it is a size_t, it will not interpret the first bit as a negative sign but as a part of the number. Since it is the first bit, i.e. the highest power of two, you get a large number. You can read more about binary representation here: https://en.wikipedia.org/wiki/Binary_number
There is no error because the the computer just reads the bits indicated in memory by the variable, and this represents a valid size_t, so there is now way for the computer to know that this is wrong.
I'm trying to test against negative numbered variables that are unsigned int. Is that not possible?
It skips the do while loop and outputs a garbage number.
#include<stdio.h>
unsigned int getPositiveInteger(void);
int main(void){
unsigned int i=getPositiveInteger();
printf("The number is %u.\n", i);
return 0;
}
unsigned int getPositiveInteger(void){
int error=0;
unsigned int n=0;
do{
if(error){
printf("The number must be positive!\n");
}
error=0;
printf("What's the number?\n");
scanf("%u", &n);
if(n<1){
error=1;
}
}
while(n<1);
return n;
}
When ran:
What's the number?
-1
The number is 4294967295.
When the input number is signed, scanf with %u produces the value that results from negating the number in the unsigned type, so it always produces a non-negative result.
The specification of the %u conversion for scanf is in C 2018 7.21.6.2 12:
… u Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
For strtoul, 7.22.1.4 3 says:
… [For base value 10] the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a plus or minus sign,…
and 7.22.1.4 5 says:
… If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).
Thus, for input characters “-1”, scanf converts “1” to an unsigned value of 1 and then applies the - operator. Arithmetic in unsigned wraps modulo UINT_MAX+1, so the mathematical negation, −1, wraps to −1+UINT_MAX+1, which is UINT_MAX.
To test whether an input is negative, you can read the individual characters and check for a “-” character. To do this, you can either accumulate characters in a temporary buffer and then use sscanf to process them, or you can skip white space characters until you see either a digit or a “-” (or some other character, which you would treat as an error). If it is a “-”, then report an error. If it is a digit, use ungetc to put it back into the input stream, then use scanf.
To store signed numbers you should use signed data types for variable. That's the point of signed data types.
long int id;
printf("Enter Aircraft Id: (eg abeb11");
scanf("%x",&id);
The id" has to be read as hex value.
but I am getting the warning format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long int’ [-Wformat=]
In C++, we can use setbase().
But I am stuck as to how to do it in C.
You can use %lx format specifier to read in the long hex value.
Also scanf wants the address of the variable to read in.
scanf("%x",id); this will lead to Undefined Behavior.
Hence change as below.
printf("Enter Aircraft Id: (eg abeb11");
scanf("%lx",&id);
No there is not, the standard input/output streams in C are much more low-level and do not support the concept of a base (nor the concept of outputting "a number", they are character streams).
Just use printf():
const int number = 4711;
printf("%d in hex is %x; in octal it's %o\n", number, (unsigned int) number,
(unsigned int) number);
will print:
4711 in hex is 1267, in octal it's 11147
And no, there's no standard way of printing in binary, you're going to have to implement that on your own if you need it.
To input, you need to match the type of the variable with the type implied by the formatting specifier:
if(scanf("%lx", &id) == 1)
{
printf("the ID is %lu (0x%lx)\n", id, id);
}
the type of hexadecimal numbers is unsigned with the printf() and scanf() family of functions.
C++ has a type safe conversion system for reading values from standard streams. setbase(16) is used to change the input base, not to specify the type of the target, which is handed automatically.
in C, the scanf() function uses a format string to specify both the type of the target variable and how to convert it, it has no information about the actual type of the remaining arguments. scanf() supports decimal, octal and hexadecimal conversions:
%d converts an optionally signed value expressed in decimal into an int variable
%u converts an optionally signed value expressed in decimal into an unsigned int variable
%o converts an optionally signed value expressed in octal into an unsigned int variable
%x converts a optionally signed value expressed in hexadecimal into an unsigned int variable,
%lx does the same for an unsigned long int variable. There is no way to tell scanf() that the target variable is a long int, passing the address of a long int for %lx has undefined behavior, but it is accepted and works correctly for positive values on most current systems.
%i converts a optionally signed value expressed in decimal, octal or hexadecimal into an int variable. The base is determined from the initial prefix (after optional spaces and an optional sign): 0 for octal, 0x for hexadecimal, otherwise decimal.
Note that you should test the return value of scanf() to detect invalid or missing input.
Here is a modified version:
unsigned long int id;
printf("Enter Aircraft Id (eg abeb11): ");
if (scanf("%lx", &id) != 1) {
printf("Invalid input\n");
...
}
Is the output of the following programm 1 or 0, or is it undefined behaviour?
int main() {
unsigned char u = 10;
sscanf("1025","%hhu",&u);
printf("u, is it 0 or is it 1? u's value is ... %hhu", u);
}
According to fscanf conversion specifier %u with length modifier hh (i.e. %hhu), semantics is defined based on that of strtoul function and a mapping to type pointer to unsigned char:
12) The conversion specifiers and their meanings are:
"u"
Matches an optionally signed decimal integer, whose format is the same
as expected for the subject sequence of the strtoul function with the
value 10 for the base argument. The corresponding argument shall be a
pointer to unsigned integer.
11) The length modifiers and their
meanings are:
"hh" Specifies that a following d, i, o, u, x,
X, or n conversion specifier applies to an argument with type pointer
to signed char or unsigned char.
But what happens if an input sequence represents an integral value exceeding 8 bits, which part of the integral value is mapped to the 8 bits of an unsigned char? Is it defined that it has to be the least significant part, does it depend on endianess, is it unspecified, or does it even yield undefined behaviour?
I cannot believe that it is undefined or unspecified behaviour. This would mean that user input might introduce such behaviour in a program using scanf("%hhu",&u), and checking user input before every use of scanf looks absurd to me.
Undefined. See one section up:
10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
I am learning C from the book "C Primer Plus" by Stephen Prata. In chapter 4, the author states that in printf(), %o and %x, denote unsigned octal integers and unsigned hexadecimal integers respectively, but in scanf(), %o and %x, interpret signed octal integers and signed hexadecimal integers respectively. Why is it so?
I wrote the following program in VS 2015 to check the author's statement:
#include <stdio.h>
int main(void)
#pragma warning(disable : 4996)
{
int a, b, c;
printf("Enter number: ");
scanf("%x %x", &a, &b);
c = a + b;
printf("Answer = %x\n", c);
while (getchar() != EOF)
getchar();
return 0;
}
The code proved the author's claim.
If the inputs had a pair integers where the absolute value of the positive integer was bigger than the absolute value of the negative integer, then everything worked fine.
But if the inputs had a pair integers where the absolute value of the positive integer was smaller than the absolute value of the negative integer, then the output was what you would expect from unsigned 2's complement.
For example:
Enter number: -5 6
Answer = 1
and
Enter number: -6 5
Answer = ffffffff
The C standard says that for printf-like functions (7.21.6.1 fprintf):
o,u,x,X
The unsigned int argument is converted to unsigned octal
(o), unsigned decimal (u), or unsigned hexadecimal notation (x or X)
While for scanf-like functions it says (7.21.6.2 fscanf):
x
Matches an optionally signed hexadecimal integer, whose format is the same
as expected for the subject sequence of the strtoul function with the value
16 for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
So as an extra feature, you can write a negative hex number and scanf will convert it to the corresponding unsigned number in the system's format (two's complement).
For example
unsigned int x;
scanf("%x", &x); // enter -1
printf("%x", x); // will print ffffffff
Why they felt like scanf needed this mildly useful feature, I have no idea. Perhaps it is there for consistency with other conversion specifiers.
However, the book seems to be using the function incorrectly, since the standard explicitly states that you must pass a pointer to unsigned int. If you pass a pointer to a signed int, you are formally invoking undefined behavior.
Reading the C11 specification, section 7.21.6.2/12, it says for the o format:
Matches an optionally signed octal integer, whose format is the same as
expected for the subject sequence of the strtoul function with the value 8
for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
With corresponding text for the hexadecimal x format.
So on one hand the specification says the input can be signed, but it also says the format is the same as for the strtoul function which reads unsigned integers, and the result is stored in an unsigned integer.
Indeed the author is wrong as #Joachim Pileborg pointed out
This is what the standard says about it
7.21.6.2 The fscanf function1
12 The conversion specifiers and their meanings are:
o Matches an optionally signed octal integer, whose format is the same as
expected for the subject sequence of the strtoul function with the value 8
for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
x Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument. The corresponding argument shall be
a pointer to unsigned integer.
as you can read above it's optionally signed but it certainly expects a pointer to and unsigned integer
1Of course I have omitted a lot, in fact fscanf() is one of the largest sections in the standard document.