Why scanf llu doesn't validate %llu? - c

I'm using scanf("%llu%c", &myUnsignedLong, &checkerNewLine) to validate a number >0 && <2^64, where checkerNewLine is used to clear the buffer if someone try to insert a letter ( while (getchar() != '\n'); ).
But if I try inserting a negative number, like -541231, scanf succeed and return 2 ( as the number of parameters matched ). Obviously the number stored in myUnsignedLong is 2^64-541231, but it is NOT my intention.
Any simple way to solve this? Thanks a lot

From the specification (7.9.16.2):
u Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
Note that the input is optionally signed. This explains why the code behaves as it does.
However I can't offer any good explanation for why it's specified this way. Since you can't store a negative value in an unsigned variable in a well-defined way, I suspect that this may actually invoke undefined behavior.

Related

output of negative integer to %u format specifier

Consider the following code
char c=125;
c+=10;
printf("%d",c); //outputs -121 which is understood.
printf("%u",c); // outputs 4294967175.
printf("%u",-121); // outputs 4294967175
%d accepts negative numbers therefore output is -121 in first case.
output in case 2 and case 3 is 4294967175. I don't understand why?
Do
232 - 121 = 4294967175
printf interprets data you provide thanks to the % values
%d signed integer, value from -231 to 231-1
%u unsigned integer, value from 0 to 232-1
In binary, both integer values (-121 and 4294967175) are (of course) identical:
`0xFFFFFF87`
See Two's complement
printf is a function with variadic arguments. In such case a "default argument promotions" are applied on arguments before the function is called. In your case, c is first converted from char to int and then sent to printf. The conversion does not depend on the corresponding '%' specifier of the format. The value of this int parameter can be interpreted as 4294967175 or -121 depending on signedness. The corresponding parts in the C standard are:
6.5.2.2 Function call
6 - ... If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions.
7- If the expression that denotes the called function has a type that does include a prototype,
the arguments are implicitly converted, as if by assignment, to the types of the
corresponding parameters, taking the type of each parameter to be the unqualified version
of its declared type. The ellipsis notation in a function prototype declarator causes
argument type conversion to stop after the last declared parameter. The default argument
promotions are performed on trailing arguments.
If char is signed in your compiler (which is the most likely case) and is 8 bits wide (extremely likely,) then c+=10 will overflow it. Overflow of a signed integer results in undefined behavior. This means you can't reason about the results you're getting.
If char is unsigned (not very likely on most PC platforms), then see the other answers.
printf uses something called variadic arguments. If you make a brief research about them you'll find out that the function that uses them does not know the type of the input you're passing to it. Therefore there must be a way to tell the function how it must interpret the input, and you're doing it with the format specifiers.
In your particular case, c is a 8-bit signed integer. Therefore, if you set it to the literal -121 inside it, it will memorize: 10000111. Then, by the integer promotion mechanism you have it converted to an int: 11111111111111111111111110000111.
With "%d" you tell printf to interpret 11111111111111111111111110000111 as a signed integer, therefore you have -121 as output. However, with "%u" you're telling printf that 11111111111111111111111110000111 is an unsigned integer, therefore it will output 4294967175.
EDIT: As stated in the comments, actually the behaviour is undefined in C. That's because you have more than one way to encode negative numbers (sign and modulo, One's complement, ...) and sone other aspects (such as endianness, if I'm not wrong, influences this result). So the result is said to be implementation defined. Therefore you may get a different output rather than 4294967175. But the main concepts I explained for different interpretation of the same string of bits and the lossness of the type of data in variadic arguments still hold.
Try to convert the number into base 10, first as a pure binary number, then knowing that it's memorized in 32-bit Two's complement... you get two different results. But if I do not tell you which intepretation you need to use, that binary string can represent everything (a 4-char ASCII string, a number, a small 8-bit 2x2 image, your safe combination, ...).
EDIT: you can think of "%<format_string>" as a sort of "extension" for that string of bits. You know, when you create a file, you usually give it an extension, which is actually a part of the filename, to remember in which format/encoding that file has been stored. Let's suppose you have your favorite song saved as song.ogg file on your PC. If you rename the file in song.txt, song.odt, song.pdf, song, song.akwardextension, that does not change the content of the file. But if you try to open it with the program usually associated to .txt or .whatever, it reads the bytes in the file, but when it tries to interpret sequences of bytes it may fail (that's why if you open song.ogg with Emacs or VIm or whatever text editor you get sonething that looks like garbage information, if you open it with, for instance, GIMP, GIMP cannot read it, and if you open it with VLC you listen to your favorite song). The extension is just a reminder for you: it reminds you how to interpret that sequence of bits. As printf has no knowledge for that interpretation, you need to provide it one, and if you tell printf that a signed integer is acutally unsigned, well, it's like opening song.ogg with Emacs...

K&R 1-7 is it solvable by using putchar() instead of printf?

There are many questions about this exercise all over the internet, but I couldn't find any solution (nor any hint) on how to solve this exercise using 'putchar'.
Write a program to print the value of EOF.
I can easily get a working answer to this:
printf("%d", EOF);
I'd like to know if there are any known (or logical) answers using 'putchar' (as I guess that was the whole purpose of the exercise, being located at the end of a paragraph on 'getchar' and 'putchar')
Writing:
putchar(EOF);
or
int c;
c = EOF;
putchar(c);
the program just launches and closes itself without showing any text.
putchar converts its argument to unsigned char before it's outputted (and it's a character which is written, not the result of a decimal conversion). As EOF is negative, this conversion isn't value-preserving. Commonly, EOF has the value -1, with 8-bit char, that conversion results in 255, that is, putchar(EOF) is equivalent to putchar('\xff'), which isn't a printable character (assuming an ASCII system). Piping the output to hexdump -C would make the output visible.
Without using printf and friends, a function outputting a decimal number for an int can be written. Basically,
print a '-' if the value is negative
for every decimal digit, starting from the highest-valued:
convert the digit's value to a character (e.g. 7 should become '7')
try to print the result (with putchar, for example)
if that fails, return an error indicator
Hint: The division and modulus operators, / and %, will be useful.
The digits '0' ... '9' are ascending (and contiguous); so for the conversion from the digit's value to a character, adding '0' yields the desired result (3 + '0' is '3', for example).
Care must be taken to avoid integer overflow, even if corner cases like INT_MIN are passed to the function. -INT_MIN may result in an overflow (and in fact does on pretty much every system), a positive number, on the other hand, can always be negated. A value of 0 may need special handling.
(Alternatively, the number can be converted to a string first which then can be outputted. A char array of size 1 + sizeof(int)*CHAR_BIT/3+1 + 1 is big enough to hold that string, including minus sign and 0-terminator.)
If you get stuck, the (non-standard) itoa function does something similar, looking for example implementations may give some ideas.
EOF is not a char which can be printed as you expected. So putchar(EOF) doesn't print any values, as it prints only char.
Hence use printf("%d", EOF) which outputs integer -1.
putchar(int char) So if you pass a ASCII value to this API you get a corresponding character printed on the screen but EOF evaluates to -1 which is not a printable ASCII character so you don't see anything or might see some junk.
As per the man page of putchar(),
int putchar(int c);
putchar(c); is equivalent to putc(c,stdout).
and
putc() is equivalent to fputc() .......fputc() writes the character c, cast to an unsigned char, to stream.
This, putchar() is supposed to output the char representation of the ASCII value supplied as it's argument.
So, if the value of EOF is a non printable character [in ASCII], you won't see anything ["without showing any text"] on stdout.

Scanf is mutating my number

I have a program that grabs a user input number and adds the appropriate ending (1st, 11th, 312th, etc)
So I gave it to my friend to find bugs/ break it.
The code is
int number;
printf("\n\nNumber to end?: ");
scanf("%d",&number);
getchar();
numberEnder(number);
when 098856854345712355 was input, scanf passed 115573475 to the rest of the program.
Why? 115573475 is less than the max for an int.
INT_MAX is typically (2^31) -1 which is ~2 billion...
... if you look at 98856854345712355 & 0xffffffff you will find that it is 115573475
The C99 standard states in the fscanf section (7.19.6.2-10):
...the result of the conversion is placed in the object pointed to by
the first argument following the format argument that has not already
received a conversion result. If this object does not have an
appropriate type, or if the result of the conversion cannot be
represented in the object, the behavior is undefined [emphasis added].
This is true in C11 as well, though fscanf is in a different section (7.21.6.2-10).
You can read more about undefined behavior in this excellent answer, but it basically means the standard doesn't require anything of the implementation. Following from that, you shouldn't rely on the fact that today scanf uses the lower bits of the number because tomorrow it could use the high bits, INT_MAX, or anything else.
098856854345712355 is too big of a number for int.
long long number;
printf("\n\nNumber to end?: ");
scanf("%lld",&number);
getchar();
numberEnder(number);
The answer to why you get a specific garbage answer is GIGO. The standard doesn't specify the result for bad input and it's implementation specific. I would hazard a guess that if you apply a 32 bit mask to the input, that's the number you would get.
Just out of curiosity, I looked up an implementation of scanf.
It boils down to
…
res = strtoq(buf, (char **)NULL, base);
…
*va_arg(ap, long *) = res;
…
Which should truncate off the high-bits in the cast. That would support the bit-mask guess, but it would only be the case for this implementation.

strtol using errno

I have the following code:
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
void main(void)
{
int data;
char * tmp;
data = strtol("23ef23",&tmp,10);
printf("%d",errno);
getchar();
}
output is 0 ...
why?
i am using visual studio 2010 C++
code must be C89 compatible.
strtol only sets errno for overflow conditions, not to indicate parsing failures. For that purpose, you have to check the value of the end pointer, but you need to store a pointer to the original string:
char const * const str = "blah";
char const * endptr;
int n = strtol(str, &endptr, 0);
if (endptr == str) { /* no conversion was performed */ }
else if (*endptr == '\0') { /* the entire string was converted */ }
else { /* the unconverted rest of the string starts at endptr */ }
I think the only required error values are for underflow and overflow.
Conversely, if the entire string has been consumed in the conversion, you have *endptr = '\0', which may be an additional thing you might want to check.
Your logic does not fit with the 'spec'.
see this
An invalid value does not necessarily set 'errno'.
(copy follows)
long int strtol ( const char * str, char ** endptr, int base );
Convert string to long integer
Parses the C string str interpreting its content as an integral number of the specified base, which is returned as a long int value.
The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes as many characters as possible that are valid following a syntax that depends on the base parameter, and interprets them as a numerical value. Finally, a pointer to the first character following the integer representation in str is stored in the object pointed by endptr.
If the value of base is zero, the syntax expected is similar to that of integer constants, which is formed by a succession of:
An optional plus or minus sign
An optional prefix indicating octal or hexadecimal base ("0" or "0x" respectively)
A sequence of decimal digits (if no base prefix was specified) or either octal orhexadecimal digits if a specific prefix is present
If the base value is between 2 and 36, the format expected for the integral number is a succession of the valid digits and/or letters needed to represent integers of the specified radix (starting from '0' and up to 'z'/'Z' for radix 36). The sequence may optionally be preceded by a plus or minus sign and, if base is 16, an optional "0x" or "0X" prefix.
If the first sequence of non-whitespace characters in str is not a valid integral number as defined above, or if no such sequence exists because either str is empty or it contains only whitespace characters, no conversion is performed.
Parameters
str
C string containing the representation of an integral number.
endptr
Reference to an object of type char*, whose value is set by the function to the next character in str after the numerical value.
This parameter can also be a null pointer, in which case it is not used.
Return Value
On success, the function returns the converted integral number as a long int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, LONG_MAX or
LONG_MIN is returned, and the global variable errno is set to ERANGE.
It has been 10 years since the question was first posted, but the problem does not age. The answers given are either out of date (yet true for their time) or a bit confusing because I had to search more.
I have seen this in a book and met this post while searching for its meaning, and while checking the page for strtol, I ended up in this page on cplusplus.com of errno macro.
Your question has 2 parts to answer here:
First lets make a note of these 2 things about errno:
1- errno can be anything during the execution of a program for no function resets it (unless your own function does so)
errno is set to zero at program startup ...
any function ... can modify its value ...
no ... function sets its value back to zero
2- one has to reset it before calling a function that may use it.
should be reset ... to zero before the call ... since ... previous ... function may have altered its value
your program is pretty small, so no function seems to be there to change it. The sole visitors of errno are main program to set it to zero, and strtol in case of any error.
Yet, your program shows errno is 0, and this is confusing because one expects 23ef23 would not be converted to a number since it includes letters in it. However, this expectation is wrong, and actually, you get a number from this string thus there is really no error here, so no change is made to errno. and this makes the second part of the answer.
you will find this definition in strtol page
... takes as many characters as possible that are valid following a
syntax that depends on the base parameter, and interprets them as a
numerical value ... a pointer to the first character following is
stored.
instead of a long explanation, this following print statement and its output will suffice to visualize that above definition:
printf("%d %d %s",data,errno,tmp);
23 0 ef23
if you set the base to 16, output would be 2354979 0 . And base 2 would give 0 0 23ef23, showing that strtol will not freak if it does not find a number. The only error it will give will be ERANGE for breaching limits:
If the value read is out of the range of representable values by a
long int, the function returns LONG_MAX or LONG_MIN (defined in
), and errno is set to ERANGE.
You have to set errno to 0 before you call strtol. Otherwise you overwrite whatever value strtol set errno to.
You have to check tmp is not the same as "blablabla" pointer.
If data == 0 and tmp == "blablabla", then the input data is in the incorrect format. errno needs not to be set by the implementation if the input data is not in the expected format.
On strtol, strtoll, strtoul, and strtoull functions C says:
(C99, 7.20.1.4p7) If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of nptr is stored in the object pointed to by endptr, provided that endptr is not a null pointer.
(C99, 7.20.1.4p9) The strtol, strtoll, strtoul, and strtoull functions return the converted
value, if any. If no conversion could be performed, zero is returned.

C Compatibility Between Integers and Characters

How does C handle converting between integers and characters? Say you've declared an integer variable and ask the user for a number but they input a string instead. What would happen?
The user input is treated as a string that needs to be converted to an int using atoi or another conversion function. Atoi will return 0 if the string cannot be interptreted as a number because it contains letters or other non-numeric characters.
You can read a bit more at the atoi documentation on MSDN - http://msdn.microsoft.com/en-us/library/yd5xkb5c(VS.80).aspx
Uh?
You always input a string. Then you parse convert this string to number, with various ways (asking again, taking a default value, etc.) of handling various errors (overflow, incorrect chars, etc.).
Another thing to note is that in C, characters and integers are "compatible" to some degree. Any character can be assigned to an int. The reverse also works, but you'll lose information if the integer value doesn't fit into a char.
char foo = 'a'; // The ascii value representation for lower-case 'a' is 97
int bar = foo; // bar now contains the value 97
bar = 255; // 255 is 0x000000ff in hexadecimal
foo = bar; // foo now contains -1 (0xff)
unsigned char foo2 = foo; // foo now contains 255 (0xff)
As other people have noted, the data is normally entered as a string -- the only question is which function is used for doing the reading. If you're using a GUI, the function may already deal with conversion to integer and reporting errors and so in an appropriate manner. If you're working with Standard C, it is generally easier to read the value into a string (perhaps with fgets() and then convert. Although atoi() can be used, it is seldom the best choice; the trouble is determining whether the conversion succeeded (and produced zero because the user entered a legitimate representation of zero) or not.
Generally, use strtol() or one of its relatives (strtoul(), strtoll(), strtoull()); for converting floating point numbers, use strtod() or a similar function. The advantage of the integer conversion routines include:
optional base selection (for example, base 10, or base 10 - hex, or base 8 - octal, or any of the above using standard C conventions (007 for octal, 0x07 for hex, 7 for decimal).
optional error detection (by knowing where the conversion stopped).
The place I go for many of these function specifications (when I don't look at my copy of the actual C standard) is the POSIX web site (which includes C99 functions). It is Unix-centric rather than Windows-centric.
The program would crash, you need to call atoi function.

Resources