Scanf is mutating my number - c

I have a program that grabs a user input number and adds the appropriate ending (1st, 11th, 312th, etc)
So I gave it to my friend to find bugs/ break it.
The code is
int number;
printf("\n\nNumber to end?: ");
scanf("%d",&number);
getchar();
numberEnder(number);
when 098856854345712355 was input, scanf passed 115573475 to the rest of the program.
Why? 115573475 is less than the max for an int.

INT_MAX is typically (2^31) -1 which is ~2 billion...
... if you look at 98856854345712355 & 0xffffffff you will find that it is 115573475

The C99 standard states in the fscanf section (7.19.6.2-10):
...the result of the conversion is placed in the object pointed to by
the first argument following the format argument that has not already
received a conversion result. If this object does not have an
appropriate type, or if the result of the conversion cannot be
represented in the object, the behavior is undefined [emphasis added].
This is true in C11 as well, though fscanf is in a different section (7.21.6.2-10).
You can read more about undefined behavior in this excellent answer, but it basically means the standard doesn't require anything of the implementation. Following from that, you shouldn't rely on the fact that today scanf uses the lower bits of the number because tomorrow it could use the high bits, INT_MAX, or anything else.

098856854345712355 is too big of a number for int.
long long number;
printf("\n\nNumber to end?: ");
scanf("%lld",&number);
getchar();
numberEnder(number);
The answer to why you get a specific garbage answer is GIGO. The standard doesn't specify the result for bad input and it's implementation specific. I would hazard a guess that if you apply a 32 bit mask to the input, that's the number you would get.
Just out of curiosity, I looked up an implementation of scanf.
It boils down to
…
res = strtoq(buf, (char **)NULL, base);
…
*va_arg(ap, long *) = res;
…
Which should truncate off the high-bits in the cast. That would support the bit-mask guess, but it would only be the case for this implementation.

Related

Effect of type casting on printf function

Here is a question from my book,
Actually, I don't know what will be the effect on printf function, so I tried the statements in the original system of C lang. Here is my code:
#include <stdio.h>
void main() {
int x = 4;
printf("%hi\n", x);
printf("%hu\n", x);
printf("%i\n", x);
printf("%u\n", x);
printf("%li\n", x);
printf("%lu\n", x);
}
Try it online!
So, the output is very simple. But, is this really the solution to above problem?
There are numerous problems in this question that make it unsuitable for teaching C.
First, to work on this problem at all, we have to assume a non-standard C implementation is used. In standard C, %x is a complete conversion specification, so %xu and %xd cannot be; the conversion specification has already ended before the u or d. And the uses of z in a conversion specification interferes with its standard use for size_t.
Nonetheless, let’s assume this C variant does not have those standard conversion specifications and instead uses the ones shown in the table but that this C variant otherwise conforms to the C standard with minimal changes.
Our next problem is that, in Y num = 42;, we have a plain Y, not the signed Y or unsigned Y shown in the table. Let’s assume signed Y is intended.
Then num is a signed four-bit integer. The greatest value it can represent is 01112 = 710. So it cannot represent 42. Attempting to initialize it with 42 results in a conversion specified by C 2018 6.3.1.3, which says, in part:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The result is we do not know what value is in num or even whether the program continues to execute; it may trap and terminate.
Well, let’s assume this implementation just takes the low bits of the value. 42 is 1010102, so its low four bits are 1010. So if the bits in num are 1010, it is negative. The C standard permits several methods of representation for negative numbers, but we will assume the overwhelmingly most common one, two’s complement, so the bits 1010 in num represent −6.
Now, we get to the printf statements. Except the problem text shows Printf, which is not defined by the C standard. (Are you sure this problem relates to C code at all?) Let’s assume it means printf.
In printf("%xu",num);, if the conversion specification is supposed to work like the ones in standard C, then the corresponding argument should be an unsigned X value that has been promoted to int for the function call. As a two-bit unsigned integer, an unsigned X can represent 0, 1, 2, or 3. Passing it −6 is not defined. So we do not know what the program will print. It might take just the low two bits, 10, and print “2”. Or it might use all the bits and print “-6”. Both of those would be consistent with the requirement that the printf behave as specified for values that are in the range representable by unsigned X.
In printf("%xd",num); and printf("%yu",num);, the same problem exists.
In printf("%yd",num);, we are correctly passing a signed Y value for a signed Y conversion specification, so “-6” is printed.
Then printf("%zu",num); has the same problem with the value mismatched for the type.
Finally, in printf("%zd",num);, the value is again in the correct range, and “-6” is printed.
From all the assumptions we had to make and all the points where the behavior is undefined, you can see this is a terrible exercise. You should question the quality of the book it is in and of any school using it.

Result of Printf ("%d", &a)

I have a code in C .
int a;
a =10;
printf ("%d", &a);
I want to know if some garbage will be printed or error msg will be shown.
Basically I am interested in knowing how printf works. Does it have a list of variables in a table sort of things. If a table is there from where it will take the value of &a
This is a beginners question and deserves to be answered properly. (I'm startled by the downvotes and the non-constructive comments)
Let me give a walk-throu line by line:
int a;
The first line declares a variable with the name a. The variable is of type integer and with normal compilers (Microsoft Visual C++ on Windows, GCC on linux, Clang on Mac) this usually 32 bits wide. The integer variable is signed because you did not specify unsigned. This means it can represent values ranging from –2,147,483,648 to 2,147,483,647.
a =10;
The second line assigns the value 10 to that variable
printf ("%d", &a);
The third line is where you get the surprising result. "%d" is the "format string" it defines how the variables, given as further arguments are formatted and subsequently printed. The format string comprises of normal text (which will be printed normally) and control-sequences. the control sequences start with the character % and end with a letter. The letter specifies the argument type that is expected. d in the above case expects a integer value.
The problem with your code is that you do not specify an itenger value, you give the address of an integer value (with the address-of & operator). The correct statement would be:
printf ("%d", a);
Simple.
I recommend that you have a read on a good C book. I recommend "The C programming language" which is from the original authors of the language. You find this book on amazon, but you find it also online.
You can find the same answers reading in the standard. But to be honest, these documents are not easy reading. You can find a draft of the C11 standard here. The description of the formatting options starts on page 309. (drafts are usually good enough for programming purposes, and are usually available for free).
This is undefined behaviour.
If you are new to C, this may be a surprise. However the C specification defines the behaviour of certain programs (called "correct programs"). If you do not follow the rules in the specification, then your program is not correct, and literally anything may happen. The above link goes into more detail.
Your program doesn't follow the rules because the %d format specifier for printf must have a corresponding argument of type int (after the default argument promotions), but instead you pass an argument of type int *.
Since it is undefined behaviour, the output is meaningless and it is generally not worthwhile to investigate.
It will print the address of the variable 'a'. Remember that the & operator returns the address of the operand.

When can I get away with not declaring int with signed?

In C, signed integers like -1 are supposedly supposed to be declared with the keyword signed, like so:
signed int i = -1;
However, I tried this:
signed int i = -2;
unsigned int i = -2;
int i = -2;
and all 3 cases print out -2 with printf("%d", i);. Why?
Since you confirmed you are printing using:
printf("%d", i);
this is undefined behavior in the unsigned case. This is covered in the draft C99 standard section 7.19.6.1 The fprintf function which also covers printf for format specifiers, it says in paragraph 9:
If a conversion specification is invalid, the behavior is undefined.248)[...]
The standard defined in section 3.4.3 undefined behavior as:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
and further notes:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Finally, we can see that int is the same as signed int. We can see this by going to section 6.7.2 Type specifiers, in paragraph 2 it groups int as follows:
int, signed, or signed int
and later on says in paragraph 5 says:
Each of the comma-separated sets designates the same type, except that for bit-field[...]
The way an integer variable is printed, is subjected to the format string that you pass to printf:
If you use %d, then you'll be printing it as a signed integer.
If you use %u, then you'll be printing it as an unsigned integer.
printf has no way of knowing what you pass to it. C compiler does the default type promotions on passing the arguments, and then the function itself reinterprets the values in accordance with the format specifiers that you pass, because it has no other information regarding the type of the value that you passed.
When you pass an unsigned int to printf in a position of %d, it is undefined behavior. Your program is incorrect, and it could print anything.
It happens that on hardware that represent negative numbers in two's complement representation you get the same number that you started with. However, this is not a universal rule.
unsigned int i = -2; // i actually holds 4294967294
printf("%d", i); // printf casts i back to an int which is -2 hence the same output
You've got 2 things going on:
Signed and unsigned are different ways of interpreting the same 64 (or 32, or whatever) bits.
Printf is a variadic function which accepts parameters of different types
You passed a signed value (-2) to an unsigned variable, and then asked printf to interpret it as signed.
Remember that "signed" and unsigned have to do with how arithmetic is done on the numbers.
the printf family accepts internally casts whatever you pass in based on the format designators. (this is the nature of variadic functions that accept more than one type of parameter. They cannot use traditional type safety mechanisms)
This is all very well, but not all things will work the same.
Addition and subtraction work the same on most architectures (as long as you're not on some oddball architecture that doesn't use 2's complement for representing negative numbers
Multiplication and division may also work the same.
Inequality comparisons are the hardest thing to know how they will work, and I have been bit a number of times doing a comparison between signed and unsigned that I thought would be ok, be cause they were in the small signed number range.
Thats what "undefined" means. Behaviour is left to the compiler and hardware implementers and cannot be relied to be the same between architectures or even over time on the same architecture.

Why scanf llu doesn't validate %llu?

I'm using scanf("%llu%c", &myUnsignedLong, &checkerNewLine) to validate a number >0 && <2^64, where checkerNewLine is used to clear the buffer if someone try to insert a letter ( while (getchar() != '\n'); ).
But if I try inserting a negative number, like -541231, scanf succeed and return 2 ( as the number of parameters matched ). Obviously the number stored in myUnsignedLong is 2^64-541231, but it is NOT my intention.
Any simple way to solve this? Thanks a lot
From the specification (7.9.16.2):
u Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
Note that the input is optionally signed. This explains why the code behaves as it does.
However I can't offer any good explanation for why it's specified this way. Since you can't store a negative value in an unsigned variable in a well-defined way, I suspect that this may actually invoke undefined behavior.

Wrong output from printf of a number

int main()
{
double i=4;
printf("%d",i);
return 0;
}
Can anybody tell me why this program gives output of 0?
When you create a double initialised with the value 4, its 64 bits are filled according to the IEEE-754 standard for double-precision floating-point numbers. A float is divided into three parts: a sign, an exponent, and a fraction (also known as a significand, coefficient, or mantissa). The sign is one bit and denotes whether the number is positive or negative. The sizes of the other fields depend on the size of the number. To decode the number, the following formula is used:
1.Fraction × 2Exponent - 1023
In your example, the sign bit is 0 because the number is positive, the fractional part is 0 because the number is initialised as an integer, and the exponent part contains the value 1025 (2 with an offset of 1023). The result is:
1.0 × 22
Or, as you would expect, 4. The binary representation of the number (divided into sections) looks like this:
0 10000000001 0000000000000000000000000000000000000000000000000000
Or, in hexadecimal, 0x4010000000000000. When passing a value to printf using the %d specifier, it attempts to read sizeof(int) bytes from the parameters you passed to it. In your case, sizeof(int) is 4, or 32 bits. Since the first (rightmost) 32 bits of the 64-bit floating-point number you supply are all 0, it stands to reason that printf produces 0 as its integer output. If you were to write:
printf("%d %d", i);
Then you might get 0 1074790400, where the second number is equivalent to 0x40100000. I hope you see why this happens. Other answers have already given the fix for this: use the %f format specifier and printf will correctly accept your double.
Jon Purdy gave you a wonderful explanation of why you were seeing this particular result. However, bear in mind that the behavior is explicitly undefined by the language standard:
7.19.6.1.9: If a conversion specification is invalid, the behavior is undefined.248) If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
(emphasis mine) where "undefined behavior" means
3.4.3.1: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
IOW, the compiler is under no obligation to produce a meaningful or correct result. Most importantly, you cannot rely on the result being repeatable. There's no guarantee that this program would output 0 on other platforms, or even on the same platform with different compiler settings (it probably will, but you don't want to rely on it).
%d is for integers:
int main()
{
int i=4;
double f = 4;
printf("%d",i); // prints 4
printf("%0.f",f); // prints 4
return 0;
}
Because the language allows you to screw up and you happily do it.
More specifically, '%d' is the formatting for an int and therefore printf("%d") consumes as many bytes from the arguments as an int takes. But a double is much larger, so printf only gets a bunch of zeros. Use '%lf'.
Because "%d" specifies that you want to print an int, but i is a double. Try printf("%f\n"); instead (the \n specifies a new-line character).
The simple answer to your question is, as others have said, that you're telling printf to print a integer number (for example a variable of the type int) whilst passing it a double-precision number (as your variable is of the type double), which is wrong.
Here's a snippet from the printf(3) linux programmer's manual explaining the %d and %f conversion specifiers:
d, i The int argument is converted to signed decimal notation. The
precision, if any, gives the minimum number of digits that must
appear; if the converted value requires fewer digits, it is
padded on the left with zeros. The default precision is 1.
When 0 is printed with an explicit precision 0, the output is
empty.
f, F The double argument is rounded and converted to decimal notation
in the style [-]ddd.ddd, where the number of digits after the
decimal-point character is equal to the precision specification.
If the precision is missing, it is taken as 6; if the precision
is explicitly zero, no decimal-point character appears. If a
decimal point appears, at least one digit appears before it.
To make your current code work, you can do two things. The first alternative has already been suggested - substitute %d with %f.
The other thing you can do is to cast your double to an int, like this:
printf("%d", (int) i);
The more complex answer(addressing why printf acts like it does) was just answered briefly by Jon Purdy. For a more in-depth explanation, have a look at the wikipedia article relating to floating point arithmetic and double precision.
Because i is a double and you tell printf to use it as if it were an int (%d).
#jagan, regarding the sub-question:
What is Left most third byte. Why it is 00000001? Can somebody explain?"
10000000001 is for 1025 in binary format.

Resources