I have written a piece of code that I am using to research the behavior of different libraries and functions. And doing so, I stumbled upon some strange behavior with sscanf.
I have a piece of code that reads an input into a buffer, then tries to put that value into a numeric variable.
When I call sscanf from main using the input buffer, and the format specifier %x yields a garbage value if the input string is shorter than the buffer. Let's say I enter 0xff, I get an arbitrarily large random number every time. But when I pass that buffer to a function, all calls to scanf result in 255 (0xff) like I expect, regardless of type and format specifier mismatch.
My question is, why does this happen in the function main but not in the function test?
This is the code:
#include <stdio.h>
int test(char *buf){
unsigned short num;
unsigned int num2;
unsigned long long num3;
sscanf(buf, "%x", &num);
sscanf(buf, "%x", &num2);
sscanf(buf, "%x", &num3);
printf("%x", num);
printf("%x", num2);
printf("%x", num3);
return 0;
}
void main(){
char buf[16];
unsigned long long num;
printf("%s","Please enter the magic number:");
fgets(buf, sizeof(buf),stdin);
sscanf(buf, "%x", &num);
printf("%x\n", num);
test(&buf);
}
I expect the behavior to be cohesive; all calls should fail, or all calls should succeed, but this is not the case.
I have tried to read the documentation and do experiments with different types, format specifiers, and so on. This behavior is present across all numeric types.
I have tried compiling on different platforms; gcc and Linux behave the same, as do Windows and msvc.
I also disassembled the binary to see if the call to sscanf differs between main() and test(), but that assembly is identical. It loads the pointer to the buffer into a register and pushes that register onto the stack, and calls sscanf.
Now just to be clear:
This happens consistently, and num in main is never equal to num, num2 or num3 in test, but num, num2 and num3 are always equal to each other.
I would expect this to cause undefined behavior and not be consistent.
Output when run - every time
./main
Please enter the magic number: 0xff
0xaf23af23423 <--- different every time
0xff <--- never different
0xff <--- never different
0xff <--- never different
The current reasoning I have is in one instance sscanf is interpreting more bytes than in the other. It seems to keep evaluating the entire buffer, getting impacted by residual data in memory.
I know I can make it behave correctly by either filling the buffer, with that last byte being a new line or using the correct format specifier to match the pointer type. "%llx" for main in this case. So that is not what I am wondering; I have made that error on purpose.
I am wondering why using the wrong format specifier works in one case but not in the other consistently when the code runs.
sscanf with %x should be used only with the address of an unsigned int. When an address of another object is passed, the behavior is not defined by the C standard.
With a pointer to a wider object, the additional bytes in the object may hold other values (possibly leftover from when the startup code prepared the process and called main). With a pointer to a narrower object, sscanf may write bytes outside of the object. With compiler optimization, a variety of additional behaviors are possible. These various possibilities may manifest as large numbers, corruption in data, program crashes, or other behaviors.
Additionally, printing with incorrect conversion specifiers is not defined by the C standard, and can cause errors in printf attempting to process the arguments passed to it.
Use %hx to scan into an unsigned short. Use %lx to scan into an unsigned long. Use %llx to scan into an unsigned long long. Also use those conversion specifiers when printing their corresponding types.
My question is, why does this happen in the function main but not in the function test?
One possibility is the startup code used a little stack space while setting up the process, and this left some non-zero data in the bytes that were later used for num in main. The bytes lower on the stack held zero values, and these bytes were later used for num3 in test.
The argument expression in this call
test(&buf);
has the type char ( * )[16] but the function expects an argument of the type char *
int test(char *buf){
There is no implicit conversion between these pointer types.
You need to call the function like
test( buf );
Also it seems there is a typo
printf("%s","Please enter the magic number:");
printf("%x\n", num);
The variable num is not initialized.
In this call
unsigned long long num;
//...
sscanf(buf, "%x", &num);
you are using the third argument of the type unsigned long long int * but the conversion specification "%x" expects an argument of the type unsigned int *. So the call has undefined behavior.
You need to write
sscanf(buf, "%llx", &num);
The same problem exists for the used variable num that has the type unsigned short
unsigned short num;
//...
sscanf(buf, "%x", &num);
You have to write
sscanf(buf, "%hx", &num);
The same length modifiers you need to use in calls of printf
printf("%hx", num);
printf("%x", num2);
printf("%llx", num3);
Here is a demonstration program.
#include <stdio.h>
int main( void )
{
char buf[] = "0xff\n";
unsigned short num;
unsigned int num2;
unsigned long long num3;
sscanf( buf, "%hx", &num );
sscanf( buf, "%x", &num2 );
sscanf( buf, "%llx", &num3 );
printf( "%hx\n", num );
printf( "%x\n", num2 );
printf( "%llx\n", num3 );
}
The program output is
ff
ff
ff
Related
Why the value of the input variable is set to zero if I pass incorrectly ordered type specifier for id variable?
#include <stdio.h>
#include <stdlib.h>
#include<string.h>
#define MAX 100
int main()
{
int i=0;
int input;
char *name=(char *)malloc(sizeof(char)*MAX);
unsigned short int id;
printf("\nPlease enter input:\t");
scanf("%d", &input);
getchar();//to take \n
printf("\nEnter name.....\n");
fgets(name,MAX,stdin);
printf("\nEnter id: ");
scanf("%uh",&id);//type specifier should have been %hu
printf("%d",input);//value set to 0?
}
Why is input being overridden by scanf("%uh", &id) ?
Jyoti Rawat
You have what I call a memory over run error. Microsoft (Visual Studio) calls it:
"Stack around the variable ‘id’ was corrupted"
Not all computer systems treat memory over runs errors the same. Visual Studio/Windows catch this error and throw an exception.
OpenVMS would just execute the instruction and then continue on to the next instruction. In either case, if the program continues, its behavior will be undefined.
In English, you are writing more bits of data into a variable that cannot contain it. The extra bits are written to other memory locations not assigned to the variable. As mention by chux, statements that follow will have undefined behavior.
Solution:
As you already know and mention by others, change the format specifier from "uh" to "hu".
Things to note about your code:
As stated my Adrian, using "hu" as a format specifier will work.
This specifier expects a pointer to a unsigned short int variable. Click here to go to scanf format specifiers wiki page.
When you specify “%uh”, scanf pares out the %u as the format
specifier and treats the rest ('h') as text to be ignored.
The format specifier 'u' expects a pointer to an unsigned int variable (unsigned int *).
The format specifier 'hu' expects a pointer to an unsigned short variable (unsigned short
int *).
Why the value of the input variable is set to zero if I pass incorrectly ordered type specifier to id variable?
unsigned short int id;
scanf("%uh",&id);//type specifier should have been %hu
scanf() first uses "%u" and looks for a matching unsigned * argument. As code passed a unsigned short * the result is undefined behavior (UB).
The result of 0 is not demo'd with printf("%d",input);. Perhaps OP meant printf("%d",id);?
In any case, code after the UB of scanf() is irrelevant. It might print 0 today or crash the code tomorrow - it is UB.
EDIT (After re-reading question/comments):
Mentat has nailed the problem! I'll leave some fragments of my original answer, with some additions ...
(1) The warning about using %uh should be accepted at face value: to input a short integer (signed or unsigned), you need the h modifier before the type specifier (u or d) - that's just the way scanf format specifiers work.
(2) Your code happens to work on MSVC, so I missed the point. With clang-cl I found your error: As the h size modifier is being ignored, the value read in by '%uh' is a (long or 32-bit) unsigned, which is being written to memory that you've allocated as a short (16-bit) unsigned. As it happens, that variable (id)is in memory next to input, so part of input is being overwritten by the high 16-bits of the 32-bit unsigned you are writing to id.
(3) Try entering a value of 65537 for your first input and then the value will still be modified (probably to 65536), but not cleared to zero.
(4) Recommend: Accept and upvote the answer from Mentat. Feel free to downvote my answer!
Yours, humbly!
pixel_data is a vector of char.
When I do printf(" 0x%1x ", pixel_data[0] ) I'm expecting to see 0xf5.
But I get 0xfffffff5 as though I was printing out a 4 byte integer instead of 1 byte.
Why is this? I have given printf a char to print out - it's only 1 byte, so why is printf printing 4?
NB. the printf implementation is wrapped up inside a third party API but just wondering if this is a feature of standard printf?
You're probably getting a benign form of undefined behaviour because the %x modifier expects an unsigned int parameter and a char will usually be promoted to an int when passed to a varargs function.
You should explicitly cast the char to an unsigned int to get predictable results:
printf(" 0x%1x ", (unsigned)pixel_data[0] );
Note that a field width of one is not very useful. It merely specifies the minimum number of digits to display and at least one digit will be needed in any case.
If char on your platform is signed then this conversion will convert negative char values to large unsigned int values (e.g. fffffff5). If you want to treat byte values as unsigned values and just zero extend when converting to unsigned int you should use unsigned char for pixel_data, or cast via unsigned char or use a masking operation after promotion.
e.g.
printf(" 0x%x ", (unsigned)(unsigned char)pixel_data[0] );
or
printf(" 0x%x ", (unsigned)pixel_data[0] & 0xffU );
Better use the standard-format-flags
printf(" %#1x ", pixel_data[0] );
then your compiler puts the hex-prefix for you.
Use %hhx
printf("%#04hhx ", foo);
Then length modifier is the minimum length.
Width-specifier in printf is actually min-width. You can do printf(" 0x%2x ", pixel_data[0] & 0xff) to print lowes byte (notice 2, to actually print two characters if pixel_data[0] is eg 0xffffff02).
I wrote a code to print size of different data types in C .
#include<stdio.h>
int main()
{
printf("%d", sizeof(int));//size of integer
printf("%d", sizeof(float));
printf("%d", sizeof(double));
printf("%d", sizeof(char));
}
This does not work , but if I replace %d with %ld, it works. I did not understand why I have to take long int to print a small range number.
Both of those are wrong you must use %zu to print values of type size_t, which is what sizeof return.
This is because different values have different size, and you must match them.
It's undefined behavior to mismatch like you do, so anything could happen.
This is because sizes mismatch. By either using %zu or using %u and casting to unsigned you may fix the problem.
Currently, your implementation is undefined behaviour.
printf("%u", (unsigned)sizeof(int));//size of integer
printf("%u", (unsigned)sizeof(float));
printf("%u", (unsigned)sizeof(double));
printf("%u", (unsigned)sizeof(char));
Since stdout is new line buffered, don't forget to print \n at the end to get anything to screen.
sizeof has the return type size_t. From the Standard,
6.5.3.4 The sizeof and _Alignof operators
5 The value of the result of both operators is
implementation-defined, and its type (an unsigned integer type) is
size_t, defined in <stddef.h> (and other headers).
size_t is implementation-defined. In my linux, size_t is defined as __SIZE_TYPE__. On this topic, one can find details here.
In your case, it happens that size_t is implemented as a long , longer than int.
I did not understand why I have to take long int to print a small range number.
Because size_t may represent values much larger than what an int can support; on my particular implementation, the max size value is 18446744073709551615, which definitely won't fit in an int.
Remember that the operand of sizeof may be a parenthesized type name or an object expression:
static long double really_big_array[100000000];
...
printf( "sizeof really_big_array = %zu\n", sizeof really_big_array );
size_t must be able to represent the size of the largest object the implementation allows.
You say it does not work, but you do not say what it does. The most probable reason for this unexpected behavior is:
the conversion specifier %d expects an int value. sizeof(int) has type size_t which is unsigned and, on many platforms, larger than int, causing undefined behavior.
The conversion specifier and the type of the passed argument must be consistent because different types are passed in different ways to a vararg function like printf(). If you pass a size_t and printf expects an int, it will retrieve the value from the wrong place and produce inconsistent output if at all.
You say it works if I put %ld. This conversion may work because size_t happens to have the same size as long for your platform, but it is only a coincidence, on 64-bit Windows, size_t is larger than long.
To correct the problem, you can either:
use the standard conversion specifier %zu or
cast the value as int.
The first is the correct fix but some C libraries do not support %zu, most notably Microsoft C runtime libraries prior to VS2013. Hence I recommend the second as more portable and sufficient for types and objects that obviously have a small size:
#include <stdio.h>
int main(void) {
printf("%d\n", (int)sizeof(int));
printf("%d\n", (int)sizeof(float));
printf("%d\n", (int)sizeof(double));
printf("%d\n", (int)sizeof(char));
return 0;
}
Also note that you do not output a newline: Depending on the environment, the output will not be visible to the user until a newline is output or fflush(stdout) is called. It is even possible that the output not be flushed to the console upon program exit, causing your observed behavior, but such environments are uncommon. It is recommended to output newlines at the end of meaningful pieces of output. In your case, not doing so would cause all sizes to be clumped together as a sequence of digits like 4481, which may or may not be what you expect.
I know that is bad practice to print an integer with %lu which is a unsigned long. In a project i was working on i got a large number when trying to print 11 with %lu in the snprint format.(old code) I am using gcc 4.9.3.
This code below i thought would produce the wrong number since snprintf is told to read more than the 4 bytes occupied. Its doesnt though. Works perfectly. It reads everything correctly. Either it does not go past the 4 bytes in to the unknown or the extra 4 bytes in the long are fully of zeros when it gets promoted to long from int.
I am wondering out of curiosity is when does printf print the wrong number? What conditions does it need produce a wrong big number? There has to be garbage in the upper 4 bytes but it seems like it does not set that garbage for me.
I read the answers here but the code worked for me. I know its a different compiler.
Printing int type with %lu - C+XINU
#include<inttypes.h>
#include<stdio.h>
int main(void){
uint32_t number1 = 11;
char sentence[40];
snprintf(sentence,40,"Small number :%lu , Big number:%lu \n",number1,285212672);
printf(sentence);
}
On OP's machine, uint32_t, unsigned long and int appear to be the same size #R Sahu. OP's code is not portable and may produce incorrect output on another machine.
when does printf print the wrong number?
Use the matching printf() specifier for truly portable code. Using mis-matched specifiers may print the wrong number.
The output string may be well over 40 characters. Better to use a generous or right-sized buffer.
#include <inttypes.h>
#include <stdio.h>
int main(void){
uint32_t number1 = 11;
// char sentence[40];
char sentence[80];
snprintf(sentence, sizeof sentence,
"Small number :%" PRIu32 " , Big number:%d \n",
number1, 285212672);
// printf(sentence); // Best not to print a string using the printf() format parameter
fputs(sentence, stdout);
}
I have a given code, in my opinion there is something wrong with that code:
I compile under XINU.
The next variables are relevant :
unsigned long ularray[];
int num;
char str[100];
There is a function returns int:
int func(int i)
{
return ularray[i];
}
now the code is like this:
num = func(i);
sprintf(str, "number = %lu\n", num);
printf(str);
The problem is I get big numbers while printing with %lu, which is not correct.
If i change the %lu to %d, i get the correct number.
For example: with %lu i get 27654342, while with %d i get 26, the latter is correct;
The variables are given, the declaration of the function is given, i write the body but it must return int;
My questions are:
I'm not familiar with 'sprintf' maybe the problem is there?
I assigned unsigned long to int and then print the int with %lu, is That correct?
How can i fix the problem?
Thanks in advance.
Thanks everyone for answering.
I just want to mention I'm working under XINU, well i changed the order of the compilation of the files and what you know... its working and showing same numbers on %lu and %d.
I'm well aware that assigning 'unsigned long' to int and then printing with %lu is incorrect coding and may result loss of data.
But as i said, the code is given, i couldn't change the variables and the printing command.
I had no errors or warnings btw.
I have no idea why changing the compilation order fixed the problem, if someone have an idea you are more then welcome to share.
I want to thank all of you who tried to help me.
I assigned unsigned long to int and then print the int with %lu, is That correct?
No, it isn't correct at all. Think about it a bit! Printf tries to access the memory represented by the variables you pass in and in your case, unsigned long is represented on more bits than int, hence when printf is told to print an unsigned int, it'll read past your actual int and read some other memory which is probably garbage, hence the random numbers. If printf had a prototype mentioning an unsigned long exactly, the compiler could perform an implicit cast and fill the rest of the unwanted memory with zeroes, but since it's not the case, you have to do either one of these solutions:
One, explicit cast your variable:
printf("%lu", (unsigned long)i);
Two, use the correct format specifier:
printf("%d", i);
Also, there are problems with assigning an unsigned long to an int - if the long contains a too big number, then it won't fit into the int and get truncated.
1) the misunderstanding is format specifiers in general
2) num is an int -- therefore, %d is correct when an int is what you want to print.
3) ideally
int func(int i) would be unsigned long func(size_t i)
and int num would be unsigned long num
and sprintf(str, "number = %d\n", num); would be sprintf(str, "number = %lu\n", num);
that way, there would be no narrowing and no conversions -- the type/values would be correctly preserved throughout execution.
and to be pedantic, printf should be printf("%s", str);
if you turn your warning levels way up, your compiler will warn you of some of these things. i have been programming for a long time, and i still leave the warning level obnoxiously high (by some people's standards).
If you have an int, use %d (or %u for unsigned int). If you have a long, use %ld (or %lu for unsigned long).
If you tell printf that you're giving it a long but only pass an int, you'll print random garbage. (Technically that would be undefined behavior.)
It doesn't matter if that int somehow "came from" a long. Once you've assigned it to something shorter, the extra bytes are lost. You only have a int left.
I assigned unsigned long to int and then print the int with %lu, is That correct?
No, and I suggest not casting to int first or else simply using int as the array type. It seems senseless to store a much larger representation and only use a smaller one. Either way, the sprint results will always be off until you properly pair the type (technically the encoding) of the variable with the format's conversion specifier. This means that if you pass an unsigned long, use %ul, if it's an int, use either %i or %d (the difference is that %d is always base-10, %i can take additional specifiers to print in other bases.
How can I fix the problem?
Change the return of your func and the encoding of num to unsigned long
unsigned long ularray[];
unsigned long num;
char str[100];
unsigned long func(int i)
{
return ularray[i];
}