Why one char character appears on 4 bytes in C language

Why one char character appears on 4 bytes in C language - c

I am coding a C code and I want to compare the hexadecimal value of some character with an other. That is working, but with somme value like � I obtained a value on 4 bytes which create a no sense in my comparaison.
void obs_text(char* start, unsigned char debugLevel, unsigned char depth){
printf("%c -> 0x%x\n", start[0], start[0]);
}
I expected an output with two hexadecimals characters but the actual output is ? -> 0xffffffef.
Please does any one understand what happens ? Thank you for your help.
I am compiling with gcc.
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.3)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
but I also try on a Debian OS with the same problem

Because %x means display as an unsigned integer (4 bytes) as hexadecimal. See http://www.cplusplus.com/reference/cstdio/printf/
As noted there, you could use %hhx (added in C99) to get the behavior you were expecting (see more in this answer).

Use %hhx and cast the argument to unsigned char:
printf( "%c - 0x%hhx", start[0], (unsigned char) start[0] );
%x expects its corresponding argument to have type unsigned int. You need to use the hh length modifier to tell it that you're dealing with an unsigned char value.

Related

What if I use %d instead of %ld in c?

I am a beginner and look at the book "C primer plus" and was confused about this saying
"To print a long value, use the %ld format specifier. If int and long are the same size on your system, just %d will suffice, but your program will not work properly when transferred to a system on which the two types are different, so use the %ld specifier for long."
I tested it myself as the following code:
int num = 2147483647;
short num_short = 32767;
long num_long = 2147483647;
printf("int: %d; short: %d; long: %d", num, num_short, num_long);
the program worked okay.
I searched online and found this question:%d with Long Int
An answer said:
it works because long int and int are actually the same numeric representation: four byte, two's complement. With another platform (for example x86-64 Linux), that might not be the case, and you would probably see some sort of problem
My computer is 64-bits. The int is 32-bits. The long int is 32-bits. The short int is 16-bits (I checked it. It's all right). So you can see that the int type and short int type is different. This answer also said it would cause error if these types have different numeric representation. So what does the author mean of his saying?

Whatever happens depends on the implementation. The standard is quite clear on this, and it does invoke undefined behavior.
If a conversion specification is invalid, the behavior is undefined. If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
7.21.6.1p9
Roberto explained in detail why it "works" in his answer

What the author of your book says is correct.
Your test invokes undefined behavior, as the standard doesn't specify what will happen when a wrong format specifier is used. It means that the implementation of each compiler will be free to interpret how to manage such cases.
Nevertheless, with some common sense (and with a little knowledge about how things work "in practice") the behavior you experience can be explained.
Every format specifier says to the compiler where, in the list of parameters, it can find the parameter to be printed. For this reason, and that's the core of your book's assertion, passing an integer with an unexpected length makes the following format specifiers retrieve the parameter in the wrong place.
Example 1: a sequence of 32 bits integers
Let's start with a "normal" sequence, in order to have a base example.
int a=1, b=2;
printf("%d %d\n", a, b);
The format specifies tells the compiler that two 4 bytes integers will be found after the format strings. The parameters are actually placed in the stack in the expected way
-------------------------------------------------
... format string | a (4 bytes) | b (4 bytes) |
-------------------------------------------------
Example 2: why printing a 64-bytes long with %d doesn't work?
Let's consider the following printf:
long int a=1;
int b=2;
printf("%d %d\n", a, b);
The format specifies tells the compiler that two 4 bytes integers will be found after the format strings. But the first parameter takes 8 bytes instead of 4
-------------------------------------------------------------
... format string | a (4 bytes) + (4 bytes) | b (4 bytes) |
-------------------------------------------------------------
^ ^
Compiler Compiler
expects 'a' expects 'b'
here here
So the output would be
1 0
because b is searched where the 4 most significant bytes of a are, and they are all 0s.
Example 3: why do printing a 16-bytes short with %d work?
Let's consider the following printf:
short int a=1;
int b=2;
printf("%d %d\n", a, b);
The format specifies tells the compiler that two 4 bytes integers will be found after the format strings. The first parameter takes only two bytes instead of 4, but... we are lucky, because on a 32 bits platform parameters are 4-bytes aligned!
---------------------------------------------------------------------
... format string | a (2 bytes) | (2 bytes PADDING) | b (4 bytes) |
---------------------------------------------------------------------
^ ^
Compiler Compiler
expects 'a' expects 'b'
here here
So the output would be
1 1
because b is searched in the correct place. We would have problems in the representation of a if the alignment was done with non-0 padding, but it's not the case.
So, the real difference in case of %d used for short is just the representation of signed bytes, as the sign bit is expected to be in the most significant one.

Problem with using scanf function under stm32

I've came across on some interesting thing during use of sscanf() for STM32F437:
uint8_t var;
st = sscanf(&line[off], "%x", &var);
st = sscanf(&line[off], "%hhx", &var);
When I'm trying to compile first line I get suggestion from gcc to use "%hhx" instead of "%x". But when I changed this line to the second one - suggestion from gcc disappeared, but the result of scanning is wrong.
When &line[off] points to following string: 52 then the first scanf(..."%x"...) is working correctly giving 0x52, but the second scanf(..."%hhx"...) produces result 0x34.
Seems like scanf("..."%hhx"...") interpretes 52 as a decimal value 52 and then converts it to hexadecimal value 0x34.
I am using arm-none-eabi-gcc version 9.2.0.
Did I miss something or this is some bug in scanf()?

You are linking against what is commonly refereed as "newlib-nano". Nano version of newlib comes with limited support from the standard library - it doesn't support all C99 length modifiers, like ll or hh in both printf and scanf.
The solution is to link against full implementation of newlib, so remove -specs=nano.specs or similar from the linking options, or don't use hh length modifier when compiling with newlib-nano or use other method of converting a string to an integer.

%x without a prefix before the x means scanf is expecting a pointer to unsigned int
%hh is used for signed char or unsigned char.
%hhx is used for signed char or unsigned char in hex format.
"%"SCNu8 is used for scanning uint8_t.
"%"SCNx8 is used for uint8_t in hex format.
uint8_t is most likely 100% equivalent to unsigned char on any system.
This means that here "%x", &var you lie to the compiler and (assuming 32 bit CPU) you tell it "go ahead and read 32 bit large integer), then pass a memory address where only 8 bits of valid data are stored. This is undefined behavior and anything can happen.
Speculating about why undefined behavior bugs manifest themselves in a certain way for your specific system is rather meaningless.
Instead, do this:
#include <inttypes.h>
uint8_t var;
st = sscanf(&line[off], "%"SCNx8, &var);
Please note that sscanf is a terribly slow and dangerous function and should not be used on embedded systems. Always use strtoul instead.

Problems with the read syscall [duplicate]

pixel_data is a vector of char.
When I do printf(" 0x%1x ", pixel_data[0] ) I'm expecting to see 0xf5.
But I get 0xfffffff5 as though I was printing out a 4 byte integer instead of 1 byte.
Why is this? I have given printf a char to print out - it's only 1 byte, so why is printf printing 4?
NB. the printf implementation is wrapped up inside a third party API but just wondering if this is a feature of standard printf?

You're probably getting a benign form of undefined behaviour because the %x modifier expects an unsigned int parameter and a char will usually be promoted to an int when passed to a varargs function.
You should explicitly cast the char to an unsigned int to get predictable results:
printf(" 0x%1x ", (unsigned)pixel_data[0] );
Note that a field width of one is not very useful. It merely specifies the minimum number of digits to display and at least one digit will be needed in any case.
If char on your platform is signed then this conversion will convert negative char values to large unsigned int values (e.g. fffffff5). If you want to treat byte values as unsigned values and just zero extend when converting to unsigned int you should use unsigned char for pixel_data, or cast via unsigned char or use a masking operation after promotion.
e.g.
printf(" 0x%x ", (unsigned)(unsigned char)pixel_data[0] );
or
printf(" 0x%x ", (unsigned)pixel_data[0] & 0xffU );

Better use the standard-format-flags
printf(" %#1x ", pixel_data[0] );
then your compiler puts the hex-prefix for you.

Use %hhx
printf("%#04hhx ", foo);

Then length modifier is the minimum length.

Width-specifier in printf is actually min-width. You can do printf(" 0x%2x ", pixel_data[0] & 0xff) to print lowes byte (notice 2, to actually print two characters if pixel_data[0] is eg 0xffffff02).

C sprintf breaks with byte parameters (Keil compiler)

I have code running in 2 projects/platforms. It works in one, not the other. Code is like this:
uint8_t val = 1;
uint8_t buff[16];
sprintf(buff, "%u", val);
The expected output is "1" (gcc) but on one compiler (Keil) it returns "511", which in hex is 0x1FF. Looks like its not promoting the byte to int with this compiler. This is confirmed because it works ok if I do this:
sprintf(buff, "%u", (int)val);
My question is this: why does one compiler do what I consider the 'right thing', and one does not? Is it my incorrect expectations/assumptions, a compiler setting, or something else?

For maximum portability, you can use these macros from inttypes.h: (there are others)
PRId8, PRIx8, PRIu8
PRId16, PRIx16, PRIu16
PRId32, PRIx32, PRIu32
Normally (as I expected):
#define PRIu8 "u"
But for the Keil compiler in this case:
#define PRIu8 "bu"
e.g.,
printf("0x%"PRIx8" %"PRIu16"\n", byteValue, wordValue);
That's pretty cumbersome though. I suggest more friendly compilers.
It's amazing what you don't know about this stuff even after doing it for decades.

Your assumption may be correct, or incorrect. It depends on the compiler implementation. All modern (or should say smart) compiler will do that like you mentioned. But Keil, as of ver. 9.02, you need to specify correct variable length for printf.
This is Keil C's way handling all kinds of printf functions.
You need to specify exactly how long it is. All regular are for 16-bit (unsigned) integer including %d, %x, and %u. Use modifier 'b' for 8-bit and 'l' for 32-bit. If you gave wrong length, you would get wrong number. Even worse, the rest of the variables are all wrong.
For example, to use 8-bit 'char', you use '%bd' (%bu, and %bx), and %ld, %lu, and %lx for 32-bit 'long'.
char c = 0xab;
printf("My char number is correctly displayed as '0x%02bx'\n", c);
Also note, likewise, to get numeric data from sscanf, it's same. The following example is to get a 32-bit long variable using sscanf:
long var;
char *mynum = "12345678";
sscanf(mynum, "%ld", &var);
Variable var contains number 12345678 after sscanf.
Hereunder shows the length for variables used in printf family for Keil.
%bd, %bx, %bu - should be used for 8-bit variables
%d, %x, %u - should be used for 16-bit variables, and
%ld, %lx, %lu - should be used for 32-bit variables

Types questions in ANSI C

I having few questions about typed in ANSI C:
1. what's the difference between "\x" in the beginning of a char to 0x in the beginning of char (or in any other case for this matter). AFAIK, they both means that this is hexadecimal.. so what's the difference.
when casting char to (unsigned), not (unsigned char) - what does it mean? why (unsigned)'\xFF' != 0xFF?
Thanks!

what's the difference between "\x" in
the beginning of a char to 0x in the
beginning of char
The difference is that 0x12 is used for specifying an integer in hexadecimal, while "\x" is used for string literals. An example:
#include <stdio.h>
int main(){
int ten = 0xA;
char* tenString = "1\x30";
printf("ten as integer: %d\n", ten);
printf("ten as string: %s\n", tenString);
return 0;
}
The printf's should both output a "10" (try to understand why).
when casting char to (unsigned), not
(unsigned char) - what does it mean?
why (unsigned)'\xFF' != 0xFF?
"unsigned" is just an abbreviation for "unsigned int". So you're casting from char to int. This will give you the numeric representation of the character in the character set your platform uses. Note that the value you get for a character is platform-dependent (typically depending on the default character encoding). For ASCII characters you will (usually) get the ASCII code, but anything beyond that will depend on platform and runtime configuration.
Understanding what a cast from one typ to another does is very complicated (and often, though not always, platform-dependent), so avoid it if you can. Sometimes it is necessary, though. See e.g. need-some-clarification-regarding-casting-in-c

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Why one char character appears on 4 bytes in C language - c

Because %x means display as an unsigned integer (4 bytes) as hexadecimal. See http://www.cplusplus.com/reference/cstdio/printf/ As noted there, you could use %hhx (added in C99) to get the behavior you were expecting (see more in this answer).

Use %hhx and cast the argument to unsigned char: printf( "%c - 0x%hhx", start[0], (unsigned char) start[0] ); %x expects its corresponding argument to have type unsigned int. You need to use the hh length modifier to tell it that you're dealing with an unsigned char value.

Related

What if I use %d instead of %ld in c?

Problem with using scanf function under stm32

Problems with the read syscall [duplicate]

C sprintf breaks with byte parameters (Keil compiler)

Types questions in ANSI C

Categories

Resources