why the char buffer content size is 4 bytes? - c

I call the recv() that receive data from socket and print the end of buffer content by hex
char nbuff[BUFSZ];
while ((r_n=recv(sfd,rbuff,B_BUF,MSG_EOF))>-1)
{
printf("r_n:%d eob_p:%x\n",r_n,rbuff[r_n-1]);
if (r_n==0)
{
break;
}
memset(rbuff,0,B_BUF);
}
the result is
r_n:1674 eob_p:3c
r_n:1228 eob_p:76
r_n:2456 eob_p:ffffff81
r_n:1228 eob_p:4b
r_n:1228 eob_p:49
r_n:2456 eob_p:57
r_n:1417 eob_p:ffffff82
I am confused about why the result is 4 bytes.
I create another code to print the file that saved from buff
int main ()
{
char buff[11686];
memset(buff,0,11686);
FILE *in =fopen("web/www.sse.com.cn.html","r");
fread(buff,11686,1,in);
for (int i = 0; i < 11686 ; i++)
{
printf("%x\n",buff[i]);
}
}
the result is
....
buff[11684]:60
buff[11685]:ffffff82
why the char buff 's contents size is 4 bytes buff[11685]:ffffff82

Diagnosis
In the second example, buff is a char buffer and plain char is a signed type on your machine, and you're storing values which are negative in buff, so when they're converted to int in the call to printf(), they are negative integers (of small magnitude), printed in hex.
ISO/IEC 9899:2018
Actually, the links are to an online draft of C11, not C18, in HTML which allows links to the relevant paragraphs in the standard. AFAIK, these details have not changed between C90, C99, C11 and C18 anyway.
The standard says that the plain char type is equivalent to either signed char or unsigned char.
§6.2.5 Types ¶15:
The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.45)
45) CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.
§6.3.1.1 Boolean, characters and integers ¶2,3:
2 The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.58) All other types are unchanged by the integer promotions.
3 The integer promotions preserve value including sign. As discussed earlier, whether a "plain" char is treated as signed is implementation-defined.
58) The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclasses.
§6.5.2.6 Function calls ¶6,7:
6 If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined. If the function is defined with a type that does not include a prototype, and the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined, except for the following cases:
one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
both types are pointers to qualified or unqualified versions of a character type or void.
7 If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
Exegesis
Note the last two sentences of §6.5.2.6 ¶7 — when the char values are promoted by the 'integer promotions', they are promoted to a (signed) int, and the negative values remain negative. Since an int has 4 bytes, and all the machines you're likely to have available use two's-complement arithmetic, the most significant 3 bytes of the value will be 0xFF each.
Prescription
To always print 2-digit hex for the characters, use %.2X (or %.2x if you prefer; you can also use either %02X or %02x) and pass either (unsigned char)rbuff[r_n-1] or rbuff[r_n-1] & 0xFF as the argument (using the variables from the first example). Or, using the variables from the second example:
printf("%.2X\n", (unsigned char)buff[i]);
printf("%.2X\n", buff[i] & 0xFF);

Related

printf a literal number (int) while expecting a shorter number

Let's say we have this line of code:
printf("%hi", 6);
Let's assume sizeof(short) == 2, and sizeof(int) == 4.
printf expects a short, but is given an int, which is wider. Is this undefined behaviour?
The same with %hhi.
printf() doesn't actually expect the argument to be a short when you use %hi. When you call a variadic function, all the arguments undergo default argument promotion. In the case of integer arguments, this means integer promotions, which means that all integer types smaller than int are converted to int or unsigned int.
If the corresponding argument is a literal, all that's required is that it be a value that will fit into a short, you don't actually have to cast it to short.
The standard section 7.21.6.1.7 explains it this way:
the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing

c: type casting char values into unsigned short

starting with a pseudo-code snippet:
char a = 0x80;
unsigned short b;
b = (unsigned short)a;
printf ("0x%04x\r\n", b); // => 0xff80
to my current understanding "char" is by definition neither a signed char nor an unsigned char but sort of a third type of signedness.
how does it come that it happens that 'a' is first sign extended from (maybe platform dependent) an 8 bits storage to (a maybe again platform specific) 16 bits of a signed short and then converted to an unsigned short?
is there a c standard that determines the order of expansion?
does this standard guide in any way on how to deal with those third type of signedness that a "pure" char (i called it once an X-char, x for undetermined signedness) so that results are at least deterministic?
PS: if inserting an "(unsigned char)" statement in front of the 'a' in the assignment line, then the result in the printing line is indeed changed to 0x0080. thus only two type casts in a row will provide what might be the intended result for certain intentions.
The type char is not a "third" signedness. It is either signed char or unsigned char, and which one it is is implementation defined.
This is dictated by section 6.2.5p15 of the C standard:
The three types char , signed char , and unsigned char are
collectively called the character types. The implementation
shall define char to have the same range, representation, and
behavior as either signed char or unsigned char.
It appears that on your implementation, char is the same as signed char, so because the value is negative and because the destination type is unsigned it must be converted.
Section 6.3.1.3 dictates how conversion between integer types occur:
1 When a value with integer type is converted to another integer type
other than
_Bool ,if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or
an implementation-defined signal is raised.
Since the value 0x80 == -128 cannot be represented in an unsigned short the conversion in paragraph 2 occurs.
char has implementation-defined signedness. It is either signed or unsigned, depending on compiler. It is true, in a way, that char is a third character type, see this. char has an indeterministic (non-portable) signedness and therefore should never be used for storing raw numbers.
But that doesn't matter in this case.
On your compiler, char is signed.
char a = 0x80; forces a conversion from the type of 0x80, which is int, to char, in a compiler-specific manner. Normally on 2's complement systems, that will mean that the char gets the value -128, as seems to be the case here.
b = (unsigned short)a; forces a conversion from char to unsigned short 1). C17 6.3.1.3 Signed and unsigned integers then says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
One more than the maximum value would be 65536. So you can think of this as -128 + 65536 = 65408.
The unsigned hex representation of 65408 is 0xFF80. No sign extension takes place anywhere!
1) The cast is not needed. When both operands of = are arithmetic types, as in this case, the right operand is implicitly converted to the type of the right operand (C17 6.5.16.1 §2).

why this sizeof(c+a) is giving 4 byte instead of 3

#include <stdio.h>
int main()
{
short int a;
char c;
printf("%d %d %d",sizeof(a),sizeof(c),sizeof(c+a));
}
In this sizeof a is 2 byte size of char is 1 byte but i add them up it is giving 4 byte. what it is doing inside the expression to make it 4
Adding a short int to a char results in an int, which apparently is 4 bytes on your system.
This is a case if "integer promotion". See In a C expression where unsigned int and signed int are present, which type will be promoted to what type? for an explanation. The rules are rather confusing, but the answers there explain it rather well.
Per 6.3.1.8 Usual arithmetic conversions of the C standard, the actual conversion rule is:
If both operands have the same type, then no further conversion is
needed.
Otherwise, if both operands have signed integer types or both have
unsigned integer types, the operand with the type of lesser
integer conversion rank is converted to the type of the operand
with greater rank.
Otherwise, if the operand that has unsigned integer type has
rank greater or equal to the rank of the type of the other
operand, then the operand with signed integer type is
converted to the type of the operand with unsigned integer
type.
Otherwise, if the type of the operand with signed integer type can
represent all of the values of the type of the operand with unsigned
integer type, then the operand with unsigned integer type is
converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned
integer type corresponding to the type of the operand with signed
integer type.
The result is 4, because, as #WeatherVane noted in the comments:
5.1.2.3 para 11 EXAMPLE 2 In executing the fragment char c1, c2; /* ... */ c1 = c1 + c2; the "integer promotions" require that the abstract machine promote the value of each variable to int size and then add the two ints and truncate the sum. But there is no truncation here because the destination is unknown.
sizeof returns the size of the object representation after it has been evaluated. The expression c+a apparently returns an int, which is four bytes. I think what you are looking for is:
sizeof(c) + sizeof(a)
When integral types like char, short int, bool take less number of bytes than int, then these data types are automatically promoted to int or unsigned int when an operation is performed on them.
C11 §6.3.1.1 Boolean, characters, and integers
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions. 58)
So, c+a are converted to type int and the result has this common type of operands that is int.
Also, the behaviour of your code is undefined, because you have used the wrong format specifier.
So, use %zu instead of %d because sizeof() returns size_t and size_t is unsigned.
C11 Standard: §7.21.6.1: Paragraph 9:
If a conversion specification is invalid, the behavior is
undefined. 225) If any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.
For the mathematically inclined (and because it occurred to me to wonder when such a thing might ever be true):
The misapprehension that is OP is labouring under is that
f(x) + f(y) = f(x+y)
which is certainly not true for sizeof() for the reasons Tom points out in the comments.
The class of functions for which it is true are called Additive Maps
Typical examples include maps between rings, vector spaces, or modules that preserve the additive group.

Variadic functions and constants

How exactly do variadic functions treat numeric constants? e.g. consider the following code:
myfunc(5, 0, 1, 2, 3, 4);
The function looks like this:
void myfunc(int count, ...)
{
}
Now, in order to iterate over the single arguments with va_arg, I need to know their sizes, e.g. int, short, char, float, etc. But what size should I assume for numeric constants like I use in the code above?
Tests have shown that just assuming int for them seems to work fine so the compiler seems to push them as int even though these constants could also be represented in a single char or short each.
Nevertheless, I'm looking for an explanation for the behaviour I see. What is the standard type in C for passing numeric constants to variadic functions? Is this clearly defined or is it compiler-dependent? Is there a difference between 32-bit and 64-bit architecture?
Thanks!
I like Jonathan Leffler's answer, but I thought I'd pipe up with some technical details, for those who intend to write a portable library or something providing an API with variadic functions, and thus need to delve in to the details.
Variadic parameters are subject to default argument promotions (C11 draft N1570 as PDF; section 6.5.2.2 Function calls, paragraph 6):
.. the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument promotions.
[If] .. the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined, except for the following cases:
one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
both types are pointers to qualified or unqualified versions of a character type or void
Floating-point constants are of type double, unless they are suffixed with f or F (as in 1.0f), in which case they are of type float.
In C99 and C11, integer constants are of type int if they fit in one; long (AKA long int) if they fit in one otherwise; of long long (AKA long long int) otherwise. Since many compilers assume an integer constant without a size suffix is a human error or typo, it is a good practice to always include the suffix if the integer constant is not of type int.
Integer constants can also have a letter suffix to denote their type:
u or U for unsigned int
l or L for long int
lu or ul or LU or UL or lU or Lu or uL or Ul for unsigned long int
ll or LL or Ll or lL for long long int
llu or LLU (or ULL or any of their uppercase or lowercase variants) for unsigned long long int
The integer promotion rules are in section 6.3.1.1.
To summarize the default argument promotion rules for C11 (there are some additions compared to C89 and C99, but no significant changes):
float are promoted to double
All integer types whose values can be represented by an int are promoted to int. (This includes both unsigned and signed char and short, and bit-fields of types _Bool, int, and smaller unsigned int bit-fields.)
All integer types whose values can be represented by an unsigned int (but not an int) are promoted to unsigned int. (This includes unsigned int bit fields that cannot be represented by an int (of CHAR_BIT * sizeof (unsigned int) bits, in other words), and typedef'd aliases of unsigned int, but that's it, I think.)
Integer types at least as large as int are unchanged. This includes types long/long int, long long/long long int, and size_t, for example.
There is one 'gotcha' in the rules that I'd like to point out: "signed to unsigned is okay, unsigned to signed is iffy":
If the argument is promoted to a signed integer type, but the function obtains the value using the corresponding unsigned integer type, the function obtains the correct value using modulo arithmetic.
That is, negative values will be as if they were incremented by (1 + maximum representable value in the unsigned integer type), making them positive.
If the argument is promoted to an unsigned integer type, but the function obtains the value using the corresponding signed integer type, and the value is representable in both, the function obtains the correct value. If the value is not representable in both, the behaviour is implementation-defined.
In practice, almost all architectures do the opposite of above, i.e. the signed integer value obtained matches the unsigned value substracted by (1 + the largest representable value of the unsigned integer type). I've heard that some strange ones may signal integer overflow or something similarly weird, but I have never gotten my mitts on such machines.
The man 3 printf man page (courtesy of the Linux man pages project) is quite informative, if you compare the above rules to printf specifiers. The make_message() example function at the end (C99, C11, or POSIX required for vsnprintf()) should also be interesting.
When you write 1, that is an int constant. There is no other type that the compiler is allowed to use. If there is a non-variadic prototype for the function that demands a different type, the compiler will convert the integer 1 to the appropriate type, but on its own, 1 is an int constant. So, in your example, all 6 arguments are int.
You have to know the types of the arguments somehow before the called variadic function processes them. With the printf() family of functions, the format string tells it what to expect; similarly with the scanf() family of functions.
Note that the default conversions apply to the arguments corresponding to the ellipsis of a variadic function. For example, given:
char c = '\007';
short s = 0xB0hD;
float f = 3.1415927;
a call to:
int variadic_function(const char *, ...);
using:
int rc = variadic_function("c s f", c, s, f);
actually converts both c and s to int, and f to double.

Tilde operator in C

unsigned short int i = 0;
printf("%u\n",~i);
Why does this code return a 32 bit number in the console? It should be 16 bit, since short is 2 bytes.
The output is 4,294,967,295 and it should be 65,535.
%u expects an unsigned int; if you want to print an unsigned short int, use %hu.
EDIT
Lundin is correct that ~i will be converted to type int before being passed to printf. i is also converted to int by virtue of being passed to a variadic function. However, printf will convert the argument back to unsigned short before printing if the %hu conversion specifier is used:
7.21.6.1 The fprintf function
...
3 The format shall be a multibyte character sequence, beginning and ending in its initial
shift state. The format is composed of zero or more directives: ordinary multibyte
characters (not %), which are copied unchanged to the output stream; and conversion
specifications, each of which results in fetching zero or more subsequent arguments,
converting them, if applicable, according to the corresponding conversion specifier, and
then writing the result to the output stream.
...
7 The length modifiers and their meanings are:
...
h Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
short int or unsigned short int argument (the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing);
or that a following n conversion specifier applies to a pointer to a short
int argument.
Emphasis mine.
So, the behavior is not undefined; it would only be undefined if either i or ~i were not integral types.
When you pass an argument to printf and that argument is of integer type shorter than int, it is implicitly promoted to int as per K&R argument promotion rules. Thus your printf-call actually behaves like:
printf("%u\n", (int)~i);
Notice that this is undefined behavior since you told printf that the argument has an unsigned type whereas int is actually a signed type. Convert i to unsigned short and then to unsignedto resolve the undefined behavior and your problem:
printf("%u\n", (unsigned)(unsigned short)~i);
N1570 6.5.3.3 Unary arithmetic operators p4:
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. ...
Integer type smaller than int are promoted to int. If sizeof(unsigned short) == 2 and sizeof(int) == 4, then resulting type is int.
And what's more, printf conversion specifier %u expects unsigned int, so representation of int is interpreted as unsigned int. You are basically lying to compiler, and this is undefined behaviour.
It's because the arguments to printf() are put into the stack in words, as there is no way inside printf to know that the argument is short. Also by using %u format you are merely stating that you are passing an unsigned number.

Resources