byte order conversion for signed integer - c

I have a problem translate byte order between host(CPU dependent) and network(big endian). These are all the APIs(in "arpa/inet.h" for Linux) I've found that might solve my problem.
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);
Except for one thing, they only handle unsigned integer(2 bytes or 4 byte).
So is there any approach to handle signed integer case? In other words, how to implement the following functions(APIs)?
int32_t htonl(int32_t hostlong);
int16_t htons(int16_t hostshort);
int32_t ntohl(int32_t netlong);
int16_t ntohs(int16_t netshort);

Technically speaking, it doesn't matter what the value is inside the variable since you just want to borrow the functionality. When assigning a signed to an unsigned, its value changes but the bits are the same. So converting it back to signed is alright.
Edit: As amrit said, it is a duplicate of Signed Integer Network and Host Conversion.

Related

shifting an unsigned char by more than 8 bits

I'm a bit troubled by this code:
typedef struct _slink{
struct _slink* next;
char type;
void* data;
}
assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)
Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e.
for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)
int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) +
((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)
Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits)
Thanks in advance.
Any integer type smaller than int is promoted to type int when used in an expression.
So the shift is actually applied to an expression of type int instead of type char.
Can anyone please explain why this actually works?
The shift does not occur as an unsigned char but as a type promoted to an int1. #dbush.
Reasons why code still has issues.
32-bit int
Shifting a int 1 into the the sign's place is undefined behavior UB. See also #Eric Postpischil.
((unsigned char*)&L->data)[0]<<24) // UB
16-bit int
Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?
Alternative
const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
(uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;
For the pedantic
Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.
from little-endian to big-endian
This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.
1 ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

Do you have to append `u` suffix to unsigned integers?

I know the u suffix means 'unsigned'. But is in necessary in the following code?
uint32_t hash = 2166136261u;
Is it a matter or convention? Or does it have any technical significance in this case? The value should be converted to unsigned anyway because uint32_t is unsigned.
When should I and when should I not use the u suffix for unsigned integer values?
No it is not necessary. Things get interesting at 2147483648 and your number is greater than this.
Note that formally 2166136261 is a long or a long long type if int has 32 bits or fewer. But either are convertible to a uint32_t in a well-defined way.
As a final point: the equivalent hex 0x811C9DC5 is an unsigned type if int has 32 bits or more. Oh joy!
Reference: https://en.cppreference.com/w/c/language/integer_constant

Is there a way to specify int size in C?

I'm trying to check some homework answers about overflow for 2's complement addition, subtraction, etc. and I'm wondering if I can specify the size of a data type. For instance if I want to see what happens when I try to assign -128 or -256 to a 7-bit unsigned int.
On further reading I see you wanted bit sizes that are not normal ones, such as 7 bit and 9 bit etc.
You can achieve this using bitfields
struct bits9
{
int x : 9;
};
Now you can use this type bits9 which has one field in it x that is only 9 bits in size.
struct bits9 myValue;
myValue.x = 123;
For an arbitrary sized value, you can use bitfields in structs. For example for a 7-bit value:
struct something {
unsigned char field:7;
unsigned char padding:1;
};
struct something value;
value.field = -128;
The smallest size you have have is char which is an 8 bit integer. You can have unsigned and signed chars. Take a look at the stdint.h header. It defines a int types for you in a platform independent way. Also there is no such thing as an 7 bit integer.
Using built in types you have things like:
char value1; // 8 bits
short value2; // 16 bits
long value3; // 32 bits
long long value4; // 64 bits
Note this is the case with Microsoft's compiler on Windows. The C standard does not specify exact widths other than "this one must be at least as big as this other one" etc.
If you only care about a specific platform you can print out the sizes of your types and use those once you have figured them out.
Alternatively you can use stdint.h which is in the C99 standard. It has types with the width in the name to make it clear
int8_t value1; // 8 bits
int16_t value2; // 16 bits
int32_t value3; // 32 bits
int64_t value4; // 64 bits

Can uint8_t be a non-character type?

In this answer and the attached comments, Pavel Minaev makes the following argument that, in C, the only types to which uint8_t can be typedef'd are char and unsigned char. I'm looking at this draft of the C standard.
The presence of uint8_t implies the presence of a corresponding type int8_t (7.18.1p1).
int8_t is 8 bits wide and has no padding bits (7.18.1.1p1).
Corresponding types have the same width (6.2.5p6), so uint8_t is also 8 bits wide.
unsigned char is CHAR_BIT bits wide (5.2.4.2.1p2 and 6.2.6.1p3).
CHAR_BIT is at least 8 (5.2.4.2.1p1).
CHAR_BIT is at most 8, because either uint8_t is unsigned char, or it's a non-unsigned char, non-bit-field type whose width is a multiple of CHAR_BIT (6.2.6.1p4).
Based on this argument, I agree that, if uint8_t exists, then both it and unsigned char have identical representations: 8 value bits and 0 padding bits. That doesn't seem to force them to be the same type (e.g., 6.2.5p14).
Is it allowed that uint8_t is typedef'd to an extended unsigned integer type (6.2.5p6) with the same representation as unsigned char? Certainly it must be typedef'd (7.18.1.1p2), and it cannot be any standard unsigned integer type other than unsigned char (or char if it happens to be unsigned). This hypothetical extended type would not be a character type (6.2.5p15) and thus would not qualify for aliased access to an object of an incompatible type (6.5p7), which strikes me as the reason a compiler writer would want to do such a thing.
If uint8_t exists, the no-padding requirement implies that CHAR_BIT is 8. However, there's no fundamental reason I can find why uint8_t could not be defined with an extended integer type. Moreover there is no guarantee that the representations are the same; for example, the bits could be interpreted in the opposite order.
While this seems silly and gratuitously unusual for uint8_t, it could make a lot of sense for int8_t. If a machine natively uses ones complement or sign/magnitude, then signed char is not suitable for int8_t. However, it could use an extended signed integer type that emulates twos complement to provide int8_t.
In 6.3.1.1 (1) (of the N1570 draft of the C11 standard), we can read
The rank of any standard integer type shall be greater than the rank of any extended integer type with the same width.
So the standard explicitly allows the presence of extended integer types of the same width as a standard integer type.
There is nothing in the standard prohibiting a
typedef implementation_defined_extended_8_bit-unsigned_integer_type uint8_t;
if that extended integer type matches the specifications for uint8_t (no padding bits, width of 8 bits), as far as I can see.
So yes, if the implementation provides such an extended integer type, uint8_t may be typedef'ed to that.
uint8_t may exist and be a distinct type from unsigned char.
One significant implication of this is in overload resolution; it is platform-dependent whether:
uint8_t by = 0;
std::cout << by;
uses
operator<<(ostream, char)
operator<<(ostream, unsigned char) or
operator<<(ostream, int)
int8_t and uint8_t differ only by REPRESENTATION and NOT the content(bits). int8_t uses lower 7 bits for data and the 8th bit is to represent "sign"(positive or negative). Hence the range of int8_t is from -128 to +127 (0 is considered a positive value).
uint8_t is also 8 bits wide, BUT the data contained in it is ALWAYS positive. Hence the range of uint8_t is from 0 to 255.
Considering this fact, char is 8 bits wide. unsigned char would also be 8 bits wide but without the "sign". Similarly short and unsigned short are both 16 bits wide.
IF however, "unsigned int" be 8 bits wide, then .. since C isn't too type-nazi, it IS allowed. And why would a compiler writer allow such a thing? READABILITY!

Does ANSI C support signed / unsigned bit fields?

Does it make sense to qualify bit fields as signed / unsigned?
The relevant portion of the standard (ISO/IEC 9899:1999) is 6.7.2.1 #4:
A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other implementation-defined
type.
Yes. An example from here:
struct {
/* field 4 bits wide */
unsigned field1 :4;
/*
* unnamed 3 bit field
* unnamed fields allow for padding
*/
unsigned :3;
/*
* one-bit field
* can only be 0 or -1 in two's complement!
*/
signed field2 :1;
/* align next field on a storage unit */
unsigned :0;
unsigned field3 :6;
}full_of_fields;
Only you know if it makes sense in your projects; typically, it does for fields with more than one bit, if the field can meaningfully be negative.
It's very important to qualify your variables as signed or unsigned. The compiler needs to know how to treat your variables during comparisons and casting. Examine the output of this code:
#include <stdio.h>
typedef struct
{
signed s : 1;
unsigned u : 1;
} BitStruct;
int main(void)
{
BitStruct x;
x.s = 1;
x.u = 1;
printf("s: %d \t u: %d\r\n", x.s, x.u);
printf("s>0: %d \t u>0: %d\r\n", x.s > 0, x.u > 0);
return 0;
}
Output:
s: -1 u: 1
s>0: 0 u>0: 1
The compiler stores the variable using a single bit, 1 or 0. For signed variables, the most significant bit determines the sign (high is treated negative). Thus, the signed variable, while it gets stored as 1 in binary, it gets interpreted as negative one.
Expanding on this topic, an unsigned two bit number has a range of 0 to 3, while a signed two bit number has a range of -2 to 1.
Yes, it can. C bit-fields are essentially just limited-range integers. Frequently hardware interfaces pack bits together in such away that some control can go from, say, -8 to 7, in which case you do want a signed bit-field, or from 0 to 15, in which case you want an unsigned bit-field.
I don't think Andrew is talking about single-bit bit fields. For example, 4-bit fields: 3 bits of numerical information, one bit for sign. This can entirely make sense, though I admit to not being able to come up with such a scenario off the top of my head.
Update: I'm not saying I can't think of a use for multi-bit bit fields (having used them all the time back in 2400bps modem days to compress data as much as possible for transmission), but I can't think of a use for signed bit fields, especially not a quaint, obvious one that would be an "aha" moment for readers.
Most certainly ANSI-C provides for signed and unsigned bit fields. It is required. This is also part of writing debugger overlays for IEEE-754 floating point types [[1][5][10]], [[1][8][23]], and [[1][10][53]]. This is useful in machine type or network translations of such data, or checking conversions double (64 bits for math) to half precision (16 bits for compression) before sending over a link, like video card textures.
// Fields need to be reordered based on machine/compiler endian orientation
typedef union _DebugFloat {
float f;
unsigned long u;
struct _Fields {
signed s : 1;
unsigned e : 8;
unsigned m : 23;
} fields;
} DebugFloat;
Eric
One place where signed bitfields are useful is in emulation, where the emulated machine has fewer bits than your default word.
I'm currently looking at emulating a 48-bit machine and am trying to work out if it's reasonable to use 48 bits out of a 64-bit "long long" via bitfields... the generated code would be the same as if I did all the masking, sign-extending etc explicitly but it would read a lot better...
According to this reference, it's possible:
http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.vacpp6m.doc/language/ref/clrc03defbitf.htm
Bit masking signed types varies from platform hardware to platform hardware due to how it may deal with an overflow from a shift etc.
Any half good QA tool will warn knowingly of such usage.
if a 'bit' is signed, then you have a range of -1, 0, 1, which then becomes a ternary digit. I don't think the standard abbreviation for that would be suitable here, but makes for interesting conversations :)

Resources