c, hex representation in big-endian system - c

What is result of:
int x = 0x00000001;
int y = 0x80000000;
in a big-endian system?
My goal is to define an int that has the first (in memory) bit set, regardless of whether it is the most significant one or the least significant one. I know that with little-endian systems, x would satisfy this requirement, but is it still true in a big-endian system?
I'm pretty sure that the following will work in both systems:
char c[4] = {0x80, 0, 0, 0};
int x = (int) c;
Is that correct? Is there a more elegant method?
(I don't have a big-endian system to experiment on)

What you probably want is this:
int x = 0;
char* p = (char*)&x;
p[0] = 0x01;
The above code will set the least significant bit in the lowest-address byte of an int variable to 1:
On a Big-Endian processor, it will set the LS-bit in the MS-byte to 1 (i.e., x == 0x10000000).
On a Little-Endian processor, it will set the LS-bit in the LS-byte to 1 (i.e., x == 0x00000001).
Having said that, what is your definition of "the first bit"? IMHO it is simply the least significant one, in which case, int x = 0x00000001 is the answer regardless of the Endianness of your processor!!
The following terminology might help you to understand a little better:
Set the least significant bit in an 8-bit byte: 0x01
Set the most significant bit in an 8-bit byte: 0x80
Set the least significant byte in a 4-byte integer: 0x000000FF
Set the most significant byte in a 4-byte integer: 0xFF000000
Set the lowest-address byte in a 4-byte integer on a LE processor: 0x000000FF
Set the lowest-address byte in a 4-byte integer on a BE processor: 0xFF000000
Set the highest-address byte in a 4-byte integer on a LE processor: 0xFF000000
Set the highest-address byte in a 4-byte integer on a BE processor: 0x000000FF

You can try unions
union foo
{
char array[8];
int64_t num;
};

Related

Extract k bits from any side of hex notation

int X = 0x1234ABCD;
int Y = 0xcdba4321;
// a) print the lower 10 bits of X in hex notation
int output1 = X & 0xFF;
printf("%X\n", output1);
// b) print the upper 12 bits of Y in hex notation
int output2 = Y >> 20;
printf("%X\n", output2);
I want to print the lower 10 bits of X in hex notation; since each character in hex is 4 bits, FF = 8 bits, would it be right to & with 0x2FF to get the lower 10 bits in hex notation.
Also, would shifting right by 20 drop all 20 bits at the end, and keep the upper 12 bits only?
I want to print the lower 10 bits of X in hex notation; since each character in hex is 4 bits, FF = 8 bits, would it be right to & with 0x2FF to get the lower 10 bits in hex notation.
No, that would be incorrect. You'd want to use 0x3FF to get the low 10 bits. (0x2FF in binary is: 1011111111). If you're a little uncertain with hex values, an easier way to do that these days is via binary constants instead, e.g.
// mask lowest ten bits in hex
int output1 = X & 0x3FF;
// mask lowest ten bits in binary
int output1 = X & 0b1111111111;
Also, would shifting right by 20 drop all 20 bits at the end, and keep the upper 12 bits only?
In the case of LEFT shift, zeros will be shifted in from the right, and the higher bits will be dropped.
In the case of RIGHT shift, it depends on the sign of the data type you are shifting.
// unsigned right shift
unsigned U = 0x80000000;
U = U >> 20;
printf("%x\n", U); // prints: 800
// signed right shift
int S = 0x80000000;
S = S >> 20;
printf("%x\n", S); // prints: fffff800
Signed right-shift typically shifts the highest bit in from the left. Unsigned right-shift always shifts in zero.
As an aside: IIRC the C standard is a little vague wrt to signed integer shifts. I believe it is theoretically possible to have a hardware platform that shifts in zeros for signed right shift (i.e. micro-controllers). Most of your typical platforms (Intel/Arm) will shift in the highest bit though.
Assuming 32 bit int, then you have the following problems:
0xcdba4321 is too large to fit inside an int. The hex constant itself will actually be unsigned int in this specific case, because of an oddball type rule in C. From there you force an implicit conversion to int, likely ending up with a negative number.
Y >> 20 right shifts a negative number, which is non-portable behavior. It can either shift in ones (arithmetic shift) or zeroes (logical shift), depending on compiler. Whereas right shifting unsigned types is well-defined and always results in logical shift.
& 0xFF masks out 8 bits, not 10.
%X expects an unsigned int, not an int.
The root of all your problems is "sloppy typing" - that is, writing int all over the place when you actually need a more suitable type. You should start using the portable types from stdint.h instead, in this case uint32_t. Also make a habit of always ending you hex constants with a u or U suffix.
A fixed program:
#include <stdio.h>
#include <stdint.h>
int main (void)
{
uint32_t X = 0x1234ABCDu;
uint32_t Y = 0xcdba4321u;
printf("%X\n", X & 0x3FFu);
printf("%X\n", Y >> (32-12));
}
The 0x3FFu mask can also be written as ( (1u<<10) - 1).
(Strictly speaking you need to printf the stdint.h types using specifiers from inttypes.h but lets not confuse the answer by introducing those at the same time.)
Lots of high-value answers to this question.
Here's more info that might spark curiosity...
int main() {
uint32_t X;
X = 0x1234ABCDu; // your first hex number
printf( "%X\n", X );
X &= ((1u<<12)-1)<<20; // mask 12 bits, shifting mask left
printf( "%X\n", X );
X = 0x1234ABCDu; // your first hex number
X &= ~0u^(~0u>>12);
printf( "%X\n", X );
X = 0x0234ABCDu; // Note leading 0 printed in two styles
printf( "%X %08X\n", X, X );
return 0;
}
1234ABCD
12300000
12300000
234ABCD 0234ABCD
print the upper 12 bits of Y in hex notation
To handle this when the width of int is not known, first determine the width with code like sizeof(unsigned)*CHAR_BIT. (C specifies it must be at least 16-bit.)
Best to use unsigned or mask the shifted result with an unsigned.
#include <limits.h>
int output2 = Y;
printf("%X\n", (unsigned) output2 >> (sizeof(unsigned)*CHAR_BIT - 12));
// or
printf("%X\n", (output2 >> (sizeof output2 * CHAR_BIT - 12)) & 0x3FFu);
Rare non-2's complement encoded int needs additional code - not shown.
Very rare padded int needs other bit width detection - not shown.

small bitoperation problem with unsigned int in combination with unsigned char

Hi I got a small conceptual problem regarding bitoperations.
See the below code where I have a 4byte unsigned int. Then I access the individual bytes by assigning the address's to unsigned chars.
I then set the value of the last byte to one. And perform a shift right on the unsigned int(the 4byte variable). I do not understand why this operation apparantly changes the content of the 3byte.
See code below along with the output when I run it
#include <cstdio>
int main(int argc,char **argv){
fprintf(stderr,"sizeof(unsigned int): %lu sizeof(unsigned char):%lu\n",sizeof(unsigned int),sizeof(unsigned char));
unsigned int val=0;
unsigned char *valc =(unsigned char*) &val;
valc[3] = 1;
fprintf(stderr,"uint: %u, uchars: %u %u %u %u\n",val,valc[0],valc[1],valc[2],valc[3]);
val = val >>1;
fprintf(stderr,"uint: %u, uchars: %u %u %u %u\n",val,valc[0],valc[1],valc[2],valc[3]);
return 0;
}
sizeof(unsigned int): 4 sizeof(unsigned char):1
uint: 16777216, uchars: 0 0 0 1
uint: 8388608, uchars: 0 0 128 0
Thanks in advance
You've discovered that your computer doesn't always store the bytes for multi-byte data types in the order you happen to expect. valc[0] is the least significant byte (LSB) on your system. Since the LSB is stored at the lowest memory address, it is known as a "little-endian" system. At the other end, valc[3] is the most significant byte (MSB).
Your output will make more sense to you if you print valc[3],valc[2],valc[1],valc[0] instead, since humans expect the most significant values to be on the left.
Other computer architectures are "big-endian" and will store the most significant byte first.
This article also explains this concept in way more detail:
https://en.wikipedia.org/wiki/Endianness
The book "The Practice of Programming" by Brian Kernighan and Rob Pike also contains some good coverage on byte order (Section 8.6 Byte Order) and how to write portable programs that work on both big-endian and little-endian systems.
If we change the output of the int to hex (i.e. change %u to %x), what happens becomes more apparent:
uint: 1000000, uchars: 0 0 0 1
uint: 800000, uchars: 0 0 128 0
The value of val is shifted right by 1. This results in the low bit of the highest order byte getting shifted into the high bit of the next byte.

How does a 24 bit number overflow into a 32 bit type?

If I pack three unsigned chars into a 32 bit integer, and the most significant byte overflows, does it spill into the upper 8 bits of the 32 bit type? Does it just reduce the MSB by modulo 256 not affecting the upper 8 bits of the 32 bit type?
EDIT:
Packed using bit shifting on little endian architecture:
unsigned int foo = (msb << 16) | (middle << 8) | lsb;
Because your variable is a 32-bit type which happens to contain a 24-bit value, overflow of the 24-bit part will move up into the 25th bit. So it will behave as a normal 32-bit value.
For example:
uint32_t x = 0xFFFFFF;
printf("x=%08x\n", x); // prints 00FFFFFF
x++;
printf("x=%08x\n", x); // prints 01000000

Convert a uint16_t to char[2] to be sent over socket (unix)

I know that there are things out there roughly on this.. But my brains hurting and I can't find anything to make this work...
I am trying to send an 16 bit unsigned integer over a unix socket.. To do so I need to convert a uint16_t into two chars, then I need to read them in on the other end of the connection and convert it back into either an unsigned int or an uint16_t, at that point it doesn't matter if it uses 2bytes or 4bytes (I'm running 64bit, that's why I can't use unsigned int :)
I'm doing this in C btw
Thanks
Why not just break it up into bytes with mask and shift?
uint16_t value = 12345;
char lo = value & 0xFF;
char hi = value >> 8;
(edit)
On the other end, you assemble with the reverse:
uint16_t value = lo | uint16_t(hi) << 8;
Off the top of my head, not sure if that cast is required.
char* pUint16 = (char*)&u16;
ie Cast the address of the uint16_t.
char c16[2];
uint16_t ui16 = 0xdead;
memcpy( c16, ui16, 2 );
c16 now contains the 2 bytes of the u16. At the far end you can simply reverse the process.
char* pC16 = /*blah*/
uint16_t ui16;
memcpy( &ui16, pC16, 2 );
Interestingly though there is a call to memcpy nearly every compiler will optimise it out because its of a fixed size.
As Steven sudt points out you may get problems with big-endian-ness. to get round this you can use the htons (host-to-network short) function.
uint16_t ui16correct = htons( 0xdead );
and at the far end use ntohs (network-to-host short)
uint16_t ui16correct = ntohs( ui16 );
On a little-endian machine this will convert the short to big-endian and then at the far end convert back from big-endian. On a big-endian machine the 2 functions do nothing.
Of course if you know that the architecture of both machines on the network use the same endian-ness then you can avoid this step.
Look up ntohl and htonl for handling 32-bit integers. Most platforms also support ntohll and htonll for 64-bits as well.
Sounds like you need to use the bit mask and shift operators.
To split up a 16-bit number into two 8-bit numbers:
you mask the lower 8 bits using the bitwise AND operator (& in C) so that the upper 8 bits all become 0, and then assign that result to one char.
you shift the upper 8 bits to the right using the right shift operator (>> in C) so that the lower 8 bits are all pushed out of the integer, leaving only the top 8 bits, and assign that to another char.
Then when you send these two chars over the connection, you do the reverse: you shift what used to be the top 8 bits to the left by 8 bits, and then use bitwise OR to combine that with the other 8 bits.
Basically you are sending 2 bytes over the socket, that's all the socket need to know, regardless of endianness, signedness and so on... just decompose your uint16 into 2 bytes and send them over the socket.
char byte0 = u16 & 0xFF;
char byte1 = u16 >> 8;
At the other end do the conversion in the opposite way

Assign unsigned char to unsigned short with bit operators in ansi C

I know it is possible to assign an unsigned char to an unsigned short, but I would like to have more control how the bits are actually assigned to the unsigned short.
unsigned char UC_8;
unsigned short US_16;
UC_8 = 0xff;
US_16 = (unsigned char) UC_8;
The bits from UC_8 are now placed in the lower bits of US_16. I need more control of the conversion since the application I'm currently working on are safety related. Is it possible to control the conversion with bit operators? So I can specify where the 8 bits from the unsigned char should be placed in the bigger 16 bit unsigned short variable.
My guess is that it would be possible with masking combined with some other bit-operator, maybe left/right shifting.
UC_8 = 0xff;
US_16 = (US_16 & 0x00ff) ?? UC_8; // Maybe masking?
I have tried different combinations but have not come up with a smart solution. I'm using ansi C and as said earlier, need more control how the bits actually are set in the larger variable.
EDIT:
My problem or concern comes from a CRC generating function. It will and should always return an unsigned short, since it will sometimes calculate an 16 bit CRC. But sometimes it should calculate a 8 bit CRC instead, and place the 8 bit on the eight LSB in the 16 bit return variable. And on the eight MSB should then contain only zeros.
I would like to say something like:
US_16(7 downto 0) = UC_8;
US_16(15 downto 8) = 0x00;
If I just typecast it, can I guarantee that the bits always will be placed on the lower bits in the larger variable? (On all different architectures)
What do you mean, "control"?
The C standard unambiguously defines the unsigned binary format in terms of bit positions and significance. Certain bits of a 16-bit variable are "low", by numerical definition, and they will hold the pattern from the 8-bit variable, the other bits being set to zero. There is no ambiguity, no wiggle room, and nothing else to control.
Maybe rotation of bits will help you:
US_16 = (US_16 & 0x00ff) | ( UC_8 << 8 );
Result in bits will be:
C - UC_8 bits
S - US_16 bits
CCCC CCCC SSSS SSSS, resp.: SSSS SSSS are last 8 bits of US_16
But if UC_8 was 1 and US_16 was 0, then US_16 will be 512. Are you mean this?
US_16 = (US_16 & 0xff00) | ( UC_8 & 0x00ff );
US_16=~-1|UC_8;
Is this what you want?
If it is important to use ansi C, and not be restricted to a particular implementation, then you should not assume sizeof(short) == 2. And why bother to cast an unsigned char to an unsigned char (the same thing)? Although probably safe to assume char is 8 bits nowadays, even though that's not guaranteed.
uint8_t UC_8;
uint16_t US_16;
int nbits = ...# of bits to shift...;
US_16 = UC_8 << nbits;
Obviously, if you shift more than 15 bits, it may not be what you want. If you need to actually rearrange the bits, rather than just shift them to some position, you'll have to set them individually
int sourcebit = ...0 to 7...;
int destinationbit = ...0 to 15...;
// set
US_16 |= (US_8 & (1<<sourcebit)) << (destinationbit - sourcebit);
// clear
US_16 &= ~((US_8 & (1<<sourcebit)) << (destinationbit - sourcebit));
note: just wrote, didn't test. probably not optimal. blah blah blah. but something like that will work.

Resources