Are char bytes in unions reversed? - c

I have a union with an int and a char like so:
union {
int i;
char c[4];
} num;
When I set the int equal to 1, and print each char, I get this result:
1 0 0 0
...leading me to conclude that my machine is little endian.
However, when I bit-shift left by 24, I get the following:
0 0 0 1
Swapping the endianness through a custom function (by swapping the left-most byte with the right, and same for the middle two), I end up with:
0 0 0 1
Left shifting this by 24 results in:
0 0 0 0
This leads me to conclude that the char[4] in my union is represented from right to left, in which case the endianness is actually the reverse of what's represented. But from my understanding, char arrays are generally interpreted from left to right, regardless of platforms.
Are the char bytes in my union reversed?
Full code here:
#include <stdio.h>
#include <stdlib.h>
void endian_switch32(int *n)
{
int ret[4];
ret[0] = *n >> 24;
ret[1] = (*n >> 8) & (255 << 8);
ret[2] = (*n << 8) & (255 << 16);
ret[3] = *n << 24;
*n = ret[0] | ret[1] | ret[2] | ret[3];
}
int main (void) {
union {
int i;
char c[4];
} num;
num.i = 1;
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
num.i <<= 24;
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
num.i = 1;
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
endian_switch32(&num.i);
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
num.i <<= 24;
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
}
The result:
1 0 0 0
0 0 0 1
1 0 0 0
0 0 0 1
0 0 0 0

The point is, that you're printing the bytes in the reverse order, so you're going to print 0x01020304 as 4 3 2 1, which leads to your confusion. Endian does not affect how arrays are stored, i.e. no one "reverse store" an array.
When you shift 1 right by 24, you get zero. That's fine:
00000000 00000000 00000000 00000001
->
(00000000 00000000 00000000) 00000000 00000000 00000000 00000001
->
00000000 00000000 00000000 00000000
which is exactly zero.
When you shift 0x01000000 right by 24, you get 1. The conclusion (from output of printing of char[4]) is that your platform is little-endian.

Left and right shifts are based on the value of the int, not on its binary representation. No matter how the bytes are stored in memory, a 32-bit int with the value 1 is logically considered to be 0x00000001, or binary
00000000 00000000 00000000 00000001
Regardless of your endianness, the bit-shifting results work on this representation, so bit-shifting isn't a good way to detect endianness. Your machine is probably little-endian (both because of these results and from base rate, given that most computers are little-endian).

Related

How is an integer stored in C program?

is the number 1 stored in memory as 00000001 00000000 00000000 00000000?
#include <stdio.h>
int main()
{
unsigned int a[3] = {1, 1, 0x7f7f0501};
int *p = a;
printf("%d %p\n", *p, p);
p = (long long)p + 1;
printf("%d %p\n", *p, p);
char *p3 = a;
int i;
for (i = 0; i < 12; i++, p3++)
{
printf("%x %p\n", *p3, p3);
}
return 0;
}
Why is 16777216 printed in the output:
An integer is stored in memory in different ways on different architectures. Most commons ways are called little-endian and big-endian byte ordering.
See Endianness
(long long)p+1
|
v
Your memory: [0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, ...]
You increment p not like pointer but as a long long number, so it does not point to next integer but the next byte. So you will get 0x00, 0x00, 0x00, 0x01 which translates to 0x1000000 (decimal 16777216) in a little-endian arch.
Something to play with (assuming int is 32 bits wide):
#include <stdio.h>
#include <stdbool.h>
typedef union byte_rec {
struct bit_rec {
bool b0 : 1;
bool b1 : 1;
bool b2 : 1;
bool b3 : 1;
bool b4 : 1;
bool b5 : 1;
bool b6 : 1;
bool b7 : 1;
} bits;
unsigned char value;
} byte_t;
typedef union int_rec {
struct bytes_rec {
byte_t b0;
byte_t b1;
byte_t b2;
byte_t b3;
} bytes;
int value;
} int_t;
void printByte(byte_t *b)
{
printf(
"%d %d %d %d %d %d %d %d ",
b->bits.b0,
b->bits.b1,
b->bits.b2,
b->bits.b3,
b->bits.b4,
b->bits.b5,
b->bits.b6,
b->bits.b7
);
}
void printInt(int_t *i)
{
printf("%p: ", i);
printByte(&i->bytes.b0);
printByte(&i->bytes.b1);
printByte(&i->bytes.b2);
printByte(&i->bytes.b3);
putchar('\n');
}
int main()
{
int_t i1, i2;
i1.value = 0x00000001;
i2.value = 0x80000000;
printInt(&i1);
printInt(&i2);
return 0;
}
Possible output:
0x7ffea0e30920: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0x7ffea0e30924: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Additional (based on the comment of #chqrlie):
I've previously used the unsigned char type, but the C Standard allows only 3 - and since C99 - 4 types. Additional implementation-defined types may be acceptable by the C Standard and it seems that gcc was ok with the unsigned char type for the bit field, but i've changed it nevertheless to the allowed type _Bool (since C99).
Noteworthy: The order of bit fields within an allocation unit (on some platforms, bit fields are packed left-to-right, on others right-to-left) are undefined (see Notes section in the reference).
Reference to bit fields: https://en.cppreference.com/w/c/language/bit_field
p = (long long)p + 1; is bad code (undefined behavior UB (e.g. bus fault and re-booted machine)) as it is not specified to work in C. The attempted assigned of the newly formed address is not certainly aligned to int * needs.
Don`t do that.
To look at the bytes of a[]
#include <stdio.h>
#include <stdlib.h>
void dump(size_t sz, const void *ptr) {
const unsigned char *byte_ptr = (const unsigned char *) ptr;
for (size_t i = 0; i < sz; i++) {
printf("%p %02X\n", (void*) byte_ptr, *byte_ptr);
byte_ptr++;
}
}
int main(void) {
unsigned int a[3] = {1, 1, 0x7f7f0501u};
dump(sizeof a, a);
}
As this is wiki, feel open to edit.
There are multiple instances of undefined behavior in your code:
in printf("%d %p\n", *p, p) you should cast p as (void *)p to ensure printf receives a void * as it expects. This is unlikely to pose a problem on most current targets but some ancien systems had different representations for int * and void *, such as early Cray systems.
in p = (long long)p + 1, you have implementation defined behavior converting a pointer to an integer and implicitly converting the integral result of the addition back to a pointer. More importantly, this may create a pointer with incorrect alignment for accessing int in memory, resulting in undefined behavior when you dereference p. This would cause a bus error on many systems, eg: most RISC architectures, but by chance not on intel processors. It would be safer to compute the pointer as p = (void *)((intptr_t)p + 1); or p = (void *)((char *)p + 1); albeit this would still have undefined behavior because of alignment issues.
is the number 1 stored in memory as 00000001 00000000 00000000 00000000?
Yes, your system seems to use little endian representation for int types. The least significant 8 bits are stored in the byte at the address of a, then the next least significant 8 bits, and so on. As can be seen in the output, 1 is stored as 01 00 00 00 and 0x7f7f0501 stored as 01 05 7f 7f.
Why is 16777216 printed in the output?
The second instance of printf("%d %p\n", *p, p) as undefined behavior. On your system, p points to the second byte of the array a and *p reads 4 bytes from this address, namely 00 00 00 01 (the last 3 bytes of 1 and the first byte of the next array element, also 1), which is the representation of the int value 16777216.
To dump the contents of the array as bytes, you should access it using a char * as you do in the last loop. Be aware that char may be signed on some systems, causing for example printf("%x\n", *p3); to output ffffff80 if p3 points to the byte with hex value 80. Using unsigned char * is recommended for consistent and portable behavior.

Creating duplicate bytes from a given function

So I have the below code which shifts a 32 bit int 6 bits to the left (s->data) then appends the last 6 bits of the int operand to the int s->data. I would like to use this code to create a function which takes an unsigned char x and copies x into the first 3 bytes of the int s->data leaving the final byte as 0. So for example, if we had x = 255 then s->data , in binary form, would be 11111111 11111111 11111111 000000000. Does anyone know how this can be achieved using the below code (dataCommand). So If i can only shift left by 6 bits and append 6 bits to the end of s->data, how can I get something of the form above?.
I know how to get say 255 from using s->data (we do dataCommand(128+64+(255/64))) followed by dataCommand(128+64+(255%64)). This is assuming s->data is 0 to begin with. So this would give 00000000 00000000 00000000 11111111. However , I would like something of the form 11111111 11111111 11111111 00000000.
I am really lost as to how to do this, so any help would be greatly appreciated. Below is the dataCommand function. Thank you. As always, it can be assumed s->data is 0 to begin with.
void dataCommand(int operand, state *s) {
printf("DATA BEFORE IS %x\n", s->data);
// shifts bits of current data fields six positions to left
s->data = s->data << 6;
// (operand & 63) masks 6 bits off the operand
// then we combine 6 bits of data with 6 bits of operand
s->data = (s->data | (operand & 63));
printf("DATA AFTER %x\n", s->data);
}
Comments in MyFunction below explain how to do this.
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
typedef struct state { int x, y, tx, ty; unsigned char tool; unsigned int start, data; bool end;} state;
void dataCommand(int operand, state *s) {
printf("DATA BEFORE IS %x\n", s->data);
// shifts bits of current data fields six positions to left
s->data = s->data << 6;
// (operand & 63) masks 6 bits off the operand
// then we combine 6 bits of data with 6 bits of operand
s->data = (s->data | (operand & 63));
printf("DATA AFTER %x\n", s->data);
}
void MyFunction(unsigned char x, state *s)
{
/* Create the target pattern that contains 0 in byte 0 and x in bytes 1,
2, and 3.
*/
uint32_t p = 0x01010100u * x;
/* For each six bits in p, or fragment thereof, in order from high bits to
low bits, shift those six bits down to the low six bits and give that
to dataCommand to insert into s->data.
*/
dataCommand(p >> 6*5, s);
dataCommand(p >> 6*4, s);
dataCommand(p >> 6*3, s);
dataCommand(p >> 6*2, s);
dataCommand(p >> 6*1, s);
dataCommand(p >> 6*0, s);
}
int main(void)
{
state s;
MyFunction(0x45, &s);
printf("data = 0x%08x.\n", s.data);
}

2 to the power of N in C without pow

How can i calculate in C power of 2, without pow function?
For example, after keyboard input 4, the result to be 16?
I know that, for example, 2^5 can be typing similar like 2^1*2^5 (I don't know if this idea can help)
To calculate 2N in C, use 1 << N.
If this may exceed the value representable in an int, use (Type) 1 << N, where Type is the integer type you want to use, such as unsigned long or uint64_t.
<< is the left-shift operator. It moves bits “left” in the bits that represent a number. Since numbers are represented in binary, moving bits left increases the powers of 2 they represent. Thus, 12 represents 1, 102 represents 2, 1002 represents 4, and so on, so 1 shifted left N positions represents 2N.
Numbers can be represented in binary form. For example, if integers are stored using 32 bits, 1 is stored like this:
00000000 00000000 00000000 00000001
And the value is the result of 1 x (20)
If you do a left-shift operation your value will be stored as this:
00000000 00000000 00000000 00000010
That means that now the result is result of 1 x (21)
Bit used to store a type is sizeof(type)x8, because a byte is 8 bit.
So best method is to use shift:
The left-shift of 1 by exp is equivalent to 2 raised to exp.
Shift operators must not be used for negative exponents in case of pow. The result is an undefined behaviour.
Another case of undefined behavior is the one of shifting the number equal to or more than N, in case of that number is stored in N bits.
#include <stdio.h>
#include <stdlib.h>
int main() {
int exp;
printf("Please, insert exponent:\n");
if (scanf("%d", &exp) != 1) {
printf("ERROR: scanf\n");
exit(EXIT_FAILURE);
}
if (exp < 0) {
printf("ERROR: exponent must be >= 0\n");
exit(EXIT_FAILURE);
}
printf("2^(%d) = %d\n", exp, 1 << exp);
exit(EXIT_SUCCESS);
}
You can also do it creating a ricorsive function (int) -> int:
int my_pow(int exp) {
If (exp < 0 ) {
return -1;
}
if (exp == 0) {
return 1;
}
if (exp > 0) {
return 2 * my_pow(exp-1);
}
}
Using it as main:
int main() {
int exp;
scanf("%d" &exp);
int res = my_pow(exp);
if (res == -1) {
printf("ERROR: Exponent must be equal or bigger than 0\n");
exit(EXIT_FAILURE);
}
printf("2^(%d) = %d", exp, res);
return 0;
}

Why is "0" "1" sometimes printed as a character and sometimes as ASCII 48/49?

I noticed this when I was writing code.
To xor the elements in the character array, why do some display 0/1 and some display ASCII? How do I get them all to behave like number 0 or 1?
In function XOR, I want to xor the elements in two arrays and store the result in another array.
In main, I do some experiments.
And by the way, besides printing the results, I want to do 0 1 binary operations. Such as encryption and decryption.
Here is a piece of C code.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int XOR(char *u, char *w, char *v)
{
for(int i = 0; i < 16; i++)
{
u[i] = w[i] ^ v[i];
}
return 0;
}
int PrintList(char *list, int n)
{
for(int i = 0; i < n; i++)
{
printf("%d", list[i]);
}
return 0;
}
int main()
{
char u[17] = "";
char w[17] = "0001001000110100";
char v[17] = "0100001100100001";
XOR(u, w, v);
PrintList(u, 16);
printf("\n");
char w2[17] = "1000110011101111";
XOR(u, w2, v);
PrintList(u, 16);
printf("\n");
char v2[17] = "1111111011001000";
XOR(u, w2, v2);
PrintList(u, 16);
printf("\n");
char x[17] = "0101101001011010";
XOR(u, x, u);
PrintList(u, 16);
printf("\n");
memcpy(w, u, 16);
XOR(u, w, v);
PrintList(u, 16);
printf("\n");
return 0;
}
The result
0101000100010101
1100111111001110
0111001000100111
48484948494848484849494949494849
0110101101011100
Process returned 0 (0x0) execution time : 0.152 s
Press any key to continue.
Well, change my declarations from char to unsigned char, maybe due to printf("%d", list[i]);The print results no changes. Change to printf("%c", list[i]);The print results:
0010100001111101
Process returned 0 (0x0) execution time : 0.041 s
Press any key to continue.
Character '0' is ‭00110000‬ in binary. '1' is 00110001.
'0' ^ '0' = 00000000 (0)
'0' ^ '1' = 00000001 (1)
'1' ^ '1' = 00000000 (0)
But then you reuse the u array.
'0' ^ 0 = 0011000 (48)
'0' ^ 1 = 0011001 (49)
'1' ^ 0 = 0011001 (49)
'1' ^ 1 = 0011000 (48)
These are strings so you initially have the ASCII codes 48 (0011 0000) and 49 (0011 0001). The ^ operator is bitwise XOR so the result of two operands with the values 48 and 49 can either be 0 or 1. When you print that result as integer, you get 0 or 1 as expected.
If you later use the result of that operation though, you no longer have an array of ASCII codes, but an array of integers with the value 0 or 1. If you XOR that one with an array that is still an ASCII code array, for example 0011 0000 ^ 0, you will get the result 0011 0000, not 0. And so printf gives you 48 etc.

Checksum for binary numbers? Converting back to decimal?

Introduction
This program should input a number in decimal (base 10) from the user, convert that number to binary, calculate the "binary sum", then present the binary sum and binary representation of the input.
The program should go something like this:
What type of display do you want?
Enter 1 for character parity, 2 for integer checksum: 2
Enter an integer for checksum calculation: 1024
Integer: 1024, Bit representation: 00000000 00000000 00000100 00000000
Sum of the number is: 4
Checksum of the number is: 4, Bit representation: 00000100
What is binary sum?
The "binary sum" of a number, n, is defined splitting the binary representation of n into 8-bit long numbers, and summing the base-10 value of each. This means 32-bit long numbers, you sum the base-10 values of the numbers represented by bits (1-8), (9-16), (17-24), and (25-32). Here is an example:
Example of binary sum of 1234567:
Step 1:
Convert 1234567 into it's binary representation.
1234567 -> 100101101011010000111
Step 2:
Split the binary number into 8 bit parts, adding zero's to the left if needed to make complete 8-bit numbers.
100101101011010000111 -> 00010010 11010110 10000111
Step 3:
Convert each 8-bit long number to decimal then add their values.
00010010 -> 18 (2^1 + 2^4 => 2 + 16 = 18)
11010110 -> 214 (2^1 + 2^2 + 2^4 + 2^6 + 2^7 => 2 + 4 + 16 + 64 + 128) = 214
10000111 -> 135 (2^0 + 2^1 + 2^2 + 2^7 => 1 + 2 + 4 + 128) = 135
18 + 214 + 135 = 367
The binary sum of 1234567 is 367.
I have no problem showing the binary representation of the input, but I'm not sure on how calculate the binary sum. This is challenging because I'm not allowed to use strings or arrays, only basic primitive data types.
This the code I have made so far, with comments where I am having issues:
int main(void) {
char endLoop;
int userChoice;
char choice1;
char byte;
int choice2;
while(endLoop != 'q') {
printf("\nWhat type of display do you want?");
printf("\nEnter 1 for character parity, 2 for integer checksum: ");
scanf("%d", &userChoice);
if(userChoice == 1) {
printf("Enter a character for parity calculation: ");
scanf(" %c", &choice1);
printf("Character: %c" , choice1);
printf(", Bit Representation: ");
int number1s = fromBinary(toBinary(choice1, 8));
printf("\nNumber of ones: %d", number1s);
printf("\nEven 1 parity for the character is: ");
if(number1s % 2 != 0) {
printf("1");
toBinary(choice1, 7);
} else {
toBinary(choice1, 8);
}
}
if(userChoice == 2) {
printf("Enter an integer for checksum calculation: ");
scanf("%d", &choice2);
printf("Integer: %d", choice2);
printf(", Bit Representation: " );
toBinary(choice2, 32);
printf("\nSum of number is: ");
printf("\nChecksum of number is: ");
printf(", Bit Representation: ");
}
printf("\n\nEnter r to repeat, q to quit: ");
scanf(" %c", &endLoop);
}
}
int toBinary(int userInput, int bits) {
int i;
int mask = 1 << bits - 1;
int count = 0;
for (i = 1; i <= bits; i++) {
if (userInput & mask){
count++;
putchar('1');
} else {
putchar('0');
}
userInput <<= 1;
if (! (i % 8)) {
putchar(' ');
}
}
return count;
}
int fromBinary(char binaryValue) {
// I wanted to take the binary value I get from toBinary() and
// convert it to decimal here. But am not sure how to go about it
// since I need the bit representation, and I don't store the bit
// representation, I only print it out.
// I need to convert it to decimal so that I can add the decimal
// values up to calculate the binary sum.
}
EDIT for negative inputs
You have said that you would also like to handle negative numbers. The simplest way to do this, is to define your method to accept an unsigned int rather than an int. This will allow you to do all your normal bit operations without worrying about handling different cases for negative numbers.
Change this line
int getSum(int n) {
to this
int getSum(unsigned int n) {
No further changes are necessary, in fact now we can remove the if statement in getSum.
The new complete getSum method has been updated below. The commented code can be found at the bottom.
Remember, if you want to print out an unsigned int, the format specifier is %u not %d.
Solution
If you have a number, and you want to add up the values of what each 8 bits of that number would be in base 10, you can do it like this:
int getSum(unsigned int n) {
int total = 0;
while(n) {
int tempCount = 0, i = 0;
for(i = 0; n && i < 8; i++) {
tempCount += (n & 1) * pow(2, i);
n >>= 1;
}
total += tempCount
}
return total;
}
Explanation
This code will (while n > 0) grab 8 bits at a time, and add their base-10 values:
2^0 * 1 or 2^0 * 0 +
2^1 * 1 or 2^1 * 0 +
2^2 * 1 or 2^2 * 0 +
... +
2^7 * 1 or 2^7 * 0
tempCount holds the sum for each set of 8 bits, and after each 8 bits, tempCount is added to the total and is reset to 0.
The condition in the for loop, n && i < 8 is of course to stop after grabbing 8 bits, but to also terminate early if n is 0.
Testing
This output:
getSum(1025) = 5
getSum(2048) = 8
getSum(1234567) = 367
getSum(2147483647) = 892
was used to verify the correctness of this code:
#include <stdio.h>
#include <math.h>
int getSum(unsigned int n) {
int total = 0;
//printf("passed in %u\n", n);
while(n) {
int tempCount = 0, i;
//printf("n starts while as %u\n", n);
// Take up to 8 bits from the right side of the number
// and add together their original values (1, 2, 4, ..., 64, 128)
for(i = 0; n && i < 8; i++) {
//printf("\t\tn in for as %u\n", n);
tempCount += (n & 1) * pow(2, i);
//printf("\t\t\tbit is %u\n", (n & 1));
n >>= 1;
}
//printf("\tAdded %u from that set of 8 bits\n", tempCount);
total += tempCount;
}
return total;
}
int main(void) {
printf("getSum(1025) = %d\n", getSum(1025));
printf("getSum(2048) = %d\n", getSum(2048));
printf("getSum(1234567) = %d\n", getSum(1234567));
printf("getSum(2147483647) = %d\n", getSum(2147483647));
return 0;
}
Of course I checked these examples by hand:
2147483647
2147483647 == 01111111 11111111 11111111 11111111
The bit sum =
01111111 + 11111111 + 11111111 + 11111111 =
127 + 255 + 255 + 255 = 892
getSum(2147483647) = 892
1025
1025 == 00000100 00000001
The bit sum =
00000100 + 00000001 =
4 + 1 = 5
getSum(1025) = 5
2048
2048 == 00001000 00000000
The bit sum =
00001000 + 00000000 =
8 + 0 = 8
getSum(2048) = 8
1234567
1234567 == 00010010 11010110 10000111
The bit sum =
00010010 + 11010110 + 10000111 =
18 + 214 + 135 = 367
getSum(1234567) = 367
-1
-1 = 11111111 11111111 11111111 11111111
The bit sum =
11111111 + 11111111 + 11111111 + 11111111 =
255 + 255 + 255 + 255 = 1020
getSum(-1) = 1020

Resources