Confused with Union in C - c

I could not understand how Union works..
#include <stdio.h>
#include <stdlib.h>
int main()
{
union {
int a:4;
char b[4];
}abc;
abc.a = 0xF;
printf(" %d, %d, %d, %d, %d, %d\n", sizeof(abc), abc.a, abc.b[0], abc.b[1], abc.b[2], abc.b[3]);
return 0;
}
In the above program.
I made int a : 4;
So, a should taking 4 bits.
now I am storing, a = 0xF; //i.e a= 1111(Binary form)
So when I am accessing b[0 0r 1 or 2 or 3] why the outputs are not coming like 1, 1, 1, 1

Your union's total size will be at least 4 * sizeof(char).
Assuming the compiler you are using handles this as defined behavior, consider the following:
abc is never fully initialized, so it contains a random assortment of zeros and ones. Big problem. So, do this first: memset(&abc, 0, sizeof(abc));
The union should be the size of its largest member, so you should now have 4 zeroed-out bytes: 00000000 00000000 00000000 00000000
You are only setting 4 bits high, so your union will become something like this:
00000000 00000000 00000000 00001111 or 11110000 00000000 00000000 00000000. I'm not sure how your compiler handles this type of alignment, so this is the best I can do.
You might also consider doing a char-to-bits conversion so you can manually inspect the value of each and every bit in binary format:
Access individual bits in a char c++
Best of luck!

0xF is -1 if you look at it as a 4-bit signed, so the output is normal. b is not even assigned fully, so it's value is undefined. It's a 4 byte entity but you only assign a 4-bit entity. So everything looks normal to me.

Because every char takes (on most platforms) 1 byte i.e. 8 bits, so all the 4 bits of a fall into a single element of b[].
And beside that, it is compiler-dependent how the bit fields are stored, so it is not defined, into which byte of b[] that maps...

0xF is -1 if you defined it to be a 4 bit signed number. Check two-complement binary representation to understand why.
And you didn't initialize b, so it could be holding any random value.

Related

getting values of void pointer while only knowing the size of each element

ill start by saying ive seen a bunch of posts with similar titles but non focus on my question
ive been tasked to make a function that receives a void* arr, unsigned int sizeofArray and unsigned int sizeofElement
i managed to iterate through the array with no problem, however when i try to print out the values or do anything with them i seem to get garbage unless i specify the type of them beforehand
this is my function:
void MemoryContent(void* arr, unsigned int sizeRe, unsigned int sizeUnit)
{
int sizeArr = sizeRe/sizeUnit;
for (int i = 0; i < sizeArr ; i++)
{
printf("%d\n",arr); // this one prints garbage
printf("%d\n",*(int*)arr); // this one prints expected values given the array is of int*
arr = arr + sizeUnit;
}
}
the output of this with the following array(int arr[] = {1, 2, 4, 8, 16, 32, -1};) is:
-13296 1
-13292 2
-13288 4
-13284 8
-13280 16
-13276 32
-13272 -1
i realize i have to specify somehow the type. while the printf wont actually be used as i need the binary representation of whatever value is in there (already taken care of in a different function) im still not sure how to get the actual value without casting while knowing the size of the element
any explanation would be highly appreciated!
note: the compiler used is gcc so pointer arithmetics are allowed as used
edit for clarification:
the output after formating and all that should look like this for the given array of previous example
00000000 00000000 00000000 00000001 0x00000001
00000000 00000000 00000000 00000010 0x00000002
00000000 00000000 00000000 00000100 0x00000004
00000000 00000000 00000000 00001000 0x00000008
00000000 00000000 00000000 00010000 0x00000010
00000000 00000000 00000000 00100000 0x00000020
11111111 11111111 11111111 11111111 0xFFFFFFFF
getting values of void pointer getting values of void pointer while only knowing the size of each element
Not possible getting values of void pointer while only knowing the size of each element.
Say the size is 4. Is the element an int32_t, uint32_t, float, bool, some struct, or enum, a pointer, etc? Are any of the bits padding? The proper interpretation of the bits requires more than only knowing the size.
Code could print out the bits at void *ptr and leave the interpretation to the user.
unsigned char bytes[sizeUnit];
memcpy(bytes, ptr, sizeUnit);
for (size_t i = 0; i<sizeof bytes; i++) {
printf(" %02X", bytes[i]);
}
Simplifications exist.
OP's code void* arr, ... arr = arr + sizeUnit; is not portable code as adding to a void * is not defined by the C standard. Some compilers do allow it though, akin to as if the pointer was a char pointer.

Bits of the primitive type in C

Well, I'm starting my C studies and I was left with the following question, how are the bits of the primitive types filled in, for example, the int type, for example, has 4 bytes, that is 32 bits, which fits up to 4294967296. But if for example , I use a value that takes only 1 byte, how do the other bits stay?
#include <stdio.h>
int main(void) {
int x = 5; // 101 how the rest of the bits are filled
// which was not used?
return 0;
}
All leading bits will be set to 0, otherwise the value wouldn't be 5. A bit, in today computers, has only two states so if it's not 0 then it's 1 which would cause the value stored to be different. So assuming 32 bits you have that
5 == 0b00000000 00000000 00000000 00000101
5 == 0x00000005
The remaining bits are stored with 0.
int a = 356;
Now let us convert it to binary.
1 0 1 1 0 0 1 0 0
Now you get 9 bit number. Since int allocates 32 bits, fill the remaining 23 bits with 0.
So the value stored in memory is
00000000 00000000 00000001 01100100
The type you have picked determines how large the integer is, not the value you store inside the variable.
If we assume that int is 32 bits on your system, then the value 5 will be expressed as a 32 bit number. Which is 0000000000000000000000000101 binary or 0x00000005 hex. If the other bits had any other values, it would no longer be the number 5, 32 bits large.

C, Little and Big Endian confusion

I try to understand C programming memory Bytes order, but I'm confuse.
I try my app with some value on this site for my output verification : www.yolinux.com/TUTORIALS/Endian-Byte-Order.html
For the 64bits value I use in my C program:
volatile long long ll = (long long)1099511892096;
__mingw_printf("\tlong long, %u Bytes, %u bits,\t%lld to %lli, %lli, 0x%016llX\n", sizeof(long long), sizeof(long long)*8, LLONG_MIN, LLONG_MAX , ll, ll);
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i, j;
printf("\t");
for (i=size-1;i>=0;i--)
{
for (j=7;j>=0;j--)
{
byte = b[i] & (1<<j);
byte >>= j;
printf("%u", byte);
}
printf(" ");
}
puts("");
}
Out
long long, 8 Bytes, 64 bits, -9223372036854775808 to 9223372036854775807, 1099511892096, 0x0000010000040880
80 08 04 00 00 01 00 00 (Little-Endian)
10000000 00001000 00000100 00000000 00000000 00000001 00000000 00000000
00 00 01 00 00 04 08 80 (Big-Endian)
00000000 00000000 00000001 00000000 00000000 00000100 00001000 10000000
Tests
0x8008040000010000, 1000000000001000000001000000000000000000000000010000000000000000 // online website hex2bin conv.
1000000000001000000001000000000000000000000000010000000000000000 // my C app
0x8008040000010000, 1000010000001000000001000000000000000100000000010000000000000000 // yolinux.com
0x0000010000040880, 0000000000000000000000010000000000000000000001000000100010000000 //online website hex2bin conv., 1099511892096 ! OK
0000000000000000000000010000000000000000000001000000100010000000 // my C app, 1099511892096 ! OK
[Convert]::ToInt64("0000000000000000000000010000000000000000000001000000100010000000", 2) // using powershell for other verif., 1099511892096 ! OK
0x0000010000040880, 0000000000000000000000010000010000000000000001000000100010000100 // yolinux.com, 1116691761284 (from powershell bin conv.) ! BAD !
Problem
yolinux.com website announce 0x0000010000040880 for BIG ENDIAN ! But my computer use LITTLE ENDIAN I think (Intel proc.)
and I get same value 0x0000010000040880 from my C app and from another website hex2bin converter.
__mingw_printf(...0x%016llX...,...ll) also print 0x0000010000040880 as you can see.
Following yolinux website I have inverted my "(Little-Endian)" and "(Big-Endian)" labels in my output for the moment.
Also, the sign bit must be 0 for a positive number it's the case on my result but also yolinux result.(can not help me to be sure.)
If I correctly understand Endianness only Bytes are swapped not bits and my groups of bits seems to be correctly inverted.
It is simply an error on yolinux.com or is I missing a step about 64-bit numbers and C programming?
When you print some "multi-byte" integer using printf (and the correct format specifier) it doesn't matter whether the system is little or big endian. The result will be the same.
The difference between little and big endian is the order that multi-byte types are stored in memory. But once data is read from memory into the core processor, there is no difference.
This code shows how an integer (4 bytes) is placed in memory on my machine.
#include <stdio.h>
int main()
{
unsigned int u = 0x12345678;
printf("size of int is %zu\n", sizeof u);
printf("DEC: u=%u\n", u);
printf("HEX: u=0x%x\n", u);
printf("memory order:\n");
unsigned char * p = (unsigned char *)&u;
for(int i=0; i < sizeof u; ++i) printf("address %p holds %x\n", (void*)&p[i], p[i]);
return 0;
}
Output:
size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 78
address 0x7ffddf2c263d holds 56
address 0x7ffddf2c263e holds 34
address 0x7ffddf2c263f holds 12
So I can see that I'm on a little endian machine as the LSB (least significant byte, i.e. 78) is stored on the lowest address.
Executing the same program on a big endian machine would (assuming same address) show:
size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 12
address 0x7ffddf2c263d holds 34
address 0x7ffddf2c263e holds 56
address 0x7ffddf2c263f holds 78
Now it is the MSB (most significant byte, i.e. 12) that are stored on the lowest address.
The important thing to understand is that this only relates to "how multi-byte type are stored in memory". Once the integer is read from memory into a register inside the core, the register will hold the integer in the form 0x12345678 on both little and big endian machines.
There is only a single way to represent an integer in decimal, binary or hexadecimal format. For example, number 43981 is equal to 0xABCD when written as hexadecimal, or 0b1010101111001101 in binary. Any other value (0xCDAB, 0xDCBA or similar) represents a different number.
The way your compiler and cpu choose to store this value internally is irrelevant as far as C standard is concerned; the value could be stored as a 36-bit one's complement if you're particularly unlucky, as long as all operations mandated by the standard have equivalent effects.
You will rarely have to inspect your internal data representation when programming. Practically the only time when you care about endiannes is when working on a communication protocol, because then the binary format of the data must be precisely defined, but even then your code will not be different regardless of the architecture:
// input value is big endian, this is defined
// by the communication protocol
uint32_t parse_comm_value(const char * ptr)
{
// but bit shifts in C have the same
// meaning regardless of the endianness
// of your architecture
uint32_t result = 0;
result |= (*ptr++) << 24;
result |= (*ptr++) << 16;
result |= (*ptr++) << 8;
result |= (*ptr++);
return result;
}
Tl;dr calling a standard function like printf("0x%llx", number); always prints the correct value using the specified format. Inspecting the contents of memory by reading individual bytes gives you the representation of the data on your architecture.

How to interpret *( (char*)&a )

I see a way to know the endianness of the platform is this program but I don't understand it
#include <stdio.h>
int main(void)
{
int a = 1;
if( *( (char*)&a ) == 1) printf("Little Endian\n");
else printf("Big Endian\n");
system("PAUSE");
return 0;
}
What does the test do?
An int is almost always larger than a byte and often tracks the word size of the architecture. For example, a 32-bit architecture will likely have 32-bit ints. So given typical 32 bit ints, the layout of the 4 bytes might be:
00000000 00000000 00000000 00000001
or with the least significant byte first:
00000001 00000000 00000000 00000000
A char* is one byte, so if we cast this address to a char* we'll get the first byte above, either
00000000
or
00000001
So by examining the first byte, we can determine the endianness of the architecture.
This would only work on platforms where sizeof(int) > 1. As an example, we'll assume it's 2, and that a char is 8 bits.
Basically, with little-endian, the number 1 as a 16-bit integer looks like this:
00000001 00000000
But with big-endian, it's:
00000000 00000001
So first the code sets a = 1, and then this:
*( (char*)&a ) == 1)
takes the address of a, treats it as a pointer to a char, and dereferences it. So:
If a contains a little-endian integer, you're going to get the 00000001 section, which is 1 when interpeted as a char
If a contains a big-endian integer, you're going to get 00000000 instead. The check for == 1 will fail, and the code will assume the platform is big-endian.
You could improve this code by using int16_t and int8_t instead of int and char. Or better yet, just check if htons(1) != 1.
You can look at an integer as a array of 4 bytes (on most platforms). A little endian integer will have the values 01 00 00 00 and a big endian 00 00 00 01.
By doing &a you get the address of the first element of that array.
The expression (char*)&a casts it to the address of a single byte.
And finally *( (char*)&a ) gets the value contained by that address.
take the address of a
cast it to char*
dereference this char*, this will give you the first byte of the int
check its value - if it's 1, then it's little endian. Otherwise - big.
Assume sizeof(int) == 4, then:
|........||........||........||........| <- 4bytes, 8 bits each for the int a
| byte#1 || byte#2 || byte#3 || byte#4 |
When step 1, 2 and 3 are executed, *( (char*)&a ) will give you the first byte, | byte#1 |.
Then, by checking the value of byte#1 you can understand if it's big or little endian.
The program just reinterprets the space taken up by an int as an array of chars and assumes that 1 as an int will be stored as a series of bytes, the lowest order of which will be a byte of value 1, the rest being 0.
So if the lowest order byte occurs first, then the platform is little endian, else its big endian.
These assumptions may not work on every single platform in existance.
a = 00000000 00000000 00000000 00000001
^ ^
| |
&a if big endian &a if little endian
00000000 00000001
^ ^
| |
(char*)&a for BE (char*)&a for LE
*(char*)&a = 0 for BE *(char*)&a = 1 for LE
Here's how it breaks down:
a -- given the variable a
&a -- take its address; type of the expression is int *
(char *)&a -- cast the pointer expression from type int * to type char *
*((char *)&a) -- dereference the pointer expression
*((char *)&a) == 1 -- and compare it to 1
Basically, the cast (char *)&a converts the type of the expression &a from a pointer to int to a pointer to char; when we apply the dereference operator to the result, it gives us the value stored in the first byte of a.
*( (char*)&a )
In BigEndian data for int i=1 (size 4 byte) will arrange in memory as:- (From lower address to higher address).
00000000 -->Address 0x100
00000000 -->Address 0x101
00000000 -->Address 0x102
00000001 -->Address 0x103
While LittleEndian is:-
00000001 -->Address 0x100
00000000 -->Address 0x101
00000000 -->Address 0x102
00000000 -->Address 0x103
Analyzing the above cast:-
Also &a= 0x100 and thus
*((char*)0x100) implies consider by taking one byte(since 4 bytes loaded for int) a time so the data at 0x100 will be refered.
*( (char*)&a ) == 1 => (*0x100 ==1) that is 1==1 and so true,implying its little endian.

How to translate these hex numbers into bitwise values?

I'm looking through the source code of a project written in C. Here is a list of options that are defined (no these aren't the real defines...not very descriptive!)
...
#define OPTION_5 32768
#define OPTION_6 65536
#define OPTION_7 0x20000L
#define OPTION_8 0x40000L
#define OPTION_9 0x80000L
I'd like to add a new option OPTION_10 but before I do that, I'd like to understand what exactly the hex numbers represent?
Do these numbers convert to the expected decimal values of 131,072 262,144 524,288 ? If so, why not keep the same format as the earlier options?
Do these numbers convert to the expected decimal values of 131,072
Yes. You can use Google for the conversion: search for "0x20000 in decimal".
If so, why not keep the same format as the earlier options?
I guess simply because programmers know their powers of two up to 65536 and prefer hexadecimal, where they are more recognizable, above that.
The L suffix forces the literal constant to be typed at least as a long int, but the chosen type may be still larger if that's necessary to hold the constant. It's probably unnecessary in your program and the programmer used it because s/he didn't understand the emphasized clause. The nitty-gritty details are in 6.4.4.1, page 56 of the C99 standard.
Just a further thought to add to the existing answers, I prefer to define such flags more like this:
enum {
OPTION_5_SHIFT = 15,
OPTION_6_SHIFT,
OPTION_7_SHIFT,
OPTION_8_SHIFT,
OPTION_9_SHIFT,
OPTION_10_SHIFT
};
enum {
OPTION_5 = 1L << OPTION_5_SHIFT,
OPTION_6 = 1L << OPTION_6_SHIFT,
OPTION_7 = 1L << OPTION_7_SHIFT,
OPTION_8 = 1L << OPTION_8_SHIFT,
OPTION_9 = 1L << OPTION_9_SHIFT,
OPTION_10 = 1L << OPTION_10_SHIFT
};
This avoids having explicitly calculated constants and makes it much easier to insert/delete values, etc.
They represent the same kind of numbers, they are all powers of two. Or, too see it in a different light, they are all binary numbers with exactly one one (no phun intended).
One possible reason why they are written the they way they are (even though the reason isn't a good one) is that many programmers know the following sequence by hart:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
This sequence corresponds to the first 17 powers of two. Then things are not that easy any more, so they probably switched to hex (being too lazy to change all the earlier numbers).
The specific values represent bit flag options, which can be combined with the bitwise OR operator |:
flags = (OPTION_5|OPTION_6);
You will see from the binary representation of these values, that each has one unique bit set, to allow combining them using bitwise OR:
0x8000L = 32768 = 00000000 00000000 10000000 00000000
0x10000L = 65536 = 00000000 00000001 00000000 00000000
0x20000L = 131072 = 00000000 00000010 00000000 00000000
0x40000L = 262144 = 00000000 00000100 00000000 00000000
0x80000L = 524288 = 00000000 00001000 00000000 00000000
0x100000L = 1048576 = 00000000 00010000 00000000 00000000
To find out if a flag has been set in the flags variable, you can use the bitwise AND operator &:
if(flags & OPTION_6)
{
/* OPTION_6 is active */
}
Each digit of a number represents a multiplication factor of the number's numerical system's base to the power of the digit's position in the number, counted from right to left, beginning with zero.
So 32768 = 8 * 10^0 + 6 * 10^1 + 7 * 10^2 + 2 * 10^3 + 3 * 10^4.
(Hint for the sake of completeness: x^0 = 1, x^1 = x.)
Hexadecimal numbers have 16 digits (0 - 9, A (~10) - F (~15)) and hence a base of 16, so 0x20 = 0 * 16^0 + 2 * 16^1.
Binary numbers have 2 digits and a base of 2, so 100b = 1 * 2^2 + 0 * 2^1 + 0 * 2^0.
Knowing that you should be able to figure the rest yourself and handle binary and hexadecimal numbers, understand that each number you listed is twice its predecessor, what decimal values the hex numbers have, what the next decimal number in the row should be, and how to express OPTION_10 in any numerical system, and particularly binary, decimal and hexadecimal.

Resources