I see a way to know the endianness of the platform is this program but I don't understand it
#include <stdio.h>
int main(void)
{
int a = 1;
if( *( (char*)&a ) == 1) printf("Little Endian\n");
else printf("Big Endian\n");
system("PAUSE");
return 0;
}
What does the test do?
An int is almost always larger than a byte and often tracks the word size of the architecture. For example, a 32-bit architecture will likely have 32-bit ints. So given typical 32 bit ints, the layout of the 4 bytes might be:
00000000 00000000 00000000 00000001
or with the least significant byte first:
00000001 00000000 00000000 00000000
A char* is one byte, so if we cast this address to a char* we'll get the first byte above, either
00000000
or
00000001
So by examining the first byte, we can determine the endianness of the architecture.
This would only work on platforms where sizeof(int) > 1. As an example, we'll assume it's 2, and that a char is 8 bits.
Basically, with little-endian, the number 1 as a 16-bit integer looks like this:
00000001 00000000
But with big-endian, it's:
00000000 00000001
So first the code sets a = 1, and then this:
*( (char*)&a ) == 1)
takes the address of a, treats it as a pointer to a char, and dereferences it. So:
If a contains a little-endian integer, you're going to get the 00000001 section, which is 1 when interpeted as a char
If a contains a big-endian integer, you're going to get 00000000 instead. The check for == 1 will fail, and the code will assume the platform is big-endian.
You could improve this code by using int16_t and int8_t instead of int and char. Or better yet, just check if htons(1) != 1.
You can look at an integer as a array of 4 bytes (on most platforms). A little endian integer will have the values 01 00 00 00 and a big endian 00 00 00 01.
By doing &a you get the address of the first element of that array.
The expression (char*)&a casts it to the address of a single byte.
And finally *( (char*)&a ) gets the value contained by that address.
take the address of a
cast it to char*
dereference this char*, this will give you the first byte of the int
check its value - if it's 1, then it's little endian. Otherwise - big.
Assume sizeof(int) == 4, then:
|........||........||........||........| <- 4bytes, 8 bits each for the int a
| byte#1 || byte#2 || byte#3 || byte#4 |
When step 1, 2 and 3 are executed, *( (char*)&a ) will give you the first byte, | byte#1 |.
Then, by checking the value of byte#1 you can understand if it's big or little endian.
The program just reinterprets the space taken up by an int as an array of chars and assumes that 1 as an int will be stored as a series of bytes, the lowest order of which will be a byte of value 1, the rest being 0.
So if the lowest order byte occurs first, then the platform is little endian, else its big endian.
These assumptions may not work on every single platform in existance.
a = 00000000 00000000 00000000 00000001
^ ^
| |
&a if big endian &a if little endian
00000000 00000001
^ ^
| |
(char*)&a for BE (char*)&a for LE
*(char*)&a = 0 for BE *(char*)&a = 1 for LE
Here's how it breaks down:
a -- given the variable a
&a -- take its address; type of the expression is int *
(char *)&a -- cast the pointer expression from type int * to type char *
*((char *)&a) -- dereference the pointer expression
*((char *)&a) == 1 -- and compare it to 1
Basically, the cast (char *)&a converts the type of the expression &a from a pointer to int to a pointer to char; when we apply the dereference operator to the result, it gives us the value stored in the first byte of a.
*( (char*)&a )
In BigEndian data for int i=1 (size 4 byte) will arrange in memory as:- (From lower address to higher address).
00000000 -->Address 0x100
00000000 -->Address 0x101
00000000 -->Address 0x102
00000001 -->Address 0x103
While LittleEndian is:-
00000001 -->Address 0x100
00000000 -->Address 0x101
00000000 -->Address 0x102
00000000 -->Address 0x103
Analyzing the above cast:-
Also &a= 0x100 and thus
*((char*)0x100) implies consider by taking one byte(since 4 bytes loaded for int) a time so the data at 0x100 will be refered.
*( (char*)&a ) == 1 => (*0x100 ==1) that is 1==1 and so true,implying its little endian.
Related
I've been trying to understand how data is stored in C but I'm getting confused. I have this code:
int main(){
int a;
char *x;
x = (char *) &a;
x[0] = 0;
x[1] = 3;
printf("%d\n", a);
return 0;
}
I've been messing around with x[0] & x[1], trying to figure out how they work, but I just can't. For example x[1] = 3 outputs 768. Why?
I understand that there are 4 bytes (each holding 8 bits) in an int, and x[1] points to the 2nd byte. But I don't understand how making that second byte equal to 3, means a = 768.
I can visualise this in binary format:
byte 1: 00000000
byte 2: 00000011
byte 3: 00000000
byte 4: 00000000
But where does the 3 come into play? how does doing byte 2 = 3, make it 00000011 or 768.
Additional question: If I was asked to store 545 in memory. What would a[0] and a[1] = ?
I know the layout in binary is:
byte 1: 00100001
byte 2: 00000010
byte 3: 00000000
byte 4: 00000000
It is not specific to C, it is how your computer is storing the data.
There are two different methods called endianess.
Little-endian: the least significant byte is stored first.
Example: 0x11223344 will be stored as 0x44 0x33 0x22 0x11
Big-endian: the least significant byte is stored last.
Example: 0x11223344 will be stored as 0x11 0x22 0x33 0x44
Most modern computers use the little-endian system.
Additional question: If I was asked to store 545 in memory
545 in hex is 0x221 so the first byte will be 0x21 and the second one 0x02 as your computer is little-endian.
Why do I use hex numbers? Because every two digits represent exactly one byte in memory.
I've been messing around with x[0] & x[1], trying to figure out how
they work, but I just can't. For example x[1] = 3 outputs 768. Why?
768 in hex is 0x300. So the byte representation is 0x00 0x03 0x00 0x00
Warning: by casting the address of an int to a char *, the compiler is defenseless trying to maintain order. Casting is the programmer telling the compiler "I know what I am doing." Use it will care.
Another way to refer to the same region of memory in two different modes is to use a union. Here the compiler will allocate the space required that is addressable as either an int or an array of signed char.
This might be a simpler way to experiment with setting/clearing certain bits as you come to understand how the architecture of your computer stores multi-byte datatypes.
See other responses for hints about "endian-ness".
#include <stdio.h>
int main( void ) {
union {
int i;
char c[4];
} x;
x.i = 0;
x.c[1] = 3;
printf( "%02x %02x %02x %02x %08x %d\n", x.c[0], x.c[1], x.c[2], x.c[3], x.i, x.i );
x.i = 545;
printf( "%02x %02x %02x %02x %08x %d\n", x.c[0], x.c[1], x.c[2], x.c[3], x.i, x.i );
return 0;
}
00 03 00 00 00000300 768
21 02 00 00 00000221 545
I try to understand C programming memory Bytes order, but I'm confuse.
I try my app with some value on this site for my output verification : www.yolinux.com/TUTORIALS/Endian-Byte-Order.html
For the 64bits value I use in my C program:
volatile long long ll = (long long)1099511892096;
__mingw_printf("\tlong long, %u Bytes, %u bits,\t%lld to %lli, %lli, 0x%016llX\n", sizeof(long long), sizeof(long long)*8, LLONG_MIN, LLONG_MAX , ll, ll);
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i, j;
printf("\t");
for (i=size-1;i>=0;i--)
{
for (j=7;j>=0;j--)
{
byte = b[i] & (1<<j);
byte >>= j;
printf("%u", byte);
}
printf(" ");
}
puts("");
}
Out
long long, 8 Bytes, 64 bits, -9223372036854775808 to 9223372036854775807, 1099511892096, 0x0000010000040880
80 08 04 00 00 01 00 00 (Little-Endian)
10000000 00001000 00000100 00000000 00000000 00000001 00000000 00000000
00 00 01 00 00 04 08 80 (Big-Endian)
00000000 00000000 00000001 00000000 00000000 00000100 00001000 10000000
Tests
0x8008040000010000, 1000000000001000000001000000000000000000000000010000000000000000 // online website hex2bin conv.
1000000000001000000001000000000000000000000000010000000000000000 // my C app
0x8008040000010000, 1000010000001000000001000000000000000100000000010000000000000000 // yolinux.com
0x0000010000040880, 0000000000000000000000010000000000000000000001000000100010000000 //online website hex2bin conv., 1099511892096 ! OK
0000000000000000000000010000000000000000000001000000100010000000 // my C app, 1099511892096 ! OK
[Convert]::ToInt64("0000000000000000000000010000000000000000000001000000100010000000", 2) // using powershell for other verif., 1099511892096 ! OK
0x0000010000040880, 0000000000000000000000010000010000000000000001000000100010000100 // yolinux.com, 1116691761284 (from powershell bin conv.) ! BAD !
Problem
yolinux.com website announce 0x0000010000040880 for BIG ENDIAN ! But my computer use LITTLE ENDIAN I think (Intel proc.)
and I get same value 0x0000010000040880 from my C app and from another website hex2bin converter.
__mingw_printf(...0x%016llX...,...ll) also print 0x0000010000040880 as you can see.
Following yolinux website I have inverted my "(Little-Endian)" and "(Big-Endian)" labels in my output for the moment.
Also, the sign bit must be 0 for a positive number it's the case on my result but also yolinux result.(can not help me to be sure.)
If I correctly understand Endianness only Bytes are swapped not bits and my groups of bits seems to be correctly inverted.
It is simply an error on yolinux.com or is I missing a step about 64-bit numbers and C programming?
When you print some "multi-byte" integer using printf (and the correct format specifier) it doesn't matter whether the system is little or big endian. The result will be the same.
The difference between little and big endian is the order that multi-byte types are stored in memory. But once data is read from memory into the core processor, there is no difference.
This code shows how an integer (4 bytes) is placed in memory on my machine.
#include <stdio.h>
int main()
{
unsigned int u = 0x12345678;
printf("size of int is %zu\n", sizeof u);
printf("DEC: u=%u\n", u);
printf("HEX: u=0x%x\n", u);
printf("memory order:\n");
unsigned char * p = (unsigned char *)&u;
for(int i=0; i < sizeof u; ++i) printf("address %p holds %x\n", (void*)&p[i], p[i]);
return 0;
}
Output:
size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 78
address 0x7ffddf2c263d holds 56
address 0x7ffddf2c263e holds 34
address 0x7ffddf2c263f holds 12
So I can see that I'm on a little endian machine as the LSB (least significant byte, i.e. 78) is stored on the lowest address.
Executing the same program on a big endian machine would (assuming same address) show:
size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 12
address 0x7ffddf2c263d holds 34
address 0x7ffddf2c263e holds 56
address 0x7ffddf2c263f holds 78
Now it is the MSB (most significant byte, i.e. 12) that are stored on the lowest address.
The important thing to understand is that this only relates to "how multi-byte type are stored in memory". Once the integer is read from memory into a register inside the core, the register will hold the integer in the form 0x12345678 on both little and big endian machines.
There is only a single way to represent an integer in decimal, binary or hexadecimal format. For example, number 43981 is equal to 0xABCD when written as hexadecimal, or 0b1010101111001101 in binary. Any other value (0xCDAB, 0xDCBA or similar) represents a different number.
The way your compiler and cpu choose to store this value internally is irrelevant as far as C standard is concerned; the value could be stored as a 36-bit one's complement if you're particularly unlucky, as long as all operations mandated by the standard have equivalent effects.
You will rarely have to inspect your internal data representation when programming. Practically the only time when you care about endiannes is when working on a communication protocol, because then the binary format of the data must be precisely defined, but even then your code will not be different regardless of the architecture:
// input value is big endian, this is defined
// by the communication protocol
uint32_t parse_comm_value(const char * ptr)
{
// but bit shifts in C have the same
// meaning regardless of the endianness
// of your architecture
uint32_t result = 0;
result |= (*ptr++) << 24;
result |= (*ptr++) << 16;
result |= (*ptr++) << 8;
result |= (*ptr++);
return result;
}
Tl;dr calling a standard function like printf("0x%llx", number); always prints the correct value using the specified format. Inspecting the contents of memory by reading individual bytes gives you the representation of the data on your architecture.
#include <stdio.h>
void set_flag(int* flag_holder, int flag_position){
*flag_holder |= (1 << flag_position);
}
void set_flag(int* flag_holder, int flag_position);
int main(int argc, char* argv[])
{
int flag_holder = 0;
int i;
set_flag(&flag_holder, 3);
set_flag(&flag_holder, 16);
set_flag(&flag_holder, 31);
I am confused one what the following does? I think its calling for the pointer that is within void set_flag(), what I am not sure if it is then setting that value to 3,then 16 then 31?
set_flag(&flag_holder, 3);
set_flag(&flag_holder, 16);
set_flag(&flag_holder, 31);
Let's get rid of the bitwise stuff and just focus on the pointers.
void set_flag(int* flag_holder, int flag_position) {
*flag_holder = flag_position;
}
The purpose of this function is to change a caller's variable. You call it like so:
int *flag;
set_flag(&flag, 5); // flag is now 5
& makes a pointer, * turns a pointer back into what it's pointing at.
flag_holder is a pointer to an integer, it's the integer's location in memory, some 32 or 64 bit number. flag_position is a regular integer.
If set_flag tried flag_holder = flag_position that says say "please point flag_holder at memory location 5" and most likely the computer would say "no, you can't do that, that's not your memory" and crash the program.
Instead it has to say "change the number that you're pointing at to equal 5" which is *flag_holder = flag_position.
The caller is passing the address of an integer. The callee is dereferencing the address passed in to assign a new value to that integer. The reason the address is passed instead of the value directly is so the subroutine can modify the integer.
The new value happens to be based on the old value plus flipping individual bits on, depending on what is passed. But this is really a separate question from the subject line of your post.
The set_flag() function is using bitwise operators to manipulate the value of flag_holder at bit-level.
The bitwise OR | operator can be used to set an individual bit. A similar technique used to set flags is:
#define flag1 0x01
#define flag2 0x02
#define flag3 0x04
#define flag4 0x08
#define flag5 0x10
... you get the idea.
We can then use the OR operator to set an individual bit:
char flags = 0;
flags |= flag1
If you think of the values of the flags in terms of binary - imagine:
flag1 = 00000001
flag2 = 00000010
flag3 = 00000100
flag4 = 00001000
You get the idea! The OR operator will copy any bits that are set in the rvalue over to the lvalue, effectively setting a bit or flag.
So doing flags |= (flag1 | flag3) would result in our flags:
flags 00000101
We can use something like this to specify whether a particular option was specified.
Your example uses a different technique, it always applies the bit-shifted value of 1...think of this like in the example above where 0x01 == 00000001
If we shift 00000001 << 1 time we get 00000010.
Your set_flag() function is accepting a pointer to an int meaning it can change the value of flag_holder and the second parameter is specifying how many places to shift 1 to the left and isolate a particular bit.
you should read about bitwise operators they're bags of fun.
The & gets the address of the variable it is applied to, and * gets the value at the address provided by the pointer.
In the function declaration int * declares the argument to be a pointer, the address of an integer. Then within the function the value is obtained modified and written back to the given address.
When you pass the address of flag_holder in main, it is the value at this address that is modified inside set_flag(...).
The function set_flag takes 2 arguments:
int* - a pointer to an int
an integer value
Then it does some bitwise OR operations on the data.
So in your example you have flag_holder = 0 as the first value, and 3, 16 and 31 as second values. This results in (assuming 32bit int):
flag_holder = 00000000 00000000 00000000 00000000 = 0 in binary
(1 << 3) = 00000000 00000000 00000000 00001000 = 1 left shifted by 3
OR result = 00000000 00000000 00000000 00001000 = result of | binary operation
Another example if you would have another value for flag_holder:
flag_holder = 00000000 00010000 00001000 10000001
(1 << 3) = 00000000 00000000 00000000 00001000
OR result = 00000000 00010000 00001000 10001001
I have a char buffer like this
char *buff = "aaaa0006france";
I want to extract the bytes 4 to 7 and store it in an int.
int i;
memcpy(&i, buff+4, 4);
printf("%d ", i);
But it prints junk values.
What is wrong with this?
The string
0006
does not have the same binary representation as the integer 6. Instead, its bit representation is as four ASCII characters representing the glyph 0, the glyph 0, the glyph 0, then the glyph 6. This has hex representation
0x30303036
If you try blindly reinterpreting these bits as a number on a little-endian system, you get back 808,464,438. On a big-endian system, you'd get 909,127,728.
If you want to convert a substring of your string into a number, you will need to instead look for a function that converts a string of text into a number. You might want to try something like this:
char digits[5];
/* Copy over the digits in question. */
memcpy(digits, buff + 4, 4);
digits[4] = '\0'; /* Make sure it's null-terminated! */
/* Convert the string to a number. */
int i = strtol(digits + 4, NULL, 10);
This uses the strtol function, which converts a text string into a number, to explicitly convert the text to an integer.
Hope this helps!
Here you need to note down two things
How the characters are stored
Endianess of the system
Each characters (Alphabhets, numbers or special characters) are stored as 7 bit ASCII values. While doing memcpy of the string(array of characters) "0006" to a 4bytes int variable, we have to give address of string as source and address of int as destination like below.
char a[] = "0006";
int b = 0, c = 6;
memcpy(&b, a, 4);
Values of a and b are stored as below.
a 00110110 00110000 00110000 00110000
b 00000000 00000000 00000000 00000000
c 00000000 00000000 00000000 00000110
MSB LSB
Because ASCII value of 0 character is 48 and 6 character is 54. Now memcpy will try to copy whatever value present in the a to b. After memcpy value of b will be as below
a 00110110 00110000 00110000 00110000
b 00110110 00110000 00110000 00110000
c 00000000 00000000 00000000 00000110
MSB LSB
Next is endianess. Now consider we are keeping the value 0006 to the character buffer in some other way like a[0] = 0; a[1] = 0; a[2]=0; a[3] = 6; now if we do memcpy, we will the get the value as 100663296(0x6000000) not 6 if it is little endian machine. In big endian machine you will get the value as 6 only.
c 00000110 00000000 00000000 00000000
b 00000110 00000000 00000000 00000000
c 00000000 00000000 00000000 00000110
MSB LSB
So these two problems we need to consider while writing a function which converts number charters to integer value. Simple solution for these problem is to make use of existing system api atoi.
the below code might help you...
#include <stdio.h>
int main()
{
char *buff = "aaaa0006france";
char digits[5];
memcpy(digits, buff + 4, 4);
digits[4] = '\0';
int a = atoi(digits);
printf("int : %d", a);
return 0;
}
I could not understand how Union works..
#include <stdio.h>
#include <stdlib.h>
int main()
{
union {
int a:4;
char b[4];
}abc;
abc.a = 0xF;
printf(" %d, %d, %d, %d, %d, %d\n", sizeof(abc), abc.a, abc.b[0], abc.b[1], abc.b[2], abc.b[3]);
return 0;
}
In the above program.
I made int a : 4;
So, a should taking 4 bits.
now I am storing, a = 0xF; //i.e a= 1111(Binary form)
So when I am accessing b[0 0r 1 or 2 or 3] why the outputs are not coming like 1, 1, 1, 1
Your union's total size will be at least 4 * sizeof(char).
Assuming the compiler you are using handles this as defined behavior, consider the following:
abc is never fully initialized, so it contains a random assortment of zeros and ones. Big problem. So, do this first: memset(&abc, 0, sizeof(abc));
The union should be the size of its largest member, so you should now have 4 zeroed-out bytes: 00000000 00000000 00000000 00000000
You are only setting 4 bits high, so your union will become something like this:
00000000 00000000 00000000 00001111 or 11110000 00000000 00000000 00000000. I'm not sure how your compiler handles this type of alignment, so this is the best I can do.
You might also consider doing a char-to-bits conversion so you can manually inspect the value of each and every bit in binary format:
Access individual bits in a char c++
Best of luck!
0xF is -1 if you look at it as a 4-bit signed, so the output is normal. b is not even assigned fully, so it's value is undefined. It's a 4 byte entity but you only assign a 4-bit entity. So everything looks normal to me.
Because every char takes (on most platforms) 1 byte i.e. 8 bits, so all the 4 bits of a fall into a single element of b[].
And beside that, it is compiler-dependent how the bit fields are stored, so it is not defined, into which byte of b[] that maps...
0xF is -1 if you defined it to be a 4 bit signed number. Check two-complement binary representation to understand why.
And you didn't initialize b, so it could be holding any random value.