How is an integer stored in C program?

How is an integer stored in C program? - c

is the number 1 stored in memory as 00000001 00000000 00000000 00000000?
#include <stdio.h>
int main()
{
unsigned int a[3] = {1, 1, 0x7f7f0501};
int *p = a;
printf("%d %p\n", *p, p);
p = (long long)p + 1;
printf("%d %p\n", *p, p);
char *p3 = a;
int i;
for (i = 0; i < 12; i++, p3++)
{
printf("%x %p\n", *p3, p3);
}
return 0;
}
Why is 16777216 printed in the output:

An integer is stored in memory in different ways on different architectures. Most commons ways are called little-endian and big-endian byte ordering.
See Endianness
(long long)p+1
|
v
Your memory: [0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, ...]
You increment p not like pointer but as a long long number, so it does not point to next integer but the next byte. So you will get 0x00, 0x00, 0x00, 0x01 which translates to 0x1000000 (decimal 16777216) in a little-endian arch.

Something to play with (assuming int is 32 bits wide):
#include <stdio.h>
#include <stdbool.h>
typedef union byte_rec {
struct bit_rec {
bool b0 : 1;
bool b1 : 1;
bool b2 : 1;
bool b3 : 1;
bool b4 : 1;
bool b5 : 1;
bool b6 : 1;
bool b7 : 1;
} bits;
unsigned char value;
} byte_t;
typedef union int_rec {
struct bytes_rec {
byte_t b0;
byte_t b1;
byte_t b2;
byte_t b3;
} bytes;
int value;
} int_t;
void printByte(byte_t *b)
{
printf(
"%d %d %d %d %d %d %d %d ",
b->bits.b0,
b->bits.b1,
b->bits.b2,
b->bits.b3,
b->bits.b4,
b->bits.b5,
b->bits.b6,
b->bits.b7
);
}
void printInt(int_t *i)
{
printf("%p: ", i);
printByte(&i->bytes.b0);
printByte(&i->bytes.b1);
printByte(&i->bytes.b2);
printByte(&i->bytes.b3);
putchar('\n');
}
int main()
{
int_t i1, i2;
i1.value = 0x00000001;
i2.value = 0x80000000;
printInt(&i1);
printInt(&i2);
return 0;
}
Possible output:
0x7ffea0e30920: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0x7ffea0e30924: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Additional (based on the comment of #chqrlie):
I've previously used the unsigned char type, but the C Standard allows only 3 - and since C99 - 4 types. Additional implementation-defined types may be acceptable by the C Standard and it seems that gcc was ok with the unsigned char type for the bit field, but i've changed it nevertheless to the allowed type _Bool (since C99).
Noteworthy: The order of bit fields within an allocation unit (on some platforms, bit fields are packed left-to-right, on others right-to-left) are undefined (see Notes section in the reference).
Reference to bit fields: https://en.cppreference.com/w/c/language/bit_field

p = (long long)p + 1; is bad code (undefined behavior UB (e.g. bus fault and re-booted machine)) as it is not specified to work in C. The attempted assigned of the newly formed address is not certainly aligned to int * needs.
Don`t do that.
To look at the bytes of a[]
#include <stdio.h>
#include <stdlib.h>
void dump(size_t sz, const void *ptr) {
const unsigned char *byte_ptr = (const unsigned char *) ptr;
for (size_t i = 0; i < sz; i++) {
printf("%p %02X\n", (void*) byte_ptr, *byte_ptr);
byte_ptr++;
}
}
int main(void) {
unsigned int a[3] = {1, 1, 0x7f7f0501u};
dump(sizeof a, a);
}
As this is wiki, feel open to edit.

There are multiple instances of undefined behavior in your code:
in printf("%d %p\n", *p, p) you should cast p as (void *)p to ensure printf receives a void * as it expects. This is unlikely to pose a problem on most current targets but some ancien systems had different representations for int * and void *, such as early Cray systems.
in p = (long long)p + 1, you have implementation defined behavior converting a pointer to an integer and implicitly converting the integral result of the addition back to a pointer. More importantly, this may create a pointer with incorrect alignment for accessing int in memory, resulting in undefined behavior when you dereference p. This would cause a bus error on many systems, eg: most RISC architectures, but by chance not on intel processors. It would be safer to compute the pointer as p = (void *)((intptr_t)p + 1); or p = (void *)((char *)p + 1); albeit this would still have undefined behavior because of alignment issues.
is the number 1 stored in memory as 00000001 00000000 00000000 00000000?
Yes, your system seems to use little endian representation for int types. The least significant 8 bits are stored in the byte at the address of a, then the next least significant 8 bits, and so on. As can be seen in the output, 1 is stored as 01 00 00 00 and 0x7f7f0501 stored as 01 05 7f 7f.
Why is 16777216 printed in the output?
The second instance of printf("%d %p\n", *p, p) as undefined behavior. On your system, p points to the second byte of the array a and *p reads 4 bytes from this address, namely 00 00 00 01 (the last 3 bytes of 1 and the first byte of the next array element, also 1), which is the representation of the int value 16777216.
To dump the contents of the array as bytes, you should access it using a char * as you do in the last loop. Be aware that char may be signed on some systems, causing for example printf("%x\n", *p3); to output ffffff80 if p3 points to the byte with hex value 80. Using unsigned char * is recommended for consistent and portable behavior.

Related

Why is "0" "1" sometimes printed as a character and sometimes as ASCII 48/49?

I noticed this when I was writing code.
To xor the elements in the character array, why do some display 0/1 and some display ASCII? How do I get them all to behave like number 0 or 1?
In function XOR, I want to xor the elements in two arrays and store the result in another array.
In main, I do some experiments.
And by the way, besides printing the results, I want to do 0 1 binary operations. Such as encryption and decryption.
Here is a piece of C code.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int XOR(char *u, char *w, char *v)
{
for(int i = 0; i < 16; i++)
{
u[i] = w[i] ^ v[i];
}
return 0;
}
int PrintList(char *list, int n)
{
for(int i = 0; i < n; i++)
{
printf("%d", list[i]);
}
return 0;
}
int main()
{
char u[17] = "";
char w[17] = "0001001000110100";
char v[17] = "0100001100100001";
XOR(u, w, v);
PrintList(u, 16);
printf("\n");
char w2[17] = "1000110011101111";
XOR(u, w2, v);
PrintList(u, 16);
printf("\n");
char v2[17] = "1111111011001000";
XOR(u, w2, v2);
PrintList(u, 16);
printf("\n");
char x[17] = "0101101001011010";
XOR(u, x, u);
PrintList(u, 16);
printf("\n");
memcpy(w, u, 16);
XOR(u, w, v);
PrintList(u, 16);
printf("\n");
return 0;
}
The result
0101000100010101
1100111111001110
0111001000100111
48484948494848484849494949494849
0110101101011100
Process returned 0 (0x0) execution time : 0.152 s
Press any key to continue.
Well, change my declarations from char to unsigned char, maybe due to printf("%d", list[i]);The print results no changes. Change to printf("%c", list[i]);The print results:
0010100001111101
Process returned 0 (0x0) execution time : 0.041 s
Press any key to continue.

Character '0' is ‭00110000‬ in binary. '1' is 00110001.
'0' ^ '0' = 00000000 (0)
'0' ^ '1' = 00000001 (1)
'1' ^ '1' = 00000000 (0)
But then you reuse the u array.
'0' ^ 0 = 0011000 (48)
'0' ^ 1 = 0011001 (49)
'1' ^ 0 = 0011001 (49)
'1' ^ 1 = 0011000 (48)

These are strings so you initially have the ASCII codes 48 (0011 0000) and 49 (0011 0001). The ^ operator is bitwise XOR so the result of two operands with the values 48 and 49 can either be 0 or 1. When you print that result as integer, you get 0 or 1 as expected.
If you later use the result of that operation though, you no longer have an array of ASCII codes, but an array of integers with the value 0 or 1. If you XOR that one with an array that is still an ASCII code array, for example 0011 0000 ^ 0, you will get the result 0011 0000, not 0. And so printf gives you 48 etc.

C integer to binary conversion, splitting the result into two binary values [duplicate]

I know that to get the number of bytes used by a variable type, you use sizeof(int) for instance. How do you get the value of the individual bytes used when you store a number with that variable type? (i.e. int x = 125.)

You have to know the number of bits (often 8) in each "byte". Then you can extract each byte in turn by ANDing the int with the appropriate mask. Imagine that an int is 32 bits, then to get 4 bytes out of the_int:
int a = (the_int >> 24) & 0xff; // high-order (leftmost) byte: bits 24-31
int b = (the_int >> 16) & 0xff; // next byte, counting from left: bits 16-23
int c = (the_int >> 8) & 0xff; // next byte, bits 8-15
int d = the_int & 0xff; // low-order byte: bits 0-7
And there you have it: each byte is in the low-order 8 bits of a, b, c, and d.

You can get the bytes by using some pointer arithmetic:
int x = 12578329; // 0xBFEE19
for (size_t i = 0; i < sizeof(x); ++i) {
// Convert to unsigned char* because a char is 1 byte in size.
// That is guaranteed by the standard.
// Note that is it NOT required to be 8 bits in size.
unsigned char byte = *((unsigned char *)&x + i);
printf("Byte %d = %u\n", i, (unsigned)byte);
}
On my machine (Intel x86-64), the output is:
Byte 0 = 25 // 0x19
Byte 1 = 238 // 0xEE
Byte 2 = 191 // 0xBF
Byte 3 = 0 // 0x00

You could make use of a union but keep in mind that the byte ordering is processor dependent and is called Endianness http://en.wikipedia.org/wiki/Endianness
#include <stdio.h>
#include <stdint.h>
union my_int {
int val;
uint8_t bytes[sizeof(int)];
};
int main(int argc, char** argv) {
union my_int mi;
int idx;
mi.val = 128;
for (idx = 0; idx < sizeof(int); idx++)
printf("byte %d = %hhu\n", idx, mi.bytes[idx]);
return 0;
}

If you want to get that information, say for:
int value = -278;
(I selected that value because it isn't very interesting for 125 - the least significant byte is 125 and the other bytes are all 0!)
You first need a pointer to that value:
int* pointer = &value;
You can now typecast that to a 'char' pointer which is only one byte, and get the individual bytes by indexing.
for (int i = 0; i < sizeof(value); i++) {
char thisbyte = *( ((char*) pointer) + i );
// do whatever processing you want.
}
Note that the order of bytes for ints and other data types depends on your system - look up 'big-endian' vs 'little-endian'.

This should work:
int x = 125;
unsigned char *bytes = (unsigned char *) (&x);
unsigned char byte0 = bytes[0];
unsigned char byte1 = bytes[1];
...
unsigned char byteN = bytes[sizeof(int) - 1];
But be aware that the byte order of integers is platform dependent.

Array of HEX Values to Decimal

I'm reading in HEX values from a file into an array.
The part of the buffer I'm using is contains 4 bytes in Hex -> CE EE 00 00
unsigned int fileLocationOffset = 64;
unsigned char fileSize[4]; //This is actually in a struct.
//Putting here for purposes of this question
unsigned char buff[sizeOfRoot];
fseek(fp, startOfRoot, SEEK_SET); //Seek to point in file fp
fread(buff, 1, sizeOfRoot, fp); //Save contents to a buffer
//Read in 4 Bytes backwards to put as Big-Endian
for(int z = 31; z > 27; z--){
fileSize[31 - z] = buff[fileLocationOffset + z];
}
//TEST: Print Values at each element to see if correct
for(int z = 0; z < 4; z++){
printf("%X ", fileSize[z]);
}
// Output: 0 0 EE CE <- Correct
So, I know that my fileSize array contains the correct values, but now I need to convert 0x00EECE to a decimal.
Could somebody please advise how I should go about doing this?

Ok, so you have the bytes of an int value in big endian order and want to build the value. You could simply place the bytes into an uint32_t variable:
uint32_t bytes_to_int(unsigned char *bytes) {
uint32_t val = 0;
for(int i=0; i<4; i++) {
val <<= 8; //shift 8 positions to the right
val |= *bytes++;
}
return val;
}
With it, the following test program:
int main() {
unsigned char foo[] = { 0, 0, 0xee, 0xce };
unsigned val = bytes_to_int(foo);
printf("%d - 0x%x\n", val, val);
return 0;
}
output as expected:
61134 - 0xeece

What do you do with the data?
Concepts like hex and decimal only apply when you print the data.
What is the output of
printf("%d", *(int *)filesize);

Your previous comment suggests you are on a little endian machine.
The value in your file is also little endian.
Why do you change your endianess?
See following test program:
// big endian
unsigned char fileSize[4] = { 0x0, 0x0, 0xEE, 0xCE };
// little endian
unsigned char fileSize2[4] = { 0xCE, 0xEE, 0x00, 0x00 };
int main(void)
{
int i;
// Output: 0 0 EE CE <- Correct
for (i = 0; i < sizeof fileSize; i++)
printf("%02x ", fileSize[i]);
printf("\n");
for (i = 0; i < sizeof fileSize2; i++)
printf("%02x ", fileSize2[i]);
printf("\n");
printf("%u\n", *(int *)fileSize);
printf("%u\n", *(int *)fileSize2);
}
Output (on pc):
00 00 ee ce
ce ee 00 00
3471704064
61134

You could extract the data as a single unsigned 32-bit integer and use that almost directly (depending on endianness issues of course).
Perhaps something like this:
uint32_t fileSize;
memcpy(&fileSize, buff[fileLocationOffset + 28], 4);
Then for the endianness issue, if you're on Linux you could use be32toh (see e.g. this endian manual page) to convert big-endian to host encoding (and it does nothing if your host system is big-endian):
fileSize = be32toh(fileSize);
The closest function to this in the Windows API is ntohl, which could be used in a similar way.
It's not hard to implement absolute byte-swapping functions, or even macros, for this:
inline uint16_t byteswap16(uint16_t value)
{
return (value & 0xff) << 8 | (value >> 8);
}
inline uint32_t byteswap32(uint32_t value)
{
return byteswap16(value & 0xffff) << 16 | byteswap16(value >> 16);
}
...
fileSize = byteswap32(fileSize);
...

Answer given by Serge Ballesta is correct, however there is one other dirty but short trick to do this.
int main() {
char data[]={0xce,0xee,0,0};
int *ptr;
ptr=data;
printf("%d\n",*ptr);
//if you want to store this result
int x=*ptr;
printf("%d\n",x);
return 0;
}
This way you won't even have to reverse your bytes.
Output:
61134
61134
Here compiler itself takes care of the endianess

Are char bytes in unions reversed?

I have a union with an int and a char like so:
union {
int i;
char c[4];
} num;
When I set the int equal to 1, and print each char, I get this result:
1 0 0 0
...leading me to conclude that my machine is little endian.
However, when I bit-shift left by 24, I get the following:
0 0 0 1
Swapping the endianness through a custom function (by swapping the left-most byte with the right, and same for the middle two), I end up with:
0 0 0 1
Left shifting this by 24 results in:
0 0 0 0
This leads me to conclude that the char[4] in my union is represented from right to left, in which case the endianness is actually the reverse of what's represented. But from my understanding, char arrays are generally interpreted from left to right, regardless of platforms.
Are the char bytes in my union reversed?
Full code here:
#include <stdio.h>
#include <stdlib.h>
void endian_switch32(int *n)
{
int ret[4];
ret[0] = *n >> 24;
ret[1] = (*n >> 8) & (255 << 8);
ret[2] = (*n << 8) & (255 << 16);
ret[3] = *n << 24;
*n = ret[0] | ret[1] | ret[2] | ret[3];
}
int main (void) {
union {
int i;
char c[4];
} num;
num.i = 1;
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
num.i <<= 24;
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
num.i = 1;
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
endian_switch32(&num.i);
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
num.i <<= 24;
printf("%d %d %d %d\n", num.c[0], num.c[1], num.c[2], num.c[3]);
}
The result:
1 0 0 0
0 0 0 1
1 0 0 0
0 0 0 1
0 0 0 0

The point is, that you're printing the bytes in the reverse order, so you're going to print 0x01020304 as 4 3 2 1, which leads to your confusion. Endian does not affect how arrays are stored, i.e. no one "reverse store" an array.
When you shift 1 right by 24, you get zero. That's fine:
00000000 00000000 00000000 00000001
->
(00000000 00000000 00000000) 00000000 00000000 00000000 00000001
->
00000000 00000000 00000000 00000000
which is exactly zero.
When you shift 0x01000000 right by 24, you get 1. The conclusion (from output of printing of char[4]) is that your platform is little-endian.

Left and right shifts are based on the value of the int, not on its binary representation. No matter how the bytes are stored in memory, a 32-bit int with the value 1 is logically considered to be 0x00000001, or binary
00000000 00000000 00000000 00000001
Regardless of your endianness, the bit-shifting results work on this representation, so bit-shifting isn't a good way to detect endianness. Your machine is probably little-endian (both because of these results and from base rate, given that most computers are little-endian).

Bit Manipulation on char array in c

If I am given a char array of size 8, where I know the the first 3 bytes are the id, the next byte is the message, and the last 3 bytes are the values. How could I use bit manipulation in order to extract the message.
Example: a char array contains 9990111 (one integer per position), where 999 is the id, 0 is the message, and 111 is the value.
Any tips? Thanks!

Given:
the array contains {'9','9','9','0','1','1','1'}
Then you can convert with sscanf():
char buffer[8] = { '9', '9', '9', '0', '1', '1', '1', '\0' };
//char buffer[] = "9990111"; // More conventional but equivalent notation
int id;
int message;
int value;
if (sscanf(buffer, "%3d%1d%3d", &id, &message, &value) != 3)
…conversion failed…inexplicably in this context…
assert(id == 999);
assert(message == 0);
assert(value == 111);
But there's no bit manipulation needed there.

Well, if you want bit manipulation, no matter what, here it goes:
#include <stdio.h>
#include <arpa/inet.h>
int main(void) {
char arr[8] = "9997111";
int msg = 0;
msg = ((ntohl(*(uint32_t *) arr)) & 0xff) - 48;
printf("%d\n", msg);
return 0;
}
Output:
7
Just remember one thing... this does not comply with strict aliasing rules. But you can use some memcpy() stuff to solve it.
Edit #1 (parsing it all, granting compliance with strict aliasing rules, and making you see that this does not make any sense):
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <arpa/inet.h>
int main(void) {
char arr[8] = "9997111";
uint32_t a[2];
unsigned int id = 0, msg = 0, val = 0;
memcpy(a, arr, 4);
memcpy(&a[1], arr + 4, 4);
a[0] = ntohl(a[0]);
a[1] = ntohl(a[1]);
id = ((((a[0] & 0xff000000) >> 24) - 48) * 100) + ((((a[0] & 0xff0000) >> 16)- 48) * 10) + (((a[0] & 0xff00) >> 8)- 48);
msg = (a[0] & 0xff) - 48;
val = ((((a[1] & 0xff000000) >> 24) - 48) * 100) + ((((a[1] & 0xff0000) >> 16)- 48) * 10) + (((a[1] & 0xff00) >> 8)- 48);
printf("%d\n", id);
printf("%d\n", msg);
printf("%d\n", val);
return 0;
}
Output:
999
7
111

The usual way would be to define a structure with members which are bit fields and correspond to the segmented information in your array. (oh, re-reading your question: is the array filled with { '9', '9',...}?? Then you'd just sscanf the values with the proper offset into the array.

You can use Memory Copy to extract the values. Here is an example
char *info = malloc(sizeof(int)*3);
char *info2 = malloc(sizeof(int)*1);
char *info3 = malloc(sizeof(int)*3);
memcpy(info,msgTest, 3);
memcpy(info2,msgTest+3, 1);
memcpy(info3,msgTest+4, 3);
printf("%s\n", msgTest);
printf("ID is %s\n", info);
printf("Code is %s\n", info2);
printf("Val is %s\n", info3);
Lets say string msgTest = "0098457
The print statement willl goes as follows..
ID is 009
Code is 8
Val is 457
Hope this helps, Good luck!

here is an example in which i don't use malloc or memory copy for a good implementation on embedded devices, where the stack is limited. Note there is no need to use compact because it is only 1 byte. This is C11 implementation. If you have 4 Bytes for example to be analyzed, create another struct with 4 charbits, and copy the address to the new struct instead. This is coinstance with design patterns concept for embedded.
#include <stdio.h>
// start by creating a struct for the bits
typedef struct {
unsigned int bit0:1; //this is LSB
unsigned int bit1:1; //bit 1
unsigned int bit2:1;
unsigned int bit3:1;
unsigned int bit4:1;
unsigned int bit5:1;
unsigned int bit6:1;
unsigned int bit7:1;
unsigned int bit8:1;
}charbits;
int main()
{
// now assume we have a char to be converted into its bits
char a = 'a'; //asci of a is 97
charbits *x; //this is the character bits to be converted to
// first convert the char a to void pointer
void* p; //this is a void pointer
p=&a; // put the address of a into p
//now convert the void pointer to the struct pointer
x=(charbits *) p;
// now print the contents of the struct
printf("b0 %d b1 %d b2 %d b3 %d b4 %d b5 %d b6 %d b7 %d", x->bit0,x->bit1, x->bit2,x->bit3, x->bit4, x->bit5, x->bit6, x->bit7, x->bit8);
// 97 has bits like this 01100001
//b0 1 b1 0 b2 0 b3 0 b4 0 b5 1 b6 1 b7 0
// now we see that bit 0 is the LSB which is the first one in the struct
return 0;
}
// thank you and i hope this helps

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How is an integer stored in C program? - c

Related

Why is "0" "1" sometimes printed as a character and sometimes as ASCII 48/49?

C integer to binary conversion, splitting the result into two binary values [duplicate]

Array of HEX Values to Decimal

Are char bytes in unions reversed?

Bit Manipulation on char array in c

Categories

Resources