How the char array in union is working? - c

static int i = 2;
union U {
int a, b;
char c[3];
}u;
int main(){
u.b = 0x6;
for(;i; u.b++)
u.b = u.a << i--;
printf("%d %o %s", u.a, u.b, u.c);
return 0;
}
This code gives the output for the character array as 3. Now I know that this code poses several Undefined Behaviour specially when I am a storing into one variable and accessing of some other, but just for the purpose of experiment, can anybody explain to me why u.c has a value of 3.
Note: Internal memory structure would be better to understand this

After the for loop the union u contains the bits:
0x00000033
which split into chars are
0x33 0x00 0x00
so
c[0]=0x33
c[1]=0x00
c[2]=0x00
and 0x33 happens to be the ASCII code for the digit '3';

u.a, u.b and c's bytes all occupy the same memory. Since u.a. and u.b have the same type they are essentially the same variable. The loop
int i=2;
u.b = 6;
for(;i; u.b++)
u.b = u.a << i--;
can be written (just using u.b for clarity) as:
u.b = 6;
u.b = u.b << 2; // u.b is now 24 (one bit shift left is multiplying by 2)
u.b++; // u.b is now 25
u.b = u.b << 1; // u.b is now 50
u.b++; // u.b is now 51.
Now the memory layout of a 32 bit integer on a PC with low byte first is, byte wise, 51-00-00-00.
Interpreting these bytes as a string, as you told printf to do with the %s conversion, means that 51 is taken as an ascii value, denoting the letter 3. Fortunately the next byte is indeed 0, because the integer is small, so that the string is terminated. printf will print 3.

You can symply test it printing the hex code of a with:
printf("\n%X\n", u.a);
Output will be 0x33, that is ASCII 3
The for loop do
Start with b=0x06
Then left shift by 2 => b=0x18
Inc b => b= 0x19
Then left shift b by 1 => b=0x32
Inc b => b= 0x33
You define an union then a coincide with b.
First 3 byte of a and b are also accessible by c.
BTW printf output depends on endianess of data.
In your case, Little Endian, printf print ASCII 3, because of c is:
c[0]=0x33
c[1]=0x00
c[2]=0x00
In case of Big Endian printf print nothing, because of c is:
c[0]=0x00
c[1]=0x00
c[2]=0x00

Related

c and bit shifting in a char

I am new to C and having a hard time understanding why the code below prints out ffffffff when binary 1111111 should equal hex ff.
int i;
char num[8] = "11111111";
unsigned char result = 0;
for ( i = 0; i < 8; ++i )
result |= (num[i] == '1') << (7 - i);
}
printf("%X", bytedata);
You print bytedata which may be uninitialized.
Replace
printf("%X", bytedata);
with
printf("%X", result);
Your code then run's fine. code
Although it is legal in C, for good practice you should make
char num[8] = "11111111";
to
char num[9] = "11111111";
because in C the null character ('\0') always appended to the string literal. And also it would not compile as a C++ file with g++.
EDIT
To answer your question
If I use char the result is FFFFFFFF but if I use unsigned char the result is FF.
Answer:
Case 1:
In C size of char is 1byte(Most implementation). If it is unsigned we can
use 8bit and hold maximum 11111111 in binary and FF in hex(decimal 255). When you print it with printf("%X", result);, this value implicitly converted to unsigned int which becomes FF in hex.
Case 2: But when you use char(signed), then MSB bit use as sign bit, so you can use at most 7 bit for your number whose range -128 to 127 in decimal. When you assign it with FF(255 in decimal) then Integer Overflow occur which leads to Undefined behavior.

fetch 32bit instruction from binary file in C

I need to read 32bit instructions from a binary file.
so what i have right now is:
unsigned char buffer[4];
fread(buffer,sizeof(buffer),1,file);
which will put 4 bytes in an array
how should I approach that to connect those 4 bytes together in order to process 32bit instruction later?
Or should I even start in a different way and not use fread?
my weird method right now is to create an array of ints of size 32 and the fill it with bits from buffer array
The answer depends on how the 32-bit integer is stored in the binary file. (I'll assume that the integer is unsigned, because it really is an id, and use the type uint32_t from <stdint.h>.)
Native byte order The data was written out as integer on this machine. Just read the integer with fread:
uint32_t op;
fread(&op, sizeof(op), 1, file);
Rationale: fread read the raw representation of the integer into memory. The matching fwrite does the reverse: It writes the raw representation to thze file. If you don't need to exchange the file between platforms, this is a good method to store and read data.
Little-endian byte order The data is stored as four bytes, least significant byte first:
uint32_t op = 0u;
op |= getc(file); // 0x000000AA
op |= getc(file) << 8; // 0x0000BBaa
op |= getc(file) << 16; // 0x00CCbbaa
op |= getc(file) << 24; // 0xDDccbbaa
Rationale: getc reads a char and returns an integer between 0 and 255. (The case where the stream runs out and getc returns the negative value EOF is not considered here for brevity, viz laziness.) Build your integer by shifting each byte you read by multiples of 8 and or them with the existing value. The comments sketch how it works. The capital letters are being read, the lower-case letters were already there. Zeros have not yet been assigned.
Big-endian byte order The data is stored as four bytes, least significant byte last:
uint32_t op = 0u;
op |= getc(file) << 24; // 0xAA000000
op |= getc(file) << 16; // 0xaaBB0000
op |= getc(file) << 8; // 0xaabbCC00
op |= getc(file); // 0xaabbccDD
Rationale: Pretty much the same as above, only that you shift the bytes in another order.
You can imagine little-endian and big-endian as writing the number one hundred and twenty tree (CXXIII) as either 321 or 123. The bit-shifting is similar to shifting decimal digtis when dividing by or multiplying with powers of 10, only that you shift my 8 bits to multiply with 2^8 = 256 here.
Add
unsigned int instruction;
memcpy(&instruction,buffer,4);
to your code. This will copy the 4 bytes of buffer to a single 32-bit variable. Hence you will get connected 4 bytes :)
If you know that the int in the file is the same endian as the machine the program's running on, then you can read straight into the int. No need for a char buffer.
unsigned int instruction;
fread(&instruction,sizeof(instruction),1,file);
If you know the endianness of the int in the file, but not the machine the program's running on, then you'll need to add and shift the bytes together.
unsigned char buffer[4];
unsigned int instruction;
fread(buffer,sizeof(buffer),1,file);
//big-endian
instruction = (buffer[0]<<24) + (buffer[1]<<16) + (buffer[2]<<8) + buffer[3];
//little-endian
instruction = (buffer[3]<<24) + (buffer[2]<<16) + (buffer[1]<<8) + buffer[0];
Another way to think of this is that it's a positional number system in base-256. So just like you combine digits in a base-10.
257
= 2*100 + 5*10 + 7
= 2*10^2 + 5*10^1 + 7*10^0
So you can also combine them using Horner's rule.
//big-endian
instruction = ((((buffer[0]*256) + buffer[1]*256) + buffer[2]*256) + buffer[3]);
//little-endian
instruction = ((((buffer[3]*256) + buffer[2]*256) + buffer[1]*256) + buffer[0]);
#luser droog
There are two bugs in your code.
The size of the variable "instruction" must not be 4 bytes: for example, Turbo C assumes sizeof(int) to be 2. Obviously, your program fails in this case. But, what is much more important and not so obvious: your program will also fail in case sizeof(int) be more than 4 bytes! To understand this, consider the following example:
int main()
{ const unsigned char a[4] = {0x21,0x43,0x65,0x87};
const unsigned char* p = &a;
unsigned long x = (((((p[3] << 8) + p[2]) << 8) + p[1]) << 8) + p[0];
printf("%08lX\n", x);
return 0;
}
This program prints "FFFFFFFF87654321" under amd64, because an unsigned char variable becomes SIGNED INT when it is used! So, changing the type of the variable "instruction" from "int" to "long" does not solve the problem.
The only way is to write something like:
unsigned long instruction;
instruction = 0;
for (int i = 0, unsigned char* p = buffer + 3; i < 4; i++, p--) {
instruction <<= 8;
instruction += *p;
}

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

Assigning address of an int to a char pointer

In the following program, suppose a is stored at address 1000 and an int takes up 4bytes of storage. Now, c will point to the base address ie, 1000 and incrementing it by 3 will make it point to address 1003. Now, printing the character pointed to by c must give me the character corresponding to ascii 65. But it prints nothing!
#include<stdio.h>
#include<stdlib.h>
int main(){
int a = 65;
char *c = &a;
printf("%c\n", *(c+3));
}
What is wrong in my reasoning?
You didn't take endianness into account. On a little-endian system, the 'a' (or if the encoding isn't ASCII compatible, whatever 65 is) will be in the first byte, and the other bytes are 0. Passing a 0 byte to printf("%c\n",_); prints out nothing but the newline.
printing the character pointed to by c must give me the character corresponding to ascii 65
It should and if your machine is little-endian, it will, indeed. However, you're printing the 4th byte of your int using *(c + 3), that's still 0. Perhaps you meant *c instead?
The best would be not aliasing stuff through pointers, however. Integers have the nice property that they can be operated on by bitwise operators.
uint32_t i = 0x12345678;
uint8_t b0 = (i >> 0) & 0xff;
uint8_t b1 = (i >> 8) & 0xff;
uint8_t b2 = (i >> 16) & 0xff;
uint8_t b3 = (i >> 24) & 0xff;
This will give you access to the bytes correctly regardless to the endianness of sour architecture.

C unsigned int array and bit shifts

If i have an array of short unsigned ints.
Would shifting array[k+1] left by 8 bits, put 8 bits into the lower half of array[k+1]?
Or do they simply drop off as they have gone outside of the allocated space for the element?
They drop off. You can't affect the other bits this way. Try it:
#include <stdio.h>
void print_a (short * a)
{
int i;
for (i = 0; i < 3; i++)
printf ("%d:%X\n", i, a[i]);
}
int main ()
{
short a[3] = {1, -1, 3};
print_a (a);
a[1] <<= 8;
print_a (a);
return 0;
}
Output is
0:1
1:FFFFFFFF
2:3
0:1
1:FFFFFF00
2:3
They drop off the data type totally, not carrying over to the next array element.
If you want that sort of behavior, you have to code it yourself with something like (left shifting the entire array by four bits):
#include <stdio.h>
int main(void) {
int i;
unsigned short int a[4] = {0xdead,0x1234,0x5678,0xbeef};
// Output "before" variables.
for (i = 0; i < sizeof(a)/sizeof(*a); i++)
printf ("before %d: 0x%04x\n", i, a[i]);
printf ("\n");
// This left-shifts the array by left-shifting the current
// element and bringing in the top bit of the next element.
// It is in a loop for all but hte last element.
// Then it just left-shifts the last element (no more data
// to shift into that one).
for (i = 0; i < sizeof(a)/sizeof(*a)-1; i++)
a[i] = (a[i] << 8) | (a[i+1] >> 8);
a[i] = (a[i] << 8);
// Print the "after" variables.
for (i = 0; i < sizeof(a)/sizeof(*a); i++)
printf ("after %d: 0x%04x\n", i, a[i]);
return 0;
}
This outputs:
before 0: 0xdead
before 1: 0x1234
before 2: 0x5678
before 3: 0xbeef
after 0: 0xad12
after 1: 0x3456
after 2: 0x78be
after 3: 0xef00
The way to think about this is that in C (and for most programming languages) the implementation for array[k] << 8 involves loading array[k] into a register, shifting the register, and then storing the register back into array[k]. Thus array[k+1] will remain untouched.
As an example, foo.c:
unsigned short array[5];
void main() {
array[3] <<= 8;
}
Will generate the following instructions:
movzwl array+6(%rip), %eax
sall $8, %eax
movw %ax, array+6(%rip)
This loads array[3] into %eax, modifies it, and stores it back.
Shifting an unsigned int left by 8 bits will fill the lower 8 bits with zeros. The top 8 bits will be discarded, it doesn't matter that they are in an array.
Incidentally, whether 8 bits is half of an unsigned int depends on your system, but on 32-bit systems, 8 bits is typically a quarter of an unsigned int.
unsigned int x = 0x12345678;
// x is 0x12345678
x <<= 8;
// x is 0x34567800
Be aware that the C definition of the int data type does not specify how many bits it contains and is system dependent. An int was originally intended to be the "natural" word size of the processor, but this isn't always so and you could find int contains 16, 32, 64 or even some odd number like 24 bits.
The only thing you are guaranteed is an unsigned int can hold all the values between 0 and UINT_MAX inclusive, where UINT_MAX must be at least 65535 - so the int types must contain at least 16 bits to hold the required range of values.
So shifting an array of integer by 8 bits will change each int individually, but be aware that this shift will not necessarily be 'half of the array'

Resources