Why is the output of this c program like this? - c

The ouput of the following code is coming out to be 512 0 2 however it should have been 512 0 0. Can somebody please help !
#include<stdio.h>
int main()
{
union a
{
int i;
char ch[2];
};
union a z = { 512 };
printf("%d %d %d\n",z.i, z.ch[0], z.ch[1]);
return 0;
}

You have build a union of two bytes. Know you assign 512d (0x0200) to the union.
First Byte = 0x00
Second Byte = 0x02
The integer (int16_t) i and your array ch[2] use the same memory!

Let's assume for simplicity that int is 2 bytes.In this case the memory for your struct will be 2 bytes.Let's also assume that the union is located on address 0x0000.By the results you are getting i can tell you're using a little endian machine - address 0x0000 -> value 0x0000, address 0x0002 -> value 0x0002.
z.i prints 512 correctly.
z.ch[0] getting the value from address 0x0000 which is 0
z.ch[1] getting the value from address 0x0002 which is 2
Big Endian Byte Order: The most significant byte (the "big end") of
the data is placed at the byte with the lowest address. The rest of
the data is placed in order in the next three bytes in memory.
Little Endian Byte Order: The least significant byte (the "little
end") of the data is placed at the byte with the lowest address. The
rest of the data is placed in order in the next three bytes in memory.

I think you have some confusion with a struct and a union.
A union uses the same memory for all of its members and a struct has a separate memory for each member.
See the following extension of your code (at IDEone ):
#include<stdio.h>
int main()
{
union a
{
int i;
char ch[2];
};
union a aa = { 512 };
printf("%d %d %d\n",aa.i, aa.ch[0], aa.ch[1]);
struct b
{
int i;
char ch[2];
};
struct b bb = { 512 };
printf("%d %d %d\n",bb.i, bb.ch[0], bb.ch[1]);
return 0;
}
Output:
Union: 512 0 2
Struct: 512 0 0

Related

Union data type field behavior

I am having trouble figuring out how this piece of code works.
Mainly, I am confused by how does x.br get the value of 516 after x.str.a and x.str.b get their values of 4 and 2, respectively.
I am new to unions, so maybe there is something I am missing, but shouldn't there only be 1 active field in an union at any given time?
#include <stdio.h>
void f(short num, short* res) {
if (num) {
*res = *res * 10 + num % 10;
f(num / 10, res);
}
}
typedef union {
short br;
struct {
char a, b;
} str;
} un;
void main() {
short res = 0; un x;
x.str.a = 4;
x.str.b = 2;
f(x.br, &res);
x.br = res;
printf("%d %d %d\n", x.br, x.str.a, x.str.b);
}
I would be very thankful if somebody cleared this up for me, thank you!
To add to #Deepstop answer, and to correct an important point about your understanding -
shouldn't there only be 1 active field in an union at any given time?
There's no such a thing as active field in unions. All the different fields are referring to the same exact piece of memory (except miss-aligned data.
You can look at the different fields as different ways to interpret the same data, i.e. you can read your union as two fields or 8 bits or one field of 16 bits. But both will always "work" at the same time.
OK short is likely a 16 bit integer. Char a, b are each 8 bit chars.
So you are using the same 16 bit memory location for both.
0000 0010 0000 0100 is the 16 bit representation of 516
0000 0010 is the 8 bit representation of 2
0000 0100 is the 8 bit representation of 4
The CPU you are running this on is 'little-endian' so the low-order byte of a 16 bit integer comes first, which is the 2, and the high order byte, the 4, comes second.
So by writing 2 then 4 into consecutive bytes, and reading them back as a 16 bit integer, you get 516, which is 2 * 256 + 4. If you wrote 3 then 5, you'd get 3 * 256 + 5, which is 783.
The point is that union puts two data structures in exactly the same memory location.
how does x.br get the value of 516 after x.str.a and x.str.b get their
values of 4 and 2
Your union definition
typedef union {
short br;
struct {
char a, b;
} str;
} un;
Specifies that un.br share the same memory address as un.str. This is the whole point of a union. This means that when you modify the value of un.br that you are also modifying the values for un.str.a and un.str.b.
I am new to unions, so maybe there is something I am missing, but
shouldn't there only be 1 active field in an union at any given time?
Not sure what you mean by "only be 1 active field", but the members of a union are all mapped to the same memory address so any time that you write a value to a union member it writes that value to the same memory address as the other members. If you want the members to be mapped to different memory addresses so that when you write the value of a member it only modifies the value of that specific member, then you should use a struct and not a union.

Reading successive 3 bytes in a 32-bit word memory

I have a 32-bit word-addressable memory and my data section can start and end at any byte within the memory.
Lets suppose my data section starts at byte 2 (0 being Lowest byte) of word 0x3.
Then I have to read data from bytes 2 and 3 of word 0x3 and byte 0 of word 0x4. After this, I must read byte 1, 2 and 3 of word 0x4 and so on...And stop only when I all 3 bytes as zero OR my data section ends. The next section has extensible boundary and it can move into my data section so the ending word or byte is not fixed.
Do you have any suggestion on best possible algorithm to tackle this. I have come up with a way of creating two masks of total 24 bits which I move across words but that seems overkill and gives me large code. I'm trying to solve it in minimum possible C instructions. Looking forward to your advice.
from your statement it implies you can only read 32 bit words at a time, no 16 or 8 bit and that also implies you dont have to think/talk about alignment.
Just like the processors that do support byte addressable memory, you can implement it the same way if you have an address 0x1002 and you have some 24 bit item then
0x1002 = 0b0001000000000010 the lower two bits describe the byte in the word the upper bits the word number/address 0b00010000000000 0b10 so word address 0x400 starting with byte 2 (endianness is of course a factor, assuming little). you also know that 4 - 0b10 = 2 means there are two bytes left in this word and if you need a third you start at offset 0 in the next word.
so you could do something like this untested code:
unsigned int get24 ( unsigned int x )
{
unsigned int ra;
unsigned int ret;
unsigned int wa;
unsigned int shift;
unsigned int rb;
ret=0;
wa=x>>2;
shift=(x&3)<<3;
rb=load32(wa);
for(ra=0;ra<3;ra++)
{
ret<<=8;
ret|=(rb>>shift)&0xFF;
shift+=8;
if(shift>=32)
{
wa++;
rb=load32(wa);
shift=0;
}
}
}
you can take the byte based approach in another answer, but you have to make sure the compiler is aware of your word based memory limitations, it cant be allowed to do byte reads (well depends on the architecture), nor unaligned accesses (depends on the architecture).
You could try table based
//0x00000000 0x00FFFFFF
//0x00000000 0xFFFFFF00
//0x000000FF 0xFFFF0000
//0x0000FFFF 0xFF000000
unsigned int al[4]={0,0,24,16};
unsigned int ar[4]={0,0,8,8};
unsigned int bl[4]={8,0,16,24};
unsigned int br[4]={8,8,16,24};
unsigned int wa;
unsigned int off;
unsigned int ra;
unsigned int rb;
unsigned int ret;
wa=byte_address>>2;
off=byte_address&3;
rb=load32(wa);
ret=(rb<<bl[off])>>br[off];
//ret=(rb<<bl[off])>>(off<<3);
if(off&2)
{
ra=load32(wa+1);
//ret|=(ra<<al[off])>>ar[off];
ret|=(ra<<al[off])>>8;
}
or jump table based
wa=byte_address>>2;
rb=load32(wa);
//if(byte_address&0xC)
ra=load32(wa+1);
switch(byte_address&3)
{
case 0:
{
ret=(rb<<8)>>8;
break;
}
case 1:
{
ret=rb>>8;
break;
}
case 2:
{
ret=(rb<<16)>>16;
ret|=(ra<<24)>>8;
break;
}
case 3:
{
ret=(rb<<24)>>24;
ret|=(ra<<16)>>8;
break;
}
}
I don't really understand what you mean by next section, extensible boundary and it can move. Ignoring that the below code should work for reading the data:
int start_word = 3;
int start_byte = 2;
int end_word = 100;
uint8_t* p = data + start_word * 4 + start_byte;
uint8_t* end = data + end_word * 100;
while (p + 3 <= end) { // 3 more bytes must be available
uint8_t b0 = *p++;
uint8_t b1 = *p++;
uint8_t b2 = *p++;
if (b0 == 0 && b1 == 0 && b2 == 0)
break;
// do something with the 3 bytes
}
The idea is not to care too much about words and work byte-wise.

Memory layout of struct having bitfields

I have this C struct: (representing an IP datagram)
struct ip_dgram
{
unsigned int ver : 4;
unsigned int hlen : 4;
unsigned int stype : 8;
unsigned int tlen : 16;
unsigned int fid : 16;
unsigned int flags : 3;
unsigned int foff : 13;
unsigned int ttl : 8;
unsigned int pcol : 8;
unsigned int chksm : 16;
unsigned int src : 32;
unsigned int des : 32;
unsigned char opt[40];
};
I'm assigning values to it, and then printing its memory layout in 16-bit words like this:
//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
unsigned short int* ptr = (unsigned short int*)&dgram;
int i,j;
//print only 10 words
for(i=0 ; i<10 ; i++)
{
for(j=15 ; j>=0 ; j--)
{
if( (*ptr) & (1<<j) ) printf("1");
else printf("0");
if(j%8==0)printf(" ");
}
ptr++;
printf("\n");
}
}
int main()
{
struct ip_dgram dgram;
dgram.ver = 4;
dgram.hlen = 5;
dgram.stype = 0;
dgram.tlen = 28;
dgram.fid = 1;
dgram.flags = 0;
dgram.foff = 0;
dgram.ttl = 4;
dgram.pcol = 17;
dgram.chksm = 0;
dgram.src = (unsigned int)htonl(inet_addr("10.12.14.5"));
dgram.des = (unsigned int)htonl(inet_addr("12.6.7.9"));
print_dgram(dgram);
return 0;
}
I get this output:
00000000 01010100
00000000 00011100
00000000 00000001
00000000 00000000
00010001 00000100
00000000 00000000
00001110 00000101
00001010 00001100
00000111 00001001
00001100 00000110
But I expect this:
The output is partially correct; somewhere, the bytes and nibbles seem to be interchanged. Is there some endianness issue here? Are bit-fields not good for this purpose? I really don't know. Any help? Thanks in advance!
No, bitfields are not good for this purpose. The layout is compiler-dependant.
It's generally not a good idea to use bitfields for data where you want to control the resulting layout, unless you have (compiler-specific) means, such as #pragmas, to do so.
The best way is probably to implement this without bitfields, i.e. by doing the needed bitwise operations yourself. This is annoying, but way easier than somehow digging up a way to fix this. Also, it's platform-independent.
Define the header as just an array of 16-bit words, and then you can compute the checksum easily enough.
The C11 standard says:
An implementation may allocate any addressable storage unit large
enough to hold a bitfield. If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined.
I'm pretty sure this is undesirable, as it means there might be padding between your fields, and that you can't control the order of your fields. Not just that, but you're at the whim of the implementation in terms of network byte order. Additionally, imagine if an unsigned int is only 16 bits, and you're asking to fit a 32-bit bitfield into it:
The expression that specifies the width of a bit-field shall be an
integer constant expression with a nonnegative value that does not
exceed the width of an object of the type that would be specified were
the colon and expression omitted.
I suggest using an array of unsigned chars instead of a struct. This way you're guaranteed control over padding and network byte order. Start off with the size in bits that you want your structure to be, in total. I'll assume you're declaring this in a constant such as IP_PACKET_BITCOUNT: typedef unsigned char ip_packet[(IP_PACKET_BITCOUNT / CHAR_BIT) + (IP_PACKET_BITCOUNT % CHAR_BIT > 0)];
Write a function, void set_bits(ip_packet p, size_t bitfield_offset, size_t bitfield_width, unsigned char *value) { ... } which allows you to set the bits starting at p[bitfield_offset / CHAR_BIT] bit bitfield_offset % CHARBIT to the bits found in value, up to bitfield_width bits in length. This will be the most complicated part of your task.
Then you could define identifiers for VER_OFFSET 0 and VER_WIDTH 4, HLEN_OFFSET 4 and HLEN_WIDTH 4, etc to make modification of the array seem less painless.
Although question was asked long time back, there's no answer with explaination of your result. I'll answer it, hopefully it'll be useful to someone.
I'll illustrate the bug using first 16 bits of your data structure.
Please Note: This explaination is guarranteed to be true only with the set of your processor and compiler. If any of these changes, behaviour may change.
Fields:
unsigned int ver : 4;
unsigned int hlen : 4;
unsigned int stype : 8;
Assigned to:
dgram.ver = 4;
dgram.hlen = 5;
dgram.stype = 0;
Compiler starts assigning bit fields starting with offset 0. This means first byte of your data structure is stored in memory as:
Bit offset: 7 4 0
-------------
| 5 | 4 |
-------------
First 16 bits after assignment look like this:
Bit offset: 15 12 8 4 0
-------------------------
| 5 | 4 | 0 | 0 |
-------------------------
Memory Address: 100 101
You are using Unsigned 16 pointer to dereference memory address 100. As a result address 100 is treated as LSB of a 16 bit number. And 101 is treated as MSB of a 16 bit number.
If you print *ptr in hex you'll see this:
*ptr = 0x0054
Your loop is running on this 16 bit value and hence you get:
00000000 0101 0100
-------- ---- ----
0 5 4
Solution:
Change order of elements to
unsigned int hlen : 4;
unsigned int ver : 4;
unsigned int stype : 8;
And use unsigned char * pointer to traverse and print values.
It should work.
Please note, as others've said, this behavior is platform and compiler specific. If any of these changes, you need to verify that memory layout of your data structure is correct.
For Chinese users, I think you can refer blog for more details, really good.
In summary, due to endianness, there is byte order as well as bit order. Bit order is the order how each bit of one byte saved in memory. Bit order has same rule with byte order in sense of endianness issue.
For your picture, it's designed in network order which is big endian. So your struct defination is actually for big endian. Per your output, your PC is little endian, so you need change struct field orders when use.
The way to show each bits is incorrect since when get by char, the bit order has changed from machine order (little endian in your case) to normal order which we human use. You may change it as following per refered blog.
void
dump_native_bits_storage_layout(unsigned char *p, int bytes_num)
{
union flag_t {
unsigned char c;
struct base_flag_t {
unsigned int p7:1,
p6:1,
p5:1,
p4:1,
p3:1,
p2:1,
p1:1,
p0:1;
} base;
} f;
for (int i = 0; i < bytes_num; i++) {
f.c = *(p + i);
printf("%d%d%d%d %d%d%d%d ",
f.base.p7,
f.base.p6,
f.base.p5,
f.base.p4,
f.base.p3,
f.base.p2,
f.base.p1,
f.base.p0);
}
printf("\n");
}
//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
unsigned char* ptr = (unsigned short int*)&dgram;
int i,j;
//print only 10 words
for(i=0 ; i<10 ; i++)
{
dump_native_bits_storage_layout(ptr, 1);
/* for(j=7 ; j>=0 ; j--)
{
if( (*ptr) & (1<<j) ) printf("1");
else printf("0");
if(j%8==0)printf(" ");
}*/
ptr++;
//printf("\n");
}
}
#unwind
A typical use case of Bit Fields is interpreting/emulation of byte code or CPU instructions with given layout. "Don't use it, because you cannot control it" is the answer for children.
#Bruce
For Intel/GCC I see a packed LITTLE ENDIAN bit layout, i.e. in struct ip_dgram field ver is represented by bits 0..3, field hlen is represented by bits 4..7 ...
For correctness of operation it is required to verify the memory layout against your design at runtime.
struct ModelIndicator
{
int a:4;
int b:4;
int c:4;
};
union UModelIndicator
{
ModelIndicator i;
int v;
};
// test packed little endian
static bool verifyLayoutModel()
{
UModelIndicator um;
um.v = 0;
um.i.a = 2; // 0..3
um.i.b = 3; // 4..7
um.i.c = 9; // 8..11
return um.v = (9 << 8) + (3 << 4) + 2;
}
int main()
{
if (!verifyLayoutModel())
{
std::cerr << "Invalid memory layout" << std::endl;
return -1;
}
// ...
}
At the earliest, when above test fails, you need to consider compiler pragmas or adjust your structures accordingly, resp. verifyLayoutModel().
I agree with what unwind said. Bit fields are compiler dependent.
If you need the bits to be in a specific order, pack the data into a pointer to a character array. Increment the buffer the size of the element being packed. Pack the next element.
pack( char** buffer )
{
if ( buffer & *buffer )
{
//pack ver
//assign first 4 bits to 4.
*((UInt4*) *buffer ) = 4;
*buffer += sizeof(UInt4);
//assign next 4 bits to 5
*((UInt4*) *buffer ) = 5;
*buffer += sizeof(UInt4);
... continue packing
}
}
Compiler dependant or not, It depends whether you want to write a very fast program or if you want one that works with different compilers. To write for C a fast, compact application, use a stuct with bit fields/. If you want a slow general purpose program , long code it.

Casting int pointer to char pointer causes loss of data in C?

I have the following piece of code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 0;
printf("n = %d\n", n);
system("PAUSE");
return 0;
}
The output put of the program is n = 256.
I may understand why it is, but I am not really sure.
Can anyone give me a clear explanation, please?
Thanks a lot.
The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int:
0x04 0x01 0x00 0x00
By using a char pointer, you point to the first byte and change it to 0x00, which changes the int to 256 (= 256 * 1 + 0).
You're apparently working on a little-endian machine. What's happening is that you're starting with an int that takes up at least two bytes. The value 260 is 256+4. The 256 goes in the second byte, and the 4 in the first byte. When you write 0 to the first byte, you're left with only the 256 in the second byte.
In C a pointer references a block of bytes based on the type associated with the pointer. So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value
I understood what exactly happens by changing value:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 20;
printf("pp = %d\n", (int)*pp);
printf("n = %d\n", (int)n);
system("PAUSE");
return 0;
}
The output value are
20
and
276
So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number
Your problem is the assignment
*pp = 0;
You're dereferencing pp which points to n, and changing n.
However, pp is a char pointer so it doesn't change all of n
which is an int. This causes the binary complications in the other answers.
In terms of the C language, the description for what you are doing is modifying the representation of the int variable n. In C, all types have a "representation" as one or more bytes (unsigned char), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here.
As schnaader answered, on a little endian, twos complement implementation with 32-bit int, the representation of 260 is:
0x04 0x01 0x00 0x00
and overwriting the first byte with 0 yields:
0x00 0x01 0x00 0x00
which is the representation for 256 on such an implementation.
C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t, it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent).
Considering 32 bit systems,
256 will be represented in like this.
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000100(Byte-0)
Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. So, only the LSB gets changes to zero, not all the 4 bytes.
And it becomes
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000000(Byte-0)
Hence, the o/p is 256.

Storing an unsigned integer into a struct

I've written this piece of code where I've assigned an unsigned integer to two different structs. In fact they're the same but one of them has the __attribute__((packed)).
#include
#include
struct st1{
unsigned char opcode[3];
unsigned int target;
}__attribute__((packed));
struct st2{
unsigned char opcode[3];
unsigned int target;
};
void proc(void* addr) {
struct st1* varst1 = (struct st1*)addr;
struct st2* varst2 = (struct st2*)addr;
printf("opcode in varst1: %c,%c, %c\n",varst1->opcode[0],varst1->opcode[1],varst1->opcode[2]);
printf("opcode in varst2: %c,%c,%c\n",varst2->opcode[0],varst2->opcode[1],varst2->opcode[2]);
printf("target in varst1: %d\n",varst1->target);
printf("target in varst2: %d\n",varst2->target);
};
int main(int argc,char* argv[]) {
unsigned int* var;
var =(unsigned int*) malloc(sizeof(unsigned int));
*var = 0x11334433;
proc((void*)var);
return 0;
}
The output is:
opcode in varst1: 3,D,3
opcode in varst2: 3,D,3
target in varst1: 17
target in varst2: 0
Given that I'm storing this number
0x11334433 == 00010001001100110100010000110011
I'd like to know why that is the output I get.
This is to do with data alignment. Most compilers will align data on address boundaries that help with general performance. So, in the first case, the struct with the packed attribute, there is an extra byte between the char [3] and the int to align the int on a four byte boundary. In the packed version that padding byte is missing.
byte : 0 1 2 3 4 5 6 7
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
You allocate an unsigned int and pass that to the function:
byte : 0 1 2 3 4 5 6 7
alloc : |-----------int------------------| |---unallocated---|
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
If you're using a little endian system then the lowest eight bits (right most) are stored at byte 0 (0x33), byte 1 has 0x44, byte 2 has 0x33 and byte 4 has 0x11. In the st1 structure the int value is mapped to memory beyond the end of the allocated amount and the st2 version the lowest byte of the int is mapped to the byte 4, 0x11. So st1 produces 0 and st2 produces 0x11.
You are lucky that the unallocated memory is zero and that you have no memory range checking going on. Writing to the ints in st1 and st2 in this case could corrupt memory at worst, generate memory guard errors or do nothing. It is undefined and dependant on the runtime implementation of the memory manager.
In general, avoid void *.
Your bytes look like this:
00010001 00110011 01000100 00110011
Though obviously your endianness is wrong and in fact they're like this:
00110011 01000100 00110011 00010001
If your struct is packed then the first three bytes are associated with opcode, and the 4th is target - thats why the packed array has atarget of 17 - 0001001 in binary.
The unpacked array is padded with zeros, which is why target in varst2 is zero.
%c interprets the argument as the ascii code of a character and prints the character
3's ascii code is 0x33
D's ascii code is 0x44
17 is 0x11
an int is stored little endian or big endian depending on the processor architecture -- you can't depend on it going into your struct's fields in order.
The int target in the unpacked version is past the position of the int, so it stays 0.

Resources