Unexpected output in the C code with union - c

I don't understand the output in the following C code:
#include <stdio.h>
int main()
{
union U
{
int i;
char s[3];
} u;
u.i=0x3132;
printf("%s", u.s);
return 0;
}
Initial memory is 32 bits and is the binary value of 0x3132 which is
0000 0000 0000 0000 0011 0001 0011 0010.
If the last three bytes of 0x3132 are the value of s (without leading zeroes), then s[0]=0011,s[1]=0001,s[2]=0011.
This gives the values of s=0011 0001 0011=787.
Question: Why the output is 21 and not 787?

The value 0x3132 is represented in memory as: 0x32 , 0x31 , 0x0 , 0x0, because the byte order is in little endian.
The printf call prints out the string represented by the member of the union s. The string is printed out byte by byte. First 0x32 and then 0x31 which are the ascii values for the characters: '2' and '1'. Then the printing stops as the third element is the null character: 0x0.
Note that the representation of int is implementation defined and may not consist of 4 bytes and may have padding. Thus the member of the union s may not represent a string, in which case calling printf with the %s specifier will cause undefined behavior.

first see this code sample:
#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
int main()
{
union{
int32_t i32;
uint32_t u32;
int16_t i16[2];
uint16_t u16[2];
int8_t i8[4];
uint8_t u8[4];
} u;
u.u8[3] = 52;
u.u8[2] = 51;
u.u8[1] = 50;
u.u8[0] = 49;
printf(" %d %d %d %d \n", u.u8[3], u.u8[2], u.u8[1], u.u8[0]); // 52 51 50 49
printf(" %x %x %x %x \n", u.u8[3], u.u8[2], u.u8[1], u.u8[0]); // 34 33 32 31
printf(" 0x%x \n", u.i32); // 0x34333231
return 0;
}
the union here is just to access the memory of u in 6 different ways.
you may use u.i32 to read or write as int32_t or
you may use u.u32 to read or write as uint32_t or
you may use u.i16[0] or u.i16[1] to read or write as int16_t or
you may use u.u16[0] or u.u16[1] to read or write as uint16_t or
or like this to write as uint8_t:
u.u8[3] = 52;
u.u8[2] = 51;
u.u8[1] = 50;
u.u8[0] = 49;
and read like this as int8_t:
printf(" %d %d %d %d \n", u.u8[3], u.u8[2], u.u8[1], u.u8[0]);
then output is:
52 51 50 49
and read as int32_t:
printf(" 0x%x \n", u.i32);
then output is:
0x34333231
so as you see in this sample code union shares one memory place with many names/types.
in your sample code u.i=0x3132; this writes 0x3132 inside u.i memory, and depending on endianness of you system which is little-endian here, then you asked printf("%s", u.s); from compiler, so u.s is array of type char meaning constant pointer to char type, so this printf("%s", u.s); will reads u.s[0] and prints that on the output stdout then reads u.s[1] and prints that on the output stdout and so on ..., until one of this u.s[i] is zero.
this is what your code doing, so if none of u.s[0], u.s[1], u.s[2], u.s[3] not zero, then memory outside of your union will be read until one zero found or system memory fault error happens.

It means that you machine is little-endian, so the bytes are stored in the opposite order, like this:
32 31 00 00
So: s[0] = 0x32, s[1] = 0x31, s[2] = 0x00.
Even if in theory printing an array of chars using "%s" is undefined behaviour, this works, it prints 0x32 (character '2'), 0x31 (character '1') and then it stops a 0x00.

if you write your code like this:
#include <stdio.h>
int main( void )
{
union U
{
int i;
char s[3];
} u;
u.i=0x3132;
printf("%s", u.s);
printf( "%8x\n", (unsigned)u.i);
}
Then you would see that the contents of u.i is 0x0000000000003132, which would actually be stored as: 0x3231000000000000 due to Endianness
and 0x00 is not a printable character, so the output from the second call to printf() is <blank><blank><blank><blank><blank><blank>3132 as you would expect
and the ascii char 1 is 0x31 and ascii char 2 is 0x32 and the first 0x00 stops the %s operations, so the first printf() outputs 21.

Related

What happen when we do AND operation of 4 byte number with 2 byte number

I am trying to write a piece of code to extract every single byte of data out of a 4 byte integer
after little bit of searching i actually found the code with which this can be achieved , but i am just curious to know why it behaves the way output is created , below is the code
#include <stdio.h>
int getByte(int x, int n);
void main()
{
int x = 0xAABBCCDD;
int n;
for (n=0; n<=3; n++) {
printf("byte %d of 0x%X is 0x%X\n",n,x,getByte(x,n));
}
}
// extract byte n from word x
// bytes numbered from 0 (LSByte) to 3 (MSByte)
int getByte(int x, int n)
{
return (x >> (n << 3)) & 0xFF;
}
Output:
byte 0 of 0xAABBCCDD is 0xDD
byte 1 of 0xAABBCCDD is 0xCC
byte 2 of 0xAABBCCDD is 0xBB
byte 3 of 0xAABBCCDD is 0xAA
the result is self explanatory but why the compiler didn't converted 0xFF into 0x000000FF , since it has to perform AND operation with a 4 byte number and instead generated the output consisting of only 1 byte of data and not 4 byte of data.
The compiler did convert 0xFF to 0x000000FF. Since the value is an integer constant, 0xFF gets converted internally to a 32-bit value (assuming that your platform has a 32-bit int).
Note that the values you get back, i.e. 0xDD, 0xCC, 0xBB, and 0xAA, are also ints, so they have leading zeros as well, so you actually get 0x000000DD, 0x000000CC, and so on. Leading zeros do not get printed automatically, though, so if you wish to see them, you'd need to change the format string to request leading zeros to be included:
for (n=0; n<=3; n++) {
printf("byte %d of 0x%08X is 0x%08X\n", n, x, getByte(x,n));
}
Demo.
The output is a 4-byte number, but printf("%X") doesn't print any leading zero digits.
If you do printf("foo 0x%X bar\n", 0x0000AA) you'll get foo 0xAA bar as output.
0xFF and 0x000000FF are both exactly the same thing, by default the formatting will drop the leading 0's if you want to print them in your output, you just need to specify it:
printf("byte %d of 0x%X is 0x08%X\n",n,x,getByte(x,n));
But since you are printing Bytes I'm not quite sure why you would expect the leading 0's.

How do memory gets shared in union

for following code
#include <stdio.h>
#include <string.h>
union share
{
int num;
char str[3];
}share1;
int main()
{
strcpy(share1.str,"ab");
printf("str is %s and num is %d", share1.str, share1.num );
return 0;
}
I get output as "str is ab and num is 25185".
str is printed as it is but how do I get 25185.
Unions overlap in memory. That means the 2 bytes representing your int share the same memory location as the first 2 chars (bytes) of your string. Changing the characters automatically changes the int, since by definition they're the SAME thing... just being treated differently because you can access those two bytes as a char OR an int.
a -> 0x61
b -> 0x62
25185 -> 0x6261
^--- a
^----- b
the actual hex coding for "ab" is: 0x616200
Of importance is that the instance of the union is in file global space
so the memory is 'pre-set' to all 0x00.
depending on the architecture, (little or big Endian) that value
"ab" will be read as either:
0x00006261 or as 0x61620000
given the small magnitude of the printed number, it is obvious that
the integer representation is 0x00006261. (little Endian)
0x00006261 (hex) is 25185 (decimal)

C Printing Hexadecimal

Okay so I am trying to print hexadecimal values of a struct. Now my print function does the following:
int len = sizeof(someStruct);
unsigned char *buffer = (unsigned char*)&someStruct;
int count;
for(count = 0; count < len; count++) {
fprintf(stderr, "%02x ", buffer[count]);
}
fprintf(stderr, "\n");
Here is the definition of the struct:
struct someStruct {
unsigned char a;
short myShort;
} __attribute__((packed)) someStruct;
The length of this struct printed out as expected is (output on console):
sizeof(someStruct): 3 bytes
Issue here is the following that I am encountering. There is a short which I set to a value.
someStruct.myShort = 0x08;
Now this short is 2 bytes long. When it is printed out into the console however, it does not show the most significant 0x00. Here is the output I get,
stderr: 00 08
I would like the following output however (3 bytes long),
stderr: 00 00 08
If I fill the short with a 0xFFFF, then I do get the 2 byte output, however, whenever there is leading 0x00, it does not output the leading 0x00 to console.
Any ideas on what I am doing wrong. Probably something small I would assume I am overlooking.
After you provided more info, your code is OK for me. It prints the output:
00 08 00
First 00 is from unsigned char a; and second bytes 08 00 are from short. They are switched because of platform dependent data storing in memory.
If you want switched bytes of the short you could just show a short:
fprintf(stderr, "%02x %02x", (someStruct.myShort >> 8) & 0xFF, someStruct.myShort & 0xFF)
I don't see a problem with your code. However, I get 08 00, which makes sense on my little-endian Intel machine.
The problem is in the format of the printf
%02x
%02x means that the result will be printed as hex value (x), with a minimum lenght of 2 (2) and filling the spaces with 0 (0)
Try with
fprintf(stderr, "%04x ", buffer[count]);
The width specifier in the format string (2 in your case) refers to the minimum number of characters in the text output, not the number of bytes to print. Try using "%04x " as your format string instead.
As for the digit grouping (00 08 as opposed to 0008): Plain old printf doesn't support that, but POSIX printf does. Info here: Digit grouping in C's printf
Need to take care not to shift in a signed bit should buffer be signed. Use "hh" to only print 1 byte worth of data. "hh" available with C99. See What is the purpose of the h and hh modifiers for printf?
fprintf(stderr, "%02hhx %02hhx", buffer[count] >> 8, buffer[count]);
[Edit OP's latest edit wants to see 3 bytes] This will print all field's contents. Each field is in the endian order of the machine.
size_t len = sizeof(someStruct);
const unsigned char *buffer = (unsigned char*)&someStruct;
size_t count;
for(count = 0; count < len; count++) {
fprintf(stderr, "%02x ", buffer[count]);
}
fprintf(stderr, "\n");

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

s(n)printf prints more characters than format specifier specifies

I am encountering a curious issue with sprintf on an embedded system (Libelium Waspmote, similar to Arduino) where sprintf is outputting more characters than given by the format specifier. In this particular instance I am using %02X to output the hexadecimal value of bytes in an array. However on some bytes, instead of writing 2 characters, 4 are being written, with FF being prefixed before the actual byte value. snprintf behaves similarly, except that it respects the buffer size specified and just prints the prefix.
For reference, here is the code snippet printing the array contents:
char *pduChars = (char *) malloc(17*sizeof(char));
pduData.toChar(pduChars);
for (int i = 0; i < 17; i++) {
char asciiCharsS[5];
char asciiCharsSN[3];
int printedS = sprintf(asciiCharsS, "%02X", pduChars[i]);
int printedSN = snprintf(asciiCharsSN, 3, "%02X", pduChars[i]);
USB.print(printedS);
USB.print(" ");
USB.print(printedSN);
USB.print(" ");
USB.print(asciiCharsS);
USB.print(" ");
USB.print(asciiCharsSN);
USB.println(" ");
}
And the output from that snippet (abridged to only the erroneous bytes):
The actual byte sequence should be 0x00 0xFC 0xFF 0xFF 0x48 0xA5 0x33 0x51
sprintf snprintf sprintf Buffer snprintf Buffer
…
2 2 00 00
4 4 FFFC FF
4 4 FFFF FF
4 4 FFFF FF
2 2 48 48
4 4 FFA5 FF
2 2 33 33
2 2 51 51
Am I overlooking something here or might this be a platform-specific issue relating to the implementation of s(n)printf?
I'm guessing your implementation is using signed chars. The format "%X" expects unsigned values. Cast or use unsigned char instead.
/* cast */
int printedS = sprintf(asciiCharsS, "%02X", (unsigned char)pduChars[i]);
int printedSN = snprintf(asciiCharsSN, 3, "%02X", (unsigned char)pduChars[i]);
or
/* use unsigned char */
unsigned char *pduChars = malloc(17); /* cast is, at best, redundant */
/* sizeof (char) is, by definition, 1 */
The format specifier modifiers you are using are only used for padding. In case the value's number of symbols exceeds the specified value, the whole string will be printed.
%02X is for padding... it will not omit...
so in case your value is greater than specified value then whole string will be printed

Resources