Output Explanation of this program in C? - c

I have this program in C:
int main(int argc, char *argv[])
{
int i=300;
char *ptr = &i;
*++ptr=2;
printf("%d",i);
return 0;
}
The output is 556 on little endian.
I tried to understand the output. Here is my explanation.
Question is Will the answer remains the same in the big endian machine?
i = 300;
=> i = 100101100 //in binary in word format => B B Hb 0001 00101100 where B = Byte and Hb = Half Byte
(A)=> in memory (assuming it is Little endian))
0x12345678 - 1100 - 0010 ( Is this correct for little endian)
0x12345679 - 0001 - 0000
0x1234567a - 0000 - 0000
0x1234567b - 0000 - 0000
0x1234567c - Location of next intezer(location of ptr++ or ptr + 1 where ptr is an intezer pointer as ptr is of type int => on doing ++ptr it will increment by 4 byte(size of int))
when
(B)we do char *ptr = &i;
ptr will become of type char => on doing ++ptr it will increment by 1 byte(size of char)
so on doing ++ptr it will jump to location -> 0x12345679 (which has 0001 - 0000)
now we are doing
++ptr = 2
=> 0x12345679 will be overwritten by 2 => 0x12345679 will have 00*10** - 0000 instead of 000*1* - 0000
so the new memory content will look like this :
(C)
0x12345678 - 1100 - 0010
0x12345679 - 0010 - 0000
0x1234567a - 0000 - 0000
0x1234567b - 0000 - 0000
which is equivalent to => B B Hb 0010 00101100 where B = Byte and Hb = Half Byte
Is my reasoning correct?Is there any other short method for this?
Rgds,
Softy

In a little-endian 32-bit system, the int 300 (0x012c) is typically(*) stored as 4 sequential bytes, lowest first: 2C 01 00 00. When you increment the char pointer that was formerly the int pointer &i, you're pointing at the second byte of that sequence, and setting it to 2 makes the sequence 2C 02 00 00 -- which, when turned back into an int, is 0x22c or 556.
(As for your understanding of the bit sequence...it seems a bit off. Endianness affects byte order in memory, as the byte is the smallest addressable unit. The bits within the byte don't get reversed; the low-order byte will be 2C (00101100) whether the system is little-endian or big-endian. (Even if the system did reverse the bits of a byte, it'd reverse them again to present them to you as a number, so you wouldn't notice a difference.) The big difference is where that byte appears in the sequence. The only places where bit order matters, is in hardware and drivers and such where you can receive less than a byte at a time.)
In a big-endian system, the int is typically(*) represented by the byte sequence 00 00 01 2C (differing from the little-endian representation solely in the byte order -- highest byte comes first). You're still modifying the second byte of the sequence, though...making 00 02 01 2C, which as an int is 0x02012c or 131372.
(*) Lots of things come into play here, including two's complement (which almost all systems use these days...but C doesn't require it), the value of sizeof(int), alignment/padding, and whether the system is truly big- or little-endian or a half-assed implementation of it. This is a big part of why mucking around with the bytes of a bigger type so often leads to undefined or implementation-specific behavior.

This is implementation defined. The internal representation of an int is not known according to the standard, so what you're doing is not portable. See section 6.2.6.2 in the C standard.
However, as most implementations use two's complement representation of signed ints, the endianness will affect the result as described in cHaos answer.

This is your int:
int i = 300;
And this is what the memory contains at &i: 2c 01 00 00
With the next instruction you assign the address of i to ptr, and then you move to the next byte with ++ptr and change its value to 2:
char *ptr = &i;
*++ptr = 2;
So now the memory contains: 2c 02 00 00 (i.e. 556).
The difference is that in a big-endian system in the address of i you would have seen 00 00 01 2C, and after the change: 00 02 01 2C.
Even if the internal rappresentation of an int is implementation-defined:
For signed integer types, the bits of the object representation shall
be divided into three groups: value bits, padding bits, and the sign
bit. There need not be any padding bits; signed char shall not have
any padding bits. There shall be exactly one sign bit. Each bit that
is a value bit shall have the same value as the same bit in the object
representation of the corresponding unsigned type (if there are M
value bits in the signed type and N in the unsigned type, then M ≤ N).
If the sign bit is zero, it shall not affect the resulting value. If
the sign bit is one, the value shall be modified in one of the
following ways: — the corresponding value with sign bit 0 is negated
(sign and magnitude); — the sign bit has the value −(2M) (two’s
complement); — the sign bit has the value −(2M − 1) (ones’
complement). Which of these applies is implementation-defined, as
is whether the value with sign bit 1 and all value bits zero (for the
first two), or with sign bit and all value bits 1 (for ones’
complement), is a trap representation or a normal value. In the case
of sign and magnitude and ones’ complement, if this representation is
a normal value it is called a negative zero.

I like experiments and that's the reason for having the PowerPC G5.
stacktest.c:
int main(int argc, char *argv[])
{
int i=300;
char *ptr = &i;
*++ptr=2;
/* Added the Hex dump */
printf("%d or %x\n",i, i);
return 0;
}
Build command:
powerpc-apple-darwin9-gcc-4.2.1 -o stacktest stacktest.c
Output:
131372 or 2012c
Resume: the cHao's answer is complete and in case you're in doubt here is the experimental evidence.

Related

C, Little and Big Endian confusion

I try to understand C programming memory Bytes order, but I'm confuse.
I try my app with some value on this site for my output verification : www.yolinux.com/TUTORIALS/Endian-Byte-Order.html
For the 64bits value I use in my C program:
volatile long long ll = (long long)1099511892096;
__mingw_printf("\tlong long, %u Bytes, %u bits,\t%lld to %lli, %lli, 0x%016llX\n", sizeof(long long), sizeof(long long)*8, LLONG_MIN, LLONG_MAX , ll, ll);
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i, j;
printf("\t");
for (i=size-1;i>=0;i--)
{
for (j=7;j>=0;j--)
{
byte = b[i] & (1<<j);
byte >>= j;
printf("%u", byte);
}
printf(" ");
}
puts("");
}
Out
long long, 8 Bytes, 64 bits, -9223372036854775808 to 9223372036854775807, 1099511892096, 0x0000010000040880
80 08 04 00 00 01 00 00 (Little-Endian)
10000000 00001000 00000100 00000000 00000000 00000001 00000000 00000000
00 00 01 00 00 04 08 80 (Big-Endian)
00000000 00000000 00000001 00000000 00000000 00000100 00001000 10000000
Tests
0x8008040000010000, 1000000000001000000001000000000000000000000000010000000000000000 // online website hex2bin conv.
1000000000001000000001000000000000000000000000010000000000000000 // my C app
0x8008040000010000, 1000010000001000000001000000000000000100000000010000000000000000 // yolinux.com
0x0000010000040880, 0000000000000000000000010000000000000000000001000000100010000000 //online website hex2bin conv., 1099511892096 ! OK
0000000000000000000000010000000000000000000001000000100010000000 // my C app, 1099511892096 ! OK
[Convert]::ToInt64("0000000000000000000000010000000000000000000001000000100010000000", 2) // using powershell for other verif., 1099511892096 ! OK
0x0000010000040880, 0000000000000000000000010000010000000000000001000000100010000100 // yolinux.com, 1116691761284 (from powershell bin conv.) ! BAD !
Problem
yolinux.com website announce 0x0000010000040880 for BIG ENDIAN ! But my computer use LITTLE ENDIAN I think (Intel proc.)
and I get same value 0x0000010000040880 from my C app and from another website hex2bin converter.
__mingw_printf(...0x%016llX...,...ll) also print 0x0000010000040880 as you can see.
Following yolinux website I have inverted my "(Little-Endian)" and "(Big-Endian)" labels in my output for the moment.
Also, the sign bit must be 0 for a positive number it's the case on my result but also yolinux result.(can not help me to be sure.)
If I correctly understand Endianness only Bytes are swapped not bits and my groups of bits seems to be correctly inverted.
It is simply an error on yolinux.com or is I missing a step about 64-bit numbers and C programming?
When you print some "multi-byte" integer using printf (and the correct format specifier) it doesn't matter whether the system is little or big endian. The result will be the same.
The difference between little and big endian is the order that multi-byte types are stored in memory. But once data is read from memory into the core processor, there is no difference.
This code shows how an integer (4 bytes) is placed in memory on my machine.
#include <stdio.h>
int main()
{
unsigned int u = 0x12345678;
printf("size of int is %zu\n", sizeof u);
printf("DEC: u=%u\n", u);
printf("HEX: u=0x%x\n", u);
printf("memory order:\n");
unsigned char * p = (unsigned char *)&u;
for(int i=0; i < sizeof u; ++i) printf("address %p holds %x\n", (void*)&p[i], p[i]);
return 0;
}
Output:
size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 78
address 0x7ffddf2c263d holds 56
address 0x7ffddf2c263e holds 34
address 0x7ffddf2c263f holds 12
So I can see that I'm on a little endian machine as the LSB (least significant byte, i.e. 78) is stored on the lowest address.
Executing the same program on a big endian machine would (assuming same address) show:
size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 12
address 0x7ffddf2c263d holds 34
address 0x7ffddf2c263e holds 56
address 0x7ffddf2c263f holds 78
Now it is the MSB (most significant byte, i.e. 12) that are stored on the lowest address.
The important thing to understand is that this only relates to "how multi-byte type are stored in memory". Once the integer is read from memory into a register inside the core, the register will hold the integer in the form 0x12345678 on both little and big endian machines.
There is only a single way to represent an integer in decimal, binary or hexadecimal format. For example, number 43981 is equal to 0xABCD when written as hexadecimal, or 0b1010101111001101 in binary. Any other value (0xCDAB, 0xDCBA or similar) represents a different number.
The way your compiler and cpu choose to store this value internally is irrelevant as far as C standard is concerned; the value could be stored as a 36-bit one's complement if you're particularly unlucky, as long as all operations mandated by the standard have equivalent effects.
You will rarely have to inspect your internal data representation when programming. Practically the only time when you care about endiannes is when working on a communication protocol, because then the binary format of the data must be precisely defined, but even then your code will not be different regardless of the architecture:
// input value is big endian, this is defined
// by the communication protocol
uint32_t parse_comm_value(const char * ptr)
{
// but bit shifts in C have the same
// meaning regardless of the endianness
// of your architecture
uint32_t result = 0;
result |= (*ptr++) << 24;
result |= (*ptr++) << 16;
result |= (*ptr++) << 8;
result |= (*ptr++);
return result;
}
Tl;dr calling a standard function like printf("0x%llx", number); always prints the correct value using the specified format. Inspecting the contents of memory by reading individual bytes gives you the representation of the data on your architecture.

distinguishes between signed and unsigned in machine code

I was reading a text book saying:
It is important to note how machine code distinguishes between signed
and unsigned values. Unlike in C, it does not associate a data type
with each program value. Instead, it mostly uses the same
(assembly)instructions for the two cases, because many arithmetic
operations have the same bit-level behavior for unsigned and
two’s-complement arithmetic.
I don't understand what it means, could anyone provide me an example?
For example, this code:
int main() {
int i = -1;
if(i < 9)
i++;
unsigned u = -1; // Wraps around to UINT_MAX value
if(u < 9)
u++;
}
gives following output on x86 GCC:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], -1 ; i = -1
cmp DWORD PTR [rbp-4], 8 ; i comparison
jg .L2 ; i comparison
add DWORD PTR [rbp-4], 1 ; i addition
.L2:
mov DWORD PTR [rbp-8], -1 ; u = -1
cmp DWORD PTR [rbp-8], 8 ; u comparison
ja .L3 ; u comparison
add DWORD PTR [rbp-8], 1 ; u addition
.L3:
mov eax, 0
pop rbp
ret
Notice how it uses the same instructions on intialization (mov) and increment (add) for variables i and u. This is because the bit pattern changes identically for unsigned and 2's complement.
Comparison also uses the same instruction cmp, but jump decision has to be different, because values where the highest bit is set are different on the types: jg (jump if greater) on signed, and ja (jump if above) on unsigned.
What instructions are chosen, depends on the architecture and the compiler.
On Intel Processors (x86 family) and others that have FLAGS, you get bits in those FLAGS that tell you how the last operation worked. The name of the FLAGS vary a little between processors, but in general you have two important ones in regard to arithmetic: CF and OF.
CF is the Carry bit (often called C on other processors).
OF is the Overflow bit (often called V on other processors).
More or less, CF represents an unsigned overflow and OF represents a signed overflow. When the processors does the ADD operation, it has one extra bit, which is CF. So, if you add two 64 bit numbers, the result without wrapping may need 65 bits. That is the carry. The OF flag is set to the highest bit (so bit 63 in a 64 bit number), using 3 logical operations against that bit in the two sources and the destination.
There is an example of how CF works with 4 bit registers:
R1 = 1010
R2 = 1101
R3 = R1 + R2 = 1 0111
^
+---- carry (CF)
The extra 1 doesn't fit in R3 so it gets put in the CF bit instead. As a side note, the MIPS processor does not have any FLAGS. It's up to you to determine whether a carry is generated (which you can do using XOR and such on the two sources and the destination).
However, in C (and C++), there is no verification of overflow on your integer types (at least not by default.) So in other words, the CF and OF flags are ignored for all your operations except the four compare operators (<, <=, >, >=).
As shown in the example presented by #user694733, the difference is whether a jg or ja will be used. Each of the 16 jump instructions will test various flags to know whether to jump or not. That combination is really what makes the difference.
Another interesting aspect is the difference between ADC and ADD. In one case you add with the carry and the other you don't. It's probably not used as much now that we have 64 bit computers, but to add two 64 bit numbers with a 32 bit processor, it would add the lower 32 bits as unsigned 32 bit numbers and then add the upper 32 bit numbers (signed or unsigned as may be the case) plus the carry from the first operation.
Say you have two 64 bit numbers in 32 bit registers (ECX:EAX and EDX:EBX), you would add them like this:
ADD EAX, EBX
ADC ECX, EDX
Here the EDX and the carry are added to ECX if EAX + EBX had an unsigned overflow (carry--meaning that adding EAX and EBX properly should be represented by 33 bits now because the result doesn't fit 32 bits, the CF flag is that 33rd bit).
To be noted, the Intel processors have:
A Zero bit: ZF (whether the result is zero or not,)
CF is called "Borrow" when subtracting (for SBC, SBB,) and
Also the AF bit which is used for "decimal number operations" (which no one in their right mind uses.) That AF bit tells you that there is an overflow in the decimal operation. Something like that. I never used that one. I find their use too complicated/cumbersome. Also, the bit is sill there in amd64 but the instructions setting it were removed (see DAA for example).
The beauty of twos complement is that for addition (and as a result subtraction since that uses an adder, again part of the beauty of twos complement). That the add operation itself does not care about signed vs unsigned the same bit patterns added together produce the same result 0xFE + 0x01 = 0xFF, -2 + 1 = 1 also 126 + 1 = 127. same input bits same result pattern.
Twos complement helps for only a percentage. Not all. add/subtract but not necessarily multiply and divide. Bitwise of course bits is bits. But (right) shifts desire a difference, but does C deliver?
The comparisons are very sensitive. The equal and not equal, zero and not zero those are single flag tests and will work. But unsigned less than and signed less than are not the same set of flags that are used/tested. The less than and greater than with or without equal applied to them do not work the same way with unsigned vs signed. Likewise signed overflow and unsigned overflow (often just called the carry bit) are computed differently from each other. And some instruction sets the carry bit is inverted when the operand is a subtract, but not always, so for comparisons you need to know whether or not it is a borrow bit on subtract or always just the carry out unmodified.
Multiplication and likely division are "it depends". An N bit times N bit equals N bit result signed and unsigned both work, but N bit times N bit equals 2*Nbit (the only really useful hardware multiply) requires a signed and unsigned version to have the hardware/instruction do all the work, otherwise you have to break the operands up into parts if you don't have both flavors. A simple paper and pencil grade school will show why, leave that to the reader to figure out.
You don't need us at all you can easily provide your own example and see from the compiler output when there is a difference and when there isn't.
int32_t fun0 ( int32_t a, int32_t b ) { return a+b; }
int32_t fun1 ( int32_t a, int32_t b ) { return a*b; }
int32_t fun2 ( int32_t a, int32_t b ) { return a^b; }
uint32_t fun3 ( uint32_t a, uint32_t b ) { return a+b; }
uint32_t fun4 ( uint32_t a, uint32_t b ) { return a*b; }
uint32_t fun5 ( uint32_t a, uint32_t b ) { return a^b; }
uint32_t fun6 ( uint64_t a, uint64_t b ) { return a+b; }
uint32_t fun7 ( uint64_t a, uint64_t b ) { return a*b; }
uint32_t fun8 ( uint64_t a, uint64_t b ) { return a^b; }
uint64_t fun9 ( uint64_t a, uint64_t b ) { return a*b; }
int64_t fun10 ( int64_t a, int64_t b ) { return a*b; }
uint64_t fun11 ( uint32_t a, uint32_t b ) { return a*b; }
int64_t fun12 ( int32_t a, int32_t b ) { return a*b; }
int32_t comp0 ( int32_t a, int32_t b ) { return a<b; }
uint32_t comp1 ( uint32_t a, uint32_t b ) { return a<b; }
plus other operators and combinations.
EDIT
Okay the real answer...rather than making you do the work.
I want to add -2 and +1
11111110
+ 00000001
============
finish it
00000000
11111110
+ 00000001
============
11111111
-2 + 1 = -1
What about 127 + 1
00000000
11111110
+ 00000001
============
11111111
hmmm...same bits in same bits out, but how I interpret those bits as a programmer varies widely.
You can try as many legal values as you want (ones that don't overflow the result) and you will see that the addition result does not know nor care about signed vs unsigned operants. Part of the beauty of twos complement.
Subtraction is just addition in logic, some may have learned "invert and add one" want to know what the bit pattern 11111111 is you invert 00000000 and add 1 00000001 so 11111111 is -1. But how does addition really work with two operands as shown above you really need a three bit adder three bits in and two bits out the result and carry out, so there is a carry in, two operand bits a result and carry out. What if we go back to grade school as well...
-32 - 3 = (-32) + (-3) apply the invert and add one to the -3 and we get (-32) + (~3) + 1
1
11100000
+ 11111100
==============
and thats how a computer does that math, inverts the carry in and the second operand. SOME invert the carry out because a 1 on carry out when the adder is used as a subtractor means no borrow, but a 0 means a borrow happened. so some instruction sets will invert the carry out some will not. this is hugely important for this topic.
Likewise the carry out bit is computed based on the addition of the msbits of the operands and the carry in to that position, it is the carry out of that addition.
abcxxxxxx
dxxxxxxx
+ exxxxxxx
============
f
a the carry out is the carry out when adding bits b+d+e. This is also known as the unsigned overflow flag when this is an addition operation and the operands are considered to be unsigned values. But the signed overflow flag is determined by whether b and a are equal or not equal.
In what situations does this happen.
bde af
000 00
001 01
010 01
011 10 <--
100 01 <--
101 10
110 10
111 11
so you can read that is carry in is not equal to carry out for the msbit there is a signed overflow. At the same time you can say if the msbit of the operands are equal and the msbit of the result is not equal to those operand bits then signed overflow is true. If you generate a table of signed numbers and their results and which overflow this will start to be clear, you don't have to do 8 bit by 8 bit 256 * 256 combinations, take 3 or 4 bit numbers synthesize your own addition routines that or 3 or 4 bits and that smaller number of combinations will be enough.
So while addition and subtraction themselves as far as the result bits go do not know signed from unsigned the flags if you have a processor that uses them the C or carry flag the V or overflow flag have a signed based use case. The carry flag itself can have of two definitions when produced by a subtract depending on the instruction set and since comparisons are generally done with a subtraction that carry definition matters to how the flags are then used.
Greater than or less than while using a subtract to determine how they are used and the result itself is not affected by signedness how the flags are interpreted very much are.
Take some four bit positive numbers.
1101 - 1100 (13 - 12)
1100 - 1100 (12 - 12)
1011 - 1100 (11 - 12)
11111
1101
+ 0011
=======
0001
carry out 1, zero flag 0, v = 0, n = 0
11111
1100
+ 0011
========
0000
carry out 1, zero flag 1, v = 0, n = 0
00111
1011
+ 0011
========
1111
carry out 0, zero flag 0, v = 0, n = 1
(n is the msbit of the result, the sign bit 1 means signed negative number, zero means signed positive number)
cz
10 greater than but not equal
11 equal
00 less than but not equal
same bit patterns
1101 - 1100 (-3 - -4)
1100 - 1100 (-4 - -4)
1011 - 1100 (-5 - -4)
cz
10 greater than but not equal
11 equal
00 less than but not equal
so far nothing changed.
but if I examine all the combinations
#include <stdio.h>
int main ( void )
{
unsigned int ra;
unsigned int rb;
unsigned int rc;
unsigned int rx;
unsigned int v;
unsigned int n;
int sa,sb;
for(ra=0;ra<0x10;ra++)
for(rb=0;rb<0x10;rb++)
{
for(rx=8;rx;rx>>=1) if(rx&ra) printf("1"); else printf("0");
printf(" - ");
for(rx=8;rx;rx>>=1) if(rx&rb) printf("1"); else printf("0");
rc=ra-rb;
printf(" = ");
for(rx=8;rx;rx>>=1) if(rx&rb) printf("1"); else printf("0");
printf(" c=%u",(rc>>4)&1);
printf(" n=%u",(rc>>3)&1);
n=(rc>>3)&1;
if((rc&0xF)==0) printf(" z=1"); else printf(" z=0");
v=0;
if((ra&8)==(rb&8))
{
if((ra&8)==(rc&8)) v=1;
}
printf(" v=%u",v);
printf(" (%2u - %2u)",ra,rb);
sa=ra;
if(sa&8) sa|=0xFFFFFFF0;
sb=rb;
if(sb&8) sb|=0xFFFFFFF0;
printf(" (%+2d - %+2d)",sa,sb);
if(rc&0x10) printf(" C ");
if(n==v) printf(" NV ");
printf("\n");
}
}
you can find fragments within the output that show the problem.
0000 - 0110 = 0110 c=1 n=1 z=0 v=0 ( 0 - 6) (+0 - +6) C
0000 - 0111 = 0111 c=1 n=1 z=0 v=0 ( 0 - 7) (+0 - +7) C
0000 - 1000 = 1000 c=1 n=1 z=0 v=0 ( 0 - 8) (+0 - -8) C
0000 - 1001 = 1001 c=1 n=0 z=0 v=0 ( 0 - 9) (+0 - -7) C NV
0000 - 1010 = 1010 c=1 n=0 z=0 v=0 ( 0 - 10) (+0 - -6) C NV
0000 - 1011 = 1011 c=1 n=0 z=0 v=0 ( 0 - 11) (+0 - -5) C NV
For unsigned 0 is less than 6,7,8,9... so the carry out is set so that means greater than. But the same bit patterns signed 0 is less than 6 and 7 but greater than -8 -7 -6 ...
What is not obvious necessarily until you stare at it a lot or just cheat and look at ARMs documentation for signed if N == V it is a signed greater than or equal. for N != V it is a signed less than. don't need to examine the carry out. particularly the signed bit pattern problems 0000 and 1000 don't work with the carry like other bit patterns.
Hmm, I wrote this all up in other questions before. Anyway, multiply both does and doesn't care about unsigned and signed.
Using your calculator 0xF * 0xF = 0xE1. The biggest 4 bit number times the biggest 4 bit number gives an 8 bit number, we need twice as many bits to cover all the bit patterns.
1111
* 1111
=================
1111
1111
1111
+ 1111
=================
11100001
so we see the addition that results is at least 2n-1 bits, if you end up with a carry off that last bit then you end up with 2n bits.
but, what is -1 * -1? its equal to 1 right? what are we missing?
unsigned has implied zeros
00001111
* 1111
=================
00001111
00001111
00001111
+00001111
=================
00011100001
but signed the sign is extended
11111111
* 1111
=================
11111111
11111111
11111111
+11111111
=================
00000000001
so sign matters with multiply?
0xC * 0x3 = 0xF4 or 0x24.
#include <stdio.h>
int main ( void )
{
unsigned int ra;
unsigned int rb;
unsigned int rc;
unsigned int rx;
int sa;
int sb;
int sc;
for(ra=0;ra<0x10;ra++)
for(rb=0;rb<0x10;rb++)
{
sa=ra;
if(ra&8) sa|=0xFFFFFFF0;
sb=rb;
if(rb&8) sb|=0xFFFFFFF0;
rc=ra*rb;
sc=sa*sb;
if((rc&0xF)!=(sc&0xF))
{
for(rx=8;rx;rx>>1) if(rx&ra) printf("1"); else printf("0");
printf(" ");
for(rx=8;rx;rx>>1) if(rx&rb) printf("1"); else printf("0");
printf("\n");
}
}
}
and there is no output. as expected. the bits abcd * 1111
abcd
1111
===============
aaaaabcd
aaaaabcd
aaaaabcd
aaaaabcd
================
four bits in on each operand if I only care about the lower four bits out
abcd
1111
===============
abcd
bcd
cd
d
================
how the operand sign extends does not matter as far as the result is concerned
Now knowing that a significant portion of the possible combinations of n bit times n bit equals n bit overflow it doesnt help you much to do such a thing in any code you want to be useful.
int a,b,c;
c = a * b;
not very useful except for smaller numbers.
But the reality is as far as multiply if the result is the same size as the operands then signed vs unsigned does not matter, if the result is the proper twice the size of the operands then you need a separate signed multiply instruction/operation and an unsigned. You can certainly cascade/synthesize the nn=2n with an nn=n instruction as you will see in some instruction sets.
bitwise operands, xor, or, and, these are bitwise they dont/cant care about sign.
shift left start with abcd shift one bcd0, shift two cd00 and so on. not very interesting. Shift right though desires to have separate arithmetic and logical shift right where arithmetic the msbit is duplicated as the shift in bit, and logical a zero shifts in arithmetic abcd aabc aaab aaaa, logical abcd 0abc 00ab 000a 0000
But we dont have two kinds of shift right in C. But when doing addition and subtraction directly, bits is bits, the beauty of twos complement. When doing a comparison which is a subtract then the flags used are different for signed vs unsigned for a number of the comparisons, get the older ARM architectural reference manual, I think they call it the armv5 one, even though it goes back to the armv4 and up to the armv6.
There is a section called "The condition field" and a table, this very nicely shows at least for the ARM flags the flag combinations for both unsigned this and that, signed this and that and the ones that dont care about signedness (equal, not equal, etc) wont say anything.
Understand/remember that some instruction sets not only invert the carry in bit and second operand on a subtract but will also invert the carry out bit. so if a carry bit is used on something signed then it is inverted. the stuff I did above where I tried to use the term carry out instead of carry flag, the carry flag would be inverted for some other instruction sets and the unsigned greater than and less than table flips over.
Division is not as easy to show, you have to do long division, etc. I will leave that one to the reader.
Not all documentation is as good as the table I am referring to in ARMs docs. Other processor documentation may or may not make the unsigned vs signed, they might just say jump if greater than and you may have to experimentally figure out what that means. Now that you now all of this you may have already figured out you dont for example need a branch if unsigned or equal. That just means branch if not less than so you can
cmp r0,r1
or
cmp r1,r0
and just use branch if carry to cover the unsigned less than, unsigned less than or equal, unsigned greater than, unsigned greater than or equal cases. Although you might upset some programmers doing that because you were trying to save some bits in the instruction.
Saying ALL of that, the processor never distinguishes signed from unsigned. These are concepts that only mean something to the programmer, processors are very stupid. Bits is bits, the processor doesnt know if these bits are an address, if they are a variable if they are a character in a string, a floating point number (being implemented with a soft float library in fixed point), these interpretations are only meaningful to the programmer not the processor. The processor does not "distinguish between unsigned and signed in machine code", the programmer has to properly place bits that are meaningful to the programmer and then select the right instructions and sequences of instructions to perform the task the programmer wants performed. Some 32 bit number in a register is only an address when those bits are used to address something with a load or store, once that one clock cycle where they are sampled to be delivered to an address bus they are an address, before and after that they are just bits. When you increment that pointer in your program they are not an address they are just bits you are adding some other bits to. You can certainly build a MIPS like instruction set with no flags, and only N bit to N bit multiplies, only have a jump if two registers are equal or not equal instruction no other greater than or less than type instructions and still be able to make useful programs just like instruction sets that go overboard with those things unsigned this flag and signed that flag, unsigned this instruction and signed that instruction.
A not so popular but sometimes talked about in school, maybe there was a real instruction set or many that did this is a non-twos complement solution and that pretty much means sign and magnitude a sign bit and an unsigned value so +3 is 0011 and -3 is 1011 for a four bit register that burns one bit for sign when doing signed math. You then as with twos complement have to sit down with pencil and paper and work through the math operations, grade school style, then implement those in logic. Does this result in a separate unsigned and signed add? twos complement 4 bit registers we can do 0-15 and -8 to +7 for sign magnitude we can declare unsigned is 0 - 15 but signed is -7 to +7. An exercise for the reader, the question/quote had to do with twos complement.
Check out Two's Complement and its arithmetic operations, it is signed numbers in binary.
Two's complement is the most common method of representing signed
integers on computers. In this scheme, if the binary number 010(2)
encodes the signed integer 2(10), then its two's complement, 110(2),
encodes the inverse: -2(10). In other words, to reverse the sign of any
integer in this scheme, you can take the two's complement of its
binary representation.
That way it is possible to have arithmetic operations between positive and negative binary values.
Two's Complement Python code snippet:
def twos_complement(input_value, num_bits):
'''Calculates a two's complement integer from the given input value's bits'''
mask = 2**(num_bits - 1)
return -(input_value & mask) + (input_value & ~mask)

Initialization of a union in C

I came across this objective question on the C programming language. The output for the following code is supposed to be 0 2, but I don't understand why.
Please explain the initialization process. Here's the code:
#include <stdio.h>
int main()
{
union a
{
int x;
char y[2];
};
union a z = {512};
printf("\n%d %d", z.y[0], z.y[1]);
return 0;
}
I am going to assume that you use a little endian system where sizeof int is 4 bytes (32 bits) and sizeof a char is 1 byte (8 bits), and one in which integers are represented in two's complement form. A union only has the size of its largest member, and all the members point to this exact piece of memory.
Now, you are writing to this memory an integer value of 512.
512 in binary is 1000000000.
or in 32 bit two's complement form:
00000000 00000000 00000010 00000000.
Now convert this to its little endian representation and you'll get:
00000000 00000010 00000000 00000000
|______| |______|
| |
y[0] y[1]
Now see the above what happens when you access it using indices of a char array.
Thus, y[0] is 00000000 which is 0,
and y[1] is 00000010 which is 2.
The memory allocated for the union is the size of the largest type in the union, which is intin this case. Let's say the size of int on your system is 2 bytes then
512 will be 0x200.
Represenataion looks like:
0000 0010 0000 0000
| | |
-------------------
Byte 1 Byte 0
So the first byte is 0 and the second one is 2.(On Little endian systems)
char is one byte on all systems.
So the access z.y[0] and z.y[1] is per byte access.
z.y[0] = 0000 0000 = 0
z.y[1] = 0000 0010 = 2
I am just giving you how memory is allocated and the value is stored.You need to consider the below points since the output depends on them.
Points to be noted:
The output is completely system dependent.
The endianess and the sizeof(int) matters, which will vary across the systems.
PS: The memory occupied by both the members is the same in union.
The standard says that
6.2.5 Types:
A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.
The compiler allocates only enough space for the largest of the members, which overlay each other within this space. In your case, memory is allocated for int data type (assuming 4-bytes). The line
union a z = {512};
will initialize the first member of union z, i.e. x becomes 512. In binary it is represented as 0000 0000 0000 0000 0000 0010 0000 0000 on a 32 machine.
Memory representation for this would depend on the machine architecture. On a 32-bit machine it either will be like (store the least significant byte in the smallest address-- Little Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0010
0x1002 0000 0000
0x1003 0000 0000
or like (store the most significant byte in the smallest address -- Big Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0000
0x1002 0000 0010
0x1003 0000 0000
z.y[0] will access the content at addrees 0x1000 and z.y[1] will access the content at address 0x1001 and those content will depend on the above representation.
It seems that your machine supports Little Endian representation and therefore z.y[0] = 0 and z.y[1] = 2 and output would be 0 2.
But, you should note that footnote 95 of section 6.5.2.3 states that
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
The size of the union is derived by the maximum size to hold a single element of it. So, here it is the size of int.
Assuming it to be 4 bytes/int and 1 bytes/char, we can say: sizeof union a = 4 bytes.
Now, let's see how it is actually stored in memory:
For example, an instance of the union, a, is stored at 2000-2003:
2000 -> last(4th / least significant / rightmost) byte of int x, y[0]
2001 -> 3rd byte of int x, y[1]
2002 -> 2nd byte of int x
2003 -> 1st byte of int x (most significant)
Now, when you say z=512:
since z = 0x00000200,
M[2000] = 0x00
M[2001] = 0x02
M[2002] = 0x00
M[2003] = 0x00
So, whey you print, y[0] and y[1], it will print data M[2000] and M[2001] which is 0 and 2 in decimal respectively.
For automatic (non-static) members, the initialization is identical to assignment:
union a z;
z.x = 512;

using C Pointer with char array

int i=512;
char *c = (char *)&i;
c[0] =1;
printf("%d",i);
this displays "513", it adds 1 to i.
int i=512;
char *c = (char *)&i;
c[1] =1;
printf("%d",i);
whereas this displays 256. Divides it by 2.
Can someone please explain why? thanks a lot
Binary
The 32-bit number 512 expressed in binary, is just:
00000000000000000000001000000000
because 2 to the power of 9 is 512. Conventionally, you read the bits from right-to-left.
Here are some other decimal numbers in binary:
0001 = 1
0010 = 2
0011 = 3
0100 = 4
The Cast: Reinterpreting the Int as an Array of Bytes
When you do this:
int i = 512;
char *c = (char *)&i;
you are interpreting the 4-byte integer as an array of characters (8-bit bytes), as you probably know. If not, here's what's going on:
&i
takes the address of the variable i.
(char *)&i
reinterprets it (or casts it) to a pointer to char type. This means it can now be used like an array. Since you know an int is at least 32-bit on your machine, can access its bytes using c[0], c[1], c[2], c[3].
Depending on the endianness of the system, the bytes of the number might be laid out: most significant byte first (big endian), or least significant byte first (little endian). x86 processors are little endian. This basically means the number 512 is laid out as in the example above, i.e.:
00000000 00000000 00000010 00000000
c[3] c[2] c[1] c[0]
I've grouped the bits into separate 8-bit chunks (bytes) corresponding to the way they are laid out in memory. Note, you also read them right-to-left here, so we can keep with conventions for the binary number system.
Consequences
Now setting c[0] = 1 has this effect:
00000000 00000000 00000010 00000001
c[3] c[2] c[1] c[0]
which is 2^9 + 2^0 == 513 in decimal.
Setting c[1] = 1 has this effect:
00000000 00000000 00000001 00000000
c[3] c[2] c[1] c[0]
which is 2^8 == 256 in decimal, because you've overwritten the second byte 00000010 with 00000001
Do note on a big endian system, the bytes would be stored in reverse order to a little endian system. This would mean you'd get totally different results to ones you got if you ran it on one of those machines.
Remember char is 8 bit, 512 is bit representation is
512 = 10 0000 0000
when you do char *c = (char *)&i; you make:
c[1] = 10
c[0] = 0000 0000
when you do c[0] = 1
you make it 10 0000 0001 which is 513.
when you do c[1] = 1, you make it 01 0000 0000 which is 256.
Before you wonder why what you're seeing is "odd", consider the platform you're running your code on, and the endianness therein.
Then consider the following
int main(int argc, char *argv[])
{
int i=512;
printf("%d : ", i);
unsigned char *p = (unsigned char*)&i;
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
char *c = (char *)&i;
c[0] =1;
printf("%d : ", i);
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
i = 512;
c[1] =1;
printf("%d : ", i);
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
return 0;
}
On my platform (Macbook Air, OS X 10.8, Intel x64 Arch)
512 : 00020000
513 : 01020000
256 : 00010000
Couple what you see above with what you have hopefully read about endianness, and you can clearly see my platform is little endian. So whats yours?
Since you are aliasing an int through a char pointer, and a char is 8 bits wide (a byte), the assignment:
c[1] = 1;
will set the second byte of i to 000000001. Bytes 1, 3 and 4 (if sizeof(int) == 4) will stay unmodified. Previously, that second byte was 000000010 (since I assume you're on an x86-based computer, which is a little-endian architecture.) So basically, you shifted the only bit that was set one position to the right. That's a division by 2.
On a little-endian machine and a compiler with 32-bit int, you originally had these four bytes in i:
c[0] c[1] c[2] c[3]
00000000 00000010 00000000 00000000
After the assignment, i was set to:
c[0] c[1] c[2] c[3]
00000000 00000001 00000000 00000000
and therefore it went from 512 to 256.
Now you should understand why c[0] = 1 results in 513 :-) Think about which byte is set to 1 and that the assignment doesn't change the other bytes at all.
It's because your machine is little endian, meaning the least-significant byte is stored first in memory.
You said int i=512;. 512 is 0x00000200 in hex (assuming a 32-bit OS for simplicity). Let's look at how i would be stored in memory as hexadecimal bytes:
00 02 00 00 // 4 bytes, least-significant byte first
Now we interpret that same memory location as a character array by doing char *c = (char *)&i; - same memory, different interpretation:
00 02 00 00
c[0][1][2][3]
Now we change c[0] with c[0] =1; and the memory looks like
01 02 00 00
Which means if we look at it as a little endian int again (by doing printf("%d",i);), it's hex 0x00000201, which is 513 decimal.
Now if we go back and change c[1] with c[1] =1;, your memory now becomes:
00 01 00 00
Now we go back and interpret it as a little endian int, it's hex 0x00000100, which is 256 decimal.
It's depends on the machine whether that is little endian or big endian that how data is stored in bits.for more read this about endianness
C language doesn't guarantee about this .
512 in binary :
=============================================
0000 0000 | 0000 0000 | 0000 0010 | 0000 0000 ==>512
=============================================
12 34 56 78
(0x12345678 suppose address of this int)
char *c =(char *)&i now c[0] either point to 0x78 or 0x12
Modifying the value using c[0] may result to 513 if it points to 0x78
=============================================
0000 0000 | 0000 0000 | 0000 0010 | 0000 0001 ==> 513
=============================================
or, can be
=============================================
0000 0001 | 0000 0000 | 0000 0010 | 0000 0000 ==>2^24+512
=============================================
Similarly for 256 also : because your c1 will have the address of 2nd byte from right.
in figure below,
=============================================
0000 0000 | 0000 0000 | 0000 0001 | 0000 0000 ==>256
=============================================
So its implemention of representation of numbers in our system

int to char casting

int i = 259; /* 03010000 in Little Endian ; 00000103 in Big Endian */
char c = (char)i; /* returns 03 in both Little and Big Endian?? */
In my computer it assigns 03 to char c and I have Little Endian, but I don't know if the char casting reads the least significant byte or reads the byte pointed by the i variable.
Endianness doesn't actually change anything here. It doesn't try to store one of the bytes (MSB, LSB etc).
If char is unsigned it will wrap around. Assuming 8-bit char 259 % 256 = 3
If char is signed the result is implementation defined. Thank you pmg: 6.3.1.3/3 in the C99 Standard
Since you're casting from a larger integer type to a smaller one, it takes the least significant part regardless of endianness. If you were casting pointers instead, though, it would take the byte at the address, which would depend on endianness.
So c = (char)i assigns the least-significant byte to c, but c = *((char *)(&i)) would assign the first byte at the address of i to c, which would be the same thing on little-endian systems only.
If you want to test for little/big endian, you can use a union:
int isBigEndian (void)
{
union foo {
size_t i;
char cp[sizeof(size_t)];
} u;
u.i = 1;
return *u.cp != 1;
}
It works because in little endian, it would look like 01 00 ... 00, but in big endian, it would be 00 ... 00 01 (the ... is made up of zeros). So if the first byte is 0, the test returns true. Otherwise it returns false. Beware, however, that there also exist mixed endian machines that store data differently (some can switch endianness; others just store the data differently). The PDP-11 stored a 32-bit int as two 16-bit words, except the order of the words was reversed (e.g. 0x01234567 was 4567 0123).
When casting from int(4 bytes) to char(1 byte), it will preserve the last 1 byte.
Eg:
int x = 0x3F1; // 0x3F1 = 0000 0011 1111 0001
char y = (char)x; // 1111 0001 --> -15 in decimal (with Two's complement)
char z = (unsigned char)x; // 1111 0001 --> 241 in decimal

Resources