Take a look at the following example:
int a = 130;
char *ptr;
ptr = (char *) &a;
printf("%d", *ptr);
I expected to get a value 0 printed on the screen but to my surprise it's -126. I came to the conclusion that since char is 8 bits the int might be rounding.
Until now I used to think that memory is filled in a way that msb is on the left. But now everything seems to be mixed. How exactly is memory allocated?
in your case a (might be) 4 bytes little endian value. and 130 is 10000010 in binary.
int a = 130; // 10000010 00000000 00000000 00000000 see little endianness here
and you're pointing to the first byte with char*
char* ptr = (char*)&a; // 10000010
and trying to print it with %d format which will print the signed integer value of 10000010 which is -126 (see: Two's complement)
Your output is a hint that your system is little endian (Least Significant Byte has lowest memory address).
In hexadecimal (exactly 2 digit per byte) 130 writes 0x82. Assuming 4 bytes for an int, in a little endian system, the integer will be stored as 0x82, 0, 0, 0. So *ptr will be (char) 0x82.
But you use printf to display its value. As all parameters passed the first have no required type, the char value will be promoted to an int. Here assuming a 2 complement representation (the currently most common one) you will get either 130 if char was unsigned, or -126 if is is signed.
TL/DR: the output is normal on a little endian system with a 2-complement integer representation, and where the char type is signed.
Related
int main()
{
unsigned char a = -1;
printf("%d",a);
printf("%u",a);
}
when i have executed the above program i got 255 255 as the answer.
we know negative numbers will be stored in 2's complement.
since it is 2's complement the representation would be
1111 1111 -->2's complement.
but in the above we are printing %d(int) but integer is four bytes.
my assumption is even though it is character we are forcing compiler to treat it as integer.
so it internally uses sign extension concept.
1111 1111 1111 1111 1111 1111 1111 1111.
according to the above representation it has to be -1 in the first case since it is %d(signed).
in the second case it has to print (2^31- 1) but it is printing 255 and 255.
why it is printing 255 in both cases.
tell me if my assumption is wrong and give me the real interpretation.
Your assumption is wrong; the character will "roll over" to 255, then be padded to the size of an integer. Assuming a 32-bit integer:
11111111
would be padded to:
00000000 00000000 00000000 11111111
Up to the representation of a, you are correct. However, the %d and %u conversions of the printf() function both take an int as an argument. That is, your code is the same as if you had written
int main() {
unsigned char a = -1;
printf("%d", (int)a);
printf("%u", (int)a);
}
In the moment you have assigned -1 to a you have lost the information that it once was a signed value, the logical value of a is 255. Now, when you convert an unsigned char to an int, the compiler preserves the logical value of a and the code prints 255.
The compiler doesn't know what type the extra parameters in printf should be, since the only thing that specifies it should be treated as a 4-byte int is the format string, which is irrelevant at compile time.
What actually happens behind the scenes is the callee (printf) receives a pointer to each parameter, then casts to the appropriate type.
Roughly the same result as this:
char a = -1;
int * p = (int*)&a; // BAD CAST
int numberToPrint = *p; // Accesses 3 extra bytes from somewhere on the stack
Since you're likely running on a little endian CPU, the 4-byte int 0x12345678 is arranged in memory as | 0x78 | 0x56 | 0x34 | 0x12 |
If the 3 bytes on the stack following a are all 0x00 (they probably are due to stack alignment, but it's NOT GUARANTEED), the memory looks like this:
&a: | 0xFF |
(int*)&a: | 0xFF | 0x00 | 0x00 | 0x00 |
which evaluates to *(int*)&a == 0x000000FF.
unsigned char runs from 0-255 So the negative number -1 will print 255 -2 will print 254 and so on...
signed char runs from -128 to +127 so you get -1 for the same printf() which is not the case with unsigned char
Once you make a assignment to a char then the rest of the integer values will be padded so your assumption of 2^31 is wrong.
The negative number is represented using 2's complement(Implementation dependent)
So
1 = 0000 0001
So in order to get -1 we do
----------------------------------------
2's complement = 1111 1111 = (255) |
-----------------------------------------
It is printing 255, simply because this is the purpose from ISO/IEC9899
H.2.2 Integer types
1 The signed C integer types int, long int, long long int, and the corresponding
unsigned types are compatible with LIA−1. If an implementation adds support for the
LIA−1 exceptional values ‘‘integer_overflow’’ and ‘‘undefined’’, then those types are
LIA−1 conformant types. C’s unsigned integer types are ‘‘modulo’’ in the LIA−1 sense
in that overflows or out-of-bounds results silently wrap. An implementation that defines
signed integer types as also being modulo need not detect integer overflow, in which case,
only integer divide-by-zero need be detected.
If this is given, printing 255 is absolutly that, what the LIA-1 would expect.
Otherwise, if your implementation doesn't support C99's LIA-1 Annex part, then its simply undefined behaving.
Number 4 represented as a 32-bit unsigned integer would be
on a big endian machine:
00000000 00000000 00000000 00000100 (most significant byte first)
on a small endian machine:
00000100 00000000 00000000 00000000 (most significant byte last)
As a 8-bit unsigned integer it is represented as
00000100 on both machines.
Now when casting 8-bit uint to a 32-bit I always thought that on a big endian machine that means sticking 24 zeros in front of the existing byte, and appending 24 zeros to the end if the machine is little endian. However, someone pointed out that in both cases zeros are prepended rather than appended. But wouldn't it mean that on a little endian 00000100 will become the most significant byte, which will result in a very large number? Please explain where I am wrong.
Zeroes are prepended if you consider the mathematical value (which just happens to also be the big-endian representation).
Casts in C always strive to preserve the value, not representation. That's how, for example, (int)1.25 results(*note below) in 1, as opposed to something which makes much less sense.
As discussed in the comments, the same holds for bit-shifts (and other bitwise operations, for that matter). 50 >> 1 == 25, regardless of endianness.
(* note: usually, depends rounding mode for float->integer conversion)
In short: Operators in C operate on the mathematical value, regardless of representation. One exception is when you cast a pointer to the value (as in (char*)&foo), since then it is essentially a different "view" to the same data.
Not sure if it answers your question, but will give it a try:
If you take a char variable and cast it to an int variable, then you get the exact same result on both architectures:
char c = 0x12;
int i = (int)c; // i == 0x12 on both architectures
If you take an int variable and cast it to a char variable, then you get the exact same result (possibly truncated) on both architectures:
int i = 0x12345678;
char c = (char)i; // c == 0x78 on both architectures
But if you take an int variable and read it using a char* pointer, then you get a different result on each architecture:
int i = 0x12345678;
char c = *(char*)&i; // c == 0x12 on BE architecture and 0x78 on LE architecture
The example above assumes that sizeof(int) == 4 (may be different on some compilers).
Loosely speaking, "Endianness" is the property of how processor sees the data stored in memory. This means that all the processors, when a particular data is brought to the CPU, sees it the same way.
For example:
int a = 0x01020304;
Irrespective of whether a little or big endian machine, would always have 04 as the least significant and 01 as the most significant byte, when stored in it's register.
The problem arises when this variable/data has to be stored in memory, which is "byte addressable". Should 01 (Most Significant Byte) go into the lowest memory address (Big Endian) or the highest memory address (Little Endian).
In your particular example, what you have shown is the representation, the way processor sees it - with LS/MS Byte.
So technically speaking, both little and big endian machines would have:
00000000 00000000 00000000 00000100
in its 32 bit wide register. Assuming of course what you have in memory is 32 bit wide integer representing 4. How this 4 is stored in/retrieved from memory is what endianness is all about.
I'm trying to convert the 8 bits of a char to the least significant bits of an int. I know that the conversion from char to int is easy doable via
int var = (int) var2;
where var2 is of type char (or even without putting (int)).
But I wonder, if I write the code above, are the remaining highest significant (32-8=) 24 bits of the int just random or are they set to 0?
Example:
Let var2 be 00001001, if I write the code above, is var then 00000000 00000000 00000000 00001001?
C11 (n1570), § 6.5.4 Cast operators
Preceding an expression by a parenthesized type name converts the value of the
expression to the named type.
Therefore, the remaining of var are set to 0.
By the way, the explicit cast is unecessary. The conversion from char to int is implicit.
C11 (n1570), § 6.3.1.1 Boolean, characters, and integers
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.
A C/C++ compiler can choose for the char type to be either signed or unsigned.
If your compiler defines char to be signed, the upper bits will be sign-extended when it is cast to an int. That is, they will either be all zeros or all ones depending on the value of the sign bit of var2. For example, if var2 has the value -1 (in hex, that's 0xff), var will also be -1 after the assignment (represented as 0xffffffff on a 32-bit machine).
If your compiler defines char to be unsigned, the upper bits will be all zero. For example, if var2 has the value 255 (again 0xff), the value of var will be 255 (0x000000ff).
I'm trying to convert the 8 bits of a char to the least significant
bits of an int
A char isn't guaranteed to be 8 bits. It might be more. Furthermore, as others have mentioned, it could be a signed integer type. Negative char values will convert to negative int values, in the following code.
int var = (int) var2;
The sign bit is considered to be the most significant, so this code doesn't do what you want it to. Perhaps you mean to convert from char to unsigned char (to make it positive), and then to int (by implicit conversion):
int var = (unsigned char) var2;
If you foresee CHAR_BIT exceeding 8 in your use case scenarios, you might want to consider using the modulo operator to reduce it:
int var = (unsigned char) var2 % 256;
But I wonder, if I write the code above, are the remaining highest
significant (32-8=) 24 bits of the int just random or are they set to
0?
Of course an assignment will assign the entire value, not just part of it.
Example: Let var2 be 00001001, if I write the code above, is var then
00000000 00000000 00000000 00001001?
Semantically, yes. The C standard requires that an int be able to store values between the range of -32767 and 32767. Your implementation may choose to represent larger ranges, but that's not required. Technically, an int is at least 16 bits. Just keep that in mind.
For values of var2 that are negative, however (eg. 10000001 in binary notation), the sign bit will be extended. var will end up being 10000000 00000001 (in binary notation).
For casting a char variable named c to an int:
if c is a positive such as 24, then c = 00011000 and after casting it will be fill up by zeros:
00000000 00000000 00000000 00011000
if c is a negative such as -24, then c = 11101000 and after casting it will be fill up by ones:
11111111 11111111 11111111 11101000
for the following code:
void main()
{
int i;
float a=5.2;
char *ptr;
ptr=(char *)&a;
for(i=0;i<=3;i++)
printf("%d ",*ptr++);
}
i m getting o/p as 102 102 -90 64..why?how does the Character pointer treats MSB bit of each byte?
Whether char is signed or unsigned is implementation defined. Clearly the char data type in your system is signed. So the MSB is the sign bit.
In your case, apparently it treats the most significant bit as a sign bit, in other words, in your implementation char is a signed integer type, with two's complement representation, incidentally.
If you convert the 5.2 floating point value into binary format you get:
5.2 = 01000000 (=64) 10100110 (=166) 01100110 (=102) 01100110 (= 102)
If you take the 3rd byte (166) and convert it into a signed char value (within [-128, 127]) then you obtain -90.
Compile your program with -funsigned-char to obtain 102 102 166 64 as output.
In your case , the char is using signed bit representation. As far as the values are concerned , they depend on the Endianness of the system you are working on.
#include <stdio.h>
int main(void)
{
int i = 258;
char ch = i;
printf("%d", ch)
}
the output is 2!
How the range of variable works? what is the range of different data types in c langauge?
When assigning to a smaller type the value is
truncated, i.e. 258 % 256 if the new type is unsigned
modified in an implementation-defined fashion if the new type is signed
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.
So all that fancy "adding or subtracting" means it is assigned as if you said:
ch = i % 256;
char is 8-bit long, while 258 requires nine bits to represent. Converting to char chops off the most significant bit of 258 which is 100000010 in binary, resulting in 10, which is 2 in binary.
When you pass char to printf, it gets promoted to int, which is then picked up by the %d format specifier, and printed as 2.
#include <stdio.h>
int main(void)
{
int i = 258;
char ch = i;
printf("%d", ch)
}
Here i is 0000000100000010 on the machine level. ch takes 1 byte, so it takes last 8 bit, it is 00000010, it is 2.
In order to find out how long various types are in C language you should refer to limits.h (or climits in C++). char is not guaranteed to be 8 bits long . It is just:
smallest addressable unit of the machine that can contain basic character set. It is an integer type. Actual type can be either signed or unsigned depending on implementation
Same sort of vague definitions are put for other types.
Alternatively, you can use operator sizeof to dynamically find out size of the type in bytes.
You may not assume exact ranges of native C data types. Standard places only minimal restrictions, so you can say unsigned short can hold at least 65536 different values. Upper limit can differ
Refer to Wikipedia for more reading
char is on 8 bits so, when you cast (you assign an integer to a char), in 32 bits machine, the i (int is on 32 bits) var is:
00000000 00000000 00000001 00000010 = 258 (in binary)
When you want a char from this int, you truncate the last 8 bits (char is on 8 bits), so you get:
00000010 which mean 2 in decimal, this is why you see this output.
Regards.
This is an overflow ; the result is undefined because char may be signed (undefined behavior) or unsigned (well-defined "wrap-around" behavior).
You are using little-endian machine.
Binary representation of 258 is
00000000 00000000 00000001 00000010
while assigning integer to char, only 8 byte of data is copied to char. i.e LSB.
Here only 00000010 i.e 0x02 will be copied to char.
The same code will gives zero, in case of big-endian machine.