int main()
{
int i=21;
char *p;
p=(char*)&i;
printf("%d",*p);
getch();
return 0;
}
printf statement gave me perfect answer but I think it shouldn't have as 'p' is a character pointer it will be able to save its base address but int takes up two spaces, *p shouldn't be able to give me integer value as it will point to address let say X but int is stored in two bytes so value need to be collected from X and X+1 address but I ran this code and gave me the value , or do I have the wrong insight on this ?
p=(char*)&i;
This points p to the lowest address in i. Whether that is the address of the low order byte or the high order byte depends on the endianness of your system. (It could even be an internal byte ... PDP-11's are little-endian but longs (32 bits) were stored with the high order 16-bit word first, so the byte order was 2,3,0,1.) Likely you're running on a little-endian machine (x86's are) so it points to the low order byte.
*p
Given little-endianness, this fetches the low order byte of i, which is (char)21, and then does the default conversion to an int, giving (int)21, and prints 21. If i contained a value > 255, you would get the "wrong" result. Also if it contained a value > 127 and < 256 and char is signed on your system -- it would print a negative value.
Since the result depends on the endianness of the machine and is implementation-defined and thus is not portable, you should not do this sort of thing unless your specific goal is to determine the endianness of your machine. Beginning programmers should spend a lot less time trying to understand why bad code sometimes "works" and instead learn how to write good code. A general rule (with plenty of exceptions): code with casts is bad code.
Related
Will the accessibility of memory space get changed or just informing the compiler take the variable of mentioned type?
Example:
int main()
{
char a;
a = 123456789;
printf("ans is %d\n",(int)a);
}
Output:
overflow in implicit constant conversion a= 123456789.
ans is 21.
Here I know why it's causing overflow. But I want to know how memory is accessed when an overflow occurs.
This is kind of simple: Since char typically only holds one byte, only a single byte of 123456789 will be copied to a. Exactly how depends on if char is signed or unsigned (it's implementation-specific which one it is). For the exact details see e.g. this integer conversion reference.
What typically happens (I haven't seen any compiler do any different) is that the last byte of the value is copied, unmodified, into a.
For 123456789, if you view the hexadecimal representation of the value it will be 0x75bcd15. Here you can easily see that the last byte is 0x15 which is 21 in decimal.
What happens with the casting to int when you print the value is actually nothing that wouldn't happen anyway... When using variable-argument functions like printf values of a smaller type than int will be promoted to an int. Your printf call is exactly equal to
printf("ans is %d\n",a);
Here is something weird I found:
When I have a char* s of three elements, and assigned it to be "21",
The printed short int value of s appears to be 12594, which is same to 0010001 0010010 in binary, and 49 50 for separate char. But according to the ASCII chart, the value of '2' is 50 and '1' is 49.
when I shift the char to right, *(short*)s >>= 8, the result is agreed with (1.), which is '1' or 49. But after I assigned the char *s = '1', the printed string of s also appears to be "1", which I earlier thought it will become "11".
I am kind of confused about how bits stored in a char now, hope someone can explain this.
Following is the code I use:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
printf("%lu,%lu\n",sizeof(char), sizeof(short));
char* s = malloc(sizeof(char)*3);
*s = '2', *(s+1) = '1', *(s+2) = '\0';
printf("%s\n",s);
printf("%d\n",*(short int*)s);
*(short*)s >>= 8;
printf("%s\n",s);
printf("%d\n",*(short int*)s);
*s = '1';
printf("%s\n",s);
return 0;
}
And the output is:
1,2
21
12594
1
49
1
This program is compiled on macOS with gcc.
You need some understanding of the concept of "endianess" here, that values can be represented as "little endian" and "big endian".
I am going to skip the discussion of how legal it is, about involved undefined bahaviour.
(Here is however a relevant link, provided by Lundin, credits:
What is the strict aliasing rule?)
But lets look at a pair of byte in memory, of which the lower-addressed contains a 50 and the higher addressed contains a 49:
50 49
You introduce them exactly this way, by explicitly setting lower byte and higher byte (via char type).
Then you read them, forcing the compiler to consider it a short, which is a two byte sized type on your system.
Compilers and hardware can be created with different "opinions" on what is a good representation of two byte values in two cosecutive bytes. It is called "endianess".
Two compilers, both of which are perfectly standard-conforming can act like this:
The short to be returned is
take the value from lower address, multiply it by 256, add the value from higher address
take the value from the higher address, multiply it by 256, add the value from the lower address
They do not actually do so, it is a much more efficient mechanism implemented in hardware, but the point is that even the implementation in hardware implicity does this or that.
You are re-interpreting representations by aliasing types in a way that is not allowed by the standard: you can process a short value as if it were a char array, but not the opposite. Doing that can cause weird errors with optimizing compilers that could assume that the value has never been initialized, or could optimize out a full branch of code that contains Undefined Behaviour.
Then the answer to your question is called endianess. In a big endian representation, the most significant byte has the lowest address (258 or 0x102 will be represented as the 2 byte 0x01, 0x02 in that order) while in little endian representation the least significant byte has the lowest address (0x102 is represented as 0x02, 0x01 in that order).
Your system happens to be a little endian one.
In my course for intro to operating systems, our task is to determine if a system is big or little endian. There's plenty of results I've found on how to do it, and I've done my best to reconstruct my own version of a code. I suspect it's not the best way of doing it, but it seems to work:
#include <stdio.h>
int main() {
int a = 0x1234;
unsigned char *start = (unsigned char*) &a;
int len = sizeof( int );
if( start[0] > start[ len - 1 ] ) {
//biggest in front (Little Endian)
printf("1");
} else if( start[0] < start[ len - 1 ] ) {
//smallest in front (Big Endian)
printf("0");
} else {
//unable to determine with set value
printf( "Please try a different integer (non-zero). " );
}
}
I've seen this line of code (or some version of) in almost all answers I've seen:
unsigned char *start = (unsigned char*) &a;
What is happening here? I understand casting in general, but what happens if you cast an int to a char pointer? I know:
unsigned int *p = &a;
assigns the memory address of a to p, and that can you affect the value of a through dereferencing p. But I'm totally lost with what's happening with the char and more importantly, not sure why my code works.
Thanks for helping me with my first SO post. :)
When you cast between pointers of different types, the result is generally implementation-defined (it depends on the system and the compiler). There are no guarantees that you can access the pointer or that it correctly aligned etc.
But for the special case when you cast to a pointer to character, the standard actually guarantees that you get a pointer to the lowest addressed byte of the object (C11 6.3.2.3 ยง7).
So the compiler will implement the code you have posted in such a way that you get a pointer to the least significant byte of the int. As we can tell from your code, that byte may contain different values depending on endianess.
If you have a 16-bit CPU, the char pointer will point at memory containing 0x12 in case of big endian, or 0x34 in case of little endian.
For a 32-bit CPU, the int would contain 0x00001234, so you would get 0x00 in case of big endian and 0x34 in case of little endian.
If you de reference an integer pointer you will get 4 bytes of data(depends on compiler,assuming gcc). But if you want only one byte then cast that pointer to a character pointer and de reference it. You will get one byte of data. Casting means you are saying to compiler that read so many bytes instead of original data type byte size.
Values stored in memory are a set of '1's and '0's which by themselves do not mean anything. Datatypes are used for recognizing and interpreting what the values mean. So lets say, at a particular memory location, the data stored is the following set of bits ad infinitum: 01001010 ..... By itself this data is meaningless.
A pointer (other than a void pointer) contains 2 pieces of information. It contains the starting position of a set of bytes, and the way in which the set of bits are to be interpreted. For details, you can see: http://en.wikipedia.org/wiki/C_data_types and references therein.
So if you have
a char *c,
an short int *i,
and a float *f
which look at the bits mentioned above, c, i, and f are the same, but *c takes the first 8 bits and interprets it in a certain way. So you can do things like printf('The character is %c', *c). On the other hand, *i takes the first 16 bits and interprets it in a certain way. In this case, it will be meaningful to say, printf('The character is %d', *i). Again, for *f, printf('The character is %f', *f) is meaningful.
The real differences come when you do math with these. For example,
c++ advances the pointer by 1 byte,
i++ advanced it by 4 bytes,
and f++ advances it by 8 bytes.
More importantly, for
(*c)++, (*i)++, and (*f)++ the algorithm used for doing the addition is totally different.
In your question, when you do a casting from one pointer to another, you already know that the algorithm you are going to use for manipulating the bits present at that location will be easier if you interpret those bits as an unsigned char rather than an unsigned int. The same operatord +, -, etc will act differently depending upon what datatype the operators are looking at. If you have worked in Physics problems wherein doing a coordinate transformation has made the solution very simple, then this is the closest analog to that operation. You are transforming one problem into another that is easier to solve.
I have the following code:
void main()
{
char tmp[3]= "AB";
short k;
memcpy(&k,tmp,2);
printf("%x\n", k);
}
In ASCII, the hex value of char 'A' is 41 and the hex value of char 'B' is 42. Why is the result of this program 4241? I think the correct result is 4142.
You are apparently running this on a "little-endian" machine, where the least significant byte comes first. See http://en.wikipedia.org/wiki/Endianness.
Your platform stores less significant bytes of a number at smaller memory addresses, and more significant bytes at higher memory addresses. Such platforms are called little-endian platforms.
However, when you print a number the more significant digits are printed first while the less significant digits are printed later (which is how our everyday numeric notation works). For this reason the result looks "reversed" compared to the way it is stored in memory on a little-endian platform.
If you compile and run the same program on a big-endian platform, the output should be 4142 (assuming a platform with 2-byte short).
P.S. One can argue that the "problem" in this case is the "weirdness" of our everyday numerical notation: we write numbers so that the significance of their digits increase in right-to-left direction. This appears to be inconsistent in the context of societies that write and read in left-to-right direction. In other words, it in not the little-endian memory that is reversed. It is the way we write numbers that is reversed.
Your system is little-endian. That means that a short (16-bit integer) is stored with the least significant byte first, followed by the most significant byte.
The same goes for larger integers. The following code would result in "44434241".
void main()
{
char tmp[5]= "ABCD";
int k;
memcpy(&k,tmp,4);
printf("%x\n", k);
}
I wrote a small program which reverses a string and prints it to screen:
void ReverseString(char *String)
{
char *Begin = String;
char *End = String + strlen(String) - 1;
char TempChar = '\0';
while (Begin < End)
{
TempChar = *Begin;
*Begin = *End;
*End = TempChar;
Begin++;
End--;
}
printf("%s",String);
}
It works perfectly in Dev C++ on Windows (little endian).
But I have a sudden doubt of its efficiency. If you look at this line:
while (Begin < End)
I am comparing the address of the beginning and end. Is this the correct way?
Does this code work on a big endian OS like Mac OS X ?
Or am I thinking the wrong way ?
I have got several doubts which I mentioned above.
Can anyone please clarify ?
Your code has no endianness-related issues. There's also nothing wrong with the way you're comparing the two pointers. In short, your code's fine.
Endianness is defined as the order of significance of the bytes in a multi-byte primitive type. So if your int is big-endian, that means the first byte (i.e. the one with the lowest address) of an int in memory contains the most significant bits of the int, and so on to the last/least significant. That's all it means. When we say a system is big-endian, that generally means that all of its pointer and arithmetic types are big-endian, although there are some odd special cases out there. Endian-ness doesn't affect pointer arithmetic or comparison, or the order in which strings are stored in memory.
Your code does not use any multi-byte primitive types[*], so endian-ness is irrelevant. In general, endian-ness only becomes relevant if you somehow access the individual bytes of such an object (for example by casting a pointer to unsigned char*, writing the memory to a file or over the network, and the like).
Supposing a caller did something like this:
int x = 0x00010203; // assuming sizeof(int) == 4 and CHAR_BIT == 8
ReverseString((char *)&x);
Then their code would be endian-dependent. On a big-endian system, they would pass you an empty string, since the first byte would be 0, so your code would leave x unchanged. On a little-endian system they would pass you a three-byte string, since the first three bytes would be 0x03, 0x02, 0x01 and the fourth byte 0, so your code would change x to 0x00030201
[*] well, the pointers are multi-byte, on OSX and on pretty much every C implementation. But you don't inspect their storage representations, you just use them as values, so there's no opportunity for behavior to differ according to endianness.
As far as I know, endianness does not affect a char * as each character is a single byte and forms an array of characters. Have a look at http://www.ibm.com/developerworks/aix/library/au-endianc/index.html?ca=drs-
The effect will be seen in multi byte data types like int.
As long as you manipulate whole type T objects (which is what you do with type T being char) you just can't run into endianness problems.
You could run into them if you for example tried to manipulate separate bytes within a larger type (an int for example) but you don't do anything like that. This is why endianness problems are impossible in your code, period.