From which endpoint and how does C read the variables? - c

I was messing around with pointers in C and was trying to read values from the same address using different types of pointers. First I created a double variable and assigned the number 26 to it.
double g = 26;
like so. And then I assigned g's address to a void pointer void *vptr = &g;. after that, I tried to read the value at the address of g as a float by type-casting.
float *atr = (float*) (vptr);. When I tried to print the value of *atr it gave me 0.000000. Then i used pointer to a character since characters are 1 byte and tried to see values of those 8 bytes one by one.
char *t;
t = (char*) vptr;
for (int i = 0; i < 8; i++){
printf("%x t[%d]: %d\n",t+i , i, t[i]);
}
it gave me this output
ffffcbe9 t[1]: 0
ffffcbea t[2]: 0
ffffcbeb t[3]: 0
ffffcbec t[4]: 0
ffffcbed t[5]: 0
ffffcbee t[6]: 58
ffffcbef t[7]: 64
Then I checked binary representation of g which is 01000000 00111010 00000000 00000000 00000000 00000000 00000000 00000000 using this website.
When I convert every byte to decimal individually, first byte becomes 64 and the second is 58.
So it was basically reversed. Then I tried to read as a float again but this time i shifted the address.
atr = (float*) (vptr+4);. I didn't know how many bytes it would shift but coincidentally i discovered that it shifts by one just like char pointers.
This time i printed as printf("%f\n",*atr); and now it gave me 2.906250.
When I checked it's binary representation it was 01000000 00111010 00000000 00000000 which is the first half of the variable g. So I am kind of confused how C is reading values from addresses since it looks like c reads the values from right-end and when i add positive numbers to addresses it shifts towards left-end. I am sorry for any spelling or grammatical mistakes.

The order in which the bytes of an scalar object are stored in C are implementation-defined, per C 2018 6.2.6.1 2. (Array elements are of course stored in ascending order by index, and members of structures are in order of declaration, possibly with padding between them.)
The behavior of using *atr after float *atr = (float*) (vptr); is not defined by the C standard, due to the aliasing rules in C 2018 6.5 7. It is defined to examine the bytes through a char lvalue, as you did with t[i], although which bytes are which is implementation-defined per above.
A proper way to reinterpret some bytes of a double as a float is to copy them in byte-by-byte, which you can do with manual code using a char * or simply float f; memcpy(&f, &g, sizeof f);. (memcpy is specified to work as if by copying bytes, per C 2018 7.24.2.1 2.) This will of course only reinterpret the low-addressed bytes of the double as a float, which has two problems:
The low-address bytes may not be the ones that contain the most significant bytes of the double.
float and double commonly use different formats, and the difference is not simply that float has fewer bits in the significand (the fraction portion). The exponent field is also a different width and has a different encoding bias. So reinterpreting the double this way generally will not give you a float that has about the same value as a double.
I didn't know how many bytes it would shift but coincidentally i discovered that it shifts by one just like char pointers.
Supporting arithmetic on void * is a GCC extension that is, as far as I know, needless. When offsets are added to or subtracted from void *, GCC does arithmetic as if it were a char *. This appears to be needless because one can do the desired arithmetic simply by using a char * instead of a void *, so the extension does not provide any new function.

Related

How values are changed in C unions?

#include <stdio.h>
int main()
{
typedef union{
int a ;
char c;
float f;
} myu;
myu sam;
sam.a = 10;
sam.f=(float)5.99;
sam.c= 'H';
printf("%d\n %c\n %f\n",sam.a,sam.c,sam.f);
return 0;
}
Output
1086303816
H
5.990025
How come the value of integer has changed so drastically while the float is almost the same.
The fields of a union all share the same starting memory address. This means that writing to one member will overwrite the contents of another.
When you write one member and then read a different member, the representation of the written member (i.e. how it is laid out in memory) is reinterpreted as the representation of the read member. Integers and floating point types have very different representations, so it makes sense that reading a float as though it were an int can vary greatly.
Things become even more complicated if the two types are not the same size. If a smaller field is written and a larger field is read, the excess bytes might not have event been initialized.
In your example, you first write the value 10 to the int member. Then you write the value 5.99 to the float member. Assuming int and float are both 4 bytes in length, all of the bytes used by the int member are overwritten by the float member.
When you then change the char member, this only changes the first byte. Assuming a float is represented in little-endian IEEE754, this changes just the low-order byte of the mantissa, so only the digits furthest to the right are affected.
Try this. Instead of using printf (which will mainly output nonesens) show the raw memory after each modification.
The code below assumes that int and float are 32 bit types and that your compiler does not add padding bytes in this union.
#include <string.h>
#include <stdio.h>
#include <assert.h>
void showmemory(void* myu)
{
unsigned char memory[4];
memcpy(memory, myu, 4);
for (int i = 0; i < 4; i++)
{
printf("%02x ", memory[i]);
}
printf("\n");
}
int main()
{
typedef union {
int a;
char c;
float f;
} myu;
assert(sizeof(myu) == 4); // assume size of the union is 4 bytes
myu sam;
sam.a = 10;
showmemory(&sam);
sam.f = (float)5.99;
showmemory(&sam);
sam.c = 'H';
showmemory(&sam);
}
Possible output on a little endian system:
0a 00 00 00 // 0a is 10 in hexadecimal
14 ae bf 40 // 5.99 in float
48 ae bf 40 // 48 is 'H'
How come the value of integer has changed so drastically while the float is almost the same.
That is just a coincidence. Your union will be stored in 4 bytes. When you assign the field "a" to 10, the binary representation of the union is 0x0000000A. And then, when you assign the field
f to 5.99, it becomes 0x40bfae14. Finally, you set the c to 'H' (0x48 in Hex), it will overwrite the first byte, which corresponds to the mantissa of the float value. Thus, the float part changes slightly. For more information about floating-point encoding, you can check this handy website out.
In "traditional" C, any object that are not bitfields and do not have a register storage class will represent an association between a sequence of consecutive bytes somewhere in memory and a means of reading or writing values of the object's type. Storing a value into an object of type T will convert the value into a a pattern of sizeof (T) * CHAR_BIT bits, and store that pattern into the associated memory. Reading an object of type T will read the sizeof (T) * CHAR_BIT bits from the object's associated storage and convert that bit pattern into a value of type T, without regard for how the underlying storage came to hold that bit pattern.
A union object serves to reserve space for the largest member, and then creates an association between each member and a region of storage that begins at the start of the union object. Any write to a member of a union will affect the appropriate part of the underlying storage, and any read of a union member will interpret whatever happens to be in the underlying storage as a value of its type. This will be true whether the member is accessed directly, or via pointer or array syntax.
The "traditional C" model is really quite simple. The C Standard, however, is much more complicated because the authors wanted to allow implementations to deviate from that behavior when doing so wouldn't interfere with whatever their customers need to do. This in turn has been interpreted by some compiler writers as an invitation to deviate from the traditional behavior without any regard for whether the traditional behavior might be useful in more circumstances than the bare minimums mandated by the Standard.

Using incorrect format specifier in printf()

I am trying to solve the next problem:
printf("%d", 1.0f); // Output is 0
So, I really do not know why it is so. The number 1.0 (32 bit in IEEE 754) has the next binary interpretation:
00111111 10000000 00000000 00000000
If convert this one to integer interpretation we get the next:
1 065 353 216
So, sizeof(int) == sizeof(float) == 4 bytes.
I know the float number in C will be converted into double by compiler, but I use f for float constant.
I tried different values and I counted the binary numbers, but I do not know. That is insanity.
I want to see the 1 065 353 216 in my console.
When you use the incorrect format specifier to printf, you invoke undefined behavior, meaning you can't accurately predict what will happen.
That being said, floating point values are typically passed to functions via floating point registers, while integer values are typically passed on the stack. So the value you're seeing is whatever happened to be sitting on the stack.
As an example, if I put that line by itself in a main function, it prints a different value every time I run it.
If you want to print the representation of a float, you can use a union:
union {
float f;
unsigned int i;
} u;
u.f = 1.0f;
printf("%d", u.i);

Printing actual bit representation of integers in C

I wanted to print the actual bit representation of integers in C. These are the two approaches that I found.
First:
union int_char {
int val;
unsigned char c[sizeof(int)];
} data;
data.val = n1;
// printf("Integer: %p\nFirst char: %p\nLast char: %p\n", &data.f, &data.c[0], &data.c[sizeof(int)-1]);
for(int i = 0; i < sizeof(int); i++)
printf("%.2x", data.c[i]);
printf("\n");
Second:
for(int i = 0; i < 8*sizeof(int); i++) {
int j = 8 * sizeof(int) - 1 - i;
printf("%d", (val >> j) & 1);
}
printf("\n");
For the second approach, the outputs are 00000002 and 02000000. I also tried the other numbers and it seems that the bytes are swapped in the two. Which one is correct?
Welcome to the exotic world of endian-ness.
Because we write numbers most significant digit first, you might imagine the most significant byte is stored at the lower address.
The electrical engineers who build computers are more imaginative.
Someimes they store the most significant byte first but on your platform it's the least significant.
There are even platforms where it's all a bit mixed up - but you'll rarely encounter those in practice.
So we talk about big-endian and little-endian for the most part. It's a joke about Gulliver's Travels where there's a pointless war about which end of a boiled egg to start at. Which is itself a satire of some disputes in the Christian Church. But I digress.
Because your first snippet looks at the value as a series of bytes it encounters then in endian order.
But because the >> is defined as operating on bits it is implemented to work 'logically' without regard to implementation.
It's right of C to not define the byte order because hardware not supporting the model C chose would be burdened with an overhead of shuffling bytes around endlessly and pointlessly.
There sadly isn't a built-in identifier telling you what the model is - though code that does can be found.
It will become relevant to you if (a) as above you want to breakdown integer types into bytes and manipulate them or (b) you receive files for other platforms containing multi-byte structures.
Unicode offers something called a BOM (Byte Order Marker) in UTF-16 and UTF-32.
In fact a good reason (among many) for using UTF-8 is the problem goes away. Because each component is a single byte.
Footnote:
It's been pointed out quite fairly in the comments that I haven't told the whole story.
The C language specification admits more than one representation of integers and particularly signed integers. Specifically signed-magnitude, twos-complement and ones-complement.
It also permits 'padding bits' that don't represent part of the value.
So in principle along with tackling endian-ness we need to consider representation.
In principle. All modern computers use twos complement and extant machines that use anything else are very rare and unless you have a genuine requirement to support such platforms, I recommend assuming you're on a twos-complement system.
The correct Hex representation as string is 00000002 as if you declare the integer with hex represetation.
int n = 0x00000002; //n=2
or as you where get when printing integer as hex like in:
printf("%08x", n);
But when printing integer bytes 1 byte after the other, you also must consider the endianess, which is the byte order in multi-byte integers:
In big endian system (some UNIX system use it) the 4 bytes will be ordered in memory as:
00 00 00 02
While in little endian system (most of OS) the bytes will be ordered in memory as:
02 00 00 00
The first prints the bytes that represent the integer in the order they appear in memory. Platforms with different endian will print different results as they store integers in different ways.
The second prints the bits that make up the integer value most significant bit first. This result is independent of endian. The result is also independent of how the >> operator is implemented for signed ints as it does not look at the bits that may be influenced by the implementation.
The second is a better match to the question "Printing actual bit representation of integers in C". Although there is a lot of ambiguity.
It depends on your definition of "correct".
The first one will print the data exactly like it's laid out in memory, so I bet that's the one you're getting the maybe unexpected 02000000 for. *) IMHO, that's the correct one. It could be done simpler by just aliasing with unsigned char * directly (char pointers are always allowed to alias any other pointers, in fact, accessing representations is a usecase for char pointers mentioned in the standard):
int x = 2;
unsigned char *rep = (unsigned char *)&x;
for (int i = 0; i < sizeof x; ++i) printf("0x%hhx ", rep[i]);
The second one will print only the value bits **) and take them in the order from the most significant byte to the least significant one. I wouldn't call it correct because it also assumes that bytes have 8 bits, and because the shifting used is implementation-defined for negative numbers. ***) Furthermore, just ignoring padding bits doesn't seem correct either if you really want to see the representation.
edit: As commented by Gerhardh meanwhile, this second code doesn't print byte by byte but bit by bit. So, the output you claim to see isn't possible. Still, it's the same principle, it only prints value bits and starts at the most significant one.
*) You're on a "little endian" machine. On these machines, the least significant byte is stored first in memory. Read more about Endianness on wikipedia.
**) Representations of types in C may also have padding bits. Some types aren't allowed to include padding (like char), but int is allowed to have them. This second option doesn't alias to char, so the padding bits remain invisible.
***) A correct version of this code (for printing all the value bits) must a) correctly determine the number of value bits (8 * sizeof int is wrong because bytes (char) can have more then 8 bits, even CHAR_BIT * sizeof int is wrong, because this would also count padding bits if present) and b) avoid the implementation-defined shifting behavior by first converting to unsigned. It could look for example like this:
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL *30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 + 4-12/((m)%31+3))
int main(void)
{
int x = 2;
for (unsigned mask = 1U << (IMAX_BITS((unsigned)-1) - 1); mask; mask >>= 1)
{
putchar((unsigned) x & mask ? '1' : '0');
}
puts("");
}
See this answer for an explanation of this strange macro.

Void Pointers In C

I'm beginner in C Programming and Now learning concepts of Pointers. Here's my code -->>
`#include<stdio.h>
int main()
{
char t='s';
int a=10;
float s=89;
void *ptr;
ptr =&s;
printf("%c\t",*((char*)ptr));
printf("%d\t",*((int*)ptr));
printf("%f\t",*((float*)ptr));
return 0;
}`
My question is When I deferenced a void pointer which points to the Floating Point number into Char then why the output is Blank Space and for Integer it is 1118961664. I wish to know that what's going on in the Byte Level and Is it depends on Alignment of Bytes and Architecture!!
Often, the float s variable is 4 bytes long and its value (89) is represented in IEEE 754 format. For simplicity, let's assume this.
The first printf (char) will use the first byte of the original float variable s (because one C char = 1 byte). Here the first byte means the byte located in the lowest memory position of the 4 bytes of float s. Yes, this is dependent on the machine's byte alignment. The output is blank most likely because the first byte corresponds to an ascii control charater (for example 0).
The second printf, assuming that an int is 4-bytes long in your machine/compiler, will take the same 4 bytes as the third printf but will print them as an integer (note the difference between integers and IEEE 754 floating point numbers). It turns out that the IEEE 754 representation of 89 corresponds to a "1118961664" integer. This will also be dependent on byte alignment.
The third printf is doing the right thing, it will use the bytes where s's value (89) is stored, and interpret them as a floating point number. It should print 89.0. This does not depend on byte alignment.
If the size or representation of floats were different the details would change (how many bytes come from where, and what number is printed by the second printf) but the behavior would be similar. Note also that in principle the first two printf calls have undefined behavior.
float numbers (that is of type float) occupy 4 bytes. In this expression *((char*)ptr)) the first byte of s is interpretated as a character. In expression *((int*)ptr)) all four bytes of s are interpretated as an integer provided that sizeof( int ) equal to 4.
As internal representation of float and int are different you get different vresults.

A small program for understanding unions in C [duplicate]

Suppose I define a union like this:
#include <stdio.h>
int main() {
union u {
int i;
float f;
};
union u tst;
tst.f = 23.45;
printf("%d\n", tst.i);
return 0;
}
Can somebody tell me what the memory where tst is stored will look like?
I am trying to understand the output 1102813594 that this program produces.
It depends on the implementation (compiler, OS, etc.) but you can use the debugger to actually see the memory contents if you want.
For example, in my MSVC 2008:
0x00415748 9a 99 bb 41
is the memory contents. Read from LSB on the left side (Intel, little-endian machine), this is 0x41bb999a or indeed 1102813594.
Generally, however, the integer and float are stored in the same bytes. Depending on how you access the union, you get the integer or floating point interpretation of those bytes. The size of the memory space, again, depends on the implementation, although it's usually the largest of its constituents aligned to some fixed boundary.
Why is the value such as it is in your (or mine) case? You should read about floating-point number representation for that (look up ieee 754)
The result is depends on the compiler implementation, But for most x86 compilers, float and int will be the same size. Wikipedia has a pretty good diagram of the layout of a 32 bit float http://en.wikipedia.org/wiki/Single_precision_floating-point_format, that can help to explain 1102813594.
If you print out the int as a hex value, it will be easier to figure out.
printf("%x\n", tst.i);
With a union, both variables are stored starting at the same memory location. A float is stored in an IEEE format (can't remember the standard number, you can look that up[edit: as pointed out by others, IEEE 754]). But, it will be a two's complement normalized (mantissa is always between 0 and 10, exponent can be anything) floating point number.
you are taking the first 4 bytes of that number (again, you can look up what bits go where in the 16 or 32 bits that a float takes up, can't remember). So it basically means nothing and it isn't useful as an int. That is, unless you know why you would want to do something like that, but usually, a float and int combo isn't very useful.
And, no, I don't think it is implementation defined. I believe that the standard dictates what format a float is in.
In union, members will be share the same memory. so that we can get the float value as integer value.
Floating number format will be different from integer storage. so that we can understand the difference using the union.
For Ex:
If I store the 12 integer value in ( 32 bits ). we can get this 12 value as floating point format.
It will stored as signed(1 bit), exponent(8 bits) and significant precision(23 bits).
I wrote a little program that shows what happens when you preserve the bit pattern of a 32-bit float into a 32-bit integer. It gives you the exact same output you are experiencing:
#include <iostream>
int main()
{
float f = 23.45;
int x = *reinterpret_cast<int*>(&f);
std::cout << x; // 1102813594
}

Resources