Casting int pointer to char pointer causes loss of data in C? - c

I have the following piece of code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 0;
printf("n = %d\n", n);
system("PAUSE");
return 0;
}
The output put of the program is n = 256.
I may understand why it is, but I am not really sure.
Can anyone give me a clear explanation, please?
Thanks a lot.

The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int:
0x04 0x01 0x00 0x00
By using a char pointer, you point to the first byte and change it to 0x00, which changes the int to 256 (= 256 * 1 + 0).

You're apparently working on a little-endian machine. What's happening is that you're starting with an int that takes up at least two bytes. The value 260 is 256+4. The 256 goes in the second byte, and the 4 in the first byte. When you write 0 to the first byte, you're left with only the 256 in the second byte.

In C a pointer references a block of bytes based on the type associated with the pointer. So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value

I understood what exactly happens by changing value:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 20;
printf("pp = %d\n", (int)*pp);
printf("n = %d\n", (int)n);
system("PAUSE");
return 0;
}
The output value are
20
and
276
So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number

Your problem is the assignment
*pp = 0;
You're dereferencing pp which points to n, and changing n.
However, pp is a char pointer so it doesn't change all of n
which is an int. This causes the binary complications in the other answers.

In terms of the C language, the description for what you are doing is modifying the representation of the int variable n. In C, all types have a "representation" as one or more bytes (unsigned char), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here.
As schnaader answered, on a little endian, twos complement implementation with 32-bit int, the representation of 260 is:
0x04 0x01 0x00 0x00
and overwriting the first byte with 0 yields:
0x00 0x01 0x00 0x00
which is the representation for 256 on such an implementation.
C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t, it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent).

Considering 32 bit systems,
256 will be represented in like this.
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000100(Byte-0)
Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. So, only the LSB gets changes to zero, not all the 4 bytes.
And it becomes
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000000(Byte-0)
Hence, the o/p is 256.

Related

declaring string using pointer to int

I am trying to initialize a string using pointer to int
#include <stdio.h>
int main()
{
int *ptr = "AAAA";
printf("%d\n",ptr[0]);
return 0;
}
the result of this code is 1094795585
could any body explain this behavior and why the code gave this answers ?
I am trying to initialize a string using pointer to int
The string literal "AAAA" is of type char[5], that is array of five elements of type char.
When you assign:
int *ptr = "AAAA";
you actually must use explicit cast (as types don't match):
int *ptr = (int *) "AAAA";
But, still it's potentially invalid, as int and char objects may have different alignment requirements. In other words:
alignof(char) != alignof(int)
may hold. Also, in this line:
printf("%d\n", ptr[0]);
you are invoking undefined behavior (so it might print "Hello from Mars" if compiler likes so), as ptr[0] dereferences ptr, thus violating strict aliasing rule.
Note that it is valid to make transition int * ---> char * and read object as char *, but not the opposite.
the result of this code is 1094795585
The result makes sense, but for that, you need to rewrite your program in valid form. It might look as:
#include <stdio.h>
#include <string.h>
union StringInt {
char s[sizeof("AAAA")];
int n[1];
};
int main(void)
{
union StringInt si;
strcpy(si.s, "AAAA");
printf("%d\n", si.n[0]);
return 0;
}
To decipher it, you need to make some assumptions, depending on your implementation. For instance, if
int type takes four bytes (i.e. sizeof(int) == 4)
CPU has little-endian byte ordering (though it's not really matter, since every letter is the same)
default character set is ASCII (the letter 'A' is represented as 0x41, that is 65 in decimal)
implementation uses two's complement representation of signed integers
then, you may deduce, that si.n[0] holds in memory:
0x41 0x41 0x41 0x41
that is in binary:
01000001 ...
The sign (most-significant) bit is unset, hence it is just equal to:
65 * 2^24 + 65 * 2^16 + 65 * 2^8 + 65 =
65 * (2^24 + 2^16 + 2^8 + 1) = 65 * 16843009 = 1094795585
1094795585 is correct.
'A' has the ASCII value 65, i.e. 0x41 in hexadecimal.
Four of them makes 0x41414141 which is equal to 1094795585 in decimal.
You got the value 65656565 by doing 65*100^0 + 65*100^1 + 65*100^2 + 65*100^3 but that's wrong since a byte1 can contain 256 different values, not 100.
So the correct calculation would be 65*256^0 + 65*256^1 + 65*256^2 + 65*256^3, which gives 1094795585.
It's easier to think of memory in hexadecimal because one hexadecimal digit directly corresponds to half a byte1, so two hex digits is one full byte1 (cf. 0x41). Whereas in decimal, 255 fits in a single byte1, but 256 does not.
1 assuming CHAR_BIT == 8
65656565 this is a wrong representation of the value of "AAAA" you are seprately representing each character and "AAAA" is stored as array.Its converting into 1094795585 because %d identifier prints decimal value. Run this in gdb with following command:
x/8xb (pointer) //this will show you the memory hex value
x/d (pointer) //this will show you the converted decimal value
#zenith gave you the answer you expected, but your code invokes UB. Anyway, you could demonstrate the same in an almost correct way :
#include <stdio.h>
int main()
{
int i, val;
char *pt = (char *) &val; // cast a pointer to any to a pointer to char : valid
for (i=0; i<sizeof(int); i++) pt[i] = 'A'; // assigning bytes of int : UB in general case
printf("%d 0x%x\n",val, val);
return 0;
}
Assigning bytes of an int is UB in the general case because C standard says that [for] signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. And a remark adds Some combinations of padding bits might generate trap representations, for example, if one padding
bit is a parity bit.
But in common architectures, there are no padding bits and all bits values correspond to valid numbers, so the operation is valid (but implementation dependant) on all common systems. It is still implementation dependant because size of int is not fixed by standard, nor is endianness.
So : on a 32 bit system using no padding bits, above code will produce
1094795585 0x41414141
indepentantly of endianness.

Use struct pointer to assign array

I have the following code
typedef struct
{
int a;
int b;
}DATA;
int _tmain(int argc, _TCHAR* argv[])
{
DATA *N = NULL;
unsigned char buff[65536];
N = (DATA*)&buff;
N->a = 1000;
N->b = 50000;
for(int i =0; i < 8; i ++)
printf("buff[%d]: %d\n", i, buff[i]);
return 0;
}
Which renders the following output:
buff[0]: 232
buff[1]: 3
buff[2]: 0
buff[3]: 0
buff[4]: 80
buff[5]: 195
buff[6]: 0
buff[7]: 0
Can anybody tell me in which way the buff Array got assigned ?
You can think of pointers as type-size specifiers. A pointer is roughly an address associated with a size to consider when dereferencing. When declaring int c = 2; int *p = &2; you are associating the size of an int (say 4bytes) with an address the compiler will define. When you actually dereference p, int x = *p, the compiler knows you are trying to access 4bytes at address p.
Now, if you succeed in thinking of it this way, why your buffer gets filled this way will be as clear as ever. Your unsigned char array is a buffer of 65536 bytes. When you cast the address 'buff' into a (DATA *), you are specifying a new size (the size of type DATA) for dereferencing. Therefore, 1000 will be written on the 4bytes of N->a and 50000 on the 4bytes of N->b.
Now, 1000 in decimal is 3E8 in hex; since 4bytes need to be written, it will be sign extended and will be written bytes 00, 00, 03, E8, which in decimal is, as you would expect, bytes 0, 0, 3, 232. Now you might think they were written in reverse order, but this is because of endianess. Endianess is the way the processor actually writes bytes into memory, and in your case, your processor writes and reads bytes in reverse order, hence they are put in memory in this order. Same goes for N->b, as 50000 in decimal equals to the decimal bytes, 0, 0, 195, 80.
You wrote two 4-byte integers into the buffer in little-endian format. The program is behaving exactly as expected.

Data stored with pointers

void *memory;
unsigned int b=65535; //1111 1111 1111 1111 in binary
int i=0;
memory= &b;
for(i=0;i<100;i++){
printf("%d, %d, d\n", (char*)memory+i, *((unsigned int * )((char *) memory + i)));
}
I am trying to understand one thing.
(char*)memory+i - print out adress in range 2686636 - 2686735.
and when i store 65535 with memory= &b this should store this number at adress 2686636 and 2686637
because every adress is just one byte so 8 binary characters so when i print it out
*((unsigned int * )((char *) memory + i)) this should print 2686636, 255 and 2686637, 255
instead of it it prints 2686636, 65535 and 2686637, random number
I am trying to implement memory allocation. It is school project. This should represent memory. One adress should be one byte so header will be 2686636-2586639 (4 bytes for size of block) and 2586640 (1 byte char for free or used memory flag). Can someone explain it to me thanks.
Thanks for answers.
void *memory;
void *abc;
abc=memory;
for(i=0;i<100;i++){
*(int*)abc=0;
abc++;
}
*(int*)memory=16777215;
for(i=0;i<100;i++){
printf("%p, %c, %d\n", (char*)memory+i, *((char *)memory +i), *((char *)memory +i));
}
output is
0028FF94,  , -1
0028FF95,  , -1
0028FF96,  , -1
0028FF97, , 0
0028FF98, , 0
0028FF99, , 0
0028FF9A, , 0
0028FF9B, , 0
i think it works. 255 only one -1, 65535 2 times -1 and 16777215 3 times -1.
In your program it seems that address of b is 2686636 and when you will write (char*)memory+i or (char*)&b+i it means this pointer is pointing to char so when you add one to it will jump to only one memory address i.e2686637 and so on till 2686735(i.e.(char*)2686636+99).
now when you are dereferencing i.e.*((unsigned int * )((char *) memory + i))) you are going to get the value at that memory address but you have given value to b only (whose address is 2686636).all other memory address have garbage values which you are printing.
so first you have to store some data at the rest of the addresses(2686637 to 2686735)
good luck..
i hope this will help
I did not mention this in my comments yesterday but it is obvious that your for loop from 0 to 100 overruns the size of an unsigned integer.
I simply ignored some of the obvious issues in the code and tried to give hints on the actual question you asked (difficult to do more than that on a handy :-)). Unfortunately I did not have time to complete this yesterday. So, with one day delay my hints for you.
Try to avoid making assumptions about how big a certain type is (like 2 bytes or 4 bytes). Even if your assumption holds true now, it might change if you switch the compiler or switch to another platform. So use sizeof(type) consequently throughout the code. For a longer discussion on this you might want to take a look at: size of int, long a.s.o. on Stack Overflow. The standard mandates only the ranges a certain type should be able to hold (0-65535 for unsigned int) so a minimal size for types only. This means that the size of int might (and tipically is) bigger than 2 bytes. Beyond primitive types sizeof helps you also with computing the size of structures where due to memory alignment && packing the size of a structure might be different from what you would "expect" by simply looking at its attributes. So the sizeof operator is your friend.
Make sure you use the correct formatting in printf.
Be carefull with pointer arithmetic and casting since the result depends on the type of the pointer (and obviously on the value of the integer you add with).
I.e.
(unsigned int*)memory + 1 != (unsigned char*)memory + 1
(unsigned int*)memory + 1 == (unsigned char*)memory + 1 * sizeof(unsigned int)
Below is how I would write the code:
//check how big is int on our platform for illustrative purposes
printf("Sizeof int: %d bytes\n", sizeof(unsigned int));
//we initialize b with maximum representable value for unsigned int
//include <limits.h> for UINT_MAX
unsigned int b = UINT_MAX; //0xffffffff (if sizeof(unsigned int) is 4)
//we print out the value and its hexadecimal representation
printf("B=%u 0x%X\n", b, b);
//we take the address of b and store it in a void pointer
void* memory= &b;
int i = 0;
//we loop the unsigned chars starting at the address of b up to the sizeof(b)
//(in our case b is unsigned int) using sizeof(b) is better since if we change the type of b
//we do not have to remember to change the sizeof in the for loop. The loop works just the same
for(i=0; i<sizeof(b); ++i)
{
//here we kept %d for formating the individual bytes to represent their value as numbers
//we cast to unsigned char since char might be signed (so from -128 to 127) on a particular
//platform and we want to illustrate that the expected (all bytes 1 -> printed value 255) occurs.
printf("%p, %d\n", (unsigned char *)memory + i, *((unsigned char *) memory + i));
}
I hope you will find this helpfull. And good luck with your school assignment, I hope you learned something you can use now and in the future :-).

how to create pointer to a bit in c-language

As we know a in c-language char pointer traverse memory byte by byte i.e. 1 byte each time,
and integer pointer 4 byte each time(in gcc compiler), 2 byte each time(in TC compiler).
for example:
char *cptr; // if this points to 0x100
cptr++; // now it points to 0x101
int *iptr; // if this points to 0x100
iptr++; // now it points to 0x104
My question is:
How to create a bit pointer in c which on incrementing traverse memory bit by bit?
The char is the 'smallest addressable unit' in C. You can't point directly at something smaller than that (such as a bit).
You can't. Using pointers, it's not possible to manipulate bits directly. (Do you really expect poor hypothetical bit *p = 1; p++ to return 1.125?)
However, you can use bitwise operators, such as <<, >>, | and & to access a specific bit within a byte.
Conceptually, a "bit pointer" is not a single scalar, but an ordered pair consisting of a byte pointer and a bit index within that byte. You can represent this with a structure containing both, or with two separate objects. Performing arithmetic on them requires some modular reduction on your part; for example, if you want to access the bit 10 bits past a given bit, you have to add 10 to the bit index, then reduce it modulo 8, and increment the byte pointer part appropriately.
Incidentally, on historical systems that only had word-addressable memory, not byte-addressable, char * consisted of a word pointer and a byte index within the word. This is the exact same concept. The difference is that, while C provides char * even on machines without byte-addressable memory, it does not provide any built-in "bit pointer" type. You have to create it yourself if you want it.
No, but you can write a function to read the bits one by one:
int readBit(char *byteData, int bitOffset)
{
const int wholeBytes = bitOffset / 8;
const int remainingBits = bitOffset % 8;
return (byteData[wholeBytes] >> remainingBits) & 1;
//or if you want most significant bit to be 0
//return (byteData[wholeBytes] >> (7-remainingBits)) & 1;
}
Usage:
char *data = any memory you like.
int bitPointer=0;
int bit0 = readBit(data, bitPointer);
bitPointer++;
int bit1 = readBit(data, bitPointer);
bitPointer++;
int bit2 = readBit(data, bitPointer);
Of course if this kind of function had general value it would probably already exist. Operating bit-by-bit is just so inefficient compared to using bit masks, and shifts etc.
I don't think that is possible since modern computers are byte addressable which means that there is one address for each byte. So a bit has no address and as such a pointer cant point to it. You could use a char * and bitwise operations to determine the value of individual bits.
If you really want it you could write a class that uses a char* to keep track of the address in memory, a char(or short/int however the value would never need to be higher than 0000 0111 so a char would reduce the memory footprint) to keep track of which bit in that byte you are at and then overload the operators so that it functions as you want it to.
I am not sure what you are asking is possible. You need to do some magic with bit shifting to traverse through all the bits of a byte pointed by the pointer.
You could always cast your pointer to integer, that is at least 3 bits bigger in size than byte pointer used at the system. Then just shift the pointer after the cast left by 3 bits. Then store the bit information on the least significant 3 bits.
This integer "bitpointer" can then be incremented with normal arithmetic.
Something like this:
#include <stdio.h>
#define bitptr long long
#define create_bitptr(pointer,bit) ((((bitptr)pointer)<<3)|bit) ;
#define get_bit(bptr) ((bptr)&7)
#define get_value(bptr) (*((char*)((bptr)>>3)))
#define set_bit(bptr) get_value(bptr) |= 1<<get_bit(bptr)
#define clear_bit(bptr) get_value(bptr) &= (~(1<<get_bit(bptr)))
int main(void)
{
char variable=0;
bitptr p ;
p=create_bitptr(&variable,0) ;
set_bit(p) ; p++ ; //1
clear_bit(p) ; p++ ; //0
set_bit(p) ; p++ ; //1
clear_bit(p) ; p++ ; //0
clear_bit(p) ; p++ ; //0
clear_bit(p) ; p++ ; //0
clear_bit(p) ; p++ ; //0
clear_bit(p) ; p++ ; //0
printf("%d\n",variable) ;
return 0;
}
With pointers it does not look like possible.But to write or read any bit of the data you can try this one.
unsigned char data;
struct _p
{
unsigned char B0:1;
unsigned char B1:1;
unsigned char B2:1;
unsigned char B3:1;
unsigned char B4:1;
unsigned char B5:1;
unsigned char B6:1;
unsigned char B7:1;
}
int main()
{
data = 15;
_p * point = ( _p * ) & data;
//you can read and write any bit of the byte with point->BX; ( Ex: printf( "%d" , point->B0;point->B5 = 1;
}

Storing an unsigned integer into a struct

I've written this piece of code where I've assigned an unsigned integer to two different structs. In fact they're the same but one of them has the __attribute__((packed)).
#include
#include
struct st1{
unsigned char opcode[3];
unsigned int target;
}__attribute__((packed));
struct st2{
unsigned char opcode[3];
unsigned int target;
};
void proc(void* addr) {
struct st1* varst1 = (struct st1*)addr;
struct st2* varst2 = (struct st2*)addr;
printf("opcode in varst1: %c,%c, %c\n",varst1->opcode[0],varst1->opcode[1],varst1->opcode[2]);
printf("opcode in varst2: %c,%c,%c\n",varst2->opcode[0],varst2->opcode[1],varst2->opcode[2]);
printf("target in varst1: %d\n",varst1->target);
printf("target in varst2: %d\n",varst2->target);
};
int main(int argc,char* argv[]) {
unsigned int* var;
var =(unsigned int*) malloc(sizeof(unsigned int));
*var = 0x11334433;
proc((void*)var);
return 0;
}
The output is:
opcode in varst1: 3,D,3
opcode in varst2: 3,D,3
target in varst1: 17
target in varst2: 0
Given that I'm storing this number
0x11334433 == 00010001001100110100010000110011
I'd like to know why that is the output I get.
This is to do with data alignment. Most compilers will align data on address boundaries that help with general performance. So, in the first case, the struct with the packed attribute, there is an extra byte between the char [3] and the int to align the int on a four byte boundary. In the packed version that padding byte is missing.
byte : 0 1 2 3 4 5 6 7
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
You allocate an unsigned int and pass that to the function:
byte : 0 1 2 3 4 5 6 7
alloc : |-----------int------------------| |---unallocated---|
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
If you're using a little endian system then the lowest eight bits (right most) are stored at byte 0 (0x33), byte 1 has 0x44, byte 2 has 0x33 and byte 4 has 0x11. In the st1 structure the int value is mapped to memory beyond the end of the allocated amount and the st2 version the lowest byte of the int is mapped to the byte 4, 0x11. So st1 produces 0 and st2 produces 0x11.
You are lucky that the unallocated memory is zero and that you have no memory range checking going on. Writing to the ints in st1 and st2 in this case could corrupt memory at worst, generate memory guard errors or do nothing. It is undefined and dependant on the runtime implementation of the memory manager.
In general, avoid void *.
Your bytes look like this:
00010001 00110011 01000100 00110011
Though obviously your endianness is wrong and in fact they're like this:
00110011 01000100 00110011 00010001
If your struct is packed then the first three bytes are associated with opcode, and the 4th is target - thats why the packed array has atarget of 17 - 0001001 in binary.
The unpacked array is padded with zeros, which is why target in varst2 is zero.
%c interprets the argument as the ascii code of a character and prints the character
3's ascii code is 0x33
D's ascii code is 0x44
17 is 0x11
an int is stored little endian or big endian depending on the processor architecture -- you can't depend on it going into your struct's fields in order.
The int target in the unpacked version is past the position of the int, so it stays 0.

Resources