how to create pointer to a bit in c-language - c

As we know a in c-language char pointer traverse memory byte by byte i.e. 1 byte each time,
and integer pointer 4 byte each time(in gcc compiler), 2 byte each time(in TC compiler).
for example:
char *cptr; // if this points to 0x100
cptr++; // now it points to 0x101
int *iptr; // if this points to 0x100
iptr++; // now it points to 0x104
My question is:
How to create a bit pointer in c which on incrementing traverse memory bit by bit?

The char is the 'smallest addressable unit' in C. You can't point directly at something smaller than that (such as a bit).

You can't. Using pointers, it's not possible to manipulate bits directly. (Do you really expect poor hypothetical bit *p = 1; p++ to return 1.125?)
However, you can use bitwise operators, such as <<, >>, | and & to access a specific bit within a byte.

Conceptually, a "bit pointer" is not a single scalar, but an ordered pair consisting of a byte pointer and a bit index within that byte. You can represent this with a structure containing both, or with two separate objects. Performing arithmetic on them requires some modular reduction on your part; for example, if you want to access the bit 10 bits past a given bit, you have to add 10 to the bit index, then reduce it modulo 8, and increment the byte pointer part appropriately.
Incidentally, on historical systems that only had word-addressable memory, not byte-addressable, char * consisted of a word pointer and a byte index within the word. This is the exact same concept. The difference is that, while C provides char * even on machines without byte-addressable memory, it does not provide any built-in "bit pointer" type. You have to create it yourself if you want it.

No, but you can write a function to read the bits one by one:
int readBit(char *byteData, int bitOffset)
{
const int wholeBytes = bitOffset / 8;
const int remainingBits = bitOffset % 8;
return (byteData[wholeBytes] >> remainingBits) & 1;
//or if you want most significant bit to be 0
//return (byteData[wholeBytes] >> (7-remainingBits)) & 1;
}
Usage:
char *data = any memory you like.
int bitPointer=0;
int bit0 = readBit(data, bitPointer);
bitPointer++;
int bit1 = readBit(data, bitPointer);
bitPointer++;
int bit2 = readBit(data, bitPointer);
Of course if this kind of function had general value it would probably already exist. Operating bit-by-bit is just so inefficient compared to using bit masks, and shifts etc.

I don't think that is possible since modern computers are byte addressable which means that there is one address for each byte. So a bit has no address and as such a pointer cant point to it. You could use a char * and bitwise operations to determine the value of individual bits.
If you really want it you could write a class that uses a char* to keep track of the address in memory, a char(or short/int however the value would never need to be higher than 0000 0111 so a char would reduce the memory footprint) to keep track of which bit in that byte you are at and then overload the operators so that it functions as you want it to.

I am not sure what you are asking is possible. You need to do some magic with bit shifting to traverse through all the bits of a byte pointed by the pointer.

You could always cast your pointer to integer, that is at least 3 bits bigger in size than byte pointer used at the system. Then just shift the pointer after the cast left by 3 bits. Then store the bit information on the least significant 3 bits.
This integer "bitpointer" can then be incremented with normal arithmetic.
Something like this:
#include <stdio.h>
#define bitptr long long
#define create_bitptr(pointer,bit) ((((bitptr)pointer)<<3)|bit) ;
#define get_bit(bptr) ((bptr)&7)
#define get_value(bptr) (*((char*)((bptr)>>3)))
#define set_bit(bptr) get_value(bptr) |= 1<<get_bit(bptr)
#define clear_bit(bptr) get_value(bptr) &= (~(1<<get_bit(bptr)))
int main(void)
{
char variable=0;
bitptr p ;
p=create_bitptr(&variable,0) ;
set_bit(p) ; p++ ; //1
clear_bit(p) ; p++ ; //0
set_bit(p) ; p++ ; //1
clear_bit(p) ; p++ ; //0
clear_bit(p) ; p++ ; //0
clear_bit(p) ; p++ ; //0
clear_bit(p) ; p++ ; //0
clear_bit(p) ; p++ ; //0
printf("%d\n",variable) ;
return 0;
}

With pointers it does not look like possible.But to write or read any bit of the data you can try this one.
unsigned char data;
struct _p
{
unsigned char B0:1;
unsigned char B1:1;
unsigned char B2:1;
unsigned char B3:1;
unsigned char B4:1;
unsigned char B5:1;
unsigned char B6:1;
unsigned char B7:1;
}
int main()
{
data = 15;
_p * point = ( _p * ) & data;
//you can read and write any bit of the byte with point->BX; ( Ex: printf( "%d" , point->B0;point->B5 = 1;
}

Related

Why does a reference to an int return only one memory address?

Example Program:
#include <stdio.h>
int main() {
int x = 0;
printf("%p", &x);
return 0;
}
I have read that most machines are byte-accessible, meaning that only one
byte can be stored on a single memory address (e.g. 0xf4829cba stores the value 01101011). Assuming that x is a 32-bit integer, shouldn't the reference to the variable return four memory addresses, instead of one?
Please ELI5, as I am very confused right now.
Thank you so much for your time.
-Matt
The address (it's not a "reference") you're given is to the beginning of the memory where the variable is stored. The variable will then take as many bytes as needed according to its type. So if int is 32 bits in your target architecture, the address you get is of the first of four bytes used to store that int.
+−−−−−−−−+
address−−−>| byte 0 |
| byte 1 |
| byte 2 |
| byte 3 |
+−−−−−−−−+
It may help to think in terms of objects1 rather than bytes. Most useful data types in C take up more than a single byte.
As for an expression like &x evaluating to multiple addresses, think of it like the address to your house - you don't specify a distinct address for every room in the house, do you? No, for the purpose of telling other people where your house is, you only need to specify one address. For the purpose of knowing where an int ordouble or struct humongous object is, we only need to know the address of the first byte.
You can access and manipulate individual bytes in a larger object in several different ways. You can use bit masking operations like
int x = some_value;
unsigned char aByte = (x & 0xFF000000) >> 24; // isolate the MSB
or you can map the object onto an array of unsigned char using a union:
union {
int x;
unsigned char b[sizeof (int)];
} u;
u.x = some_value;
aByte = u.b[0]; // access the initial byte - depending on byte ordering, this
// may be the MSB or the LSB.
or by creating a pointer to the first byte:
int x = some_value;
unsigned char *b = (unsigned char *) &x;
unsigned char aByte = b[0];
Byte ordering is a thing - some architectures store multi-byte values starting at the most significant byte, others starting at the least significant byte:
For any address A
A+0 A+1 A+2 A+3
Big endian +---+---+---+---+
|MSB| | |LSB|
+---+---+---+---+ Little endian
A+3 A+2 A+1 A+0
The M68K chips that powered the original Macintosh were big-endian, while x86 is little-endian.
Bitwise operators like & and | take byte ordering into account - x & 0xFF000000 will always isolate the MSB2. When you map an object onto an array of unsigned char, the first element may map to the MSB, or it may map to the LSB, or it may map to something else (the old VAX architecture used a "middle-endian" ordering for 32-bit floats that either went 2301 or 1032, can't remember which offhand).
In the C sense of a region of storage that may be used to hold a value, not the OOP sense of an instance of a class.
Assuming 32-bit int and 8-bit bytes, anyway.

how can split integers into bytes without using arithmetic in c?

I am implementing four basic arithmetic functions(add, sub, division, multiplication) in C.
the basic structure of these functions I imagined is
the program gets two operands by user using scanf,
and the program split these values into bytes and compute!
I've completed addition and subtraction,
but I forgot that I shouldn't use arithmetic functions,
so when splitting integer into single bytes,
I wrote codes like
while(quotient!=0){
bin[i]=quotient%2;
quotient=quotient/2;
i++;
}
but since there is arithmetic functions that i shouldn't use..
so i have to rewrite that splitting parts,
but i really have no idea how can i split integer into single byte without using
% or /.
To access the bytes of a variable type punning can be used.
According to the Standard C (C99 and C11), only unsigned char brings certainty to perform this operation in a safe way.
This could be done in the following way:
typedef unsigned int myint_t;
myint_t x = 1234;
union {
myint_t val;
unsigned char byte[sizeof(myint_t)];
} u;
Now, you can of course access to the bytes of x in this way:
u.val = x;
for (int j = 0; j < sizeof(myint_t); j++)
printf("%d ",u.byte[j]);
However, as WhozCrag has pointed out, there are issues with endianness.
It cannot be assumed that the bytes are in determined order.
So, before doing any computation with bytes, your program needs to check how the endianness works.
#include <limits.h> /* To use UCHAR_MAX */
unsigned long int ByteFactor = 1u + UCHAR_MAX; /* 256 almost everywhere */
u.val = 0;
for (int j = sizeof(myint_t) - 1; j >= 0 ; j--)
u.val = u.val * ByteFactor + j;
Now, when you print the values of u.byte[], you will see the order in that bytes are arranged for the type myint_t.
The less significant byte will have value 0.
I assume 32 bit integers (if not the case then just change the sizes) there are more approaches:
BYTE pointer
#include<stdio.h>
int x; // your integer or whatever else data type
BYTE *p=(BYTE*)&x;
x=0x11223344;
printf("%x\n",p[0]);
printf("%x\n",p[1]);
printf("%x\n",p[2]);
printf("%x\n",p[3]);
just get the address of your data as BYTE pointer
and access the bytes directly via 1D array
union
#include<stdio.h>
union
{
int x; // your integer or whatever else data type
BYTE p[4];
} a;
a.x=0x11223344;
printf("%x\n",a.p[0]);
printf("%x\n",a.p[1]);
printf("%x\n",a.p[2]);
printf("%x\n",a.p[3]);
and access the bytes directly via 1D array
[notes]
if you do not have BYTE defined then change it for unsigned char
with ALU you can use not only %,/ but also >>,& which is way faster but still use arithmetics
now depending on the platform endianness the output can be 11,22,33,44 of 44,33,22,11 so you need to take that in mind (especially for code used in multiple platforms)
you need to handle sign of number, for unsigned integers there is no problem
but for signed the C uses 2'os complement so it is better to separate the sign before spliting like:
int s;
if (x<0) { s=-1; x=-x; } else s=+1;
// now split ...
[edit2] logical/bit operations
x<<n,x>>n - is bit shift left and right of x by n bits
x&y - is bitwise logical and (perform logical AND on each bit separately)
so when you have for example 32 bit unsigned int (called DWORD) yu can split it to BYTES like this:
DWORD x; // input 32 bit unsigned int
BYTE a0,a1,a2,a3; // output BYTES a0 is the least significant a3 is the most significant
x=0x11223344;
a0=DWORD((x )&255); // should be 0x44
a1=DWORD((x>> 8)&255); // should be 0x33
a2=DWORD((x>>16)&255); // should be 0x22
a3=DWORD((x>>24)&255); // should be 0x11
this approach is not affected by endianness
but it uses ALU
the point is shift the bits you want to position of 0..7 bit and mask out the rest
the &255 and DWORD() overtyping is not needed on all compilers but some do weird stuff without them especially on signed variables like char or int
x>>n is the same as x/(pow(2,n))=x/(1<<n)
x&((1<<n)-1) is the same as x%(pow(2,n))=x%(1<<n)
so (x>>8)=x/256 and (x&255)=x%256

Copying a 4 element character array into an integer in C

A char is 1 byte and an integer is 4 bytes. I want to copy byte-by-byte from a char[4] into an integer. I thought of different methods but I'm getting different answers.
char str[4]="abc";
unsigned int a = *(unsigned int*)str;
unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3];
unsigned int c;
memcpy(&c, str, 4);
printf("%u %u %u\n", a, b, c);
Output is
6513249 1633837824 6513249
Which one is correct? What is going wrong?
It's an endianness issue. When you interpret the char* as an int* the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86 which is little endian), while with the manual conversion the first byte becomes the most significant.
To put this into pictures, this is the source array:
a b c \0
+------+------+------+------+
| 0x61 | 0x62 | 0x63 | 0x00 | <---- bytes in memory
+------+------+------+------+
When these bytes are interpreted as an integer in a little endian architecture the result is 0x00636261, which is decimal 6513249. On the other hand, placing each byte manually yields 0x61626300 -- decimal 1633837824.
Of course treating a char* as an int* is undefined behavior, so the difference is not important in practice because you are not really allowed to use the first conversion. There is however a way to achieve the same result, which is called type punning:
union {
char str[4];
unsigned int ui;
} u;
strcpy(u.str, "abc");
printf("%u\n", u.ui);
Neither of the first two is correct.
The first violates aliasing rules and may fail because the address of str is not properly aligned for an unsigned int. To reinterpret the bytes of a string as an unsigned int with the host system byte order, you may copy it with memcpy:
unsigned int a; memcpy(&a, &str, sizeof a);
(Presuming the size of an unsigned int and the size of str are the same.)
The second may fail with integer overflow because str[0] is promoted to an int, so str[0]<<24 has type int, but the value required by the shift may be larger than is representable in an int. To remedy this, use:
unsigned int b = (unsigned int) str[0] << 24 | …;
This second method interprets the bytes from str in big-endian order, regardless of the order of bytes in an unsigned int in the host system.
unsigned int a = *(unsigned int*)str;
This initialization is not correct and invokes undefined behavior. It violates C aliasing rules an potentially violates processor alignment.
You said you want to copy byte-by-byte.
That means the the line unsigned int a = *(unsigned int*)str; is not allowed. However, what you're doing is a fairly common way of reading an array as a different type (such as when you're reading a stream from disk.
It just needs some tweaking:
char * str ="abc";
int i;
unsigned a;
char * c = (char * )&a;
for(i = 0; i < sizeof(unsigned); i++){
c[i] = str[i];
}
printf("%d\n", a);
Bear in mind, the data you're reading may not share the same endianness as the machine you're reading from. This might help:
void
changeEndian32(void * data)
{
uint8_t * cp = (uint8_t *) data;
union
{
uint32_t word;
uint8_t bytes[4];
}temp;
temp.bytes[0] = cp[3];
temp.bytes[1] = cp[2];
temp.bytes[2] = cp[1];
temp.bytes[3] = cp[0];
*((uint32_t *)data) = temp.word;
}
Both are correct in a way:
Your first solution copies in native byte order (i.e. the byte order the CPU uses) and thus may give different results depending on the type of CPU.
Your second solution copies in big endian byte order (i.e. most significant byte at lowest address) no matter what the CPU uses. It will yield the same value on all types of CPUs.
What is correct depends on how the original data (array of char) is meant to be interpreted.
E.g. Java code (class files) always use big endian byte order (no matter what the CPU is using). So if you want to read ints from a Java class file you have to use the second way. In other cases you might want to use the CPU dependent way (I think Matlab writes ints in native byte order into files, c.f. this question).
If your using CVI (National Instruments) compiler you can use the function Scan to do this:
unsigned int a;
For big endian:
Scan(str,"%1i[b4uzi1o3210]>%i",&a);
For little endian:
Scan(str,"%1i[b4uzi1o0123]>%i",&a);
The o modifier specifies the byte order.
i inside the square brackets indicates where to start in the str array.

Memory layout of struct having bitfields

I have this C struct: (representing an IP datagram)
struct ip_dgram
{
unsigned int ver : 4;
unsigned int hlen : 4;
unsigned int stype : 8;
unsigned int tlen : 16;
unsigned int fid : 16;
unsigned int flags : 3;
unsigned int foff : 13;
unsigned int ttl : 8;
unsigned int pcol : 8;
unsigned int chksm : 16;
unsigned int src : 32;
unsigned int des : 32;
unsigned char opt[40];
};
I'm assigning values to it, and then printing its memory layout in 16-bit words like this:
//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
unsigned short int* ptr = (unsigned short int*)&dgram;
int i,j;
//print only 10 words
for(i=0 ; i<10 ; i++)
{
for(j=15 ; j>=0 ; j--)
{
if( (*ptr) & (1<<j) ) printf("1");
else printf("0");
if(j%8==0)printf(" ");
}
ptr++;
printf("\n");
}
}
int main()
{
struct ip_dgram dgram;
dgram.ver = 4;
dgram.hlen = 5;
dgram.stype = 0;
dgram.tlen = 28;
dgram.fid = 1;
dgram.flags = 0;
dgram.foff = 0;
dgram.ttl = 4;
dgram.pcol = 17;
dgram.chksm = 0;
dgram.src = (unsigned int)htonl(inet_addr("10.12.14.5"));
dgram.des = (unsigned int)htonl(inet_addr("12.6.7.9"));
print_dgram(dgram);
return 0;
}
I get this output:
00000000 01010100
00000000 00011100
00000000 00000001
00000000 00000000
00010001 00000100
00000000 00000000
00001110 00000101
00001010 00001100
00000111 00001001
00001100 00000110
But I expect this:
The output is partially correct; somewhere, the bytes and nibbles seem to be interchanged. Is there some endianness issue here? Are bit-fields not good for this purpose? I really don't know. Any help? Thanks in advance!
No, bitfields are not good for this purpose. The layout is compiler-dependant.
It's generally not a good idea to use bitfields for data where you want to control the resulting layout, unless you have (compiler-specific) means, such as #pragmas, to do so.
The best way is probably to implement this without bitfields, i.e. by doing the needed bitwise operations yourself. This is annoying, but way easier than somehow digging up a way to fix this. Also, it's platform-independent.
Define the header as just an array of 16-bit words, and then you can compute the checksum easily enough.
The C11 standard says:
An implementation may allocate any addressable storage unit large
enough to hold a bitfield. If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined.
I'm pretty sure this is undesirable, as it means there might be padding between your fields, and that you can't control the order of your fields. Not just that, but you're at the whim of the implementation in terms of network byte order. Additionally, imagine if an unsigned int is only 16 bits, and you're asking to fit a 32-bit bitfield into it:
The expression that specifies the width of a bit-field shall be an
integer constant expression with a nonnegative value that does not
exceed the width of an object of the type that would be specified were
the colon and expression omitted.
I suggest using an array of unsigned chars instead of a struct. This way you're guaranteed control over padding and network byte order. Start off with the size in bits that you want your structure to be, in total. I'll assume you're declaring this in a constant such as IP_PACKET_BITCOUNT: typedef unsigned char ip_packet[(IP_PACKET_BITCOUNT / CHAR_BIT) + (IP_PACKET_BITCOUNT % CHAR_BIT > 0)];
Write a function, void set_bits(ip_packet p, size_t bitfield_offset, size_t bitfield_width, unsigned char *value) { ... } which allows you to set the bits starting at p[bitfield_offset / CHAR_BIT] bit bitfield_offset % CHARBIT to the bits found in value, up to bitfield_width bits in length. This will be the most complicated part of your task.
Then you could define identifiers for VER_OFFSET 0 and VER_WIDTH 4, HLEN_OFFSET 4 and HLEN_WIDTH 4, etc to make modification of the array seem less painless.
Although question was asked long time back, there's no answer with explaination of your result. I'll answer it, hopefully it'll be useful to someone.
I'll illustrate the bug using first 16 bits of your data structure.
Please Note: This explaination is guarranteed to be true only with the set of your processor and compiler. If any of these changes, behaviour may change.
Fields:
unsigned int ver : 4;
unsigned int hlen : 4;
unsigned int stype : 8;
Assigned to:
dgram.ver = 4;
dgram.hlen = 5;
dgram.stype = 0;
Compiler starts assigning bit fields starting with offset 0. This means first byte of your data structure is stored in memory as:
Bit offset: 7 4 0
-------------
| 5 | 4 |
-------------
First 16 bits after assignment look like this:
Bit offset: 15 12 8 4 0
-------------------------
| 5 | 4 | 0 | 0 |
-------------------------
Memory Address: 100 101
You are using Unsigned 16 pointer to dereference memory address 100. As a result address 100 is treated as LSB of a 16 bit number. And 101 is treated as MSB of a 16 bit number.
If you print *ptr in hex you'll see this:
*ptr = 0x0054
Your loop is running on this 16 bit value and hence you get:
00000000 0101 0100
-------- ---- ----
0 5 4
Solution:
Change order of elements to
unsigned int hlen : 4;
unsigned int ver : 4;
unsigned int stype : 8;
And use unsigned char * pointer to traverse and print values.
It should work.
Please note, as others've said, this behavior is platform and compiler specific. If any of these changes, you need to verify that memory layout of your data structure is correct.
For Chinese users, I think you can refer blog for more details, really good.
In summary, due to endianness, there is byte order as well as bit order. Bit order is the order how each bit of one byte saved in memory. Bit order has same rule with byte order in sense of endianness issue.
For your picture, it's designed in network order which is big endian. So your struct defination is actually for big endian. Per your output, your PC is little endian, so you need change struct field orders when use.
The way to show each bits is incorrect since when get by char, the bit order has changed from machine order (little endian in your case) to normal order which we human use. You may change it as following per refered blog.
void
dump_native_bits_storage_layout(unsigned char *p, int bytes_num)
{
union flag_t {
unsigned char c;
struct base_flag_t {
unsigned int p7:1,
p6:1,
p5:1,
p4:1,
p3:1,
p2:1,
p1:1,
p0:1;
} base;
} f;
for (int i = 0; i < bytes_num; i++) {
f.c = *(p + i);
printf("%d%d%d%d %d%d%d%d ",
f.base.p7,
f.base.p6,
f.base.p5,
f.base.p4,
f.base.p3,
f.base.p2,
f.base.p1,
f.base.p0);
}
printf("\n");
}
//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
unsigned char* ptr = (unsigned short int*)&dgram;
int i,j;
//print only 10 words
for(i=0 ; i<10 ; i++)
{
dump_native_bits_storage_layout(ptr, 1);
/* for(j=7 ; j>=0 ; j--)
{
if( (*ptr) & (1<<j) ) printf("1");
else printf("0");
if(j%8==0)printf(" ");
}*/
ptr++;
//printf("\n");
}
}
#unwind
A typical use case of Bit Fields is interpreting/emulation of byte code or CPU instructions with given layout. "Don't use it, because you cannot control it" is the answer for children.
#Bruce
For Intel/GCC I see a packed LITTLE ENDIAN bit layout, i.e. in struct ip_dgram field ver is represented by bits 0..3, field hlen is represented by bits 4..7 ...
For correctness of operation it is required to verify the memory layout against your design at runtime.
struct ModelIndicator
{
int a:4;
int b:4;
int c:4;
};
union UModelIndicator
{
ModelIndicator i;
int v;
};
// test packed little endian
static bool verifyLayoutModel()
{
UModelIndicator um;
um.v = 0;
um.i.a = 2; // 0..3
um.i.b = 3; // 4..7
um.i.c = 9; // 8..11
return um.v = (9 << 8) + (3 << 4) + 2;
}
int main()
{
if (!verifyLayoutModel())
{
std::cerr << "Invalid memory layout" << std::endl;
return -1;
}
// ...
}
At the earliest, when above test fails, you need to consider compiler pragmas or adjust your structures accordingly, resp. verifyLayoutModel().
I agree with what unwind said. Bit fields are compiler dependent.
If you need the bits to be in a specific order, pack the data into a pointer to a character array. Increment the buffer the size of the element being packed. Pack the next element.
pack( char** buffer )
{
if ( buffer & *buffer )
{
//pack ver
//assign first 4 bits to 4.
*((UInt4*) *buffer ) = 4;
*buffer += sizeof(UInt4);
//assign next 4 bits to 5
*((UInt4*) *buffer ) = 5;
*buffer += sizeof(UInt4);
... continue packing
}
}
Compiler dependant or not, It depends whether you want to write a very fast program or if you want one that works with different compilers. To write for C a fast, compact application, use a stuct with bit fields/. If you want a slow general purpose program , long code it.

Casting int pointer to char pointer causes loss of data in C?

I have the following piece of code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 0;
printf("n = %d\n", n);
system("PAUSE");
return 0;
}
The output put of the program is n = 256.
I may understand why it is, but I am not really sure.
Can anyone give me a clear explanation, please?
Thanks a lot.
The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int:
0x04 0x01 0x00 0x00
By using a char pointer, you point to the first byte and change it to 0x00, which changes the int to 256 (= 256 * 1 + 0).
You're apparently working on a little-endian machine. What's happening is that you're starting with an int that takes up at least two bytes. The value 260 is 256+4. The 256 goes in the second byte, and the 4 in the first byte. When you write 0 to the first byte, you're left with only the 256 in the second byte.
In C a pointer references a block of bytes based on the type associated with the pointer. So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value
I understood what exactly happens by changing value:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 20;
printf("pp = %d\n", (int)*pp);
printf("n = %d\n", (int)n);
system("PAUSE");
return 0;
}
The output value are
20
and
276
So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number
Your problem is the assignment
*pp = 0;
You're dereferencing pp which points to n, and changing n.
However, pp is a char pointer so it doesn't change all of n
which is an int. This causes the binary complications in the other answers.
In terms of the C language, the description for what you are doing is modifying the representation of the int variable n. In C, all types have a "representation" as one or more bytes (unsigned char), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here.
As schnaader answered, on a little endian, twos complement implementation with 32-bit int, the representation of 260 is:
0x04 0x01 0x00 0x00
and overwriting the first byte with 0 yields:
0x00 0x01 0x00 0x00
which is the representation for 256 on such an implementation.
C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t, it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent).
Considering 32 bit systems,
256 will be represented in like this.
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000100(Byte-0)
Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. So, only the LSB gets changes to zero, not all the 4 bytes.
And it becomes
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000000(Byte-0)
Hence, the o/p is 256.

Resources