Data stored with pointers - c

void *memory;
unsigned int b=65535; //1111 1111 1111 1111 in binary
int i=0;
memory= &b;
for(i=0;i<100;i++){
printf("%d, %d, d\n", (char*)memory+i, *((unsigned int * )((char *) memory + i)));
}
I am trying to understand one thing.
(char*)memory+i - print out adress in range 2686636 - 2686735.
and when i store 65535 with memory= &b this should store this number at adress 2686636 and 2686637
because every adress is just one byte so 8 binary characters so when i print it out
*((unsigned int * )((char *) memory + i)) this should print 2686636, 255 and 2686637, 255
instead of it it prints 2686636, 65535 and 2686637, random number
I am trying to implement memory allocation. It is school project. This should represent memory. One adress should be one byte so header will be 2686636-2586639 (4 bytes for size of block) and 2586640 (1 byte char for free or used memory flag). Can someone explain it to me thanks.
Thanks for answers.
void *memory;
void *abc;
abc=memory;
for(i=0;i<100;i++){
*(int*)abc=0;
abc++;
}
*(int*)memory=16777215;
for(i=0;i<100;i++){
printf("%p, %c, %d\n", (char*)memory+i, *((char *)memory +i), *((char *)memory +i));
}
output is
0028FF94,  , -1
0028FF95,  , -1
0028FF96,  , -1
0028FF97, , 0
0028FF98, , 0
0028FF99, , 0
0028FF9A, , 0
0028FF9B, , 0
i think it works. 255 only one -1, 65535 2 times -1 and 16777215 3 times -1.

In your program it seems that address of b is 2686636 and when you will write (char*)memory+i or (char*)&b+i it means this pointer is pointing to char so when you add one to it will jump to only one memory address i.e2686637 and so on till 2686735(i.e.(char*)2686636+99).
now when you are dereferencing i.e.*((unsigned int * )((char *) memory + i))) you are going to get the value at that memory address but you have given value to b only (whose address is 2686636).all other memory address have garbage values which you are printing.
so first you have to store some data at the rest of the addresses(2686637 to 2686735)
good luck..
i hope this will help

I did not mention this in my comments yesterday but it is obvious that your for loop from 0 to 100 overruns the size of an unsigned integer.
I simply ignored some of the obvious issues in the code and tried to give hints on the actual question you asked (difficult to do more than that on a handy :-)). Unfortunately I did not have time to complete this yesterday. So, with one day delay my hints for you.
Try to avoid making assumptions about how big a certain type is (like 2 bytes or 4 bytes). Even if your assumption holds true now, it might change if you switch the compiler or switch to another platform. So use sizeof(type) consequently throughout the code. For a longer discussion on this you might want to take a look at: size of int, long a.s.o. on Stack Overflow. The standard mandates only the ranges a certain type should be able to hold (0-65535 for unsigned int) so a minimal size for types only. This means that the size of int might (and tipically is) bigger than 2 bytes. Beyond primitive types sizeof helps you also with computing the size of structures where due to memory alignment && packing the size of a structure might be different from what you would "expect" by simply looking at its attributes. So the sizeof operator is your friend.
Make sure you use the correct formatting in printf.
Be carefull with pointer arithmetic and casting since the result depends on the type of the pointer (and obviously on the value of the integer you add with).
I.e.
(unsigned int*)memory + 1 != (unsigned char*)memory + 1
(unsigned int*)memory + 1 == (unsigned char*)memory + 1 * sizeof(unsigned int)
Below is how I would write the code:
//check how big is int on our platform for illustrative purposes
printf("Sizeof int: %d bytes\n", sizeof(unsigned int));
//we initialize b with maximum representable value for unsigned int
//include <limits.h> for UINT_MAX
unsigned int b = UINT_MAX; //0xffffffff (if sizeof(unsigned int) is 4)
//we print out the value and its hexadecimal representation
printf("B=%u 0x%X\n", b, b);
//we take the address of b and store it in a void pointer
void* memory= &b;
int i = 0;
//we loop the unsigned chars starting at the address of b up to the sizeof(b)
//(in our case b is unsigned int) using sizeof(b) is better since if we change the type of b
//we do not have to remember to change the sizeof in the for loop. The loop works just the same
for(i=0; i<sizeof(b); ++i)
{
//here we kept %d for formating the individual bytes to represent their value as numbers
//we cast to unsigned char since char might be signed (so from -128 to 127) on a particular
//platform and we want to illustrate that the expected (all bytes 1 -> printed value 255) occurs.
printf("%p, %d\n", (unsigned char *)memory + i, *((unsigned char *) memory + i));
}
I hope you will find this helpfull. And good luck with your school assignment, I hope you learned something you can use now and in the future :-).

Related

Does memcpy copy bytes in reverse order?

I am little bit confused on usage of memcpy. I though memcpy can be used to copy chunks of binary data to address we desire. I was trying to implement a small logic to directyl convert 2 bytes of hex to 16 bit signed integer without using union.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main()
{ uint8_t message[2] = {0xfd,0x58};
// int16_t roll = message[0]<<8;
// roll|=message[1];
int16_t roll = 0;
memcpy((void *)&roll,(void *)&message,2);
printf("%x",roll);
return 0;
}
This return 58fd instead of fd58
No, memcpy did not reverse the bytes as it copied them. That would be a strange and wrong thing for memcpy to do.
The reason the bytes seem to be in the "wrong" order in the program you wrote is that that's the order they're actually in! There's probably a canonical answer on this somewhere, but here's what you need to understand about byte order, or "endianness".
When you declare a string, it's laid out in memory just about exactly as you expect. Suppose I write this little code fragment:
#include <stdio.h>
char string[] = "Hello";
printf("address of string: %p\n", (void *)&string);
printf("address of 1st char: %p\n", (void *)&string[0]);
printf("address of 5th char: %p\n", (void *)&string[4]);
If I compile and run it, I get something like this:
address of string: 0xe90a49c2
address of 1st char: 0xe90a49c2
address of 5th char: 0xe90a49c6
This tells me that the bytes of the string are laid out in memory like this:
0xe90a49c2 H
0xe90a49c3 e
0xe90a49c4 l
0xe90a49c5 l
0xe90a49c6 o
0xe90a49c7 \0
Here I've shown the string vertically, but if we laid it out horizontally, with addresses increasing from left to right, we would see the characters of the string "Hello" laid out from left to right also, just as we would expect.
But that's for strings, which are arrays of char. But integers of various sizes are not really built out of characters, and it turns out that the individual bytes of an integer are not necessarily laid out in memory in "left-to-right" order as we might expect. In fact, on the vast majority of machines today, the bytes within an integer are laid out in the opposite order. Let's take a closer look at how that works.
Suppose I write this code:
int16_t i2 = 0x1234;
printf("address of short: %p\n", (void *)&i2);
unsigned char *p = &i2;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
This initializes a 16-bit (or "short") integer to the hex value 0x1234, and then uses a pointer to print the two bytes of the integer in "left-to-right" order, that is, with the lower-addressed byte first, followed by the higher-addressed byte.
On my machine, the result is something like:
address of short: 0xe68c99c8
0xe68c99c8: 34
0xe68c99c9: 12
You can clearly see that the byte that's stored at the "front" of the two-byte region in memory is 34, followed by 12. The least-significant byte is stored first. This is referred to as "little endian" byte order, because the "little end" of the integer — its least-significant byte, or LSB — comes first.
Larger integers work the same way:
int32_t i4 = 0x5678abcd;
printf("address of long: %p\n", (void *)&i4);
p = &i4;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
This prints:
address of long: 0xe68c99bc
0xe68c99bc: cd
0xe68c99bd: ab
0xe68c99be: 78
0xe68c99bf: 56
There are machines that lay the byes out in the other order, with the most-significant byte (MSB) first. Those are called "big endian" machines, but for reasons I won't go into they're not as popular.
How do you construct an integer value out of individual bytes if you don't know your machine's byte order? The best way is to do it "mathematically", based on the properties of the numbers. For example, let's go back to your original array of bytes:
uint8_t message[2] = {0xfd, 0x58};
Now, you know, because you wrote it, that 0xfd is supposed to be the MSB and 0xf8 is supposed to be the LSB. So one good way of combining them together into an integer is like this:
int16_t roll = message[0] << 8; /* MSB */
roll |= message[1]; /* LSB */
The nice thing about this code is that it works correctly on machines of either endianness. I called this technique "mathematical" because it's equivalent to doing it this other way:
int16_t roll = message[0] * 256; /* MSB */
roll += message[1]; /* LSB */
And, in fact, this suggestion of mine involving roll = message[0] << 8 is very close to something you already tried, but had commented out in the code you posted. The difference is that you don't want to think about it in terms of two bytes next to each other in memory; you want to think about it in terms of the most- and least-significant byte. When you say << 8, you're obviously thinking about the most-significant byte, so that should be message[0].
Does memcpy copy bytes in reverse order?
memcpy does not reverse the order bytes.
This return 58fd instead of fd58
Yes, your computer is little endian, so bytes 0xfd,0x58 in order are interpreted by your computer as the value 0x58fd.

Output of the following C code

What will be the output of the following C code. Assuming it runs on Little endian machine, where short int takes 2 Bytes and char takes 1 Byte.
#include<stdio.h>
int main() {
short int c[5];
int i = 0;
for(i = 0; i < 5; i++)
c[i] = 400 + i;
char *b = (char *)c;
printf("%d", *(b+8));
return 0;
}
In my machine it gave
-108
I don't know if my machine is Little endian or big endian. I found somewhere that it should give
148
as the output. Because low order 8 bits of 404(i.e. element c[4]) is 148. But I think that due to "%d", it should read 2 Bytes from memory starting from the address of c[4].
The code gives different outputs on different computers because on some platforms the char type is signed by default and on others it's unsigned by default. That has nothing to do with endianness. Try this:
char *b = (char *)c;
printf("%d\n", (unsigned char)*(b+8)); // always prints 148
printf("%d\n", (signed char)*(b+8)); // always prints -108 (=-256 +148)
The default value is dependent on the platform and compiler settings. You can control the default behavior with GCC options -fsigned-char and -funsigned-char.
c[4] stores 404. In a two-byte little-endian representation, that means two bytes of 0x94 0x01, or (in decimal) 148 1.
b+8 addresses the memory of c[4]. b is a pointer to char, so the 8 means adding 8 bytes (which is 4 two-byte shorts). In other words, b+8 points to the first byte of c[4], which contains 148.
*(b+8) (which could also be written as b[8]) dereferences the pointer and thus gives you the value 148 as a char. What this does is implementation-defined: On many common platforms char is a signed type (with a range of -128 .. 127), so it can't actually be 148. But if it is an unsigned type (with a range of 0 .. 255), then 148 is fine.
The bit pattern for 148 in binary is 10010100. Interpreting this as a two's complement number gives you -108.
This char value (of either 148 or -108) is then automatically converted to int because it appears in the argument list of a variable-argument function (printf). This doesn't change the value.
Finally, "%d" tells printf to take the int argument and format it as a decimal number.
So, to recap: Assuming you have a machine where
a byte is 8 bits
negative numbers use two's complement
short int is 2 bytes
... then this program will output either -108 (if char is a signed type) or 148 (if char is an unsigned type).
To see what sizes types have in your system:
printf("char = %u\n", sizeof(char));
printf("short = %u\n", sizeof(short));
printf("int = %u\n", sizeof(int));
printf("long = %u\n", sizeof(long));
printf("long long = %u\n", sizeof(long long));
Change the lines in your program
unsigned char *b = (unsigned char *)c;
printf("%d\n", *(b + 8));
And simple test (I know that it is not guaranteed but all C compilers I know do it this way and I do not care about old CDC or UNISYS machines which had different addresses and pointers to different types of data
printf(" endianes test: %s\n", (*b + (unsigned)*(b + 1) * 0x100) == 400? "little" : "big");
Another remark: it is only because in your program c[0] == 400

C programming why does the address of char array increment from 0012FF74 to 0012FF75?

Heres the code:
char chararray[] = {68, 97, 114, 105, 110};
/* 1 byte each*/
int i;
printf("chararray intarray\n");
printf("-------------------\n");
for(i = 0; i < 5; i++)
printf("%p\n", (chararray + i));
Output:
chararray
---------
0012FF74
0012FF75
0012FF76
0012FF77
Now im trying to understand this in terms of hexadecimal, bits and bytes.
I understand that a char is 1 byte and its supposed to increment by 1 byte which is 8 bits.
But I dont understand how its only increasing by 1 in hex? 1 hexadecimal only represents 4 bits correct? so Im kind of confused, it seems like its only incrementing by 4 bits.
Any help on clearing this up is greatly appreciated thanks!
It's true that if you represent a byte in hexa then it is made out of 2 hexa digits where each one stands for 4 bits.
However, the addresses you are seeing are addresses of bytes, and not the content of them. Each byte receives its own address, and the addresses are sequential, just like if we gave each byte a number: byte 0, byte 1, byte 2, byte 3,....
The address in a pointer points to a byte, not to a bit. Your pointer is of type char *, so when it is incremented, the address increases by sizeof(char). If, however, you used a different type, such as int, your pointer would increase by sizeof(int) on each increment, even if it is pointing to a char [] array.
On my machine, sizeof(int)==4, for example.
I wrote this code:
#include <stdio.h>
int main()
{
char str[] = "ACBDEFGHIJKLMNOPQRSTUVWXYZ";
int *a = str;
printf("Char\tAddr\n");
while(a <= &str[25])
{
printf("%c\t%p\n", *a, (void *)a);
a++;
}
return 0;
}
Output:
Char Addr
A 00D5F9BC
E 00D5F9C0
I 00D5F9C4
M 00D5F9C8
Q 00D5F9CC
U 00D5F9D0
Y 00D5F9D4
Every fourth character in the string is outputted.
First, pointer arithmetics like (chararray + i), where chararray points to a char (i.e. is of type char*) increases the value of pointer chararray by i * sizeof(char). Note that sizeof(char) is 1 by definition.
Second, a pointer represents a memory address, which is represented by an integral value that indicates a position in an (absolutely or relatively) addressed memory block, e.g. on the heap, on the stack, on some other data segment, ... . Confer, for example, the following statement in this online C standard draft:
6.3.2.3 Pointers
(5) An integer may be converted to any pointer type. ...
(6) Any pointer type may be converted to an integer type. ...
So when viewing the value of a pointer, we can think of an integral value, just like 256 or 1024 (when "viewed" in decimal format), or 0x100 or 0x400 (when viewed in hexadecimal format). Note that 256 in decmial is equivalent to 100 in hexadecimal, and this has nothing to do with bits and bytes.
Adding 1 to an integral value of 256 (or 0x100) gives 257 (or 0x101), regardless of whether this value stands for a position in a memory block or for oranges sold in the department store. So it's all about "outputting" integral values in hex format.
See the following code illustrating this:
int main()
{
char chararray[] = {68, 97, 114, 105, 110};
for(int i = 0; i < 5; i++) {
char *ptr = (chararray + i);
unsigned long ptrAsIntegralVal = (unsigned long)ptr;
printf("ptr: %p; in decmial format: %lu\n", ptr, ptrAsIntegralVal);
}
}
Output:
ptr: 0x7fff5fbff767; in decmial format: 140734799804263
ptr: 0x7fff5fbff768; in decmial format: 140734799804264
ptr: 0x7fff5fbff769; in decmial format: 140734799804265
ptr: 0x7fff5fbff76a; in decmial format: 140734799804266
ptr: 0x7fff5fbff76b; in decmial format: 140734799804267
Using hexadecimal numbers is just another way of representing any number. It has nothing to do with bits and bytes. One byte is 8 bits, no matter if you represent it as hexadecimal number or decimal number. So it just increases by one = 1 Byte = 8 Bits.

Copying a 4 element character array into an integer in C

A char is 1 byte and an integer is 4 bytes. I want to copy byte-by-byte from a char[4] into an integer. I thought of different methods but I'm getting different answers.
char str[4]="abc";
unsigned int a = *(unsigned int*)str;
unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3];
unsigned int c;
memcpy(&c, str, 4);
printf("%u %u %u\n", a, b, c);
Output is
6513249 1633837824 6513249
Which one is correct? What is going wrong?
It's an endianness issue. When you interpret the char* as an int* the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86 which is little endian), while with the manual conversion the first byte becomes the most significant.
To put this into pictures, this is the source array:
a b c \0
+------+------+------+------+
| 0x61 | 0x62 | 0x63 | 0x00 | <---- bytes in memory
+------+------+------+------+
When these bytes are interpreted as an integer in a little endian architecture the result is 0x00636261, which is decimal 6513249. On the other hand, placing each byte manually yields 0x61626300 -- decimal 1633837824.
Of course treating a char* as an int* is undefined behavior, so the difference is not important in practice because you are not really allowed to use the first conversion. There is however a way to achieve the same result, which is called type punning:
union {
char str[4];
unsigned int ui;
} u;
strcpy(u.str, "abc");
printf("%u\n", u.ui);
Neither of the first two is correct.
The first violates aliasing rules and may fail because the address of str is not properly aligned for an unsigned int. To reinterpret the bytes of a string as an unsigned int with the host system byte order, you may copy it with memcpy:
unsigned int a; memcpy(&a, &str, sizeof a);
(Presuming the size of an unsigned int and the size of str are the same.)
The second may fail with integer overflow because str[0] is promoted to an int, so str[0]<<24 has type int, but the value required by the shift may be larger than is representable in an int. To remedy this, use:
unsigned int b = (unsigned int) str[0] << 24 | …;
This second method interprets the bytes from str in big-endian order, regardless of the order of bytes in an unsigned int in the host system.
unsigned int a = *(unsigned int*)str;
This initialization is not correct and invokes undefined behavior. It violates C aliasing rules an potentially violates processor alignment.
You said you want to copy byte-by-byte.
That means the the line unsigned int a = *(unsigned int*)str; is not allowed. However, what you're doing is a fairly common way of reading an array as a different type (such as when you're reading a stream from disk.
It just needs some tweaking:
char * str ="abc";
int i;
unsigned a;
char * c = (char * )&a;
for(i = 0; i < sizeof(unsigned); i++){
c[i] = str[i];
}
printf("%d\n", a);
Bear in mind, the data you're reading may not share the same endianness as the machine you're reading from. This might help:
void
changeEndian32(void * data)
{
uint8_t * cp = (uint8_t *) data;
union
{
uint32_t word;
uint8_t bytes[4];
}temp;
temp.bytes[0] = cp[3];
temp.bytes[1] = cp[2];
temp.bytes[2] = cp[1];
temp.bytes[3] = cp[0];
*((uint32_t *)data) = temp.word;
}
Both are correct in a way:
Your first solution copies in native byte order (i.e. the byte order the CPU uses) and thus may give different results depending on the type of CPU.
Your second solution copies in big endian byte order (i.e. most significant byte at lowest address) no matter what the CPU uses. It will yield the same value on all types of CPUs.
What is correct depends on how the original data (array of char) is meant to be interpreted.
E.g. Java code (class files) always use big endian byte order (no matter what the CPU is using). So if you want to read ints from a Java class file you have to use the second way. In other cases you might want to use the CPU dependent way (I think Matlab writes ints in native byte order into files, c.f. this question).
If your using CVI (National Instruments) compiler you can use the function Scan to do this:
unsigned int a;
For big endian:
Scan(str,"%1i[b4uzi1o3210]>%i",&a);
For little endian:
Scan(str,"%1i[b4uzi1o0123]>%i",&a);
The o modifier specifies the byte order.
i inside the square brackets indicates where to start in the str array.

Casting int pointer to char pointer causes loss of data in C?

I have the following piece of code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 0;
printf("n = %d\n", n);
system("PAUSE");
return 0;
}
The output put of the program is n = 256.
I may understand why it is, but I am not really sure.
Can anyone give me a clear explanation, please?
Thanks a lot.
The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int:
0x04 0x01 0x00 0x00
By using a char pointer, you point to the first byte and change it to 0x00, which changes the int to 256 (= 256 * 1 + 0).
You're apparently working on a little-endian machine. What's happening is that you're starting with an int that takes up at least two bytes. The value 260 is 256+4. The 256 goes in the second byte, and the 4 in the first byte. When you write 0 to the first byte, you're left with only the 256 in the second byte.
In C a pointer references a block of bytes based on the type associated with the pointer. So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value
I understood what exactly happens by changing value:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 20;
printf("pp = %d\n", (int)*pp);
printf("n = %d\n", (int)n);
system("PAUSE");
return 0;
}
The output value are
20
and
276
So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number
Your problem is the assignment
*pp = 0;
You're dereferencing pp which points to n, and changing n.
However, pp is a char pointer so it doesn't change all of n
which is an int. This causes the binary complications in the other answers.
In terms of the C language, the description for what you are doing is modifying the representation of the int variable n. In C, all types have a "representation" as one or more bytes (unsigned char), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here.
As schnaader answered, on a little endian, twos complement implementation with 32-bit int, the representation of 260 is:
0x04 0x01 0x00 0x00
and overwriting the first byte with 0 yields:
0x00 0x01 0x00 0x00
which is the representation for 256 on such an implementation.
C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t, it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent).
Considering 32 bit systems,
256 will be represented in like this.
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000100(Byte-0)
Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. So, only the LSB gets changes to zero, not all the 4 bytes.
And it becomes
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000000(Byte-0)
Hence, the o/p is 256.

Resources