Copying int to different memory location, receiving extra bytes than expected

Copying int to different memory location, receiving extra bytes than expected - c

Trying to pre-pend a 2 byte message length, after getting the length in a 4 byte int. I use memcpy to copy 2 bytes of the int. When I look at the second byte I copied, it is as expected, but accessing the first byte actually prints 4 bytes.
I would expect that dest[0] and dest[1] both contain 1 byte of the int. whether or not it's a significant byte, or the order is switched... I can throw in an offset on the memcpy or reversing 0 and 1. It does not have to be portable, I would just like it to work.
The same error is happening in Windows with LoadRunner and Ubuntu with GCC - so I have at least tried to rule out portability as a cause.
I'm not sure where I'm going wrong. I am suspecting it's related to my lack of using pointers recently? Is there a better approach to cast an int to a short and then put it in the first 2 bytes of a buffer?
char* src;
char* dest;
int len = 2753; // Hex - AC1
src=(char*)malloc(len);
dest=(char*)malloc(len+2);
memcpy(dest, &len, 2);
memcpy(dest+2, src, len);
printf("dest[0]: %02x", dest[0]);
// expected result: c1
// actual result: ffffffc1
printf("dest[1]: %02x", dest[1]);
// expected result: 0a
// actual result: 0a

You cannot just take a random two bytes out of a four byte object and call it a cast to short.
You will need to copy your int into a two byte int before doing your memcpy.
But actually, that isn't the best way to do it either, because you have no control over the byte order of an integer.
Your code should look like this:
dest[0] = ((unsigned)len >> 8) & 0xFF;
dest[1] = ((unsigned)len) & 0xFF;
That should write it out in network byte order aka big endian. All of the standard network protocols use this byte order.
And I'd add something like:
assert( ((unsigned)len & 0xFFFF0000) == 0 ); // should be nothing in the high bytes

Firstly, you are using printf incorrectly. This
printf("dest[0]: %02x", dest[0]);
uses x format specifier in printf. x format specifier requires an argument of type unsigned int. Not char, but unsigned int and only unsigned int (or alternatively an int with non-negative value).
The immediate argument you supplied has type char, which is probably signed on your platform. This means that your dest[0] contains -63. A variadic argument of type char is automatically promoted to type int, which turns 0xc1 into 0xffffffc1 (as a signed representation of -63 in type int). Since printf expects an unsigned int value and you are passing a negative int value instead, the behavior is undefined. The printout that you see is nothing more than a manifestation of that undefined behavior. It is meaningless.
One proper way to print dest[0] in this case would be
printf("dest[0]: %02x", (unsigned) dest[0]);
I'm pretty sure the output will still be ffffffc1, but in this case 0xffffffc1 is the prefectly expected result of integer conversion from negative -63 value to unsigned int type. Nothing unusual here.
Alternatively you can do
printf("dest[0]: %02x", (unsigned char) dest[0]);
which should give you your desired c1 output. Note that the conversion to int takes place in this case as well, but since the original value is positive (193), the result of the conversion to int is positive too and printf works properly.
Finally, if you want to work with raw memory directly, the proper type to use would be unsigned char from the very beginning. Not char, but unsigned char.
Secondly, an object of type int may easily occupy more than two 8-bit bytes. Depending on the platform, the 0xA and 0xC1 values might end up in completely different portions of the memory region occupied by that int object. You should not expect that copying the first two bytes of an int object will copy the 0xAC1 portion specifically.

You make the assumption that an "int" is two bytes. What justification do you have for that? Your code is highly unportable.
You make another assumption that "char" is unsigned. What justification do you have for that? Again, your code is highly unportable.
You make another assumption about the ordering of bytes in an int. What justification do you have for that? Again, your code is highly unportable.

instead of the literal 2, use sizeof(int). Never hard code the size of a type.
If this code should be portable, you should not use int, but a fixed size datatype.
If you need 16 bit, you could use int16_t.
Also, the printing of the chars would need a cast to unsigned. Now, the char is upcasted to an int, and the sign is extended. This gives the initial FFFF's

Related

c programming question on reinterpret_cast

What is the reinterpret_cast of (char) doing here?
unsigned int aNumber = 258; // 4 bytes in allocated memory [02][01][00][00]
printf("\n is printing out the first byte %02i",(char)aNumber); // Outputs the first byte[02]
Why am i getting out the first byte without pointing to it? such as (char*)&aNumber
is the %02i doing this = (char)*&aNumber
or is the reinterpret_cast of (char) cutting out the rest 3 bytes since it is a char it only allocate one byte of them 4 bytes?

First, reinterpret_cast is a C++ operator. What you've shown is not that but a C-style cast.
The cast is converting a value of type unsigned int to a value of type char. Conversion of an out-of-range value is implementation defined, but in most implementations you're likely to come across, this is implemented as reinterpreting the lower order bytes as the converted value.
In this particular case, the low order byte of aNumber has the value 0x02, so that's what the result is when casted to a char.

What is forbidden after pointer-casting a big type to a smaller type in C

Say I have a bigger type.
uint32_t big = 0x01234567;
Then what can I do for (char*)&big, the pointer interpreted as a char type after casting?
Is that an undefined behavior to shift the address of (char*)&big to (char*&big)+1, (char*&big)+2, etc.?
Is that an undefined behavior to both shift and edit (char*)&big+1? Like the example below. I think this example should be an undefined behavior because after casting to (char*), we then have limited our eyesight to a char-type pointer, and we ought not access, even change the value outside this scope.
uint32_t big = 0x01234567;
*((char*)&big + 1) = 0xff;
printf("%02x\n\n\n", *((char*)&big+1));
printf("%02x\n\n\n", big);
(This pass my Visual C++ compiler. By the way, I want to ask a forked question on that why in this example the first printf gives ffffffff? Shouldn't it be ff?)
I have seen a code like this. And this is what I usually do when I need to achieve similar task. Is this UB or not? Why or why not? What is the standard way to achieve this?
uint8_t catcher[8] = { 0 };
uint64_t big = 0x1234567812345678;
memcpy(catcher, (uint8_t*)&big, sizeof(uint64_t));

Then what can I do for (char*)&big, the pointer interpreted as a char type after casting?
If a char is eight bits, which it is in most modern C implementations, then there are four bytes in the uint32_t big, and you can do arithmetic on the address from (char *) &big + 0 to (char *) &big + 4. You can also read and write the bytes from (char *) &big + 0 to (char *) &big + 3, and those will access individual bytes in the representation of big. Although arithmetic is defined to work up to (char *) &big + 4, that is only an endpoint. There is no defined byte there, and you should not use that address to read or write anything.
Is that an undefined behavior to shift the address of (char*)&big to (char*&big)+1, (char*&big)+2, etc.?
These are additions, not shifts, and the syntax is (char *) &big + 1, not (char*&big)+1. Arithmetic is defined for the offsets from +0 to +4.
Is that an undefined behavior to both shift and edit (char*)&big+1?
It is allowed to read and write the bytes in big using a pointer to char. This is a special rule for character types. Generally, the bytes of an object should not be accessed using an unrelated type. For example, a float object could not be accessed using an int type. However, the character types are special; you may access the bytes of any object using a character type.
However, it is preferable to use unsigned char for this, as it avoids complications with signed values.
I have seen a code like this.
It is allowed to read or write the bytes of an object using memcpy. memcpy is defined to work as if by copying characters.
Note that, while accessing the bytes of an object is defined by the C standard, how bytes represent values is partly implementation-defined. Different C implementations may use different orders for the bytes within an object, and there can be other differences.
By the way, I want to ask a forked question on that why in this example the first printf gives ffffffff? Shouldn't it be ff?
In your C implementation, char is signed and can represent values from −128 to +127. In *((char*)&big + 1) = 0xff;, 0xff is 255 and is too big to fit into a char. It is converted to a char value in an implementation-defined way. Your C implementation converts it to −1. (The eight-bit two’s complement representation of −1, bits 11111111, uses the same bits as the binary representation of 255, again bits 11111111.)
Then printf("%02x\n\n\n", *((char*)&big+1)); passes this value, −1, to printf. Since it is a char, it is promoted to int to be passed to printf. This produces the same value, −1, but it has 32 bits, 11111111111111111111111111111111. Then you are passing an int, but printf expects an unsigned int for %02x. The behavior of this is not defined by the C standard, but your C implementation reads the 32 bits as if they were an unsigned int. As an unsigned int, the 32 bits 11111111111111111111111111111111 represent the value 4,294,967,295 or 0xffffffff, so that is what printf prints.
You can print the correct value by using printf("%02hhx\n\n\n", * ((unsigned char *) &big + 1));. As an unsigned char, the bits 11111111 represent 255 or 0xff, and converting that to an int produces 255 or 0x000000ff.

For variadic functions (like printf) all arguments undergoes default argument promotion which promotes smaller integer types to int.
This conversion will include sign-extension if the smaller type is signed, so the value keeps its value.
So if char is a signed type (which is implementation defined) with a value of -1 then it will be promoted to the int value -1. Which is what you see.
If you want to print a smaller type you need to first of all cast to the correct type (unsigned char) then use the proper format (like %hhx for printing unsigned char values).

Why doesn't assigning a negative integer to an unsigned int cause an error?

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
unsigned int i;
i = -12;
printf("%d\n" , i);
system("pause");
return 0;
}
I run the above code in Visual Studio 2012. Because I know unsigned refers to nonnegative numbers, I expected the program to report an error. Why does it still run smoothly and print the output?

As 200_success alluded to, there are two things going on here that are combining to produce the correct output, despite the obvious problems of mixing unsigned and signed integer values.
First, the line i = -12 is implicitly converting the (signed) int literal value -12 to an unsigned int. The bits being stored in memory don't change. It's still 0xfffffff4, which is the twos-complement representation of -12. An unsigned int, however, ignores the sign bit (the uppermost bit) and instead treats it as part of the value, so as an unsigned int, this value (0xfffffff4) is interpreted as the number 4294967284. The bottom line here is that C has very loose rules about implicit conversion between signed and unsigned values, especially between integers of the same size. You can verify this by doing:
printf("%u\n", i);
This will print 4294967284.
The second thing that's going on here is that printf doesn't know anything about the arguments you've passed it other than what you tell it via the format string. This is essentially true for all functions in C that are defined with variable argument lists (e.g., int printf(const char *fmt, ...); ) This is because it is impossible for the compiler to know exactly what types of arguments might get passed into this function, so when the compiler generates the assembly code for calling such a function, it can't do type-checking. All it can do is determine the size of each argument, and push the appropriate number of bytes onto the stack. So when you do printf("%d\n", i);, the compiler is just pushing sizeof(unsigned int) bytes onto the stack. It can't do type checking because the function prototype for printf doesn't have any information about the types of any of the arguments, except for the first argument (fmt), which it knows is a const char *. Any subsequent arguments are just copied as generic blobs of a certain number of bytes.
Then, when printf gets called, it just looks at the first sizeof(unsigned int) bytes on the stack, and interprets them how you told it to. Namely, as a signed int value. And since the value stored in those bytes is still just 0xfffffff4, it prints -12.
Edit: Note that by stating that the value in memory is 0xfffffff4, I'm assuming that sizeof(unsigned int) on your machine is 4 bytes. It's possible that unsigned int is defined to be some other size on your machine. However, the same principles still apply, whether the value is 0xfff4 or 0xfffffffffffffff4, or whatever it may be.

This question is similar to this Objective C question. The short answer is, two wrongs make a right.
i = -12 is wrong, in that you are trying to store a negative number in an unsigned int.
printf("%d\n", i) is wrong, in that you are asking printf to interpret an unsigned int as a signed int.
Both of those statements should have resulted in compiler warnings. However, C will happily let you abuse the unsigned int as just a place to store some bits, which is what you've done.

i = -12; is well-defined. When you assign an out-of-range value to an unsigned int, the value is adjusted modulo UINT_MAX + 1 until it comes within range of the unsigned int.
For example if UINT_MAX is 65535, then i = -12 results in i having the value of 65536 - 12 which is 65524.
It is undefined behaviour to mismatch the argument types to printf. When you say %d you must supply an int (or a smaller type that promotes to int under the default argument promotions).
In practice what will usually happen is that the system interprets the bits used to represent the unsigned int as if they were bits used to represent a signed int; of course since it is UB this is not guaranteed to work or even be attempted.

You are indeed saving the -12 as an integer and telling printf (by using %d) that it is a normal int, so it interprets the contents of said variable as an int and prints a -12.
If you used %u in printf you would see what it is you're really storing when intepreting the contents as an unsigned int.

printing may work (as explained by many above) but avoid using i in your code for further calculation. It will give not retain the sign.

Char to unsigned char

I have a buffer structure with a field
char inc_factor;
which is the amount of bytes to increment in a character array. The problem is that it must be able to hold a value up to 255. Obviously the easiest solution is to change it to unsigned char, but I'm not able to change the supplied structure definition. The function:
Buffer * b_create(short init_capacity, char inc_factor, char o_mode)
Takes in those parameters and return a pointer to a buffer. I was wondering how I would be able to fit the number 255 in a signed char.

You can convert the type:
unsigned char n = inc_factor;
Signed-to-unsigned conversion is well defined and does what you want, since all three char types are required to have the same width.
You may need to be careful on the calling end (or when you store the char in your structure) and do something like f(n - UCHAR_MAX) or so (since again, if this is negative and char is unsigned, all is well).

Lets use the term "byte" to represents 8-bits of storage in memory.
A byte with the value of "0xff" can be accessed either as an unsigned character or as a signed character.
BYTE byte = 0xff;
unsigned char* uc = (unsigned char*)&byte;
signed char* sc = (signed char*)&byte; // same as "char", the "signed" is a default.
printf("uc = %u, sc = %d\n", *uc, *sc);
(I chose to use pointers because I want to demonstrate that the underlying value stored in memory is the same).
Will output
uc = 255, sc = -1
"signed" numbers use the same storage space (number of bits) as unsigned, but they use the upper-most bit as a flag to tell the cpu whether to treat them as negative or not.
The bit pattern that represents "255" (11111111) unsigned is the same bit pattern that represents -1. The bit pattern "10000000" is either 128 or -127.
So you can store the number "255" in a signed char, by storing "-1" and then casting it to an unsigned int.
EDIT:
In-case you're wondering: negative numbers start "at the top" (i.e. 0xff/255) for computational convenience. Remember that the underlying storage is a byte, so if you take "0xff" and add 1, just using normal, unsigned cpu math, it produces the value "0x00". Which is the correct value for "i + 1" when "i = -1". Of course, it would be equally odd if negative numbers started with "-1" having the value 0x80/128.

You COULD cast it.
(unsigned char)inc_factor = 250;
And then you could read it back also with a cast :
if( (unsigned char)inc_factor == 250 ) {...}
However, that's really not best practices, It'll confuse anyone who has to maintain the code.
In addition, it's not going to help you if you're passing inc_factor into a function that expects a signed char.
There's no way to read that value as a signed char and get a value above 128.

short type variable automatically extended to integer type?

I wanna print the value of b[FFFC] like below,
short var = 0xFFFC;
printf("%d\n", b[var]);
But it actually print the value of b[FFFF FFFC].
Why does it happen ?
My computer is operated by Windows XP in 32-bit architecture.

short is a signed type. It's 16 bits on your implementation. 0xFFFC represents the integer constant 65,532, but when converted to a 16 bit signed value, this is resulting in -4.
So, your line short var = 0xFFFC; sets var to -4 (on your implementation).
0xFFFFFFFC is a 32 bit representation of -4. All that's happening is that your value is being converted from one type to a larger type, in order to use it as an array index. It retains its value, which is -4.
If you actually want to access the 65,533rd element of your array, then you should either:
use a larger type for var. int will suffice on 32 bit Windows, but in general size_t is an unsigned type which is guaranteed big enough for non-negative array indexes.
use an unsigned short, which just gives you enough room for this example, but will go wrong if you want to get another 4 steps forward.

In current compilers we can't use short (16 bit) if write short use 32 bit .
for example i compile same code with gcc4 in Ubuntu Linux 32 bit :
int main(int argc, char** argv)
{
short var = 0xFFFC;
printf("%x\n", var);
printf("%d\n", var);
return (EXIT_SUCCESS);
}
and output is :
fffffffc
-4
you can see cast short to 32bit normal and use sign extension in 2's complement

As a refresher on the C's data types available, have a look here.
There is a rule, and that concerns the usage of C, some datatypes are promoted to their integral type, for instance
char ch = '2';
int j = ch + 1;
Now look at the RHS (Right Hand Side) of the expression and notice that the ch will automatically get promoted as an int in order to produce the desired results on the LHS (LHS) of the expression. What would the value of j be? The ASCII code for '2' is 50 decimal or 0x32 hexadecimal, add 1 on to it and the value of j would be 51 decimal or 0x33 hexadecimal.
It is important to understand that rule and that explains why a data type would be 'promoted' to another data type.
What is the b? That is an array I presume that has 655532 elements correct?
Anyway, using a format specifier %d is for of type int, the value got promoted to an int, firstly, and secondly the array subscript is of type int, hence the usage of the short var got promoted and since the data size of an int is 4 bytes, it got promoted and hence you are seeing the rest of the value 0xFFFF 0xFFFC.
This is where the usage of casting comes in, to tell the compiler to cast a data type to another which explains in conjunction to Gregory Pakosz's answer above.
Hope this helps,
Best regards,
Tom.

use %hx or %hd instead to indicate that you have a short variable, e.g:
printf("short hex: %hx\n", var); /* tell printf that var is short and print out as hex */
EDIT: Uups, I got the question wrong. It was not about printf() as I thought. So this answer might be a little bit OT.
New: Because you are using var as an index to an array you should declare it as unsigned short (instead of short):
unsigned short var = 0xFFFC;
printf("%d\n", b[var]);
The 'short var' could be interpreted as a negative number.
To be more precise:
You are "underflowing" into the negative value range: Values in the range from 0x0000 upto 0x7FFF will be OK. But values from 0x8000 upto 0xFFFF will be negative.
Here are some examples of var used as an index to array b[]:
short var=0x0000;; // leads to b[0] => OK
short var=0x0001; // leads to b[1] => OK
short var=0x7FFF; // leads to b[32767] => OK
short var=0x8000; // leads to b[-32768] => Wrong
short var=0xFFFC; // leads to b[-4] => Wrong
short var=32767; // leads to the same as b[0x7FFF] => OK
short var=32768; // compile warning or error => overflow into 32bit range

You were expecting to store JUST a 16bit variable in a 32bit-aligned memory... you see, each memory address holds a whole 32bit word (hardware).
The extra FFFF comes from the fact that short is a signed value, and when assigned to int (at the printf call), it got signed-extended. When extending two-complements from 16 to 32bit, the extension is done by replicating the last N bit to all other M-N on it's left. Of course, you did not intend that.
So, in this case, you're interested in absolute array positions, so you should declare your indexer as unsigned.

In the subject of your question you have already guessed what is happening here: yes, a value of type short is "automatically extended" to a value of type int. This process is called integral promotion. That's how it always works in C language: every time you use an integral value smaller than int that value is always implicitly promoted to a value of type int (unsigned values can be promoted to unsigned int). The value itself does not change, of course, only the type of the value is changed. In your above example the 16-bit short value represented by pattern 0xFFFC is the same as 32-bit int value represented by pattern 0xFFFFFFFC, which is -4 in decimals. This, BTW, makes the rest of your question sound rather strange: promotion or not, your code is trying to access b[-4]. The promotion to int doesn't change anything.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Copying int to different memory location, receiving extra bytes than expected - c

Related

c programming question on reinterpret_cast

What is forbidden after pointer-casting a big type to a smaller type in C

Why doesn't assigning a negative integer to an unsigned int cause an error?

Char to unsigned char

short type variable automatically extended to integer type?

Categories

Resources