Extra bytes with 2s complement - c

I'm getting a lot of extra bytes when I find the twos complement of a byte.
for example "eb" turns into "ffffff15". When I printf this, it's -235, not -21 like I would expect it to be.
//unsigned char a[] holds bytes
int b=(int)a[i];
bit1=(b & 0x80 ? 1 : 0);
if (bit1==1){
b=((~b)+1);
}
printf("b: %02x",b);
this prints ffffff15. (%d prints -235).

You did not post the complete code, but it appears a must be an unsigned char array or a unsigned char pointer, or a char array and the type char is unsigned on your platform. Casting a[i] as (int) does not change the value, and evaluates to 235.
The formula b=((~b)+1); does not extend the most significant bit, but merely computes as b = -b. Hence the result -235.
To replicate to most significant bit, you could write:
b = (a[i] & 0xFF) | (-bit1 & ~0xFF);
Here is a complete example:
#include <stdio.h>
int main() {
// a is a byte array
unsigned char a[] = "0a\xeb";
for (size_t i = 0; i < sizeof a; i++) {
int b = (int)a[i];
int s = (b & 0xFF) | ((b & 0x80) ? ~0xFF : 0);
printf("a[%zd] = 0x%hhx, b: 0x%x, %d, s: 0x%x, %d\n",
i, a[i], b, b, s, s);
}
return 0;
}
It prints:
a[0] = 0x30, b: 0x30, 48, s: 0x30, 48
a[1] = 0x61, b: 0x61, 97, s: 0x61, 97
a[2] = 0xeb, b: 0xeb, 235, s: 0xffffffeb, -21
a[3] = 0x0, b: 0x0, 0, s: 0x0, 0

Trying to understand the problem: "eb turns into ffffff15. When I printf this, it's -235, not -21".
So it seems that you expect ffffff15 to represent -21 in decimal, and this seems as if you had a (very common) misunderstanding in how negative integral values are represented: Making a number negative does not just set a single "negative" bit, but actually inverts all the bits (and adds 1). Let's assume an 8 bit example:
1..0000001 -1..11111111
2..0000010 -2..11111110
3..0000011 -3..11111101
...
As you can see, making an integral number int x negative turns it actually into INT_MAX - x + 1 (which is the same as ~x + 1), not into x | 0x800000. In the other way round, 0xfffff15 represents 0xfffffff - 0xeb + 1, which means -235, not -21.

Related

How is an integer stored in C program?

is the number 1 stored in memory as 00000001 00000000 00000000 00000000?
#include <stdio.h>
int main()
{
unsigned int a[3] = {1, 1, 0x7f7f0501};
int *p = a;
printf("%d %p\n", *p, p);
p = (long long)p + 1;
printf("%d %p\n", *p, p);
char *p3 = a;
int i;
for (i = 0; i < 12; i++, p3++)
{
printf("%x %p\n", *p3, p3);
}
return 0;
}
Why is 16777216 printed in the output:
An integer is stored in memory in different ways on different architectures. Most commons ways are called little-endian and big-endian byte ordering.
See Endianness
(long long)p+1
|
v
Your memory: [0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, ...]
You increment p not like pointer but as a long long number, so it does not point to next integer but the next byte. So you will get 0x00, 0x00, 0x00, 0x01 which translates to 0x1000000 (decimal 16777216) in a little-endian arch.
Something to play with (assuming int is 32 bits wide):
#include <stdio.h>
#include <stdbool.h>
typedef union byte_rec {
struct bit_rec {
bool b0 : 1;
bool b1 : 1;
bool b2 : 1;
bool b3 : 1;
bool b4 : 1;
bool b5 : 1;
bool b6 : 1;
bool b7 : 1;
} bits;
unsigned char value;
} byte_t;
typedef union int_rec {
struct bytes_rec {
byte_t b0;
byte_t b1;
byte_t b2;
byte_t b3;
} bytes;
int value;
} int_t;
void printByte(byte_t *b)
{
printf(
"%d %d %d %d %d %d %d %d ",
b->bits.b0,
b->bits.b1,
b->bits.b2,
b->bits.b3,
b->bits.b4,
b->bits.b5,
b->bits.b6,
b->bits.b7
);
}
void printInt(int_t *i)
{
printf("%p: ", i);
printByte(&i->bytes.b0);
printByte(&i->bytes.b1);
printByte(&i->bytes.b2);
printByte(&i->bytes.b3);
putchar('\n');
}
int main()
{
int_t i1, i2;
i1.value = 0x00000001;
i2.value = 0x80000000;
printInt(&i1);
printInt(&i2);
return 0;
}
Possible output:
0x7ffea0e30920: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0x7ffea0e30924: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Additional (based on the comment of #chqrlie):
I've previously used the unsigned char type, but the C Standard allows only 3 - and since C99 - 4 types. Additional implementation-defined types may be acceptable by the C Standard and it seems that gcc was ok with the unsigned char type for the bit field, but i've changed it nevertheless to the allowed type _Bool (since C99).
Noteworthy: The order of bit fields within an allocation unit (on some platforms, bit fields are packed left-to-right, on others right-to-left) are undefined (see Notes section in the reference).
Reference to bit fields: https://en.cppreference.com/w/c/language/bit_field
p = (long long)p + 1; is bad code (undefined behavior UB (e.g. bus fault and re-booted machine)) as it is not specified to work in C. The attempted assigned of the newly formed address is not certainly aligned to int * needs.
Don`t do that.
To look at the bytes of a[]
#include <stdio.h>
#include <stdlib.h>
void dump(size_t sz, const void *ptr) {
const unsigned char *byte_ptr = (const unsigned char *) ptr;
for (size_t i = 0; i < sz; i++) {
printf("%p %02X\n", (void*) byte_ptr, *byte_ptr);
byte_ptr++;
}
}
int main(void) {
unsigned int a[3] = {1, 1, 0x7f7f0501u};
dump(sizeof a, a);
}
As this is wiki, feel open to edit.
There are multiple instances of undefined behavior in your code:
in printf("%d %p\n", *p, p) you should cast p as (void *)p to ensure printf receives a void * as it expects. This is unlikely to pose a problem on most current targets but some ancien systems had different representations for int * and void *, such as early Cray systems.
in p = (long long)p + 1, you have implementation defined behavior converting a pointer to an integer and implicitly converting the integral result of the addition back to a pointer. More importantly, this may create a pointer with incorrect alignment for accessing int in memory, resulting in undefined behavior when you dereference p. This would cause a bus error on many systems, eg: most RISC architectures, but by chance not on intel processors. It would be safer to compute the pointer as p = (void *)((intptr_t)p + 1); or p = (void *)((char *)p + 1); albeit this would still have undefined behavior because of alignment issues.
is the number 1 stored in memory as 00000001 00000000 00000000 00000000?
Yes, your system seems to use little endian representation for int types. The least significant 8 bits are stored in the byte at the address of a, then the next least significant 8 bits, and so on. As can be seen in the output, 1 is stored as 01 00 00 00 and 0x7f7f0501 stored as 01 05 7f 7f.
Why is 16777216 printed in the output?
The second instance of printf("%d %p\n", *p, p) as undefined behavior. On your system, p points to the second byte of the array a and *p reads 4 bytes from this address, namely 00 00 00 01 (the last 3 bytes of 1 and the first byte of the next array element, also 1), which is the representation of the int value 16777216.
To dump the contents of the array as bytes, you should access it using a char * as you do in the last loop. Be aware that char may be signed on some systems, causing for example printf("%x\n", *p3); to output ffffff80 if p3 points to the byte with hex value 80. Using unsigned char * is recommended for consistent and portable behavior.

How to convert hexadecimal array to decimal and back again in C?

I have got a hexadecimal array, which I want to convert to decimal, do a modulus operation, and then convert it back to hexadecimal.
int main()
{
char num[] = {0x02,0x03,0x04};
long n = strtol(num, NULL, 16);
printf("n=%ld\n", n);
}
I am getting "0" here while I am expecting "262914".
EDIT:
I know that putting char num[] = "0x040302" will give me the expected output, but it needs to be withing the {} like {0x02,0x03,0x04}
I know that putting char num[] = "0x040302" will give me the expected
output, but it needs to be withing the {} like {0x02,0x03,0x04}
"0x040302" is the equivalent of { '0', 'x', '0', '4', '0', '3', '0', '2', 0 },
or { 0x30, 0x78, 0x30, 0x34, 0x30, 0x33, 0x30, 0x32, 0x00 } assuming ASCII.
The answer is simply num[0] + (num[1] << 8) + (num[2] << 16). In general, if num is n bytes long, then the answer is:
long i, r = 0;
for (i = 0; i < n; ++i)
{
r += (long)(num[i]) << (8*i);
}
Of course make sure that num fits in a long (generally 4 or 8 bytes depending on your system).

Combining two unsigned bytes to a single integer value using left-shift and bitwise-or

I'm reading 2 bytes which together build up an unsigned short value, from 0 to 65536. I want to combine them to a single value, so here what I've done:
int32_t temp;
uint8_t buffer[2];
.............
temp = (buffer[1] << 8) /* [MSByte]*/| (buffer[0]/* [LSByte]*/);
printf (" %d" ,temp) ;
I still get an overflow at 32767. Any idea why?
Cast byte to int before shifting, i.e.:
((int32_t)buffer[1] << 8) | buffer[0]
P.S. 2 bytes can store an unsigned integer value in range of [0, 65535]; the value of 65536 you've mentioned is out of range.
Complete test program — try different byte values in buffer:
#include <stdio.h>
#include <stdint.h>
int main()
{
//uint8_t buffer[2] = {255, 0 }; // prints 255
//uint8_t buffer[2] = {255, 127}; // prints 32767
uint8_t buffer[2] = {255, 255}; // prints 65535
int32_t temp = ((int32_t)buffer[1] << 8) | buffer[0];
printf("%d", temp);
}

simple bit manipulation fails

I am learning bit manipulation in C and I have written a simple program. However the program fails. Can someone please look into this code?
Basically I want to extract and reassemble a 4 byte 'long' variable to its induvidual bytes and vice versa. Here is my code:
printf("sizeof char= %d\n", sizeof(char));
printf("sizeof unsigned char= %d\n", sizeof(unsigned char));
printf("sizeof int= %d\n", sizeof(int));
printf("sizeof long= %d\n", sizeof(long));
printf("sizeof unsigned long long= %d\n", sizeof(unsigned long long));
long val = 2;
int k = 0;
size_t len = sizeof(val);
printf("val = %ld\n", val);
printf("len = %d\n", len);
char *ptr;
ptr = (char *)malloc(sizeof(len));
//converting 'val' to char array
//val = b3b2b1b0 //where 'b is 1 byte. Since 'long' is made of 4 bytes, and char is 1 byte, extracting byte by byte of long into char
//do{
//val++;
for(k = 0; k<len; k++){
ptr[k] = ((val >> (k*len)) && 0xFF);
printf("ptr[%d] = %02X\n", k,ptr[k]);
}
//}while(val < 12);
//reassembling the bytes from char and converting them to long
long xx = 0;
int m = 0;
for(m = 0; m< len; m++){
xx = xx |(ptr[m]<<(m*8));
}
printf("xx= %ld\n", xx);
Why don't I see xx returning 2?? Also, irrespective of the value of 'val', the ptr[0] seems to store 1 :(
Please help
Thanks in advance
ptr[k] = ((val >> (k*len)) && 0xFF);
Should be
ptr[k] = ((val >> (k*8)) & 0xFF);
&& is used in conditional statements and & for bitwise and.
Also as you're splitting the value up into chars, each iteration of the loop you want to shift with as many bits as are in a byte. This is almost always 8 but can be something else. The header file limits.h has the info about that.
A few things I notice:
You're using the boolean && operator instead of bitwise &
You're shifting by "k*len" instead of "k*8"
You're allocating an array with "sizeof(len)", instead of just "len"
You're using "char" instead of "unsigned char". This will make the "(ptr[m]<<(m*8))" expression sometimes give you a negative number.
So a fixed version of your code would be:
printf("sizeof char= %d\n", sizeof(char));
printf("sizeof unsigned char= %d\n", sizeof(unsigned char));
printf("sizeof int= %d\n", sizeof(int));
printf("sizeof long= %d\n", sizeof(long));
printf("sizeof unsigned long long= %d\n", sizeof(unsigned long long));
long val = 2;
int k = 0;
size_t len = sizeof(val);
printf("val = %ld\n", val);
printf("len = %d\n", len);
unsigned char *ptr;
ptr = (unsigned char *)malloc(len);
//converting 'val' to char array
//val = b3b2b1b0 //where 'b is 1 byte. Since 'long' is made of 4 bytes, and char is 1 byte, extracting byte by byte of long into char
//do{
//val++;
for(k = 0; k<len; k++){
ptr[k] = ((val >> (k*8)) & 0xFF);
printf("ptr[%d] = %02X\n", k,ptr[k]);
}
//}while(val < 12);
//reassembling the bytes from char and converting them to long
long xx = 0;
int m = 0;
for(m = 0; m< len; m++){
xx = xx |(ptr[m]<< m*8);
}
printf("xx= %ld\n", xx);
Also, in the future, questions like this would be better suited to https://codereview.stackexchange.com/
As others have by now mentioned, I'm not sure if ptr[k] = ((val >> (k*len)) && 0xFF); does what you want it to. The && operator is a boolean operator. If (value >> (k*len)) is some non-zero value, and 0xFF is some non-zero value, then the value stored into ptr[k] will be one. That's the way boolean operators work. Perhaps you meant to use & instead of &&.
Additionally, you've chosen to use shift operators, which is appropriate for unsigned types, but has a variety of non-portable aspects for signed types. xx = xx |(ptr[m]<<(m*8)); potentially invokes undefined behaviour, for example, because it looks like it could result in signed integer overflow.
In C, sizeof (char) is always 1, because the sizeof operator tells you how many chars are used to represent a type. eg. sizeof (int) tells you how many chars are used to represent ints. It's CHAR_BIT that changes. Thus, your code shouldn't rely upon the sizeof a type.
In fact, if you want your code to be portable, then you shouldn't be expecting to be able to store values greater than 32767 or less than -32767 in an int, for example. This is regardless of size, because padding bits might exist. To summarise: the sizeof a type doesn't necessarily reflect the set of values it can store!
Choose the types of your variables for their application, portably. If your application doesn't need values beyond that range, then int will do fine. Otherwise, you might want to think about using a long int, which can store values between (and including) -2147483647 and 2147483647, portably. If you need values beyond that, use a long long int, which will give you the guaranteed range consisting of at least the values between -9223372036854775807 and 9223372036854775807. Anything beyond that probably deserves a multi-precision arithmetic library such as GMP.
When you don't expect to use negative values, you should use unsigned types.
With consideration given to your portable choice of integer type, it now makes sense that you can devise a portable way to write those integers into files, and read those integers from files. You'll want to extract the sign and absolute value into unsigned int:
unsigned int sign = val < 0; /* conventionally 1 for negative, 0 for positive */
unsigned int abs_val = val;
if (val < 0) { abs_val = -abs_val; }
... and then construct an array of 8-bit chunks of abs_val and sign, merged together. We've already decided using portable decision-making that our int can only store 16 bits, because we're only ever storing values between -32767 and 32767 in it. As a result, there is no need for a loop, or bitwise shifts. We can use multiplication to move our sign bit, and division/modulo to reduce our absolute value. Consider that the sign conventionally goes with the most significant bit, which is either at the start (big endian) or the end (little endian) of our array.
unsigned char big_endian[] = { sign * 0x80 + abs_val / 0x100,
abs_value % 0x100 };
unsigned char lil_endian[] = { abs_value % 0x100,
sign * 0x80 + abs_val / 0x100 };
To reverse this process, we perform the opposite operations in reverse of each other (that is, using division and modulo in place of multiplication, multiplication in place of division and addition, extract the sign bit and reform the value):
unsigned int big_endian_sign = array[0] / 0x80;
int big_endian_val = big_endian_sign
? -((array[0] % 0x80) * 0x100 + array[1])
: ((array[0] % 0x80) * 0x100 + array[1]);
unsigned int lil_endian_sign = array[1] / 0x80;
int lil_endian_val = lil_endian_sign
? -((array[1] % 0x80) * 0x100 + array[0])
: ((array[1] % 0x80) * 0x100 + array[0]);
The code gets a little more complex for long, and it becomes worthwhile to use binary operators. The extraction of sign and absolute value remains essentially the same, with the only changes being the type of the variables. We still don't need loops, because we made a decision that we only care about values representable portably. Here's how I'd convert from a long val to an unsigned char[4]:
unsigned long sign = val < 0; /* conventionally 1 for negative, 0 for positive */
unsigned long abs_val = val;
if (val < 0) { abs_val = -abs_val; }
unsigned char big_endian[] = { (sign << 7) | ((abs_val >> 24) & 0xFF),
(abs_val >> 16) & 0xFF,
(abs_val >> 8) & 0xFF,
abs_val & 0xFF };
unsigned char lil_endian[] = { abs_val & 0xFF,
(abs_val >> 8) & 0xFF,
(abs_val >> 16) & 0xFF,
(sign << 7) | ((abs_val >> 24) & 0xFF) };
... and here's how I'd convert back to the signed value:
unsigned int big_endian_sign = array[0] >> 7;
long big_endian_val = big_endian_sign
? -((array[0] & 0x7F) << 24) + (array[1] << 16) + (array[2] << 8) + array[3]
: ((array[0] & 0x7F) << 24) + (array[1] << 16) + (array[2] << 8) + array[3];
unsigned int lil_endian_sign = array[3] >> 7;
long lil_endian_val = lil_endian_sign
? -((array[3] & 0x7F) << 24) + (array[2] << 16) + (array[1] << 8) + array[0]
: ((array[3] & 0x7F) << 24) + (array[2] << 16) + (array[1] << 8) + array[0];
I'll leave you to devise a scheme for unsigned and long long types... and open up the floor for comments:

Array to Array arithmetic

I have 24bit data in 3 array a[0], a[1], a[2] and need to calculate for multiply and divide by some constant and result still in 3 array.
For example, data = 999900h store in a[0] = 99, a[1] = 99, a[2] = 00
[(999900h/64)*15000]/157286 << **process???**
result will be 3A97h store in b[0] = 00, b[1] =3A, b[2] = 97
My question is
1.) How to write code for fast calculate in the process, pointer in fast? how to use pointer in the process?
2.) It possible no use conversion process like array to integer and integer to array?
Here's the easiest "solution":
uint32_t data = 0x00999900;
unsigned char const * a = (unsigned char const *)&data;
Now you have a[0], ..., a[3]. The order depends on the endianness of your system.
The endianness-independent solution works algebraically:
uint32_t data = 0x3A97;
unsigned char b[sizeof data] = { data >> 24 & 0xFF, // b[0]
(data >> 16) & 0xFF, // b[1]
(data >> 8) & 0xFF, // b[2]
data & 0xFF // b[3]
};
You can also reconstitute a value from your array. Here's the endianness-dependent way:
uint32_t data;
unsigned char * p = (unsigned char *)&data;
p[0] = 0x00;
p[0] = 0x99;
p[0] = 0x99;
p[0] = 0x00;
// now "data" is 0x00999900
And here's the algebraic way:
uint32_t data = a[0] * 256 * 256 * 256 + a[1] * 256 * 256 + a[2] * 256 + a[3];
I like to use unions in this case:
#inlude<stdint.h>
union array_int {
char a[4];
uint32_t num;
} data = {.a = {00, 99, 99, 00}};
printf("%d", data.num);
Please take endianess into account. use htonl if you put in your bytes most significant - to least significant, but are on a little endian system. If you don't want to mess around with endianess then I suggest you use one of the suggested algebraic suggestions.

Resources