small bitoperation problem with unsigned int in combination with unsigned char - c

Hi I got a small conceptual problem regarding bitoperations.
See the below code where I have a 4byte unsigned int. Then I access the individual bytes by assigning the address's to unsigned chars.
I then set the value of the last byte to one. And perform a shift right on the unsigned int(the 4byte variable). I do not understand why this operation apparantly changes the content of the 3byte.
See code below along with the output when I run it
#include <cstdio>
int main(int argc,char **argv){
fprintf(stderr,"sizeof(unsigned int): %lu sizeof(unsigned char):%lu\n",sizeof(unsigned int),sizeof(unsigned char));
unsigned int val=0;
unsigned char *valc =(unsigned char*) &val;
valc[3] = 1;
fprintf(stderr,"uint: %u, uchars: %u %u %u %u\n",val,valc[0],valc[1],valc[2],valc[3]);
val = val >>1;
fprintf(stderr,"uint: %u, uchars: %u %u %u %u\n",val,valc[0],valc[1],valc[2],valc[3]);
return 0;
}
sizeof(unsigned int): 4 sizeof(unsigned char):1
uint: 16777216, uchars: 0 0 0 1
uint: 8388608, uchars: 0 0 128 0
Thanks in advance

You've discovered that your computer doesn't always store the bytes for multi-byte data types in the order you happen to expect. valc[0] is the least significant byte (LSB) on your system. Since the LSB is stored at the lowest memory address, it is known as a "little-endian" system. At the other end, valc[3] is the most significant byte (MSB).
Your output will make more sense to you if you print valc[3],valc[2],valc[1],valc[0] instead, since humans expect the most significant values to be on the left.
Other computer architectures are "big-endian" and will store the most significant byte first.
This article also explains this concept in way more detail:
https://en.wikipedia.org/wiki/Endianness
The book "The Practice of Programming" by Brian Kernighan and Rob Pike also contains some good coverage on byte order (Section 8.6 Byte Order) and how to write portable programs that work on both big-endian and little-endian systems.

If we change the output of the int to hex (i.e. change %u to %x), what happens becomes more apparent:
uint: 1000000, uchars: 0 0 0 1
uint: 800000, uchars: 0 0 128 0
The value of val is shifted right by 1. This results in the low bit of the highest order byte getting shifted into the high bit of the next byte.

Related

How to analyze bytes of a variable's value in C

is it possible to divide for example an integer in n bits?
For example, since an int variable has a size of 32 bits (4 bytes) is it possible to divide the number in 4 "pieces" of 8 bits and put them in 4 other variables that have a size of 8 bits?
I solved using unsigned char *pointer pointing to the variable that I want to analyze bytes, something like this:
int x = 10;
unsigned char *p = (unsigned char *) &x;
//Since my cpu is little endian I'll print bytes from the end
for(int i = sizeof(int) - 1; i >= 0; i--)
//print hexadecimal bytes
printf("%.2x ", p[i]);
Yes, of course it is. But generally we just use bit operations directly on the bits (called bitops) using bitwise operators defined for all discrete integer types.
For instance, if you need to test the 5th least significant bit you can use x &= 1 << 4 to have x just to have the 5th bit set, and all others set to zero. Then you can use if (x) to test if it has been set; C doesn't use a boolean type but assumes that zero is false and any other value means true. If you store 1 << 4 into a constant then you have created a "(bit) mask" for that particular bit.
If you need a value 0 or 1 then you can use a shift the other way and use x = (x >> 4) & 1. This is all covered in most C books, so I'd implore you to read about these bit operations there.
There are many Q/A's here how to split integers into bytes, see e.g. here. In principle you can store those in a char, but if you may require integer operations then you can also split the int into multiple values. One problem with that is that an int is just defined to at least store values from -32768 to 32767. That means that the number of bytes in an int can be 2 bytes or more.
In principle it is also possible to use bit fields but I'd be hesitant to use those. With an int you will at least know that the bits will be stored in the least significant bits.

Usage of uint_8, uint_16 and uint_32

Please explain each case in detail, what is happening under the hood and why I am getting 55551 and -520103681 specifically.
typedef uint_8 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :255
typedef uint_16 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :55551
typedef uint_32 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :-520103681
I am reading from a file having first four bytes as 255 216 255 244.
In your 3 cases, before the printf statement, the 4 first bytes of the arr array (in hexadecimal format) are: FF D8 FF E0, which corresponds to 255 216 255 224.
Here are explanations of each case:
arr[0] has uint8_t type, so its 1-byte value is 0xFF, so printf("%i", arr[0]); prints 255, which corresponds to 0xFF in signed decimal integer format required by %i, integer size being 4 bytes.
arr[0] has uint16_t type, so its 2-bytes value is 0xFFD8, so printf("%i", arr[0]); prints 55551, which corresponds to 0xFFD8 in signed decimal integer format required by %i, integer size being 4 bytes. Note that 0xFFD8 is interpreted with little-endianness (the byte 0xD8 is the MSB).
arr[0] has uint32_t type, so its 4-bytes value is 0xFFD8FFE0, so printf("%i", arr[0]); prints -520103681, which corresponds to 0xFFD8FFE0 in signed decimal integer format required by %i, integer size being 4 bytes. Note that 0xFFD8FFE0 is interpreted with little-endianness (the byte 0xE0 is the MSB).
Note: I voluntarily changed "255 216 255 244" from your post into "255 216 255 224". I think you made a typo.
You seem to mixup a lot of misunderstandings. Three out of four lines in your examples are questionable. So let's disect them.
typedef uint_8 BYTE; why do you have this typedef and where is uint_8 coming from I suggest you just use uint8_t from stdint.h and for a minimal example you could skip your typedef to byte completly.
The documentation of fread tells us that:
the second parameter is the size of each element to be read
the third parameter is the number of elements to be read
There are other ways to get the values into memory and to make the program reproducable by copy and paste we can just enter the corresponding values to memory. If you have problems with fread that would be a different question.
So it would be one of those lines (your last value has to be 224 not 244 to get -520103681)
uint8_t arr[512] ={0xFF, 0xD8, 0xFF, 0xE0}; //{255, 216, 255, 224}
uint16_t arr[512] = {0xD8FF, 0xF4FF };//{255<<8 + 216, 255<<8 + 224} reversed because of the endianess
uint32_t arr[512] = {0xE0FFD8FF}; //{255<<24 + 216<<16 + 255<<8 + 224}
Now you can see that the arrays are of different size and 16/32 bit hardly qualify as a BYTE
In the last line you use printf() wrong. If you look up the output and length specifiers for printf() you can see that i is used for int (which probably is 32bit).
Basically you tell it read arr[0] (of whatever type) as signed int.
This results in the values you see above. The (nearly) correct specifiers would be %hhu for unsigned char, %hu for unsigned short and %u for unsigned int.
But as you use size defined variables it would be better to use inttypes.h and the specifiers :
PRIu8, PRIu16 and PRIu32 accordingly like this
printf("%"PRIu8,arr[0]);
putting them all together yields:
#include <stdio.h>
#include <inttypes.h>
int main(void)
{
uint32_t arr[512] = {0xE0FFD8FF};
printf("%"PRIu32,arr[0]);
return 0;
}
If you make eleminate all the problems in the code we get closer to the problem.
Problem 1 might have been that you forgot about endianess so you might expect the bytes in a different order.
Also if you use the signed specifier for printf and the MSB is 1 you get a negative value, that won't happen if you use the correct specifier for unsigned values.

32-bits as hex number in C with simple function call

I would like to know if I could get away with using printf to print 32 bits of incoming binary data from a microcontroller as a hexadecimal number. I already have collected the bits into an large integer variable and I'm trying "%x" option in printf but all I seem to get are 8-bit values, although I can't tell if that's a limitation with printf or my microcontroller is actually returning that value.
Here's my code to receive data from the microcontroller:
printf("Receiving...\n");
unsigned int n=0,b=0;
unsigned long lnum=0;
b=iolpt(1); //call to tell micro we want to read 32 bits
for (n=0;n<32;n++){
b=iolpt(1); //read bit one at a time
printf("Bit %d of 32 = %d\n",n,b);
lnum<<1; //shift bits in our big number left by 1 position
lnum+=b; //and add new value
}
printf("\n Data returned: %x\n",lnum); //always returns 8-bits
The iolpt() function always returns the bit read from the microcontroller and the value returned is a 0 or 1.
Is my idea of using %x acceptable for a 32-bit hexadecimal number or should I attempt something like "%lx" instead of "%x" to try to represent long hex even though its documented nowhere or is printf the wrong function for 32-bit hex? If its the wrong function then is there a function I can use that works, or am I forced to break up my long number into four 8-bit numbers first?
printf("Receiving...\n");
iolpt(1); // Tell micro we want to read 32 bits.
/* Is this correct? It looks pretty simple to be
initiating a read. It is the same as the calls
below, iolpt(1), so what makes it different?
Just because it is first?
*/
unsigned long lnum = 0;
for (unsigned n = 0; n < 32; n++)
{
unsigned b = iolpt(1); // Read bits one at a time.
printf("Bit %u of 32 = %u.\n", n, b);
lnum <<= 1; // Shift bits in our big number left by 1 position.
// Note this was changed to "lnum <<= 1" from "lnum << 1".
lnum += b; // And add new value.
}
printf("\n Data returned: %08lx\n", lnum);
/* Use:
0 to request leading zeros (instead of the default spaces).
8 to request a field width of 8.
l to specify long.
x to specify unsigned and hexadecimal.
*/
Fixed:
lnum<<1; to lnum <<= 1;.
%x in final printf to %08lx.
%d in printf in loop to %u, in two places.
Also, cleaned up:
Removed b= in initial b=iolpt(1); since it is unused.
Moved definition of b inside loop to limit its scope.
Moved definition of n into for to limit its scope.
Used proper capitalization and punctuation in comments to improve clarity and aesthetics.
Would something like that work for you?
printf("Receiving...\n");
unsigned int n=0,b=0;
unsigned long lnum=0;
b=iolpt(1); //call to tell micro we want to read 32 bits
for (n=0;n<32;n++){
b=iolpt(1); //read bit one at a time
printf("Bit %d of 32 = %d\n",n,b);
lnum<<1; //shift bits in our big number left by 1 position
lnum+=b; //and add new value
}
printf("\n Data returned: %#010lx\n",lnum); //now returns 32-bit

Output of the following C code

What will be the output of the following C code. Assuming it runs on Little endian machine, where short int takes 2 Bytes and char takes 1 Byte.
#include<stdio.h>
int main() {
short int c[5];
int i = 0;
for(i = 0; i < 5; i++)
c[i] = 400 + i;
char *b = (char *)c;
printf("%d", *(b+8));
return 0;
}
In my machine it gave
-108
I don't know if my machine is Little endian or big endian. I found somewhere that it should give
148
as the output. Because low order 8 bits of 404(i.e. element c[4]) is 148. But I think that due to "%d", it should read 2 Bytes from memory starting from the address of c[4].
The code gives different outputs on different computers because on some platforms the char type is signed by default and on others it's unsigned by default. That has nothing to do with endianness. Try this:
char *b = (char *)c;
printf("%d\n", (unsigned char)*(b+8)); // always prints 148
printf("%d\n", (signed char)*(b+8)); // always prints -108 (=-256 +148)
The default value is dependent on the platform and compiler settings. You can control the default behavior with GCC options -fsigned-char and -funsigned-char.
c[4] stores 404. In a two-byte little-endian representation, that means two bytes of 0x94 0x01, or (in decimal) 148 1.
b+8 addresses the memory of c[4]. b is a pointer to char, so the 8 means adding 8 bytes (which is 4 two-byte shorts). In other words, b+8 points to the first byte of c[4], which contains 148.
*(b+8) (which could also be written as b[8]) dereferences the pointer and thus gives you the value 148 as a char. What this does is implementation-defined: On many common platforms char is a signed type (with a range of -128 .. 127), so it can't actually be 148. But if it is an unsigned type (with a range of 0 .. 255), then 148 is fine.
The bit pattern for 148 in binary is 10010100. Interpreting this as a two's complement number gives you -108.
This char value (of either 148 or -108) is then automatically converted to int because it appears in the argument list of a variable-argument function (printf). This doesn't change the value.
Finally, "%d" tells printf to take the int argument and format it as a decimal number.
So, to recap: Assuming you have a machine where
a byte is 8 bits
negative numbers use two's complement
short int is 2 bytes
... then this program will output either -108 (if char is a signed type) or 148 (if char is an unsigned type).
To see what sizes types have in your system:
printf("char = %u\n", sizeof(char));
printf("short = %u\n", sizeof(short));
printf("int = %u\n", sizeof(int));
printf("long = %u\n", sizeof(long));
printf("long long = %u\n", sizeof(long long));
Change the lines in your program
unsigned char *b = (unsigned char *)c;
printf("%d\n", *(b + 8));
And simple test (I know that it is not guaranteed but all C compilers I know do it this way and I do not care about old CDC or UNISYS machines which had different addresses and pointers to different types of data
printf(" endianes test: %s\n", (*b + (unsigned)*(b + 1) * 0x100) == 400? "little" : "big");
Another remark: it is only because in your program c[0] == 400

c, hex representation in big-endian system

What is result of:
int x = 0x00000001;
int y = 0x80000000;
in a big-endian system?
My goal is to define an int that has the first (in memory) bit set, regardless of whether it is the most significant one or the least significant one. I know that with little-endian systems, x would satisfy this requirement, but is it still true in a big-endian system?
I'm pretty sure that the following will work in both systems:
char c[4] = {0x80, 0, 0, 0};
int x = (int) c;
Is that correct? Is there a more elegant method?
(I don't have a big-endian system to experiment on)
What you probably want is this:
int x = 0;
char* p = (char*)&x;
p[0] = 0x01;
The above code will set the least significant bit in the lowest-address byte of an int variable to 1:
On a Big-Endian processor, it will set the LS-bit in the MS-byte to 1 (i.e., x == 0x10000000).
On a Little-Endian processor, it will set the LS-bit in the LS-byte to 1 (i.e., x == 0x00000001).
Having said that, what is your definition of "the first bit"? IMHO it is simply the least significant one, in which case, int x = 0x00000001 is the answer regardless of the Endianness of your processor!!
The following terminology might help you to understand a little better:
Set the least significant bit in an 8-bit byte: 0x01
Set the most significant bit in an 8-bit byte: 0x80
Set the least significant byte in a 4-byte integer: 0x000000FF
Set the most significant byte in a 4-byte integer: 0xFF000000
Set the lowest-address byte in a 4-byte integer on a LE processor: 0x000000FF
Set the lowest-address byte in a 4-byte integer on a BE processor: 0xFF000000
Set the highest-address byte in a 4-byte integer on a LE processor: 0xFF000000
Set the highest-address byte in a 4-byte integer on a BE processor: 0x000000FF
You can try unions
union foo
{
char array[8];
int64_t num;
};

Resources