C how to read middle bits? - c

Please note my comments, the question is still unanswered.
I have the following which I can't change:
unsigned long addr=142;
u16 offset_low, offset_middle;
u32 offset_high;
I want to set offset_low for low 16 bits, offset_middle for mid 16 bits and offset_high for higher 32 bits of addr.
So I wrote:
offset_low = addr & 0xFFFF;
offset_middle = addr & 0xFFFF0000;
offset_high = addr & 0xFFFFFFFF0000;
Is this right? Is there any clear way to do it instead of wiriting so many F?
Why I think it's not right?
I am working with little endian, so when doing addr & 0xFFFF0000; I will get the mid bits but with zeros and it may load the zeros instead of non-zeroes.

For your purpose you must shift the masked values:
unsigned long addr = 142; // 64-bit on the target system
uint16_t offset_low = addr & 0xFFFF;
uint16_t offset_middle = (addr & 0xFFFF0000) >> 16;
uint32_t offset_high = (addr & 0xFFFFFFFF00000000) >> 32;
Note that since you extract exactly 16 and 32 bits to variables with the same size, masking can be omitted:
uint64_t addr = 142;
uint16_t offset_low = addr;
uint16_t offset_middle = addr >> 16;
uint32_t offset_high = addr >> 32;
The order of bytes in memory (little endian vs big endian) is irrelevant for this question. You could read the specific parts from memory using this knowledge, reading the first 2 bytes for offset_low, the next 2 for offset_middle and the next 4 for offset_high, but extracting from the full 64-bit value is performed the same for both architectures.

Shifting one by desired bits and then subtracting one will give sequence of bits 1 unless you want the top (most significant) bit in the integer type to be one.
Assuming that unsigned long in the environment has 33 bits or more, it can be written like this:
offset_low = addr & ((1UL << 16) - 1);
offset_middle = (addr >> 16) & ((1UL << 16) - 1);
offset_high = (addr >> 32) & ((1UL << 32) - 1);

Is this right?
Not quite, these would be correct:
offset_low = addr & 0xFFFF;
offset_middle = (addr >> 16) & 0xFFFF;
offset_high = addr >> 32;
You didn't shift your results to the right (and your high was just wrong).
I am working with little endian, so when doing addr & 0xFFFF0000; I will get the mid bits but with zeros and it may load the zeros instead of non-zeroes.
The endianness doesn't matter in code, because code runs on the same machine. It only matters during serialization, where you write to a stream on one machine and read from another machine of another endianness, thus getting garbage.

Related

Copy low-order bytes of an integer whilst preserving endianness

I need to write a function that copies the specified number of low-order bytes of a given integer into an address in memory, whilst preserving their order.
void lo_bytes(uint8_t *dest, uint8_t no_bytes, uint32_t val)
I expect the usage to look like this:
uint8 dest[3];
lo_bytes(dest, 3, 0x44332211);
// Big-endian: dest = 33 22 11
// Little-endian: dest = 11 22 33
I've tried to implement the function using bit-shifts, memcpy, and iterating over each byte of val with a for-loop, but all of my attempts failed to work on either one or the other endianness.
Is it possible to do this in a platform-independent way, or do I need to use #ifdefs and have a separate piece of code for each endianness?
I've tried to implement the function using bit-shifts, memcpy, and
iterating over each byte of val with a for-loop, but all of my
attempts failed to work on either one or the other endianness.
All arithmetic, including bitwise arithmetic, is defined in terms of the values of the operands, not their representations. This cannot be sufficient for you because you want to obtain a result that differs depending on details of the representation style for type uint32_t.
You can operate on object representations via various approaches, but you still need to know which bytes to operate upon. That calls for some form of detection. If big-endian and little-endian are the only byte orders you're concerned with supporting, then I favor an approach similar to that given in #P__J__'s answer:
void lo_bytes(uint8_t *dest, uint8_t no_bytes, uint32_t val) {
static const union { uint32_t i; uint8_t a[4] } ubytes = { 1 };
memcpy(dest, &val + (1 - ubytes.a[0]) * (4 - no_bytes), no_bytes);
}
The expression (1 - ubytes.a[0]) evaluates to 1 if the representation of uint32_t is big endian, in which case the high-order bytes occur at the beginning of the representation of val. In that case, we want to skip the first 4 - no_bytes of the representation and copy the rest. If uint32_t has a little-endian representation, on the other hand, (1 - ubytes.a[0]) will evaluate to 0, with the result that the memcpy starts at the beginning of the representation. In every case, whichever bytes are copied from the representation of val, their order is maintained. That's what memcpy() does.
Is it possible to do this in a platform-independent way, or do I need to use #ifdefs and have a separate piece of code for each endianness?
No, that doesn't even make sense. Anything that cares about a specific characteristic of a platform (e.g. endianness) can't be platform independent.
Example 1 (platform independent):
// Copy the 3 least significant bytes to dest[]
dest[0] = value & 0xFF; dest[1] = (value >> 8) & 0xFF; dest[2] = (value >> 16) & 0xFF;
Example 2 (platform independent):
// Copy the 3 most significant bytes to dest[]
dest[0] = (value >> 8) & 0xFF; dest[1] = (value >> 16) & 0xFF; dest[2] = (value >> 24) & 0xFF;
Example 3 (platform dependent):
// I want the least significant bytes on some platforms and the most significant bytes on other platforms
#ifdef PLATFORM_TYPE_A
dest[0] = value & 0xFF; dest[1] = (value >> 8) & 0xFF; dest[2] = (value >> 16) & 0xFF;
#endif
#ifdef PLATFORM_TYPE_B
dest[0] = (value >> 8) & 0xFF; dest[1] = (value >> 16) & 0xFF; dest[2] = (value >> 24) & 0xFF;
#endif
Note that it makes no real difference what the cause of the platform dependence is (if it's endianness or something else), as soon as you have a platform dependence you can't have platform independence.
int detect_endianess(void) //1 if little endian 0 if big endianes
{
union
{
uint16_t u16;
uint8_t u8[2];
}val = {.u16 = 0x1122};
return val.u8[0] == 0x22;
}
void lo_bytes(void *dest, uint8_t no_bytes, uint32_t val)
{
if(detect_endianess())
{
memcpy(dest, &val, no_bytes);
}
else
{
memcpy(dest, (uint8_t *)(&val) + sizeof(val) - no_bytes, no_bytes);
}
}

C - Increment 18 bits in C 8051

I have been programming the 8051 for about two months now and am somewhat of a newbie to the C language. I am currently working with flash memory in order to read, write, erase, and analyze it. I am working on the write phase at the moment and one of the tasks that I need to do is specify an address location and fill that location with data then increment to the next location and fill it with complementary data. So on and so forth until I reach the end.
My dilemma is I have 18 address bits to play with and currently have three bytes allocated for those 18 bits. Is there anyway that I could combine those 18 bits into an int or unsigned int and increment like that? Or is my only option to increment the first byte, then when that byte rolls over to 0x00 increment the next byte and when that one rolls over, increment the next?
I currently have:
void inc_address(void)
{
P6=address_byte1;
P7=address_byte2;
P2=address_byte3;
P5=data_byte;
while(1)
{
P6++;
if(P6==0x00){P7++;}
else if(P7==0x00){P2++;}
else if(P2 < 0x94){break;} //hex 9 is for values dealing with flash chip
P5=~data_byte;
}
}
Where address is uint32_t:
void inc_address(void)
{
// Increment address
address = (address + 1) & 0x0003ffff ;
// Assert address A0 to A15
P6 = (address & 0xff)
P7 = (address >> 8) & 0xff
// Set least significant two bits of P2 to A16,A17
// without modifying other bits in P2
P2 &= 0xFC ; // xxxxxx00
P2 |= (address >> 16) & 0x03 ; // xxxxxxAA
// Set data
P5 = ~data_byte ;
}
However it is not clear why the function is called inc_address but also assigns P5 with ~data_byte, which presumably asserts the the data bus? It is doing something more than increment an address it seems, so is poorly and confusingly named. I suggest also that the function should take address and data as parameters rather than global data.
Is there anyway that I could combine those 18 bits into an int or
unsigned int and increment like that?
Sure. Supposing that int and unsigned int are at least 18 bits wide on your system, you can do this:
unsigned int next_address = (hi_byte << 16) + (mid_byte << 8) + low_byte + 1;
hi_byte = next_address >> 16;
mid_byte = (next_address >> 8) & 0xff;
low_byte = next_address & 0xff;
The << and >> are bitwise shift operators, and the binary & is the bitwise "and" operator.
It would be a bit safer and more portable to not make assumptions about the sizes of your types, however. To avoid that, include stdint.h, and use type uint_least32_t instead of unsigned int:
uint_least32_t next_address = ((uint_least32_t) hi_byte << 16)
+ ((uint_least32_t) mid_byte << 8)
+ (uint_least32_t) low_byte
+ 1;
// ...

What is the mathematical meaning of swap high part and low part of a uint64_t?

The code is from an open source project of sha256,
uint64_t swapE64(uint64_t val) {
uint64_t x = val;
x = (x & 0xffffffff00000000) >> 32 | (x & 0x00000000ffffffff) << 32;
x = (x & 0xffff0000ffff0000) >> 16 | (x & 0x0000ffff0000ffff) << 16;
x = (x & 0xff00ff00ff00ff00) >> 8 | (x & 0x00ff00ff00ff00ff) << 8;
return x;
}
the function is not complex, but I don't know its mathematical means and usage.
My fault, I did't ask the question very clear. In different environments which use different endian representation, it is clear, this function will keep the data in a same meaning, but under the same endian representation, what does it means?
It absolutely will change the meaning of the data, or there is some other reason to swap it?
In the pseudocode for SHA256 on wikipedia it says
Pre-processing: append the bit '1' to the message append k bits '0',
where k is the minimum number >= 0 such that the resulting message
length (modulo 512 in bits) is 448. append length of message (without the '1' bit or padding), in bits, as 64-bit big-endian
integer
(this will make the entire post-processed length a multiple of 512 bits)
x86/x86_64 Linux and Unix are small endian.
It's converting the length of the message to big endian to add it to the end of the message, which it does in the source at L105 of sha256.c, and that section of the code is the only place where the swapE64 function is called:
https://github.com/noryb009/sha256/blob/77a185c837417ea3fc502289215738766a8f8046/sha256.c#L100

what does a[0] = addr & 0xff?

i'm currently learning from the book "the shellcoder's handbook", I have a strong understanding of c but recently I came across a piece of code that I can't grasp.
Here is the piece of code:
char a[4];
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff;
a[1] = (addr & 0xff00) >> 8;
a[2] = (addr & 0xff0000) >> 16;
a[3] = (addr) >> 24;
So the question is what does this, what is addr & 0xff (and the three lines below it) and what makes >> 8 to it (I know that it divides it 8 times by 2)?
Ps: don't hesitate to tell me if you have ideas for the tags that I should use.
The variable addr is 32 bits of data, while each element in the array a is 8 bits. What the code does is copy the 32 bits of addr into the array a, one byte at a time.
Lets take this line:
a[1] = (addr & 0xff00) >> 8;
And then do it step by step.
addr & 0xff00 This gets the bits 8 to 15 of the value in addr, the result after the operation is 0x0000d300.
>> 8 This shifts the bits to the right, so 0x0000d300 becomes 0x000000d3.
Assign the resulting value of the mask and shift to a[1].
The code is trying to enforce endianness on the data input. Specifically, it is trying to enforce little endian behavior on the data. Here is the explaination:
a[0] = addr & 0xff; /* gets the LSB 0xb0 */
a[1] = (addr & 0xff00) >> 8; /* gets the 2nd LSB 0xd3 */
a[2] = (addr & 0xff0000) >> 16; /* gets 2nd MSB 0x06 */
a[3] = (addr) >> 24; /* gets the MSB 0x08 */
So basically, the code is masking and separating out every byte of data and storing it in the array "a" in the little endian format.
unsigned char a[4]; /* I think using unsigned char is better in this case */
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff; /* get the least significant byte 0xb0 */
a[1] = (addr & 0xff00) >> 8; /* get the second least significant byte 0xd3 */
a[2] = (addr & 0xff0000) >> 16; /* get the second most significant byte 0x06 */
a[3] = (addr) >> 24; /* get the most significant byte 0x08 */
Apparently, the code isolates the individual bytes from addr to store them in the array a so they can be indexed. The first line
a[0] = addr & 0xff;
masks out the byte of lowest value by using 0xff as a bit mask; the subsequent lines do the same, but in addition shift the result to the rightmost position. Finally, the the last line
a[3] = (addr) >> 24;
no masking is necessary anymore, as all unneccesary information is discarded by the shift.
The code is effectively storing a 32 bit adress in a 4 chars long array. As you may know, a char has a byte (8 bit). It first copies the first byte of the adress, then shifts, copies the second byte, then shifts, etc. You get the gist.
It enforces endianness, and stores the integer in little-endian format in a.
See the illustration on wikipedia.
also, why not visualize the bit shifting results..
char a[4];
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff;
a[1] = (addr & 0xff00) >> 8;
a[2] = (addr & 0xff0000) >> 16;
a[3] = (addr) >> 24;
int i = 0;
for( ; i < 4; i++ )
{
printf( "a[%d] = %02x\t", i, (unsigned char)a[i] );
}
printf("\n" );
Output:
a[0] = b0 a[1] = d3 a[2] = 06 a[3] = 08
I addition to the multiple answers given, the code has some flaws that need to be fixed to make the code portable. In particular, the char type is very dangerous to use for storing values, because of its implementation-defined signedness. Very classic C bug. If the code was taken from a book, then you should read that book sceptically.
While we are at it, we can also tidy up the code, make it overly explicit to avoid potential future maintenance bugs, remove some implicit type promotions of integer literals etc.
#include <stdint.h>
uint8_t a[4];
uint32_t addr = 0x0806d3b0UL;
a[0] = addr & 0xFFu;
a[1] = (addr >> 8) & 0xFFu;
a[2] = (addr >> 16) & 0xFFu;
a[3] = (addr >> 24) & 0xFFu;
The masks & 0xFFu are strictly speaking not needed, but they might save you from some false positive compiler warnings about wrong integer types. Alternatively, each shift result could be cast to uint8_t and that would have been fine too.

Mask or not mask when converting int to byte array?

Say you have a integer and you want to convert it to a byte array. After searching various places I've seen two ways of doing this, one with is shift only and one is shift then mask. I understand the shifting part, but why masking?
For example, scenario 1:
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = someInt >> 24;
someByteArray[1] = someInt >> 16;
someByteArray[2] = someInt >> 8;
someByteArray[3] = someInt;
Scenario 2:
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = (someInt >> 24) & 0xFF;
someByteArray[1] = (someInt >> 16) & 0xFF;
someByteArray[2] = (someInt >> 8) & 0xFF;
someByteArray[3] = someInt & 0xFF;
Is there a reason for choosing one over the other?
uint8 and uint32 are not standard types in C. I assume they represent 8-bit and 32-bit unsigned integral types, respectively (such as supported by Microsoft compilers as a vendor-specific extension).
Anyways ....
The masking is more general - it ensures the result is between 0 and 0xFF regardless of the actual type of elements someByteArray or of someInt.
In this particular case, it makes no difference, since the conversion of uint32 to uint8 is guaranteed to use modulo arithmetic (modulo 0xFF + 0x01 which is equal to 0x100 or 256 in decimal). However, if your code is changed to use variables or arrays of different types, the masking is necessary to ensure the result is between 0 and 255 (inclusive).
With some compilers the masking stops compiler warnings (it effectively tells the compiler that the expression produces a value between 0 and 0xFF, which can be stored in a 8 bit unsigned). However, some other compilers complain about the act of converting a larger type to an 8 bit type. Because of that, you will sometimes see a third variant, which truly demonstrates a "belts and suspenders" mindset.
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = (uint8)((someInt >> 24) & 0xFF);
someByteArray[1] = (uint8)(someInt >> 16) & 0xFF);
someByteArray[2] = (uint8)((someInt >> 8) & 0xFF);
someByteArray[3] = (uint8)(someInt & 0xFF);

Resources