Reinterpreting memory/pointers - c

Just a quick question concerning the rust programming language.
Assume you had the following in C:
uint8_t *someblockofdata; /* has certain length of 4 */
uint32_t *anotherway = (uint32_t*) someblockofdata;
Regardless of the code not being all that useful and rather ugly, how would I go about doing that in rust? Say you have a &[u8]with a length divisible by 4, how would you "convert" it to a &[u32] and back (preferrably avoiding unsafe code as much as possible and retaining as much speed as possible).
Just to be complete, the case where I would want to do that is an application which reads u8s from a file and then manipulates those.

Reinterpret casting a pointer is defined between pointers to objects of alignment-compatible types, and it may be valid in some implementations, but it's non-portable. For one thing, the result depends on the endianness (byte order) of your data, so you may lose performance anyway through byte-swapping.
First rewrite your C as follows, verify that it does what you expect, and then translate it to Rust.
// If the bytes in the file are little endian (10 32 means 0x3210), do this:
uint32_t value = someblockofdata[0] | (someblockofdata[1] << 8)
| (someblockofdata[2] << 16) | (someblockofdata[3] << 24);
// If the bytes in the file are big endian (32 10 means 0x3210), do this:
uint32_t value = someblockofdata[3] | (someblockofdata[2] << 8)
| (someblockofdata[1] << 16) | (someblockofdata[0] << 24);
// Middle endian is left as an exercise for the reader.

Related

Change MSB and leave remaining bits unchanged

The question is as follows:
Write a single line of C-code that sets the four MSB in DDRD to 1011 and leave the rest unchanged.
The best I can do is:
DDRD = (DDRD & 0xF) | (1<<7) | (1<<5) | (1<<4);
or just
DDRD = (DDRD & 0xF) | (0b1011 << 4);
It gets the job done, but it's certainly not the cleanest. Is there a better solution I'm not seeing?
The most readable and conventional form ought to be something like:
#define DDRD7 (1u << 7)
#define DDRD6 (1u << 6)
...
DDRD = (DDRD & 0x0Fu) | DDRD7 | DDRD5 | DDRD4;
Alternatively the bit masks could also be named something application-specific like LED2 or whatever. Naming signals the same way in the firmware as in the schematic is very good practice.
"Magic numbers" should be avoided, except 0xF which is used for bit-masking purposes so it's fine to use and self-documenting. DDRD & (DDRD0|DDRD1|DDRD2|DDRD3) would be less readable.
1<< should always be avoided since left-shifting a signed integer (1 has type int) is pretty much always a bug. Use 1u<<.
Binary constants should be avoided since they are not (yet) standard and may not be supported. Plus they are very hard to read when numbers get large. Serious beginner programmers are expected to understand hex before writing their first line of code, so why more experienced programmers ever need to use binary, I don't know.
Regarding 0xFu vs 0x0Fu, they are identical, but the latter is self-documenting code for "I am aware that I'm dealing with an 8 bit register".

Structure for an array of bits in C

It has come to my attention that there is no builtin structure for a single bit in C. There is (unsigned) char and int, which are 8 bits (one byte), and long which is 64+ bits, and so on (uint64_t, bool...)
I came across this while coding up a huffman tree, and the encodings for certain characters were not necessarily exactly 8 bits long (like 00101), so there was no efficient way to store the encodings. I had to find makeshift solutions such as strings or boolean arrays, but this takes far more memory.
But anyways, my question is more general: is there a good way to store an array of bits, or some sort of user-defined struct? I scoured the web for one but the smallest structure seems to be 8 bits (one byte). I tried things such as int a : 1 but it didn't work. I read about bit fields but they do not simply achieve exactly what I want to do. I know questions have already been asked about this in C++ and if there is a struct for a single bit, but mostly I want to know specifically what would be the most memory-efficient way to store an encoding such as 00101 in C.
If you're mainly interested in accessing a single bit at a time, you can take an array of unsigned char and treat it as a bit array. For example:
unsigned char array[125];
Assuming 8 bits per byte, this can be treated as an array of 1000 bits. The first 16 logically look like this:
---------------------------------------------------------------------------------
byte | 0 | 1 |
---------------------------------------------------------------------------------
bit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---------------------------------------------------------------------------------
Let's say you want to work with bit b. You can then do the following:
Read bit b:
value = (array[b/8] & (1 << (b%8)) != 0;
Set bit b:
array[b/8] |= (1 << (b%8));
Clear bit b:
array[b/8] &= ~(1 << (b%8));
Dividing the bit number by 8 gets you the relevant byte. Similarly, mod'ing the bit number by 8 gives you the relevant bit inside of that byte. You then left shift the value 1 by the bit number to give you the necessary bit mask.
While there is integer division and modulus at work here, the dividend is a power of 2 so any decent compiler should replace them with bit shifting/masking.
It has come to my attention that there is no builtin structure for a single bit in C.
That is true, and it makes sense because substantially no machines have bit-addressible memory.
But anyways, my question is more general: is there a good way to store
an array of bits, or some sort of user-defined struct?
One generally uses an unsigned char or another unsigned integer type, or an array of such. Along with that you need some masking and shifting to set or read the values of individual bits.
I scoured the
web for one but the smallest structure seems to be 8 bits (one byte).
Technically, the smallest addressible storage unit ([[un]signed] char) could be larger than 8 bits, though you're unlikely ever to see that.
I tried things such as int a : 1 but it didn't work. I read about bit
fields but they do not simply achieve exactly what I want to do.
Bit fields can appear only as structure members. A structure object containing such a bitfield will still have a size that is a multiple of the size of a char, so that doesn't map very well onto a bit array or any part of one.
I
know questions have already been asked about this in C++ and if there
is a struct for a single bit, but mostly I want to know specifically
what would be the most memory-efficient way to store an encoding such
as 00101 in C.
If you need a bit pattern and a separate bit count -- such as if some of the bits available in the bit-storage object are not actually significant -- then you need a separate datum for the significant-bit count. If you want a data structure for a small but variable number of bits, then you might go with something along these lines:
struct bit_array_small {
unsigned char bits;
unsigned char num_bits;
};
Of course, you can make that larger by choosing a different data type for the bits member and, maybe, the num_bits member. I'm sure you can see how you might extend the concept to handling arbitrary-length bit arrays if you should happen to need that.
If you really want the most memory efficiency, you can encode the Huffman tree itself as a stream of bits. See, for example:
https://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html
Then just encode those bits as an array of bytes, with a possible waste of 7 bits.
But that would be a horrible idea. For the structure in memory to be useful, it must be easy to access. You can still do that very efficiently. Let's say you want to encode up to 12-bit codes. Use a 16-bit integer and bitfields:
struct huffcode {
uint16_t length: 4,
value: 12;
}
C will store this as a single 16-bit value, and allow you to access the length and value fields separately. The complete Huffman node would also contain the input code value, and tree pointers (which, if you want further compactness, can be integer indices into an array).
You can make you own bit array in no time.
#define ba_set(ptr, bit) { (ptr)[(bit) >> 3] |= (char)(1 << ((bit) & 7)); }
#define ba_clear(ptr, bit) { (ptr)[(bit) >> 3] &= (char)(~(1 << ((bit) & 7))); }
#define ba_get(ptr, bit) ( ((ptr)[(bit) >> 3] & (char)(1 << ((bit) & 7)) ? 1 : 0 )
#define ba_setbit(ptr, bit, value) { if (value) { ba_set((ptr), (bit)) } else { ba_clear((ptr), (bit)); } }
#define BITARRAY_BITS (120)
int main()
{
char mybits[(BITARRAY_BITS + 7) / 8];
memset(mybits, 0, sizeof(mybits));
ba_setbit(mybits, 33, 1);
if (!ba_get(33))
return 1;
return 0;
};

c Code that reads a 4 byte little endian number from a buffer

I encountered this piece of C code that's existing. I am struggling to understand it.
I supposidly reads a 4 byte unsigned value passed in a buffer (in little endian format) into a variable of type "long".
This code runs on a 64 bit word size, little endian x86 machine - where sizeof(long) is 8 bytes.
My guess is that this code is intended to also run on a 32 bit x86 machine - so a variable of type long is used instead of int for sake of storing value from a four byte input data.
I am having some doubts and have put comments in the code to express what I understand, or what I don't :-)
Please answer questions below in that context
void read_Value_From_Four_Byte_Buff( char*input)
{
/* use long so on 32 bit machine, can still accommodate 4 bytes */
long intValueOfInput;
/* Bitwise and of input buffer's byte 0 with 0xFF gives MSB or LSB ?*/
/* This code seems to assume that assignment will store in rightmost byte - is that true on a x86 machine ?*/
intValueOfInput = 0xFF & input[0];
/*left shift byte-1 eight times, bitwise "or" places in 2nd byte frm right*/
intValueOfInput |= ((0xFF & input[1]) << 8);
/* similar left shift in mult. of 8 and bitwise "or" for next two bytes */
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
}
My questions
1) The input buffer is expected to be in "Little endian". But from code looks like assumption here is that it read in as Byte 0 = MSB, Byte 1, Byte 2, Byte 3= LSB. I thought so because code reads bytes starting from Byte 0, and subsequent bytes ( 1 onwards) are placed in the target variable after left shifting. Is that how it is or am I getting it wrong ?
2) I feel this is a convoluted way of doing things - is there a simpler alternative to copy value from 4 byte buffer into a long variable ?
3) Will the assumption "that this code will run on a 64 bit machine" will have any bearing on how easily I can do this alternatively? I mean is all this trouble to keep it agnostic to word size ( I assume its agnostic to word size now - not sure though) ?
Thanks for your enlightenment :-)
You have it backwards. When you left shift, you're putting into more significant bits. So (0xFF & input[3]) << 24) puts Byte 3 into the MSB.
This is the way to do it in standard C. POSIX has the function ntohl() that converts from network byte order to a native 32-bit integer, so this is usually used in Unix/Linux applications.
This will not work exactly the same on a 64-bit machine, unless you use unsigned long instead of long. As currently written, the highest bit of input[3] will be put into the sign bit of the result (assuming a twos-complement machine), so you can get negative results. If long is 64 bits, all the results will be positive.
The code you are using does indeed treat the input buffer as little endian. Look how it takes the first byte of the buffer and just assigns it to the variable without any shifting. If the first byte increases by 1, the value of your result increases by 1, so it is the least-significant byte (LSB). Left-shifting makes a byte more significant, not less. Left-shifting by 8 is generally the same as multiplying by 256.
I don't think you can get much simpler than this unless you use an external function, or make assumptions about the machine this code is running on, or invoke undefined behavior. In most instances, it would work to just write uint32_t x = *(uint32_t *)input; but this assumes your machine is little endian and I think it might be undefined behavior according to the C standard.
No, running on a 64-bit machine is not a problem. I recommend using types like uint32_t and int32_t to make it easier to reason about whether your code will work on different architectures. You just need to include the stdint.h header from C99 to use those types.
The right-hand side of the last line of this function might exhibit undefined behavior depending on the data in the input:
((0xFF & input[3]) << 24)
The problem is that (0xFF & input[3]) will be a signed int (because of integer promotion). The int will probably be 32-bit, and you are shifting it so far to the left that the resulting value might not be representable in an int. The C standard says this is undefined behavior, and you should really try to avoid that because it gives the compiler a license to do whatever it wants and you won't be able to predict the result.
A solution is to convert it from an int to a uint32_t before shifting it, using a cast.
Finally, the variable intValueOfInput is written to but never used. Shouldn't you return it or store it somewhere?
Taking all this into account, I would rewrite the function like this:
uint32_t read_value_from_four_byte_buff(char * input)
{
uint32_t x;
x = 0xFF & input[0];
x |= (0xFF & input[1]) << 8;
x |= (0xFF & input[2]) << 16;
x |= (uint32_t)(0xFF & input[3]) << 24;
return x;
}
From the code, Byte 0 is LSB, Byte 3 is MSB. But there are some typos. The lines should be
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
You can make the code shorter by dropping 0xFF but using the type "unsigned char" in the argument type.
To make the code shorter, you can do:
long intValueOfInput = 0;
for (int i = 0, shift = 0; i < 4; i++, shift += 8)
intValueOfInput |= ((unsigned char)input[i]) << shift;

How to handle multibyte numbers?

I'm trying to read binary data from a file. At the bytes 10-13 is a litte-endian binary-encoded number and I'm trying to parse it using only the information that the offset is 10 and the "size" is 4.
I've figured out I will have to do some binary shifting operations, but I'm not sure which byte goes where and how "far" and where it should be shifted.
If you know for certain the data is little endian, you can do something like:
int32 value = data[10] | (data[11] << 8) | (data[12] << 16) | (data[13] << 24);
This gives you a portable solution in case your code will run on both endian machines.

Convert two 8-bit uint to one 12-bit uint

I'm reading two registers from microcontroller. One have 4-bit MSB (First 4-bits has some other things) and another 8-bit LSB. I want to convert it into one 12-bit uint (16 bit to be precise). So far I made it like that:
UINT16 x;
UINT8 RegValue = 0;
UINT8 RegValue1 = 0;
ReadRegister(Register01, &RegValue1);
ReadRegister(Register02, &RegValue2);
x = RegValue1 & 0x000F;
x = x << 8;
x = x | RegValue2 & 0x00FF;
is there any better way to do that?
/* To be more precise ReadRegister is I2C communication to another ADC. Register01 and Register02 are different addresses. RegValue1 is 8 bit but only 4 LSB are needed and concatenate to RegValue (4-LSB of RegValue1 and all 8-bits of RegValue). */
If you know the endianness of your machine, you can read the bytes
directly into x like this:
ReadRegister(Register01, (UINT8*)&x + 1);
ReadRegister(Register02, (UINT8*)&x);
x &= 0xfff;
Note that this is not portable and the performance gain (if any) will
likely be small.
The RegValue & 0x00FF mask is unnecessary since RegValue is already 8 bit.
Breaking it down into three statements may be good for clarity, but this expression is probably simple enough to implement in one statement:
x = ((RegValue1 & 0x0Fu) << 8u) | RegValue ;
The use of an unsigned literal (0x0Fu) makes little difference but emphasises that we are dealing with unsigned 8-bit data. It is in fact an unsigned int even with only two digits, but again this emphasises to the reader perhaps that we are only dealing with 8 bits, and is purely stylistic rather than semantic. In C there is no 8-bit literal constant type (though in C++ '\x0f' has type char). You can force better type agreement as follows:
#define LS4BITMASK ((UINT8)0x0fu)
x = ((RegValue1 & LS4BITMASK) << 8u) | RegValue ;
The macro merely avoids repetition and clutter in the expression.
None of the above is necessarily "better" than your original code in terms of performance or actual generated code, and is largely a matter of preference or local coding standards or practices.
If the registers are adjacent to each other, they will most likley also be in the correct order with respect to target endianness. That being the case they can be read as a single 16 bit register and masked accordingly, assuming that Register01 is the lower address value:
ReadRegister16(Register01, &x ) ;
x &= 0x0fffu ;
Of course I have invented here the ReadRegister16() function, but if the registers are memory mapped, and Register01 is simply an address then this may simply be:
UINT16 x = *Register01 ;
x &= 0x0fffu ;

Resources