Short to Char Casting

Short to Char Casting - c

I have seen this done both ways. Is there any advantages or disadvantages to doing it either way?
short x = 0x9D6C;
char cx[2];
First way:
cx[0] = x &0xff;
cx[1] = (x >> 8) & 0xff;
vs.
Second way:
memcpy(cx, (char*)&x, 2);
Any thoughts?

Is there any advantages or disadvantages to doing it either way?
Yes as the functionality can differ, code should do what is functionally needed.
cx[0] = x &0xff; cx[1] = (x >> 8) & 0xff; move the least significant value byte into cx[0] and the next most significant byte into cx[1].
memcpy(cx, (char*)&x, 2); move the lowest addressed byte of x into cx[0] and the next addressed byte of x into cx[1].
These two approaches are the same functionality when with a certain endian and common short size.
What is best depends on the larger code and not this narrow snippet.
In terms of performance, this falls under micro optimization. A good compiler can analyze memcpy() and emit efficient code without a function call. A programmer's time is better spent dealing with higher levels of code to improve performance.
There is not need for the cast in memcpy(cx, (char*)&x, 2);
Be aware that short is not always laid out in a certain endian. Uncommonly short is not 2 char.
E1 >> 8 leads to "If E1 has a signed type and a negative value, the resulting value is implementation-defined".
Better code would use unsigned types and avoid subtle issues.
Note there is no short to char casting here as in the title.

Related

AVR uint8_t doesn't get correct value

I have a uint8_t that should contain the result of a bitwise calculation. The debugger says the variable is set correctly, but when i check the memory, the var is always at 0. The code proceeds like the var is 0, no matter what the debugger tells me. Here's the code:
temp = (path_table & (1 << current_bit)) >> current_bit;
//temp is always 0, debugger shows correct value
if (temp > 0) {
DS18B20_send_bit(pin, 0x01);
} else {
DS18B20_send_bit(pin, 0x00);
}
Temp's a uint8_t, path_table's a uint64_t and current_bit's a uint8_t. I've tried to make them all uint64_t but nothing changed. I've also tried using unsigned long long int instead. Nothing again.
The code always enters the else clause.
Chip's Atmega4809, and uses uint64_t in other parts of the code with no issues.
Note - If anyone knows a more efficient/compact way to extract a single bit from a variable i would really appreciate if you could share ^^

1 is an integer constant, of type int. The expression 1 << current_bit also has type int, but for 16-bit int, the result of that expression is undefined when current_bit is larger than 14. The behavior being undefined in your case, then, it is plausible that your debugger presents results for the overall expression that seem inconsistent with the observed behavior. If you used an unsigned int constant instead, i.e. 1u, then the resulting value of temp would be well defined as 0 whenever current_bit was greater than 15, because the result of the left shift would be zero.
Solve this problem by performing the computation in a type wide enough to hold the result. Here's a compact, correct, and pretty clear way to correct your code to do that:
DS18B20_send_bit(pin, (path_table & (((uint64_t) 1) << current_bit)) != 0);
Or if path_table has an unsigned type then I prefer this, though it's more of a departure from your original:
DS18B20_send_bit(pin, (path_table >> current_bit) & 1);

Realization #1 here is that AVR is 1980-1990s technology core. It is not a x64 PC that chews 64 bit numbers for breakfast, but an extremely inefficient 8-bit MCU. As such:
It likes 8 bit arithmetic.
It will struggle with 16 bit arithmetic, by doing tricks with 16 bit index registers, double accumulators or whatever 8 bit core tricks it prefers to do.
It will literally take ages to execute 32 bit arithmetic, by invoking software libraries inline.
It will probably melt through the floor if attempting 64 bit arithmetic.
Before you do anything else, you need to get rid of all 64 bit arithmetic and radically minimize the use of 32 bit arithmetic. Period. There should be no single variable of uint64_t in your code or you are doing it very very wrong.
With this revelation also comes that all 8 bit MCUs always have an int type which is 16 bits.
In the code 1<<current_bit, the integer constant 1 is of type int. Meaning that if current_bit is 15 or larger, you will shift bits into the sign bit of this temporary int. This is always a bug. Strictly speaking this is undefined behavior. In practice, you might end up with random change of sign of your numbers.
To avoid this, never use any form of bitwise operators on signed numbers. When mixing integer constants such as 1 with bitwise operators, change them to 1u to avoid bugs like the one mentioned.
If anyone knows a more efficient/compact way to extract a single bit from a variable i would really appreciate if you could share
The most efficient way in C is: uint8_t variable; ... if(variable & (1u << bits)). This should translate to the relevant "branch if bit set" instruction.
My general advise would be find your tool chain's disassembler and see what machine code that the C code actually generated. You don't have to be an assembler guru to read it, peeking at the instruction set should be enough.

Efficient tiny boolean matrix multiplication

I have some unsigned 16 bit integer s which I'd like to map to an unsigned 32 bit integer r in such a way that each flipped bit in s flips at most one (given) bit in r -- simply a mapping between 0..16 and 0..32 that is. So we can see this as a matrix equation
Ps = r
where P is a 32 x 16 boolean matrix, s is a 16 x 1 boolean vector and r is 32 x 1 boolean vector. I have a gut feeling there exists some super simple hack that I'm missing. Important note: the target machine is a 16 bit mcu!
Here's the best I can do:
static u16 P[32] = someArrayOrWhatever();
u32 FsiPermutationHack(u16 s) {
u32 r;
for (u16 i = 0; i < 32; i++)
{
r |= ((u32)((P[i] & s) > 0) << i);
}
return r;
}
The rationale is this: the i:th bit of r is 1 if and only if (P[i] & s) != 0x0000. I am too stupid to disassemble stuff, but I am guessing this would be like ~100 instructions IF we didn't have to do that stupid u32 cast. But then again, perhaps the compiler auto-splits the loop in two for us in which case it's looking pretty good for us.
Apologies for the tangent, just thought I'd share my attempted solution -- do you have a better one?

Inasmuch as you say,
I am guessing this would be like ~100 instructions IF we didn't have
to do that stupid u32 cast. But then again, perhaps the compiler
auto-splits the loop in two for us in which case it's looking pretty
good for us.
and
I have a gut feeling there exists some super simple hack that I'm missing
, I will interpret you to be asking how to minimize the use of 32-bit arithmetic in this code intended for a 16-bit processor.
You really ought to learn how to disassemble and check the compiled result to see whether the compiler does automatically split the loop as you hypothesize, but supposing that it does not, I don't see why you couldn't do the same manually:
static u16 P[32]; /* value assigned elsewhere */
u32 FsiPermutationHack(u16 s) {
u16 *P_hi = P + 16;
u16 r_lo = 0;
u16 r_hi = 0;
for (u16 i = 0; i < 16; i++) {
r_lo |= (P[i] & s) != 0) << i;
r_hi |= (P_hi[i] & s) != 0) << i;
}
return ((u32) r_hi << 16) + r_lo;
}
That supposes u16 and u32 to be unsigned 16-bit and 32-bit (respectively) integers with no padding bits.
Note also that the idea that performing arithmetic with type u16 instead of u32 should be an improvement assumes that type u32 has a higher integer promotion rank than unsigned int. Roughly speaking, that comes down to the implementation's unsigned int being a 16-bit type. That's entirely plausible for an implementation for a 16-bit processor. On a system whose int and unsigned int are instead 32-bit types, however, all narrower integer arithmetic arguments would be promoted to 32 bits anyway.
Update:
As far as the possibility of a better alternative algorithm, I observe that each bit of the result is computed from a different element of array P, that the whole value of each element is used, and that the element size is the same as the target machine's native word size. There seems then no scope for performing fewer 16-bit bitwise AND operations than there are array elements (but see below).
If we accept that each array element must be processed separately, then the provided implementation does a pretty good job of approaching it efficiently:
It performs only 16-bit computations until the time comes to assemble the final result;
It computes both the upper and lower halves of the result in the same loop, thus incurring only 16 iterations' worth of loop overhead instead of 32
It largely removes the extra indexing arithmetic that that would otherwise have required by creating P_hi for accessing the upper half of the array
It would be possible to manually unroll the loop to possibly save a few more cycles, but that's the kind of optimization that you absolutely should rely on your compiler to perform for you.
As far as "bit twiddling hacks", the only scope I see for anything of that nature would be processing adjacent pairs of 16-bit array elements as 32-bit unsigned integers. That would allow performing one 32-bit bitwise AND in place of each two 16-bit ANDs. That would be coupled with two 32-bit comparisons (vs. two 16-bit comparisons in the above code). The 16-bit shift and bitwise OR operations of the above approach could be retained. Aside from that having formally undefined behavior as a result of violating the strict aliasing rule, that would involve 32-bit arithmetic, which presumably is about half as fast as 16-bit arithmetic on your 16-bit machine. Performance is better measured than predicted, but I don't see any reason to expect a significant win from that approach.

In C, How do I calculate the signed difference between two 48-bit unsigned integers?

I've got two values from an unsigned 48bit nanosecond counter, which may wrap.
I need the difference, in nanoseconds, of the two times.
I think I can assume that the readings were taken at roughly the same time, so of the two possible answers I think I'm safe taking the smallest.
They're both stored as uint64_t. Because I don't think I can have 48 bit types.
I'd like to calculate the difference between them, as a signed integer (presumably int64_t), accounting for the wrapping.
so e.g. if I start out with
x=5
y=3
then the result of x-y is 2, and will stay so if I increment both x and y, even as they wrap over the top of the max value 0xffffffffffff
Similarly if x=3, y=5, then x-y is -2, and will stay so whenever x and y are incremented simultaneously.
If I could declare x,y as uint48_t, and the difference as int48_t, then I think
int48_t diff = x - y;
would just work.
How do I simulate this behaviour with the 64-bit arithmetic I've got available?
(I think any computer this is likely to run on will use 2's complement arithmetic)
P.S. I can probably hack this out, but I wonder if there's a nice neat standard way to do this sort of thing, which the next person to read my code will be able to understand.
P.P.S Also, this code is going to end up in the tightest of tight loops, so something that will compile efficiently would be nice, so that if there has to be a choice, speed trumps readability.

You can simulate a 48-bit unsigned integer type by just masking off the top 16 bits of a uint64_t after any arithmetic operation. So, for example, to take the difference between those two times, you could do:
uint64_t diff = (after - before) & 0xffffffffffff;
You will get the right value even if the counter wrapped around during the procedure. If the counter didn't wrap around, the masking is not needed but not harmful either.
Now if you want this difference to be recognized as a signed integer by your compiler, you have to sign extend the 48th bit. That means that if the 48th bit is set, the number is negative, and you want to set the 49th through the 64th bit of your 64-bit integer. I think a simple way to do that is:
int64_t diff_signed = (int64_t)(diff << 16) >> 16;
Warning: You should probably test this to make sure it works, and also beware there is implementation-defined behavior when I cast the uint64_t to an int64_t, and I think there is implementation-defined behavior when I shift a signed negative number to the right. I'm sure a C language lawyer could some up with something more robust.
Update: The OP points out that if you combine the operation of taking the difference and doing the sign extension, there is no need for masking. That would look like this:
int64_t diff = (int64_t)(x - y) << 16 >> 16;

struct Nanosecond48{
unsigned long long u48 : 48;
// int res : 12; // just for clarity, don't need this one really
};
Here we just use the explicit width of the field to be 48 bits and with that (admittedly somewhat awkward) type you live it up to your compiler to properly handle different architectures/platforms/whatnot.
Like the following:
Nanosecond48 u1, u2, overflow;
overflow.u48 = -1L;
u1.u48 = 3;
u2.u48 = 5;
const auto diff = (u2.u48 + (overflow.u48 + 1) - u1.u48) & 0x0000FFFFFFFFFFFF;
Of course in the last statement you can just do the remainder operation with % (overflow.u48 + 1) if you prefer.

Do you know which was the earlier reading and which was later? If so:
diff = (earlier <= later) ? later - earlier : WRAPVAL - earlier + later;
where WRAPVAL is (1 << 48) is pretty easy to read.

How to copy MSB to rest of the byte?

In an interrupt subroutine (called every 5 µs), I need to check the MSB of a byte and copy it to the rest of the byte.
I need to do something like:
if(MSB == 1){byte = 0b11111111}
else{byte = 0b00000000}
I need it to make it fast as it is on an interrupt subroutine, and there is some more code on it, so efficiency is calling.
Therefore, I don't want to use any if, switch, select, nor >> operands as I have the felling that it would slow down the process. If i'm wrong, then I'll go the "easy" way.
What I've tried:
byte = byte & 0b100000000
This gives me 0b10000000 or 0b00000000.
But I need the first to be 0b11111111.
I think I'm missing an OR somewhere (plus other gates). I don't know, my guts is telling me that this should be easy, but it isn't for me at this moment.

The trick is to use a signed type, such as int8_t, for your byte variable, and take advantage of sign extension feature of the shift-right operation:
byte = byte >> 7;
Demo.
Shifting right is very fast - a single instruction on most modern (and even not so modern) CPUs.
The reason this works is that >> on signed operands inserts the sign bit on the left to preserve the sign of its operand. This is called sign extension.
Note: Technically, this behavior is implementation-defined, and therefore is not universally portable. Thanks, Eugene Sh., for a comment and a reference.

EDIT: My answer has confused people because I did not specify unsigned bytes. Here my assumption is that B is of type unsigned char. As one comment notes below, I can omit the &1. This is not as fast as the signed byte solution that the other poster put up, but this code should be portable (once it is understood that B is unsigned type).
B = -((B >> 7)&1)
Negative numbers our are friends. Shifting bits should be fast by the way.

The MSB is actually the sign bit of the number. It is 1 for a negative number and 0 for a positive number.
so the simplest way to do that is
if(byte < 0)
{
byte = -1;
}
else
{
byte = 0;
}
because -1 = 11111111 in binary.
If it is the case for an unsigned integer, then just simply type cast it into a signed value and then compare it again as mentioned above.

Does C hold the carry about bit from a << or a >> bit shift?

I read that C holds the carry-out from shifts and it can be found in processor-specific .h.
Is this true and should I use it? or should work out the carry-out bit myself ?

There is no standard way to access the carry bit(s) of primitive operations in C.
You will either need to perform the shift in a larger data type:
uint16_t foo = ...;
uint32_t tmp = (uint32_t)foo << shift;
uint16_t result = (uint16_t)tmp;
uint16_t carry = (uint16_t)(tmp >> 16);
or by performing the opposite shift:
uint16_t result = foo << shift;
uint16_t carry = foo >> (16 - shift);
Note that this second method invokes undefined behaviour if shift == 0, so you'd need to handle that case separately.

Standard C does not provide access to the carry-out from shifts.
Some, not all, C implementations have processor-specific.h files or other extensions that do allow access.
Your should avoid using functionality from processor-specific.h files or extensions where practical. But if obliged to use, consider also writing a C Standard solution, at least as part of the documentation. see Recommended #Oli Charlesworth solution.
In general, to create effective portable code, one may need to view the problem at a higher level and not use carry outs. On the other hand, if this is for a narrow range of machines, go with what works for you (or those that pay your wage).
Various weaknesses were pointed out in my earlier posted examples. I see these now as implementation dependent. There are deleted.