ARM Multiply Negates MNEG - arm

Signed - SMNEGL,
Unsigned - UMNEGL,
?? - MNEG
Ok so what is the third multiply-negate (MNEG)? There's a few operations like this one. The descriptions are the exact same. Is it just the same as unsigned or signed? I noticed MNEG has a 32-bit version while UMNEGL doesn't.

Let's break this down a bit.
First of all, multiplication by itself. It comes in four flavors:
32 x 32 = 32 (i.e. multiply two 32-bit inputs and get a 32-bit output, with high bits truncated): e.g. mul w0, w1, w2. There is no signed/unsigned version, because non-widening multiplication is bitwise the same operation for either signed or unsigned operands - a fact from modular arithmetic.
64 x 64 = 64, e.g. mul x0, x1, x2. Same deal, no signed/unsigned distinction.
32 x 32 = 64 "long multiply", e.g. smull x0, w1, w2. Here it does make a difference whether the operation is to work for signed or unsigned values, so we have separate smull and umull instructions. For instance, 0xffffffff * 0xffffffff should be 0x1 if treated as signed, or 0xfffffffe00000001 if treated as unsigned. (But as noted above, the low 32 bits are the same for both.)
64 * 64 = 128. This uses mul / smulh or mul / umulh depending on whether it should be signed or unsigned. It's not directly relevant to your question.
Now, mul and smull/umull are also available as multiply-add and multiply-subtract operations, where for instance madd w0, w1, w2, w3 computes w0 = (w1 * w2) + w3. Likewise smaddl x0, w1, w2, x3 computes x0 = (w1 * w2) + x3, with the multiplication done as signed 32x32=64, and umaddl x0, w1, w2, x3 similar for unsigned. Indeed, mul w0, w1, w2 is just an alias of madd w0, w1, w2, wzr, and ditto for smull and umull.
As for the multiply-subtract versions, for instance msub w0, w1, w2, w3 computes w0 = w3 - (w1 * w2). If the fourth operand is the zero register, then msub w0, w1, w2, wzr would compute w0 = -(w1*w2). For convenience, this instruction is aliased as mneg w0, w1, w2. As above, it makes no difference whether you think of it as signed or unsigned, because it's the same binary operation either way.
The long versions likewise have a multiply-subtract, which becomes multiply-negate with the zero register. smnegl x0, w1, w2 computes x0 = -(w1 * w2), signed widening, alias of smsubl x0, w1, w2, xzr. Likewise umnegl x0, w1, w2 computes x0 = -(w1 * w2), unsigned widening. You may think negation looks odd on an unsigned value, but it's all mod 2^64 so you can also think of it as x0 = 2^64 - (w1 * w2).
There may not be much particular value in the multiply-negate instructions, but they come for free in the hardware as special cases of multiply-subtract when you use the zero register, so the ARM folks figured they might as well provide an alias in the assembler, just in case somebody found it useful.

Related

How can I quickly get the value 2^64 divided by random integer in C lang? [duplicate]

How to compute the integer division, 264/n? Assuming:
unsigned long is 64-bit
We use a 64-bit CPU
1 < n < 264
If we do 18446744073709551616ul / n, we get warning: integer constant is too large for its type at compile time. This is because we cannot express 264 in a 64-bit CPU. Another way is the following:
#define IS_POWER_OF_TWO(x) ((x & (x - 1)) == 0)
unsigned long q = 18446744073709551615ul / n;
if (IS_POWER_OF_TWO(n))
return q + 1;
else
return q;
Is there any faster (CPU cycle) or cleaner (coding) implementation?
I'll use uint64_t here (which needs the <stdint.h> include) so as not to require your assumption about the size of unsigned long.
phuclv's idea of using -n is clever, but can be made much simpler. As unsigned 64-bit integers, we have -n = 264-n, then (-n)/n = 264/n - 1, and we can simply add back the 1.
uint64_t divide_two_to_the_64(uint64_t n) {
return (-n)/n + 1;
}
The generated code is just what you would expect (gcc 8.3 on x86-64 via godbolt):
mov rax, rdi
xor edx, edx
neg rax
div rdi
add rax, 1
ret
I've come up with another solution which was inspired by this question. From there we know that
(a1 + a2 + a3 + ... + an)/n =
(a1/n + a2/n + a3/n + ... + an/n) + (a1 % n + a2 % n + a3 % n + ... + an % n)/n
By choosing a1 = a2 = a3 = ... = an-1 = 1 and an = 264 - n we'll have
(a1 + a2 + a3 + ... + an)/n = (1 + 1 + 1 + ... + (264 - n))/n = 264/n
= [(n - 1)*1/n + (264 - n)/n] + [(n - 1)*0 + (264 - n) % n]/n
= (264 - n)/n + ((264 - n) % n)/n
264 - n is the 2's complement of n, which is -n, or we can also write it as ~0 - n + 1. So the final solution would be
uint64_t twoPow64div(uint64_t n)
{
return (-n)/n + (n + (-n) % n)/n + (n > 1ULL << 63);
}
The last part is to correct the result, because we deal with unsigned integers instead of signed ones like in the other question. Checked both 32 and 64-bit versions on my PC and the result matches with your solution
On MSVC however there's an intrinsic for 128-bit division, so you can use like this
uint64_t remainder;
return _udiv128(1, 0, n, &remainder);
which results in the cleanest output
mov edx, 1
xor eax, eax
div rcx
ret 0
Here's the demo
On most x86 compilers (one notable exception is MSVC) long double also has 64 bits of precision, so you can use either of these
(uint64_t)(powl(2, 64)/n)
(uint64_t)(((long double)~0ULL)/n)
(uint64_t)(18446744073709551616.0L/n)
although probably the performance would be worse. This can also be applied to any implementations where long double has more than 63 bits of significand, like PowerPC with its double-double implementation
There's a related question about calculating ((UINT_MAX + 1)/x)*x - 1: Integer arithmetic: Add 1 to UINT_MAX and divide by n without overflow with also clever solutions. Based on that we have
264/n = (264 - n + n)/n = (264 - n)/n + 1 = (-n)/n + 1
which is essentially just another way to get Nate Eldredge's answer
Here's some demo for other compilers on godbolt
See also:
Trick to divide a constant (power of two) by an integer
Efficient computation of 2**64 / divisor via fast floating-point reciprocal
We use a 64-bit CPU
Which 64-bit CPU?
In general, if you multiply a number with N bits by another number that has M bits, the result will have up to N+M bits. For integer division it's similar - if a number with N bits is divided by a number with M bits the result will have N-M+1 bits.
Because multiplication is naturally "widening" (the result has more digits than either of the source numbers) and integer division is naturally "narrowing" (the result has less digits); some CPUs support "widening multiplication" and "narrowing division".
In other words, some 64-bit CPUs support dividing a 128-bit number by a 64-bit number to get a 64-bit result. For example, on 80x86 it's a single DIV instruction.
Unfortunately, C doesn't support "widening multiplication" or "narrowing division". It only supports "result is same size as source operands".
Ironically (for unsigned 64-bit divisors on 64-bit 80x86) there is no other choice and the compiler must use the DIV instruction that will divide a 128-bit number by a 64-bit number. This means that the C language forces you to use a 64-bit numerator, then the code generated by the compiler extends your 64 bit numerator to 128 bits and divides it by a 64 bit number to get a 64 bit result; and then you write extra code to work around the fact that the language prevented you from using a 128-bit numerator to begin with.
Hopefully you can see how this situation might be considered "less than ideal".
What I'd want is a way to trick the compiler into supporting "narrowing division". For example, maybe by abusing casts and hoping that the optimiser is smart enough, like this:
__uint128_t numerator = (__uint128_t)1 << 64;
if(n > 1) {
return (uint64_t)(numerator/n);
}
I tested this for the latest versions of GCC, CLANG and ICC (using https://godbolt.org/ ) and found that (for 64-bit 80x86) none of the compilers are smart enough to realise that a single DIV instruction is all that is needed (they all generated code that does a call __udivti3, which is an expensive function to get a 128 bit result). The compilers will only use DIV when the (128-bit) numerator is 64 bits (and it will be preceded by an XOR RDX,RDX to set the highest half of the 128-bit numerator to zeros).
In other words, it's likely that the only way to get ideal code (the DIV instruction by itself on 64-bit 80x86) is to resort to inline assembly.
For example, the best code you'll get without inline assembly (from Nate Eldredge's answer) will be:
mov rax, rdi
xor edx, edx
neg rax
div rdi
add rax, 1
ret
...and the best code that's possible is:
mov edx, 1
xor rax, rax
div rdi
ret
Your way is pretty good. It might be better to write it like this:
return 18446744073709551615ul / n + ((n&(n-1)) ? 0:1);
The hope is to make sure the compiler notices that it can do a conditional move instead of a branch.
Compile and disassemble.

How to compute 2⁶⁴/n in C?

How to compute the integer division, 264/n? Assuming:
unsigned long is 64-bit
We use a 64-bit CPU
1 < n < 264
If we do 18446744073709551616ul / n, we get warning: integer constant is too large for its type at compile time. This is because we cannot express 264 in a 64-bit CPU. Another way is the following:
#define IS_POWER_OF_TWO(x) ((x & (x - 1)) == 0)
unsigned long q = 18446744073709551615ul / n;
if (IS_POWER_OF_TWO(n))
return q + 1;
else
return q;
Is there any faster (CPU cycle) or cleaner (coding) implementation?
I'll use uint64_t here (which needs the <stdint.h> include) so as not to require your assumption about the size of unsigned long.
phuclv's idea of using -n is clever, but can be made much simpler. As unsigned 64-bit integers, we have -n = 264-n, then (-n)/n = 264/n - 1, and we can simply add back the 1.
uint64_t divide_two_to_the_64(uint64_t n) {
return (-n)/n + 1;
}
The generated code is just what you would expect (gcc 8.3 on x86-64 via godbolt):
mov rax, rdi
xor edx, edx
neg rax
div rdi
add rax, 1
ret
I've come up with another solution which was inspired by this question. From there we know that
(a1 + a2 + a3 + ... + an)/n =
(a1/n + a2/n + a3/n + ... + an/n) + (a1 % n + a2 % n + a3 % n + ... + an % n)/n
By choosing a1 = a2 = a3 = ... = an-1 = 1 and an = 264 - n we'll have
(a1 + a2 + a3 + ... + an)/n = (1 + 1 + 1 + ... + (264 - n))/n = 264/n
= [(n - 1)*1/n + (264 - n)/n] + [(n - 1)*0 + (264 - n) % n]/n
= (264 - n)/n + ((264 - n) % n)/n
264 - n is the 2's complement of n, which is -n, or we can also write it as ~0 - n + 1. So the final solution would be
uint64_t twoPow64div(uint64_t n)
{
return (-n)/n + (n + (-n) % n)/n + (n > 1ULL << 63);
}
The last part is to correct the result, because we deal with unsigned integers instead of signed ones like in the other question. Checked both 32 and 64-bit versions on my PC and the result matches with your solution
On MSVC however there's an intrinsic for 128-bit division, so you can use like this
uint64_t remainder;
return _udiv128(1, 0, n, &remainder);
which results in the cleanest output
mov edx, 1
xor eax, eax
div rcx
ret 0
Here's the demo
On most x86 compilers (one notable exception is MSVC) long double also has 64 bits of precision, so you can use either of these
(uint64_t)(powl(2, 64)/n)
(uint64_t)(((long double)~0ULL)/n)
(uint64_t)(18446744073709551616.0L/n)
although probably the performance would be worse. This can also be applied to any implementations where long double has more than 63 bits of significand, like PowerPC with its double-double implementation
There's a related question about calculating ((UINT_MAX + 1)/x)*x - 1: Integer arithmetic: Add 1 to UINT_MAX and divide by n without overflow with also clever solutions. Based on that we have
264/n = (264 - n + n)/n = (264 - n)/n + 1 = (-n)/n + 1
which is essentially just another way to get Nate Eldredge's answer
Here's some demo for other compilers on godbolt
See also:
Trick to divide a constant (power of two) by an integer
Efficient computation of 2**64 / divisor via fast floating-point reciprocal
We use a 64-bit CPU
Which 64-bit CPU?
In general, if you multiply a number with N bits by another number that has M bits, the result will have up to N+M bits. For integer division it's similar - if a number with N bits is divided by a number with M bits the result will have N-M+1 bits.
Because multiplication is naturally "widening" (the result has more digits than either of the source numbers) and integer division is naturally "narrowing" (the result has less digits); some CPUs support "widening multiplication" and "narrowing division".
In other words, some 64-bit CPUs support dividing a 128-bit number by a 64-bit number to get a 64-bit result. For example, on 80x86 it's a single DIV instruction.
Unfortunately, C doesn't support "widening multiplication" or "narrowing division". It only supports "result is same size as source operands".
Ironically (for unsigned 64-bit divisors on 64-bit 80x86) there is no other choice and the compiler must use the DIV instruction that will divide a 128-bit number by a 64-bit number. This means that the C language forces you to use a 64-bit numerator, then the code generated by the compiler extends your 64 bit numerator to 128 bits and divides it by a 64 bit number to get a 64 bit result; and then you write extra code to work around the fact that the language prevented you from using a 128-bit numerator to begin with.
Hopefully you can see how this situation might be considered "less than ideal".
What I'd want is a way to trick the compiler into supporting "narrowing division". For example, maybe by abusing casts and hoping that the optimiser is smart enough, like this:
__uint128_t numerator = (__uint128_t)1 << 64;
if(n > 1) {
return (uint64_t)(numerator/n);
}
I tested this for the latest versions of GCC, CLANG and ICC (using https://godbolt.org/ ) and found that (for 64-bit 80x86) none of the compilers are smart enough to realise that a single DIV instruction is all that is needed (they all generated code that does a call __udivti3, which is an expensive function to get a 128 bit result). The compilers will only use DIV when the (128-bit) numerator is 64 bits (and it will be preceded by an XOR RDX,RDX to set the highest half of the 128-bit numerator to zeros).
In other words, it's likely that the only way to get ideal code (the DIV instruction by itself on 64-bit 80x86) is to resort to inline assembly.
For example, the best code you'll get without inline assembly (from Nate Eldredge's answer) will be:
mov rax, rdi
xor edx, edx
neg rax
div rdi
add rax, 1
ret
...and the best code that's possible is:
mov edx, 1
xor rax, rax
div rdi
ret
Your way is pretty good. It might be better to write it like this:
return 18446744073709551615ul / n + ((n&(n-1)) ? 0:1);
The hope is to make sure the compiler notices that it can do a conditional move instead of a branch.
Compile and disassemble.

Dividing a number represented by two words by a number represented by one? [duplicate]

This question already has answers here:
how to calculate (a times b) divided by c only using 32-bit integer types even if a times b would not fit such a type
(7 answers)
Closed 6 years ago.
I have two numbers, X and Y.
Y is a single unsigned integer primitive, e.g. long unsigned int. (In this case, there is no larger primitive to upcast to before performing the operation.)
X is represented by two primitives: X0 is the same type as Y and represents the low bits of X, and X1 is the same type and represents the high bits of X.
X / Y will always be representable using the same type as Y, i.e. the operation can be assumed not to overflow. (Because X is incidentally the product of two values of the same type as Y, one of which is less than or equal to Y.)
What is an efficient way to determine the result of this division?
You haven't specified the platform, which is crucial for the answer.
X / Y will always be representable using the same type as Y, i.e. the operation can be assumed not to overflow. (Because X is incidentally the product of two values of the same type
as Y, one of which is less than or equal to Y.)
On the x86-64 architecture, you could take advantage of that fact, by dividing RDX:RAX pair, so it's actually the same as you would have one "glued" 128 bit register for the dividend. Beware, though, that if above invariant doesn't always hold, then you will get division exception from CPU.
That said, one implementation is to use inline assembly, e.g.:
/* divides x1:x0 pair by y, assumes that quotient <= UINT64_MAX */
uint64_t udiv128_64_unsafe(uint64_t x0, uint64_t x1, uint64_t y)
{
__asm__ (
"divq\t%3"
: "=a" (x0)
: "0" (x0), "d" (x1), "rm" (y)
);
return x0;
}
which GCC 6.3.0 translates nicely (at -O1):
udiv128_64_unsafe:
mov rcx, rdx ; place the y (divisor) in RCX
mov rax, rdi ; low part of the dividend (x0)
mov rdx, rsi ; high part of the divided (x1)
divq rcx ; RAX = RDX:RAX / RCX
ret ; RAX is return value
For instance, for X = 65454567423355465643444545, Y = 86439334393432232:
#include <stdio.h>
#include <inttypes.h>
uint64_t udiv128_64_unsafe(uint64_t x0, uint64_t x1, uint64_t y) { ... }
int main(void)
{
printf("%" PRIu64 "\n", udiv128_64_unsafe(0x35c0ecb3fea1c941ULL, 0x36248bULL,
86439334393432232ULL));
return 0;
}
the given test driver program yields:
757231275
gcc has __int128 and unsigned __int128 for x86 architectures. I have successfully use it in the past to perform this kind of operations you describe. I am sure all major compilers have equivalents.
The "divide a 2 digit number by 1 digit, giving 1 digit quotient and remainder" is the basic primitive you need to synthesize larger divisions. If you don't have it (with digit == unsigned long int) available in your hardware, you need to use smaller digits.
In your case, split Y into 2 half-sized integers and X into 4 half-sized integers, and do the division that way.

Pointer subtraction, 32-bit ARM, negative distance reported as postive

When performing subtraction of pointers and the first pointer is less than the second, I'm getting an underflow error with the ARM processor.
Example code:
#include <stdint.h>
#include <stdbool.h>
uint8_t * p_formatted_data_end;
uint8_t formatted_text_buffer[10240];
static _Bool
Flush_Buffer_No_Checksum(void)
{
_Bool system_failure_occurred = false;
p_formatted_data_end = 0; // For demonstration puposes.
const signed int length =
p_formatted_data_end - &formatted_text_buffer[0];
if (length < 0)
{
system_failure_occurred = true;
}
//...
return true;
}
The assembly code generated by the IAR compiler is:
807 static _Bool
808 Flush_Buffer_No_Checksum(void)
809 {
\ Flush_Buffer_No_Checksum:
\ 00000000 0xE92D4070 PUSH {R4-R6,LR}
\ 00000004 0xE24DD008 SUB SP,SP,#+8
810 _Bool system_failure_occurred = false;
\ 00000008 0xE3A04000 MOV R4,#+0
811 p_formatted_data_end = 0; // For demonstration purposes.
\ 0000000C 0xE3A00000 MOV R0,#+0
\ 00000010 0x........ LDR R1,??DataTable3_7
\ 00000014 0xE5810000 STR R0,[R1, #+0]
812 const signed int length =
813 p_formatted_data_end - &formatted_text_buffer[0];
\ 00000018 0x........ LDR R0,??DataTable3_7
\ 0000001C 0xE5900000 LDR R0,[R0, #+0]
\ 00000020 0x........ LDR R1,??DataTable7_7
\ 00000024 0xE0505001 SUBS R5,R0,R1
814 if (length < 0)
\ 00000028 0xE3550000 CMP R5,#+0
\ 0000002C 0x5A000009 BPL ??Flush_Buffer_No_Checksum_0
815 {
816 system_failure_occurred = true;
\ 00000030 0xE3A00001 MOV R0,#+1
\ 00000034 0xE1B04000 MOVS R4,R0
The subtraction instruction SUBS R5,R0,R1 is equivalent to:
R5 = R0 - R1
The N bit in the CPSR register will be set if the result is negative.
Ref: Section A4.1.106 SUB of ARM Architecture Reference Manual
Let:
R0 == 0x00000000
R1 == 0x802AC6A5
Register R5 will have the value 0x7FD5395C.
The N bit of the CPSR register is 0, indicating the result is not negative.
The Windows 7 Calculator application is reporting negative, but only when expressed as 64-bits: FFFFFFFF7FD5395C.
As an experiment, I used the ptrdiff_t type for the length, and the same assembly language was generated.
Questions:
Is this valid behavior, to have the result of pointer subtraction to
underflow?
What is the recommended data type to view the distance as negative?
Platform:
Target Processor: ARM Cortex A8 (TI AM3358)
Compiler: IAR 7.40
Development platform: Windows 7.
Is this valid behavior, to have the result of pointer subtraction to underflow?
Yes, because the behavior in your case is undefined. Any behavior is valid there. As was observed in comments, the difference between two pointers is defined only for pointers that point to elements of the same array object, or one past the last element of the array object (C2011, 6.5.6/9).
What is the recommended data type to view the distance as negative?
Where it is defined, the result of subtracting two pointers is specified to be of type ptrdiff_t, a signed integer type of implementation-defined size. If you evaluate p1 - p2, where p1 points to an array element and p2 points to a later element of the same array, then the result will be a negative number representable as a ptrdiff_t.
Although this is UB as stated in the other answer, most C implementations will simply subtract these pointers anyway ptrdiff_t size (or possibly using appropriate arithmetic for their word size which might also be different if both operands are near/far/huge pointers). The result should fit inside ptrdiff_t, which is usually a typedef-ed int on ARM:
typedef int ptrdiff_t;
So the issue with your code in this particular case will simply be that you are treating an unsigned int value as signed, and it doesn't fit. As specified in your question, the address of formatted_text_buffer is 0x802AC6A5, which fits inside unsigned int, but (int)0x802AC6A5 in two's complement form is actually a negative number (-0x7FD5395B). So subtracting a negative number from 0 will return a positive int as expected.
Signed 32-bit integer subtraction will work correctly if both operands are less than 0x7FFFFFFF apart, and it's reasonable to expect your arrays to be smaller than that:
// this will work
const int length = &formatted_text_buffer[0] - &formatted_text_buffer[100];
Or, if you really need to do subtract pointers which don't fit into signed 32-bit ints, use long long instead:
// ...but I doubt you really want this
const long long length = (long long)p_formatted_data_end -
(long long)&formatted_text_buffer[0];

Code to obtain quarter of a given value X is implemented as (x*64)/255 instead of (x*1)/4

I found an implementation wherein in order to obtain a quarter of a value X, following code was employed to run on ARM 32-bit processor.
//global typedef
typedef unsigned char uint8;
//Function definition
uint8 FindQuarter(void)
{
uint8 value, OneQuarter,ThreeQuarter;
value = 100;
OneQuarter = (value * 64) / 255;// why not "(value * 1)/4" or "value/4"
ThreeQuarter = (value *192) /255; //why not "(value * 3)/4"
return 1;
}
Why somebody wants to using 64/255 or 192/255 instead of 1/4 or 3/4 although both is going to give approximately same result.
Note : Accuracy of calculation is not prime important here.It is allowed to deviate a little.
What did the orignal author meant by 'Quarter'? When calculating some values a pattern emerges:
f(0) = 0x00 (0)
f(1) = 0xC0 (192)
f(2) = 0x80 (128)
f(3) = 0x40 (64)
f(4) = 0x00 (0)
f(5) = 0xC0 (192)
...
f(254) = 0x80 (128)
f(255) = 0x40 (64)
This could be interpretted as assigning a 'quarter' of the domain of a byte to each input value, where each quarter is defined by the lowest value in that subset. (It certainly isn't a divide by four.)
EDIT
The answer above in inaccurate (at best) or, more probably, incorrect.
Following up on a comment, it appears that my implementation of f() was incorrect, with the / 255 part of the calculation being 'optimised' and performed as a signed division by -1.
typedef unsigned char uint8;
int main (void)
{
short ui16Value;
for(ui16Value = 0; ui16Value < 0x100; ui16Value++)
{
uint8 value = (uint8)ui16Value;
// definition of f():
uint8 quarter = (value * 64) / 255;
printf("f(%d) = 0x%2.2X (%u)\r\n", value, quarter, quarter);
}
return 0;
}
Disassembly:
! uint8 quarter = (value * 64) / 255;
0xF48: MOV.B [W14+2], W5
0xF4A: MOV #0xC0, W4 <-- this looks odd.
0xF4C: MUL.SS W5, W4, W4
0xF4E: MOV.B W4, [W14+3]
The line highlighted above looks like an operator precidence issue, with 64 having been divided by 255 == 0xFF == -1, before the multiplication in the brackets.
Explicitly stating the value as being unsigned results in the values all being zero.
! uint8 quarter = (value * 64) / (uint8)255;
0xF48: MOV.B [W14+2], W4
0xF4A: ZE W4, W4
0xF4C: SL W4, #6, W5
0xF4E: MOV #0xFF, W4
0xF50: REPEAT #0x11
0xF52: DIV.SW W5, W4
0xF54: MOV W0, W4
0xF56: MOV.B W4, [W14+3]
I may have to look at the errata for the version of compiler I'm using, or try compiling for the target micrcontroller instead of the simulator. Either way, it's something interesting for me to look into.
For some reason the original programmer decided the value 255 should be a special case.
255 probably means 100% of something in your software, so he felt that treating it as if it were the un-representable value 256 was the best way of handling the value.
Your code is almost equivalent to...
OneQuarter = value == 255 ? 64 : value / 4;
ThreeQuarter = value == 255 ? 192 : value * 3 / 4
I'd assume this code deals with something like colours/pixels; where 0x00 is minimum and 0xFF is maximum.
If that's the case, the value 255 actually represents "255/255 of maximum", and a quarter really should be calculated as value = (value/255.0)/4 * 255.0 (or more generally, value = (value/MAX)/4 * MAX). Of course floating point is often slower, and value = (value * 64)/255 is a good "integer only" approximation.
Note that this is entirely about rounding correctly. Integer division truncates, and dividing by 4 doesn't give you the closest answer (e.g. 3/4 == 0 and doesn't give you "0.75 rounded up to 1").
I've made some tests, and the only difference that's looks important is that for value = 255, the total result would be OneQuarter + ThreeQuarter = 256.
But if precision is important, than this technique will result better values.
If you validate the result with the sentence Z == X(1/4) + Y(3/4), for the above technique the result will match for 128 values between 1 and 255. For a simple division, the result will not be valid for 192 cases.

Resources