VMINNM instruction in ARM

VMINNM instruction in ARM - arm

I would like to know how vminnm is working. Since the pseudo-code is bit unclear, I am not able to understand what is the exact function of this instruction.
vminnm.f32 d3, d5, d13
where
d5 = 0xffd5432100000000
d13 = 0x7ff0056000000000
Result:
d3 = 0x7fc0000000000000
How we are arriving with this result ?

This is the definition on the ARM Reference Manual for that instruction.
(NaN is Not a Number, that means that the value is not a valid floating point number.)
VMINNM
This instruction determines the floating point minimum number.
It handles NaNs in consistence with the IEEE754-2008 specification. It returns the numerical operand when one
operand is numerical and the other is a quiet NaN, but otherwise the result is identical to floating-point VMIN .
This instruction is not conditional.
vminnm.f32 d3, d5, d13
In your example, the values in d5 and d13 are compared and the result of the comparison is stored in d3. Take into consideration that you are dealing with vectors and you have two elements in each vector, which are 32-bit floating point each.
The value 0xffd5432100000000 is a valid 64-bit double, but not two 32-bit floating point, i.e 0xffd54321 is not a number and 0x00000000 is 0, so when you compare these values you need to be aware of the width of the values you are comparing. (You can check the values of floating points here.)

Related

Why doesn't this cause overflow?

#include <stdio.h>
int main() {
double b = 3.14;
double c = -1e20;
c = -1e20 + b;
return 0;
}
As long as I know type "double" has 52 bits of fraction. To conform 3.14's exponent to -1e20, 3.14's faction part goes over 60 bits, which never fits to 52 bits.
In my understanding, rest of fraction bits other than 52, which roughly counts 14 bits, invades unassigned memory space, like this.
rough drawing
So I examined memory map in debug mode (gdb), suspecting that the bits next to the variable b or c would be corrupted. But I couldn't see any changes. What am I missing here?

You mix up 2 very different things:
Buffer overflow/overrun
Your added image shows what happens when you overflow your buffer.
Like definining char[100] and writing to index 150. Then the memory layout is important as you might corrupt neighboring variables.
Overflow in values of a data type
What your code shows can only be an overflow of values.
If you do int a= INT_MAX; a++ you get an integer overflow.
This only affects the resulting value.
It does not cause the variable to grow in size. An int will always stay an int.
You do not invade any memory area outside your data type.
Depending on data type and architecture the overflowing bits could just be chopped off or some saturation could be applied to set the value to maximum/minimum representable value.
I did not check for yout values but without inspecting the value of c in a debugger or printing it, you cannot tell anything about the overflow there.

Floating-point arithmetic is not defined to work by writing out all the bits of the operands, performing the arithmetic using all the bits involved, and storing those bits in memory. Rather, the way elementary floating-point operations work is that each operation is performed “as if it first produced an intermediate result correct to infinite precision and with unbounded range” and then rounded to a result that is representable in the floating-point format. That “as if” is important. It means that when computer processor designers are designing the floating-point arithmetic instructions, they figure out how to compute what the final rounded result would be. The processor does not always need to “write out” all the bits to do that.
Consider an example using decimal floating-point with four significant digits. If we add 6.543•1020 and 1.037•17 (equal to 0.001037•1020), the infinite-precision result would be 6.544037•1020, and then rounding that to the nearest number representable in the four-significant-digit format would give 6.544•1020. But we do not have to write out the infinite-precision result to compute that. We can compute the result is 6.544•1020 plus a tiny fraction, and then we can discard that fraction without actually writing out its digits. This is what processor designers do. The add, multiply, and other instructions compute the main part of a result, and they carefully manage information about the other parts to determine whether they would cause the result to round upward or downward in its last digit.
The resulting behavior is that, given any two operands in the format used for double, the computer always produces a result in that same format. It does not produce any extra bits.
Supplement
There are 53 bits in the fraction portion of the format commonly used for double. (This is the IEEE-754 binary64 format, also called double precision.) The fraction portion is called the significand. (You may see it referred to as a mantissa, but that is an old term for the fraction portion of a logarithm. The preferred term is “significand.” Significands are linear; mantissas are logarithmic.) You may see some people describe there being 52 bits for the significand, but that refers to a part of the encoding of the floating-point value, and it is only part of it.
Mathematically, a floating-point representation is defined to be s•f•be, where b is a fixed numeric base, s provides a sign (+1 or −1), f is a number with a fixed number of digits p in base b, and e is an exponent within fixed limits. p is called the precision of the format, and it is 53 for the binary64 format. When this number is encoded into bits, the last 52 bits of f are stored in the significand field, which is where the 52 comes from. However, the first bit is also encoded, by way of the exponent field. Whenever the stored exponent field is not zero (or the special value of all one bits), it means the first bit of f is 1. When the stored exponent field is zero, it means the first bit of f is 0. So there are 53 bits present in the encoding.

SIMD - AVX - masking with non-zero value instead of highest bit

I have AVX (no AVX2 or AVX-512). I have a vector with 32bit values (only 4 lowest bits are used, rest is always zero):
[ 1010, 0000, 0000, 0000, 0000, 1010, 1010, 0000]
Internally, I keep vector as __m256 because of bitwise operations and the bits represents "float numbers". I need to export single 8-bit number from the vector, which will contain 1 for non-zero a 0 for zero bits.
So for above example, I need 8-bit number: 10000110
I have idea to use _mm256_cmp_ps and then _mm256_movemask_ps. However, for cmp, I dont know if it will work correctly, if numbers are not exactly floats and can be any "junk". In this case, which operand to use for cmp?
Or is there any other solution?

Conceptually, what you're doing should work. Floats with the upper 24 bits zero are valid floats. However, they are denormal.
While it should work, there are two potential problems:
If the FP mode is set to set to flush denormals to zero, then they will all be treated as zero. (thus, breaking that approach)
Because these are denormals, you may end up taking massive performance penalties depending on whether the hardware can handle them natively.
Alternative Approach:
Since the upper 24 bits are zero, you can normalize them. Then do the floating-point comparison.
(Warning: untested code)
int to_mask(__m256 data){
const __m256 MASK = _mm256_set1_ps(8388608.); // 2^23
data = _mm256_or_ps(data, MASK);
data = _mm256_cmp_ps(data, MASK, _CMP_NEQ_UQ);
return _mm256_movemask_ps(data);
}
Here, data is your input where the upper 24 bits of each "float" are zero. Let's call each of these 8-bit integers x.
OR'ing with 2^23 sets the mantissa of the float such that it becomes a normalized float with value 2^23 + x.
Then you compare against 2^23 as float - which will give a 1 only if the x is non-zero.

Alternate answer, for future readers that do have AVX2
You can cast to __m256i and use SIMD integer compares.
This avoid any problems with DAZ treating these small-integer bit patterns as exactly zero, or microcode assists for denormal (aka subnormal) inputs.
There might be 1 extra cycle of bypass latency between vcmpeqd and vpmovmskps on some CPUs, but you still come out ahead because integer compare is lower latency than FP compare.
int nonzero_positions_avx2(__m256 v)
{
__m256i vi = _mm256_castps_si256(v);
vi = _mm256_cmpeq_epi32(vi, _mm256_setzero_si256());
return _mm256_movemask_ps(_mm256_castsi256_ps(vi));
}

How to avoid floating point round off error in unit tests?

I'm trying to write unit tests for some simple vector math functions that operate on arrays of single precision floating point numbers. The functions use SSE intrinsics and I'm getting false positives (at least I think) when running the tests on a 32-bit system (the tests pass on 64-bit). As the operation runs through the array, I accumulate more and more round off error. Here is a snippet of unit test code and output (my actual question(s) follow):
Test Setup:
static const int N = 1024;
static const float MSCALAR = 42.42f;
static void setup(void) {
input = _mm_malloc(sizeof(*input) * N, 16);
ainput = _mm_malloc(sizeof(*ainput) * N, 16);
output = _mm_malloc(sizeof(*output) * N, 16);
expected = _mm_malloc(sizeof(*expected) * N, 16);
memset(output, 0, sizeof(*output) * N);
for (int i = 0; i < N; i++) {
input[i] = i * 0.4f;
ainput[i] = i * 2.1f;
expected[i] = (input[i] * MSCALAR) + ainput[i];
}
}
My main test code then calls the function to be tested (which does the same calculation used to generate the expected array) and checks its output against the expected array generated above. The check is for closeness (within 0.0001) not equality.
Sample output:
0.000000 0.000000 delta: 0.000000
44.419998 44.419998 delta: 0.000000
...snip 100 or so lines...
2043.319946 2043.319946 delta: 0.000000
2087.739746 2087.739990 delta: 0.000244
...snip 100 or so lines...
4086.639893 4086.639893 delta: 0.000000
4131.059570 4131.060059 delta: 0.000488
4175.479492 4175.479980 delta: 0.000488
...etc, etc...
I know I have two problems:
On 32-bit machines, differences between 387 and SSE floating point arithmetic units. I believe 387 uses more bits for intermediate values.
Non-exact representation of my 42.42 value that I'm using to generate expected values.
So my question is, what is the proper way to write meaningful and portable unit tests for math operations on floating point data?
*By portable I mean should pass on both 32 and 64 bit architectures.

Per a comment, we see that the function being tested is essentially:
for (int i = 0; i < N; ++i)
D[i] = A[i] * b + C[i];
where A[i], b, C[i], and D[i] all have type float. When referring to the data of a single iteration, I will use a, c, and d for A[i], C[i], and D[i].
Below is an analysis of what we could use for an error tolerance when testing this function. First, though, I want to point out that we can design the test so that there is no error. We can choose the values of A[i], b, C[i], and D[i] so that all the results, both final and intermediate results, are exactly representable and there is no rounding error. Obviously, this will not test the floating-point arithmetic, but that is not the goal. The goal is to test the code of the function: Does it execute instructions that compute the desired function? Simply choosing values that would reveal any failures to use the right data, to add, to multiply, or to store to the right location will suffice to reveal bugs in the function. We trust that the hardware performs floating-point correctly and are not testing that; we just want to test that the function was written correctly. To accomplish this, we could, for example, set b to a power of two, A[i] to various small integers, and C[i] to various small integers multiplied by b. I could detail limits on these values more precisely if desired. Then all results would be exact, and any need to allow for a tolerance in comparison would vanish.
That aside, let us proceed to error analysis.
The goal is to find bugs in the implementation of the function. To do this, we can ignore small errors in the floating-point arithmetic, because the kinds of bugs we are seeking almost always cause large errors: The wrong operation is used, the wrong data is used, or the result is not stored in the desired location, so the actual result is almost always very different from the expected result.
Now the question is how much error should we tolerate? Because bugs will generally cause large errors, we can set the tolerance quite high. However, in floating-point, “high” is still relative; an error of one million is small compared to values in the trillions, but it is too high to discover errors when the input values are in the ones. So we ought to do at least some analysis to decide the level.
The function being tested will use SSE intrinsics. This means it will, for each i in the loop above, either perform a floating-point multiply and a floating-point add or will perform a fused floating-point multiply-add. The potential errors in the latter are a subset of the former, so I will use the former. The floating-point operations for a*b+c do some rounding so that they calculate a result that is approximately a•b+c (interpreted as an exact mathematical expression, not floating-point). We can write the exact value calculated as (a•b•(1+e0)+c)•(1+e1) for some errors e0 and e1 with magnitudes at most 2-24, provided all the values are in the normal range of the floating-point format. (2-24 is the maximum relative error that can occur in any correctly rounded elementary floating-point operation in round-to-nearest mode in the IEEE-754 32-bit binary floating-point format. Rounding in round-to-nearest mode changes the mathematical value by at most half the value of the least significant bit in the significand, which is 23 bits below the most significant bit.)
Next, we consider what value the test program produces for its expected value. It uses the C code d = a*b + c;. (I have converted the long names in the question to shorter names.) Ideally, this would also calculate a multiply and an add in IEEE-754 32-bit binary floating-point. If it did, then the result would be identical to the function being tested, and there would be no need to allow for any tolerance in comparison. However, the C standard allows implementations some flexibility in performing floating-point arithmetic, and there are non-conforming implementations that take more liberties than the standard allows.
A common behavior is for an expression to be computed with more precision than its nominal type. Some compilers may calculate a*b + c using double or long double arithmetic. The C standard requires that results be converted to the nominal type in casts or assignments; extra precision must be discarded. If the C implementation is using extra precision, then the calculation proceeds: a*b is calculated with extra precision, yielding exactly a•b, because double and long double have enough precision to exactly represent the product of any two float values. A C implementation might then round this result to float. This is unlikely, but I allow for it anyway. However, I also dismiss it because it moves the expected result to be closer to the result of the function being tested, and we just need to know the maximum error that can occur. So I will continue, with the worse (more distant) case, that the result so far is a•b. Then c is added, yielding (a•b+c)•(1+e2) for some e2 with magnitude at most 2-53 (the maximum relative error of normal numbers in the 64-bit binary format). Finally, this value is converted to float for assignment to d, yielding (a•b+c)•(1+e2)•(1+e3) for some e3 with magnitude at most 2-24.
Now we have expressions for the exact result computed by a correctly operating function, (a•b•(1+e0)+c)•(1+e1), and for the exact result computed by the test code, (a•b+c)•(1+e2)•(1+e3), and we can calculate a bound on how much they can differ. Simple algebra tells us the exact difference is a•b•(e0+e1+e0•e1-e2-e3-e2•e3)+c•(e1-e2-e3-e2•e3). This is a simple function of e0, e1, e2, and e3, and we can see its extremes occur at endpoints of the potential values for e0, e1, e2, and e3. There are some complications due to interactions between possibilities for the signs of the values, but we can simply allow some extra error for the worst case. A bound on the maximum magnitude of the difference is |a•b|•(3•2-24+2-53+2-48)+|c|•(2•2-24+2-53+2-77).
Because we have plenty of room, we can simplify that, as long as we do it in the direction of making the values larger. E.g., it might be convenient to use |a•b|•3.001•2-24+|c|•2.001•2-24. This expression should suffice to allow for rounding in floating-point calculations while detecting nearly all implementation errors.
Note that the expression is not proportional to the final value, a*b+c, as calculated either by the function being tested or by the test program. This means that, in general, tests using a tolerance relative to the final values calculated by the function being tested or by the test program are wrong. The proper form of a test should be something like this:
double tolerance = fabs(input[i] * MSCALAR) * 0x3.001p-24 + fabs(ainput[i]) * 0x2.001p-24;
double difference = fabs(output[i] - expected[i]);
if (! (difference < tolerance))
// Report error here.
In summary, this gives us a tolerance that is larger than any possible differences due to floating-point rounding, so it should never give us a false positive (report the test function is broken when it is not). However, it is very small compared to the errors caused by the bugs we want to detect, so it should rarely give us a false negative (fail to report an actual bug).
(Note that there are also rounding errors computing the tolerance, but they are smaller than the slop I have allowed for in using .001 in the coefficients, so we can ignore them.)
(Also note that ! (difference < tolerance) is not equivalent to difference >= tolerance. If the function produces a NaN, due to a bug, any comparison yields false: both difference < tolerance and difference >= tolerance yield false, but ! (difference < tolerance) yields true.)

On 32-bit machines, differences between 387 and SSE floating point arithmetic units. I believe 387 uses more bits for intermediate values.
If you are using GCC as 32-bit compiler, you can tell it to generate SSE2 code still with options -msse2 -mfpmath=sse. Clang can be told to do the same thing with one of the two options and ignores the other one (I forget which). In both cases the binary program should implement strict IEEE 754 semantics, and compute the same result as a 64-bit program that also uses SSE2 instructions to implement strict IEEE 754 semantics.
Non-exact representation of my 42.42 value that I'm using to generate expected values.
The C standard says that a literal such as 42.42f must be converted to either the floating-point number immediately above or immediately below the number represented in decimal. Moreover, if the literal is representable exactly as a floating-point number of the intended format, then this value must be used. However, a quality compiler (such as GCC) will give you(*) the nearest representable floating-point number, of which there is only one, so again, this is not a real portability issue as long as you are using a quality compiler (or at the very least, the same compiler).
Should this turn out to be a problem, a solution is to write an exact representation of the constants you intend. Such an exact representation can be very long in decimal format (up to 750 decimal digits for the exact representation of a double) but is always quite compact in C99's hexadecimal format: 0x1.535c28p+5 for the exact representation of the float nearest to 42.42. A recent version of the static analysis platform for C programs Frama-C can provide the hexadecimal representation of all inexact decimal floating-point constants with option -warn-decimal-float:all.
(*) barring a few conversion bugs in older GCC versions. See Rick Regan's blog for details.

Is there any way to get correct rounding with the i387 fsqrt instruction?

Is there any way to get correct rounding with the i387 fsqrt instruction?...
...aside from changing the precision mode in the x87 control word - I know that's possible, but it's not a reasonable solution because it has nasty reentrancy-type issues where the precision mode will be wrong if the sqrt operation is interrupted.
The issue I'm dealing with is as follows: the x87 fsqrt opcode performs a correctly-rounded (per IEEE 754) square root operation in the precision of the fpu registers, which I'll assume is extended (80-bit) precision. However, I want to use it to implement efficient single and double precision square root functions with the results correctly rounded (per the current rounding mode). Since the result has excess precision, the second step of converting the result to single or double precision rounds again, possibly leaving a not-correctly-rounded result.
With some operations it's possible to work around this with biases. For instance, I can avoid excess precision in the results of addition by adding a bias in the form of a power of two that forces the 52 significant bits of a double precision value into the last 52 bits of the 63-bit extended-precision mantissa. But I don't see any obvious way to do such a trick with square root.
Any clever ideas?
(Also tagged C because the intended application is implementation of the C sqrt and sqrtf functions.)

First, let's get the obvious out of the way: you should be using SSE instead of x87. The SSE sqrtss and sqrtsd instructions do exactly what you want, are supported on all modern x86 systems, and are significantly faster as well.
Now, if you insist on using x87, I'll start with the good news: you don't need to do anything for float. You need 2p + 2 bits to compute a correctly rounded square-root in a p-bit floating-point format. Because 80 > 2*24 + 2, the additional rounding to single-precision will always round correctly, and you have a correctly rounded square root.
Now the bad news: 80 < 2*53 + 2, so no such luck for double precision. I can suggest several workarounds; here's a nice easy one off the top of my head.
let y = round_to_double(x87_square_root(x));
use a Dekker (head-tail) product to compute a and b such that y*y = a + b exactly.
compute the residual r = x - a - b.
if (r == 0) return y
if (r > 0), let y1 = y + 1 ulp, and compute a1, b1 s.t. y1*y1 = a1 + b1. Compare r1 = x - a1 - b1 to r, and return either y or y1, depending on which has the smaller residual (or the one with zero low-order bit, if the residuals are equal in magnitude).
if (r < 0), do the same thing for y1 = y - 1 ulp.
This proceedure only handles the default rounding mode; however, in the directed rounding modes, simply rounding to the destination format does the right thing.

OK, I think I have a better solution:
Compute y=sqrt(x) in extended precision (fsqrt).
If last 11 bits are not 0x400, simply convert to double precision and return.
Add 0x100-(fpu_status_word&0x200) to the low word of the extended precision representation.
Convert to double precision and return.
Step 3 is based on the fact that the C1 bit (0x200) of the status word is 1 if and only if fsqrt's result was rounded up. This is valid because, due to the test in step 2, x was not a perfect square; if it were a perfect square, y would have no bits beyond double precision.
It may be faster to perform step 3 with a conditional floating point operating rather than working on the bit representation and reloading.
Here's the code (seems to work in all cases):
sqrt:
fldl 4(%esp)
fsqrt
fstsw %ax
sub $12,%esp
fld %st(0)
fstpt (%esp)
mov (%esp),%ecx
and $0x7ff,%ecx
cmp $0x400,%ecx
jnz 1f
and $0x200,%eax
sub $0x100,%eax
sub %eax,(%esp)
fstp %st(0)
fldt (%esp)
1: add $12,%esp
fstpl 4(%esp)
fldl 4(%esp)
ret

It may not be what you want, as it doesn't take advantage of the 387 fsqrt instruction, but there's a surprisingly efficient sqrtf(float) in glibc implemented with 32-bit integer arithmetic. It even handles NaNs, Infs, subnormals correctly - it might be possible to eliminate some of these checks with real x87 instructions / FP control word flags. see: glibc-2.14/sysdeps/ieee754/flt-32/e_sqrtf.c
The dbl-64/e_sqrt.c code is not so friendly. It's hard to tell what assumptions are being made at a glance. Curiously, the library's i386 sqrt[f|l] implementations just call fsqrt, but load the value differently. flds for SP, fldl for DP.

What is the result of divide by zero?

To be clear, I am not looking for NaN or infinity, or asking what the answer to x/0 should be. What I'm looking for is this:
Based on how division is performed in hardware (I do not know how it is done), if division were to be performed with a divisor of 0, and the processor just chugged along happily through the operation, what would come out of it?
I realize this is highly dependent on the dividend, so for a concrete answer I ask this: What would a computer spit out if it followed its standard division operation on 42 / 0?
Update:
I'll try to be a little clearer. I'm asking about the actual operations done with the numbers at the bit level to reach a solution. The result of the operation is just bits. NaN and errors/exceptions come into play when the divisor is discovered to be zero. If the division actually happened, what bits would come out?

It might just not halt. Integer division can be carried out in linear time through repeated subtraction: for 7/2, you can subtract 2 from 7 a total of 3 times, so that’s the quotient, and the remainder (modulus) is 1. If you were to supply a dividend of 0 to an algorithm like that, unless there were a mechanism in place to prevent it, the algorithm would not halt: you can subtract 0 from 42 an infinite number of times without ever getting anywhere.
From a type perspective, this should be intuitive. The result of an undefined computation or a non-halting one is ⊥ (“bottom”), the undefined value inhabiting every type. Division by zero is not defined on the integers, so it should rightfully produce ⊥ by raising an error or failing to terminate. The former is probably preferable. ;)
Other, more efficient (logarithmic time) division algorithms rely on series that converge to the quotient; for a dividend of 0, as far as I can tell, these will either fail to converge (i.e., fail to terminate) or produce 0. See Division on Wikipedia.
Floating-point division similarly needs a special case: to divide two floats, subtract their exponents and integer-divide their significands. Same underlying algorithm, same problem. That’s why there are representations in IEEE-754 for positive and negative infinity, as well as signed zero and NaN (for 0/0).

For processors that have an internal "divide" instruction, such as the x86 with div, the CPU actually causes a software interrupt if one attempts to divide by zero. This software interrupt is usually caught by the language runtime and translated into an appropriate "divide by zero" exception.

Hardware dividers typically use a pipelined long division structure.
Assuming we're talking about integer division for now (as opposed to floating-point); the first step in long division is to align the most-significant ones (before attempting to subtract the divisor from the dividend). Clearly, this is undefined in the case of 0, so who knows what the hardware would do. If we assume it does something sane, the next step is to perform log(n) subtractions (where n is the number of bit positions). For every subtraction that results in a positive result, a 1 is set in the output word. So the output from this step would be an all-1s word.
Floating-point division requires three steps:
Taking the difference of the exponents
Fixed-point division of the mantissas
Handling special cases
0 is represented by all-0s (both the mantissa and the exponent). However, there's always an implied leading 1 in the mantissa, so if we weren't treating this representation as a special case, it would just look and act like an extremely small power of 2.

It depends on the implementation. IEE standard 754 floating point[1] defines signed infinity values, so in theory that should be the result of divide by zero. The hardware simply sets a flag if the demoninator is zero on a division operation. There is no magic to it.
Some erroneous (read x86) architectures throw a trap if they hit a divide by zero which is in theory, from a mathematical point of view, a cop out.
[1] http://en.wikipedia.org/wiki/IEEE_754-2008

It would be an infinite loop. Typically, division is done through continuous subtraction, just like multiplication is done via continual addition.
So, zero is special cased since we all know what the answer is anyway.

It would actually spit out an exception. Mathematically, 42 / 0, is undefined, so computers won't spit out a specific value to these inputs. I know that division can be done in hardware, but well designed hardware will have some sort of flag or interrupt to tell you that whatever value contained in the registers that are supposed to contain the result is not valid. Many computers make an exception out of this.

On x86, interrupt 0 occurs and the output registers are unchanged
Minimal 16-bit real mode example (to be added to a bootloader for example):
movw $handler, 0x00
movw %cs, 0x02
mov $0, %ax
div %ax
/* After iret, we come here. */
hlt
handler:
/* After div, we come here. *
iret
How to run this code in detail || 32-bit version.
The Intel documentation for the DIV instruction does not say that the regular output registers (ax == result, dx == module) are modified, so I think this implies they stay unchanged.
Linux would then handle that interrupt to send a SIGFPE to the process that did that, which is will kill it if not handled.

X / 0 Where X is an element of realnumbers and is greater than or equal to 1,
therefor the answer of X / 0 = infinity.
Division method (c#)
Int Counter = 0; /* used to keep track of the division */
Int X = 42; /* number */
Int Y = 0; /* divisor */
While (x > 0) {
X = X - Y;
Counter++;
}
Int answer = Counter;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight