Minimum of signed/unsigned integers using AVX - c

I was looking through the AVX instruction guide and though there are load, store and permute operations for 32-bit integer values, other operations such as determining minimum or maximum values, or shuffle operations are present only for floats and doubles.
So, if I wanted to use these operations for 32-bit integers, do I need to typecast it to floats, and then typecast it back or is there some other instruction that I'm missing?
Also, do the shuffle masks remain the same, as they were for floats, if I wanted to use it on 32-bit integers?

The bulk of the integer operations for 32B vectors are in the AVX2 extension (not the initial AVX extension, which is almost entirely floating-point operations). Intel's most recent AVX Programming Reference has the complete details; you may also want to look at Intel's blog post announcing some of the details.
Unfortunately, you cannot use the floating-point min or max operations to simulate those operations on integer data, as a significant number of integers map to NaN values when interpreted as floating-point data, and the semantics for NaN comparisons don't do what you would want for integer comparisons (you also would need to deal with the fact that floating-point encodings are sign-magnitude, so the ordering of negative values is "reversed", and that +0 and -0 compare equal).

Related

Is there a fixed point representation available in C or Assembly

As far as I know, representing a fraction in C relies on floats and doubles which are in floating point representation.
Assume I'm trying to represent 1.5 which is a fixed point number (only one digit to the right of the radix point). Is there a way to represent such number in C or even assembly using a fixed point data type?
Are there even any fixed point instructions on x86 (or other architectures) which would operate on such type?
Every integral type can be used as a fixed point type. A favorite of mine is to use int64_t with an implied 8 digit shift, e.g. you store 1.5 as 150000000 (1.5e8). You'll have to analyze your use case to decide on an underlying type and how many digits to shift (that is, assuming you use base-10 scaling, which most people do). But 64 bits scaled by 10^8 is a pretty reasonable starting point with a broad range of uses.
While some C compilers offer special fixed-point types as an extension (not part of the standard C language), there's really very little use for them. Fixed point is just integers, interpreted with a different unit. For example, fixed point currency in typical cent denominations is just using integers that represent cents instead of dollars (or whatever the whole currency unit is) for your unit. Likewise, you can think of 8-bit RGB as having units of 1/256 or 1/255 "full intensity".
Adding and subtracting fixed point values with the same unit is just adding and subtracting integers. This is just like arithmetic with units in the physical sciences. The only value in having the language track that they're "fixed point" would be ensuring that you can only add/subtract values with matching units.
For multiplication and division, the result will not have the same units as the operands so you have to either treat the result as a different fixed-point type, or renormalize. For example if you multiply two values representing 1/16 units, the result will have 1/256 units. You can then either scale the value down by a factor of 16 (rounding in whatever way is appropriate) to get back to a value with 1/16 units.
If the issue here is representing decimal values as fixed point, there's probably a library for this for C, you could try a web search. You could create your own BCD fixed point library in assembly, using the BCD related instructions, AAA (adjusts after addition), AAS (adjusts after subtraction) and AAM (adjusts after multiplication). However, it seems these instructions are invalid in X86 X64 (64 bit) mode, so you'll need to use a 32 bit program, which should be runnable on a 64 bit OS.
Financial institutions in the USA and other countries are required by law to perform decimal based math on currency values, to avoid decimal -> binary -> decimal conversion issues.

How is NaN saved during run time?

I had a small function where at one point I divided by 0 and created my first NaN. After looking on the internet I did find out the NaN - not a number and NaN != NaN.
My questions are:
During run time how is NaN saved or how does the controller know that a variable has the NaN value?(I am working with small micro controllers(c language), the mechanism is different in programs that are running on a pc(c# and other OOP languages))?
Inf is similar to NaN?
In C, the types of values are determined statically by your source code. For named objects (“variables”), you explicitly declare the types. For constants, the syntax of them (e.g., 3 versus 3.) determines the type. In typical C implementations that compile to machine code on common processors, the processors have different instructions for working with integers and floating-point. The compiler uses integer instructions for integers and floating-point instructions for floating-point values. The floating-point instructions are designed in hardware to work with encodings of floating-point values.
In IEEE-754 binary floating-point, floating-point data is encoded with a sign bit, an exponent field, and a significand field. If the exponent field is all ones and the significand field is not all zeros, the datum represents a NaN. In common modern processors, this is built into the hardware.
Infinity is not largely similar to a NaN. They might both be considered special in that they are not normal numbers and are processed somewhat differently from normal numbers. However, in IEEE-754 arithmetic, infinity is a number and participates in arithmetic. NaN is not a number.

Converting 32-bit number to 16 bits or less

On my mbed LPC1768 I have an ADC on a pin which when polled returns a 16-bit short number normalised to a floating point value between 0-1. Document here.
Because it converts it to a floating point number does that mean its 32-bits? Because the number I have is a number to six decimal places. Data Types here
I'm running Autocorrelation and I want to reduce the time it takes to complete the analysis.
Is it correct that the floating point numbers are 32-bits long and if so is it correct that multiplying two 32-bit floating point numbers will take a lot longer than multiplying two 16-bit short value (non-demical) numbers together?
I am working with C to program the mbed.
Cheers.
I should be able to comment on this quite accurately. I used to do DSP processing work where we would "integerize" code, which effectively meant we'd take a signal/audio/video algorithm, and replace all the floating point logic with fixed point arithmetic (ie: Q_mn notation, etc).
On most modern systems, you'll usually get better performance using integer arithmetic, compared to floating point arithmetic, at the expense of more complicated code you have to write.
The Chip you are using (Cortex M3) doesn't have a dedicated hardware-based FPU: it only emulates floating point operations, so floating point operations are going to be expensive (take a lot of time).
In your case, you could just read the 16-bit value via read_u16(), and shift the value right 4 times, and you're done. If you're working with audio data, you might consider looking into companding algorithms (a-law, u-law), which will give a better subjective performance than simply chopping off the 4 LSBs to get a 12-bit number from a 16-bit number.
Yes, a float on that system is 32bit, and is likely represented in IEEE754 format. Multiplying a pair of 32-bit values versus a pair of 16-bit values may very well take the same amount of time, depending on the chip in use and the presence of an FPU and ALU. On your chip, multiplying two floats will be horrendously expensive in terms of time. Also, if you multiply two 32-bit integers, they could potentially overflow, so there is one potential reason to go with floating point logic if you don't want to implement a fixed-point algorithm.
It is correct to assume that multiplying two 32-bit floating point numbers will take longer than multiplying two 16-bit short value if special hardware(Floating point unit) is not present in the processor.

Floats and Longs

I used sizeof to check the sizes of longs and floats in my 64 bit amd opteron machine. Both show up as 4.
When I check limits.h and float.h for maximum float and long values these are the values I get:
Max value of Float:340282346638528859811704183484516925440.000000
Max value of long:9223372036854775807
Since they both are of the same size, how can a float store such a huge value when compared to the long?
I assume that they have a different storage representation for float. If so, does this impact performance:ie ,is using longs faster than using floats?
It is a tradeoff.
A 32 bit signed integer can express every integer between -231 and +231-1.
A 32 bit float uses exponential notation and can express a much wider range of numbers, but would be unable to express all of the numbers in the range -- not even all of the integers. It uses some of the bits to represent a fraction, and the rest to represent an exponent. It is effectively the binary equivalent of a notation like 6.023*1023 or what have you, with the distance between representable numbers quite large at the ends of the range.
For more information, I would read this article, "What Every Computer Scientist Should Know About Floating Point Arithmetic" by David Goldberg: http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf
By the way, on your platform, I would expect a float to be a 32 bit quantity and a long to be a 64 bit quantity, but that isn't really germane to the overall point.
Performance is kind of hard to define here. Floating point operations may or may not take significantly longer than integer operations depending on the nature of the operations and whether hardware acceleration is used for them. Typically, operations like addition and subtraction are much faster in integer -- multiplication and division less so. At one point, people trying to bum every cycle out when doing computation would represent real numbers as "fixed point" arithmetic and use integers to represent them, but that sort of trick is much rarer now. (On an Opteron, such as you are using, floating point arithmetic is indeed hardware accelerated.)
Almost all platforms that C runs on have distinct "float" and "double" representations, with "double" floats being double precision, that is, a representation that occupies twice as many bits. In addition to the space tradeoff, operations on these are often somewhat slower, and again, people highly concerned about performance will try to use floats if the precision of their calculation does not demand doubles.
It's unlikely to matter whether operations on long are faster than operations on float, or vice versa.
If you only need to represent whole number values, use an integer type. Which type you should use depends on what you're using it for (signed vs. unsigned, short vs. int vs. long vs. long long, or one of the exact-width types in <stdint.h>).
If you need to represent real numbers, use one of the floating-point types: float, double, or long double. (float is actually not used much unless memory space is at a premium; double has better precision and often is no slower than float.)
In short, choose a type whose semantics match what you need, and worry about performance later. There's no great advantageously in getting wrong answers quickly.
As for storage representation, the other answers have pretty much covered that. Typically unsigned integers user all their bits to represent the value, signed integers devote one bit to representing the sign (though usually not directly), and floating-point types devote one bit for the sign, a few bits for an exponent, and the rest for the value. (That's a gross oversimplification.)
Floating point maths is a subject all to itself, but yes: int types are typically faster than float types.
One trick to remember is that not all values can be expressed as a float.
e.g. the closest you may be able to get to 1.9 is 1.899999999. This leads to fun bugs where you say if (v == 1.9) things behave unexpectedly!
If so, does this impact performance: ie, is using longs faster than using floats?
Yes, arithmetic with longs will be faster than with floats.
I assume that they have a different storage representation for float.
Yes. The float types are in IEEE 754 (single precision) format.
Since they both are of the same size, how can a float store such a huge value when compared to the long?
It's optimized to store numbers at a few points (near 0 for example), but it's not optimized to be accurate. For example you could add 1 to 1000000000. With the float, there probably won't be any difference in the sum (1000000000 instead of 1000000001), but with the long there will be.

SSE ints vs. floats practice

When dealing with both ints and floats in SSE (AVX) is it a good practice to convert all ints to floats and work only with floats?
Because we need only a few SIMD instructions after that, and all we need to use is addition and compare instructions (<, <=, ==) which this conversion, I hope, should retain completely.
Expand my comments into an answer.
Basically you weighing the following trade-off:
Stick with integer:
Integer SSE is low-latency, high throughput. (dual issue on Sandy Bridge)
Limited to 128-bit SIMD width.
Convert to floating-point:
Benefit from 256-bit AVX.
Higher latencies, and only single-issue addition/subtraction (on Sandy Bridge)
Incurs initial conversion overhead.
Restricts input to those that fit into a float without precision loss.
I'd say stick with integer for now. If you don't want to duplicate code with the float versions, then that's your call.
The only times I've seen where emulating integers with floating-point becomes faster are when you have to do divisions.
Note that I've made no mention of readability as diving into manual vectorization probably implies that performance is more important.

Resources