Using SIMD to right shift 32 bit packed negative number - c

I'm writing some SSE/AVX code and there's a task to divide a packed signed 32 bit integers by 2's complement. When the values are positive this shift works fine, however it produces wrong results for negative values, because of shifting the sign bit.
Is there any SIMD operation that lets me shift preserving the position of the sign bit? Thanks

SSE2/AVX2 has a choice of arithmetic1 vs. logical right shifts for 16 and 32-bit element sizes. (For 64-bit elements, only logical is available until AVX512).
Use _mm_srai_epi32 (psrad) instead of _mm_srli_epi32 (psrld).
See Intel's intrinsics guide, and other links in the SSE tag wiki https://stackoverflow.com/tags/sse/info. (Filter it to exclude AVX512 if you want, because it's pretty cluttered these days with all the masked versions for all 3 sizes...)
Or just look at the asm instruction-set reference, which includes intrinsics for instructions that have them. Searching for "arithmetic" in http://felixcloutier.com/x86/index.html finds the shifts you want.
Note the a=arithmetic vs. l=logical, instead of the usual intrinsics naming scheme of epu32 for unsigned. The asm mnemonics are simple and consistent (e.g. Packed Shift Right Arithmetic Dword = psrad).
Arithmetic right shifts are also available for AVX2 variable-shifts (vpsravd, and for the one-variable-for-all-elements version of the immediate shifts.
Footnote 1:
Arithmetic right shifts shift in copies of the sign bit, instead of zero.
This correctly implement 2's complement signed division by powers of 2 with rounding towards negative infinity, unlike the truncation toward zero you get from C signed division. Look at the asm output for int foo(int a){return a/4;} to see how compilers implement signed division semantics in terms of shifts.

Related

Integer memory representation

As been told again and so that nehative numbers are represented by 2s complement while unsigned don't use that extra bit for signed convention. In case of integer we can represent both signed and unsigned. How in data type integer computer figures out which encoding scheme to pursue
Some operations (such as addition) work identically on both signed and unsigned integers.
But that's not the case for all operations. When right-shifting, we shift in zeroes for unsigned integers, and we shift in the sign bit for signed integers.
In these cases, the processor provides the means to achieve both operations. It's possible for the processor two offer two different instructions, or two variations of one.
But whatever the case, there is no decision making on the processor's part. The processor just executes the instructions selected by the compiler. It's up to the compiler to emit instructions that achieve the desired result based on the type of the values involved.

What is Biased Notation?

I have read:
"Like an unsigned int, but offset by −(2^(n−1) − 1), where n is the number of bits in the numeral. Aside:
Technically we could choose any bias we please, but the choice presented here is extraordinarily common." - http://inst.eecs.berkeley.edu/~cs61c/sp14/disc/00/Disc0.pdf
However, I don't get what the point is. Can someone explain this to me with examples? Also, when should I use it, given other options like one's compliment, sign and mag, and two's compliment?
Biased notation is a way of storing a range of values that doesn't start with zero.
Put simply, you take an existing representation that goes from zero to N, and then add a bias B to each number so it now goes from B to N+B.
Floating-point exponents are stored with a bias to keep the dynamic range of the type "centered" on 1.
Excess-three encoding is a technique for simplifying decimal arithmetic using a bias of three.
Two's complement notation could be considered as biased notation with a bias of INT_MIN and the most-significant bit flipped.
A "representation" is a way of encoding information so that it easy to extract details or inferences from the encoded information.
Most modern CPUs "represent" numbers using "twos complement notation". They do this because it is easy to design digital circuits that can do what amounts to arithmetic on these values quickly (add, subtract, multiply, divide, ...). Twos complement also has the nice property that one can interpret the most significant bit as either a power-of-two (giving "unsigned numbers") or as a sign bit (giving signed numbers) without changing essentially any of the hardware used to implement the arithmetic.
Older machines used other bases, e.g, quite common in the 60s were machines that represented numbers as sets of binary-coded-decimal digits stuck in 4-bit addressable nibbles (the IBM 1620 and 1401 are examples of this). So, you can represent that same concept or value different ways.
A bias just means that whatever representation you chose (for numbers), you have added a constant bias to that value. Presumably that is done to enable something to be done more effectively. I can't speak to "−(2^(n−1) − 1)" being "an extraordinaly common (bias)"; I do lots of assembly and C coding and pretty don't find a need to "bias" values.
However, there is a common example. Modern CPUs largely implement IEEE floating point, which stores floating point numbers with sign, exponent, mantissa. The exponent is is power of two, symmetric around zero, but biased by 2^(N-1) if I recall correctly, for an N-bit exponent.
This bias allows floating point values with the same sign to be compared for equal/less/greater by using the standard machine twos-complement instructions rather than a special floating point instruction, which means that sometimes use of actual floating point compares can be avoided. (See http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm for dark corner details). [Thanks to #PotatoSwatter for noting
the inaccuracy of my initial answer here, and making me go dig this out.]

Minimum of signed/unsigned integers using AVX

I was looking through the AVX instruction guide and though there are load, store and permute operations for 32-bit integer values, other operations such as determining minimum or maximum values, or shuffle operations are present only for floats and doubles.
So, if I wanted to use these operations for 32-bit integers, do I need to typecast it to floats, and then typecast it back or is there some other instruction that I'm missing?
Also, do the shuffle masks remain the same, as they were for floats, if I wanted to use it on 32-bit integers?
The bulk of the integer operations for 32B vectors are in the AVX2 extension (not the initial AVX extension, which is almost entirely floating-point operations). Intel's most recent AVX Programming Reference has the complete details; you may also want to look at Intel's blog post announcing some of the details.
Unfortunately, you cannot use the floating-point min or max operations to simulate those operations on integer data, as a significant number of integers map to NaN values when interpreted as floating-point data, and the semantics for NaN comparisons don't do what you would want for integer comparisons (you also would need to deal with the fact that floating-point encodings are sign-magnitude, so the ordering of negative values is "reversed", and that +0 and -0 compare equal).

Are the platforms covered by the C standard still in use? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Rephrased: list of platforms supported by the C standard
The C standard is very loosely defined:
- it covers two's complement, ones' complement, signed magnitude
- integers can be of various width, with padding bits
- certain bit patterns may not represent valid values.
There is a obvious downside to this: it make portable code harder to write. Does anyone know of platforms for which there are still active development work, but which are
not 2's complement or
the integer width is not 32 bits or 64 bits or
some integer types have padding bits or
if you worked on a 2's complement machine, the bit pattern with sign
bit 1 and all value bits zero is not a valid negative number or
integer conversion from signed to unsigned (and vice versa) is not via verbatim
copying of bit patterns or
right shift of integer is not arithmetic shift or
the number of value bits in an unsigned type is not the number of
value bits in the corresponding signed type + 1 or
conversion from a wider int type to a smaller type is not by
truncation of the left most bits which would not fit
yes...it is still used in embedded system and in micro-controllers.
It is also used in education purposes.
yes, we see this all the time when working with customizable microcontrolers and DSPs for things like audio processing.

Bit shifts in C

If the bit pattern corresponding to a signed integer is shifted to the right then
1 vacant bit will be filled by the sign bit
2 vacant bit will be filled by 0
3 The outcome is implementation dependent
4 none of the above
The answer to this question is 3rd option.. Can anybody explain this,,
Also give some basic idea, about the theory behind left shift and right shift operators in C programming. E.g.
what is filled on the vacant bit when any of the operation is performed. I checked and noticed that left shifting fills the vacant bit by 0 and right shift fills by 1. Please clear the logic...
I'd have to check the spec for the question of what precisely is implementation dependent.
However, every implementation I've used in (mumble) years of embedded systems projects has been sensible:
Left shifts always shift in a 0 at the low bit. No other value makes sense.
Right shifts depend on the data type. A right shift of a signed integer duplicates the high bit as it shifts the rest to the right. This is called an "arithmetic shift", and has the nice property (in twos complement arithmetic, at least) that it divides the value by two while preserving the sign of the original number.
A right shift of an unsigned integer shifts a 0 into the high bit, and is usually known as a "logical shift".
It makes sense for an implementation to provide both kinds of shifts because both are useful, and using signed/unsigned to select which is meant is a sensible choice.
Edit: At least one thing that absolutely is implementation dependent is that the C standard does not (completely) specify the underlying implementation of integers and their storage. For instance, it is possible to build a compliant C compiler for a machine that uses one's complement arithmetic. It would also be possible (I think) to build a compliant compiler for a machine whose native storage was signed magnitude BCD. (Nope, I was wrong, see below.)
In practice, the world is pretty much settled on two's complement binary for the CPU and some of the pedantry is mooted.
So part of the question really is: how do you define the meaning of the << and >> operators in a way that is stable regardless of the underlying arithmetic system used.
IIRC, the definition of n<<1 is effectively n*2, and n>>1 is effectively n/2, with a natural extension to shifts by more than 1 (but not more than 31... there be undefined dragons there...) and with the notion that the >> operator will preserve the sign if operating on a signed value.
Edit 2: Pete Kirkham points out in his fine answer that the C standard does specifically disallow the scary case of a BCD representation of integers, whether it is signed magnitude or ten's complement. I'm sure that is a good thing, even if Knuth did use a (optionally) BCD machine for his sample code in early editions of The Art of Computer Programming.
In those rare use cases where BCD is the right answer, then storing them in an unsigned long (8 digits ten's complement) or an unsigned 64-bit integer (room for 16 digits ten's complement or 15 digits plus sign and flags) and using a carefully crafted arithmetic library to manipulate them makes sense.
In practice, of course, C implementations map the operators as directly as allowed by the standard to the native machine instructions of the CPU. The folk who wrote the standard were very mindful of the existence of of many ways to implement even simple things like the representation of an integral value, and the C standard reflects that by allowing just enough implementation defined behavior to let operators be efficiently implemented in each machine.
The alternative leads swiftly to a world where all math operations are completely specified, and cannot be efficiently implemented on any machine.
C does not guarantee that there is a sign bit, or anything about the bit-level representation of integers, that is why.
For two's complement, you will typically see the sign bit being shifted in, but that is up to the implementation.
ISO C99 requires a sign bit somewhere in the representation, but gives the option between various compliment/sign and magnitude schemes and allows padding bits, all of which effect the operation of >>.
Section 6.2.6.2 (Integer types)
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign
bit. There need not be any padding bits; there shall be exactly one
sign bit.
and
Section 6.5.7 (Bitwise shift operators)
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient of
E1 / 2E2 . If E1 has a signed type and a negative value,
the resulting value is implementation-defined.
It doesn't specify which of 1's compliment, 2's compliment or sign and magnitude is used, nor whether the sign bit is left or right of the value bits, or where any padding is, all of which would effect the output of the >> operator on signed negatives.
In answer to RBerteig's query, C99 precludes a BCD representation of integers:
Section 6.2.6.2 (Integer types)
If there are N value bits, each bit shall represent a different power of 2 between 1 and 2 N −1 , so that objects of that type shall be
capable of representing values from 0 to 2 N − 1 using a pure binary
representation; this shall be known as the value representation.
The C language implementations tend to map bit shifting operations directly onto the corresponding machine code instructions. Since different hardware architectures have historically done different things, the C specification tends to leave things implementation defined so that C implementations can take advantage of whatever the hardware offers.
The outcome is implementation dependent. However, in practice, every single x86, PPC, and MIPS compiler I have ever worked with has followed this rule for shifting right:
If the operand is a signed integer,
the vacant bit is filled with the
sign bit (really the most
significant bit)
If the operand is
an unsigned integer, the vacant bit
is filled with zero.
As RBerteig says, this is so that for signed integers, n >> 1 = n/2 (rounded down) for both positive and negative n, and for unsigned integers, n >> 1 = n/2 even for n > 2^31 (on a 32-bit architecture).
The corresponding hardware instructions are arithmetic (sign-extending) and logical (not sign-extending) shift; the compiler chooses between them based on whether the operand is signed or unsigned.

Resources