How can I reorder a 128 bit vector using Intel intrinsics? - c

I have a 128-bit vector of 4 floats that have been calculated, and I want to change the order of this vector like so:
Vector A before reordering
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
Vector A after reordering
+---+---+---+---+
| b | a | c | d |
+---+---+---+---+
As I said the vector has been calculated by earlier computations so no way to use _mm_set_ps()... Anyone have a clue on how can it be done?

You're looking for the SHUFPS instruction (shuffle packed single-precision floats).
The corresponding intrinsic is _mm_shuffle_ps:
__m128 _mm_shuffle_ps(__m128 a, __m128 b, unsigned int imm8);
The third parameter, an 8-bit immediate, is the permutation. This indicates how you want the values to be shuffled. To create this readably, you'll want to use the _MM_SHUFFLE macro. Here's a helpful graphical description of how _MM_SHUFFLE works, taken from some old Microsoft documentation:

Related

Why am I getting this output from a C union with bitfields in my code?

Sorry for the non descriptive title - I wasn't sure how to pose this in one line.
I have a data structure, where I have two values: one 14-bit, one 10-bit. I want to be able to access them as bytes in a union. I have the following:
struct test
{
union
{
struct
{
unsigned int a : 14;
unsigned int b : 10;
} fields;
struct
{
unsigned char i0;
unsigned char i1;
unsigned char i2;
} bytes;
} id;
};
Now, when I assign 1 to the value at bytes.i2, I would expect the value at values.b to also assume the value 1. But the value in values.b is actually bytes.i2 shifted left by 2 bits.
int main()
{
struct test x;
x.id.bytes.i2 = 1;
printf("%d", x.id.fields.b); // OUTPUTS 4
return 0;
}
I must be missing some basic principle here, any insight would be helpful!
In little endian, packed structs:
fields a |b
bytes i0 |i1 : |i2
BITS 00000000|000000|00|10000000 i2 = 1; b = 4
BITS 00000000|000000|10|10000000 i1 = 64; b = 1
INDEX 01234567|890123|45|67890123
0 1 2
As you can see b = 0b00000100 (4)
The exact layout and ordering of bitfields in a struct is entirely up to the implementation.
On a little endian machine, the layout of the union most likely looks like this:
|a |b a |b |
|7 6 5 4 3 2 1 0|1 0 d c b a 9 8|a 9 8 7 6 5 4 3|
| i0 | i1 | i2 |
-------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------
In this layout, we can see that the 8 low order bits of a are in the first byte, then the 6 high order bits of a and the 2 low order bits of b in the second byte, followed by the high order 8 bits of b in the third byte. This explains the result you're seeing.
Little endian machine will typically also have the bits in little endian format, so if you reverse the order of the bits in each byte above, reflecting the physical representation instead of the logical representation, you can see that the bits of each bitfield are contiguous.

struct padding extra byte after member [duplicate]

This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 4 years ago.
#include <stdio.h>
#include <stdint.h>
typedef struct s{
uint8_t a[1];
uint16_t b;
uint8_t c[1];
}s_t;
int main()
{
printf("size = %d", sizeof(s_t));
return 0;
}
I am not sure why the output of this program is 6 bytes and not 5. Why does the compiler pad an extra byte after the last member ? It also seems like, if you make the last member array length 3, the padding makes the size 8. I am unable to explain this since this is not the case for 2 arrays only.
Here is an illustration of the alignment that the compiler generates:
Bytes:
+-----+---------------+
| 0 | a[1] |
+-----+---------------+
| 1 | N/A (padding) |
+-----+---------------+
| 2 | b |
+-----+---------------+
| 3 | b |
+-----+---------------+
| 4 | c |
+-----+---------------+
As 16-bit quantities:
+---+------+----+
| 0 | a[i] | |
+---+------+----+
| 2 | b |
+---+------+----+
| 4 | c | |
+---+------+----+
Processors like to fetch 16-bit quantities from even addresses.
When they are on odd addresses, the computer may have to make 2 16-bit fetches, and extract the unaligned data out of them.
The easy method to eliminate this extra fetch is to add padding bytes so that 16-bit quantities align to even addresses.
A rule of thumb is to place the larger items first, then the smaller.
Applying this rule:
+---+------+
| 0 | b |
+---+------+
| 2 | a[1] |
+---+------+
| 3 | c |
+---+------+
The rule eliminates the need for an extra padding byte.

C, Multidimensional arrays: array whose elements are one-dimensional arrays?

Does this statement make sense, from the book C Programming: A Modern Approach, 2nd Edition on page 269
Just as the name of a one-dimensional array can be used as a pointer, so can the name of any array, regardless of how many dimensions it has. Some care is required, though. Consider the following array:
int a[NUM_ROWS][NUM_COLS];
a is not a pointer to a[0][0]; instead, it's a pointer to a[0]. This makes more sense if we look at it from the standpoint of C, which regards a not as a two-dimensional array but as a one-dimensional array whose elements are one-dimensional arrays. When used as a pointer, a has type int (*) [NUM_COLS] (pointer to an integer array of length NUM_COLS).
I'm confused because when I think "array whose elements are one-dimensional arrays" I think a jagged-array, but that's not what's going on here.. This is more like a macro with pointer arithmetic?
Is this in reference to the type system and how it treats multidimensional arrays? Could any one explain this?
Yes, it makes sense, and no, it's not even talking about "ragged" or "jagged" arrays. It's simply that when we say
int a[NUM_ROWS][NUM_COLS];
what we're creating is an array a, and what it's an array of is... other arrays. You could think of it like this:
+---------------------------------------+
| +--------+--------+--------+--------+ |
a: [0]: | | | | | | |
| +--------+--------+--------+--------+ |
+ +
| +--------+--------+--------+--------+ |
[1]: | | | | | | |
| +--------+--------+--------+--------+ |
+ +
| +--------+--------+--------+--------+ |
[2]: | | | | | | |
| +--------+--------+--------+--------+ |
+---------------------------------------+
(Here NUM_COLS is evidently 4, and NUM_ROWS is 3.)
A two- (or more) dimensional array is 100% analogous to a simple, single-dimensional array -- you just have to be careful thinking about the analogies. If a is an array, then any mention of a in an expression where its value is needed results in a pointer to the array's first element, &a[0]. So given the two-dimensional array a we're talking about, a's value is &a[0] and is a pointer to an array of NUM_COLS integers.
It has to work this way, if multidimensional array subscripts are to work correctly. If we write a[i][j], that's interpreted as (a[i])[j]. a turns into a pointer to the array's first element, as usual, but a[i] is equivalent to *(a + i), where the pointer arithmetic ends up being scaled by the size of the pointed-to element -- that is, under the hood, it's more like *(a+ i * sizeof(*a)). So sizeof(*a) has to be sizeof(int [NUM_COLS]), or NUM_COLS * sizeof(int). That way a[i] gets you the i'th subarray, and then j can select one of the cells -- the int-sized cells -- of the subarray.
One final note: I've talked colloquially about "multi-dimensional arrays", but strictly speaking, and as many of the regulars here are fond of pointing out, C has no multidimensional arrays; it has only single-dimensional arrays, and what we think of as a two-dimensional array is actually, as we've seen here, a single-dimensional array whose elements happen to be other single-dimensional arrays. (If C had true multi-dimensional arrays, the subscripts would probably look like a[i,j] instead of a[i][j].)
Addendum: Despite your mention of pointer arithmetic, and my mention of pointer arithmetic, it's important to realize that there are no pointers involved in a's definition. Pointers arise only when we try to "take the value of" a, or explain how a[i] is equivalent to *(a + i).
For a data structure that does involve pointers, we could contrast the situation described by the code
int *a2[NUM_ROWS];
for(i = 0; i < NUM_ROWS; i++)
a2[i] = malloc(NUM_COLS * sizeof(int));
This gives us a very different memory layout:
+-----+
a2: | | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
And this is what's usually called a "ragged" or "jagged" array, since it's obviously not necessary that all the rows in this case be the same length. Nevertheless, almost magically, the cells in the "ragged" array can also be accessed using the a2[i][j] notation. And for full dynamism, we could use
int **a3 = malloc(NUM_ROWS * sizeof(int *));
for(i = 0; i < NUM_ROWS; i++)
a3[i] = malloc(NUM_COLS * sizeof(int));
resulting in this memory layout:
+-----+
a3: | |
| * |
| | |
+--|--+
|
|
V
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
And a3[i][j] works here, too.
(Of course, in real code constructing "dynamic arrays" like a2 and a3, we'd have to check to make sure that malloc didn't return NULL.)
Another way to look at it...
For any type T, we create an array as
T arr[N];
where T can be int, char, double, struct foo, whatever, and reads as “N-element array of T”. It can also be another array type. So, instead of just int, suppose T is an M-element array of int, which we’d write as
int arr[N][M];
This reads as “arr is an N-element array of M-element arrays of int”. This isn’t a jagged array - all the “rows” are the same size. But it’s not exactly a 2-dimensional array, either - it is an array of arrays. The expression arr[i] has an array type (int [M]).
This view helps us figure out pointer to array types as well. Except when it is the operand of the sizeof or unary & operator, or is a string literal used to initialize a character array in a declaration, an expression of type “N-element array of T” (T [N]) will be converted (“decay”) to an expression of type “pointer to T” (T *). Again, if you replace T with an array type int [M], then you have an expression of type “N-element array of M-element arrays of int” (int [N][M]), which “decays” to type “pointer to M-element array of int” (int (*)[M]).

Pretty print graphs?

I was wondering if there is a preferred way to print a nice visual representation of a Graph, like the one seen here:
+------+
In1 ~>| |~> Out1
| bidi |
Out2 <~| |<~ In2
+------+

Efficient algorithm for looping over all neighbor pairs (2 point cliques) in 2-D array

I need to loop over all (unordered) pairs of pixels in an image that are neighbors of each other without repetition. I am using an 8 point neighborhood. For example:
x,y| 0 1 2 3 4
---+---+---+---+---+---+
0 | | | | | |
+---+---+---+---+---+
1 | a | b | c | d | |
+---+---+---+---+---+
2 | e | f | g | h | |
+---+---+---+---+---+
3 | i | j | k | l | |
+---+---+---+---+---+
4 | | | | | |
+---+---+---+---+---+
The neighbors of pixel f are in the 3x3 square around it. Thus, g, for example, forms a 2 point clique with f. If I were to loop over all the rows and columns of the image, this clique would be counted twice, once when f is the center pixel and once when g is the center pixel. Similar inefficiencies would occur with the rest of the cliques.
So what I would like to do, is loop over all the cliques, rather than each pixel. If I were familiar with graph theory, I think some of the answers already given to similar questions would suffice, but as I am not, I would really appreciate any help that you can give with an efficient algorithm in layman's terms. Thanks in advance!
Loop the first point over all points. Inner loop the second point over the right, lower-left, lower, and lower-right neighbors (if they exist).

Resources