I need to create a constant array of a constant array of records where I can reference each element of the outer array by a number.
I've tried:
A : constant array (0 .. 3) of B := (B1, B2, B3, B4)
where B is an array of records and B1,B2,B3,B4 are constant arrays of type B.
But when I do this I get the error:
"Unconstrained element type in array declaration"
type C is record
a : Integer := 0;
b : Integer := 0;
c : Integer := 0;
d : Integer := 0;
end record;
type B is array (Natural range <>) of C;
B1 : constant B := (0, 0, 0, 0);
B2 : constant B := (2, 0, 2, 0);
B3 : constant B := (0, 1, 0, 1);
B4 : constant B := (2, 1, 2, 1);
A : constant array (0 .. 3) of B := (B1, B2, B3, B4);
I was hoping to use A to be able to reference B1,B2,B3,B4 numerically like so:
A (1) returns B1
A (2) returns B2
and so on...
(I apologize if the terms I use are wrong. I'm kinda new to Ada and have been learning by trial and error...)
Your problem is that B is an unconstrained array:
type B is array (Natural range <>) of C;
This is fine for B1 : constant B := (0, 0, 0, 0);, as the constant definition creates a new anonymous type, with the range taken from the Right-Hand-Side.
It is, however, not fine for A. The compiler needs to know the size of the array elements, but cannot when the element (B in this case) is unconstrained. In addition, the constraints ('First, 'Last, etc) must be the same for all of the elements.
You can change your definition of B to be constrained:
type B is array (Natural range 1..4) of C;
This will force all your B1, B2, etc to always have four elements, which is what you already have in your example.
Also, if you want A(1) to return B1, you should change the range of A to start at 1:
A : constant array (1 .. 4) of B := (B1, B2, B3, B4);
What is obvious to you is not obvious to the compiler, that is all your B's have four elements.
For accessing element A(3)(2) it (or the Ada language) wants to be able to make very simple arithmetic (2 + 3 * 4) * (size of an integer). An array of B (which is unconstrained) would make this computation too complicated. The produced machine code would need to add the sizes of A(0), A(1), A(2) just to get to A(3)(0).
Of course you can imagine the time it would take for much larger array lengths, just for accessing element A(1234)(5678) for instance.
This is why the designers of Ada wisely required to have always arrays of constrained types. For your problem, you can solve it by defining subtype BL4 is B (0 .. 3); and use BL4 instead of B for B1, B2, B3, B4 and A.
Related
My question is about some puzzling behavior of gfortran (gcc version 8.1.0 (GCC)) in treating temporary arrays. The code below is compiled with -O3 -Warray-temporaries.
Consider the following possibility to construct sum of direct products of matrices:
program Pexam1
use Maux, only: kron
double precision:: A(3, 3), B(3, 3)
double precision:: C(5, 5), D(5, 5)
double precision:: rAC(15, 15), rBD(15, 15), r1(15, 15), r2(15, 15)
r1 = -kron(3, A, 5, C) - kron(3, B, 5, D)
! 1
! Warning: Creating array temporary at (1)
rAC = -kron(3, A, 5, C)
! 1
! Warning: Creating array temporary at (1)
r2 = rAC - kron(3, B, 5, D)
end program Pexam1
There are two warnings, but I do not expect the second one (in rAC = -kron(3, A, 5, C)). Why? Simply in the first line r1 = -kron(3, A, 5, C) - kron(3, B, 5, D) the identical first part is evaluated just fine without a temporary array. What is the reason for this inconsistency? What should one keep in mind to avoid unnecessary creation of temporary arrays?
The module Maux is as follows
module Maux
contains
pure function kron(nA, A, nB, B) result (C)
! // direct product of matrices
integer, intent(in) :: nA, nB
double precision, intent(in) :: A(nA, nA), B(nB, nB)
double precision :: C(nA*nB, nA*nB)
integer :: iA, jA, iB, jB
forall (iA=1:nA, jA=1:nA, iB=1:nB, jB=1:nB)&
C(iA + (iB-1) * nA, jA + (jB-1) * nA) = A(iA, jA) * B(iB, jB)
end function kron
end module Maux
If you build this program, you'll notice an array temporary is only created in case you're evaluating kron with the minus sign:
program Pexam1
use Maux, only: kron
double precision:: A(3, 3), B(3, 3)
double precision:: C(5, 5), D(5, 5)
double precision:: rAC(15, 15), rBD(15, 15), r1(15, 15), r2(15, 15)
rAC = kron(3, A, 5, C) ! no temporary
r2 = -kron(3, B, 5, D) ! temporary
end program Pexam1
Don't forget that every assignment in fortran (=) is a separate operation than what happens to the r.h.s. of the equal sign.
So when kron is negative, the compiler needs to:
evaluate kron(3,B,5,D), put it in a temporary (let's call it tmp)
evaluate -tmp
assign the result of this temporary operation (could have any type/kind) to r2
Apparently, gFortran is good enough that when you have step 1. only (positive assignment), the result of the operation is copied directly to rAC without any temporaries (I guess because rAC has the same rank, type, shape as the result of the call to kron).
in C programming if an 2-D array is given like ( int a[5][3]) and base address and address of particular element (cell ) is also given and have to find index no. of that element(cell) (row and col no.) can we find that? if yes how?
i know the formula of finding address is like this
int a[R][C];
address(a[i][j])=ba+size(C*i+ j);
if ba, R,C,Size and address(a[i][j]) is given... how to find value of i and j?
for finding the value of 2 variable we need 2 equation ..but im not able to find 2nd equation.
The specific address minus the base address gives you the size in bytes, from the base to the specific address.
If you divide that size in bytes with sizeof(ba[0][0]) (or sizeof(int)), you get the number of items.
items / C gives you the first dimension and items % C gives you the second dimension.
Thus:
int ba[R][C];
uintptr_t address = (uintptr_t)&ba[3][2]; // some random item
size_t items = (address - (uintptr_t)ba) / sizeof(ba[0][0]);
size_t i = items / C;
size_t j = items % C;
It is important to carry out the arithmetic with some type that has well-defined behavior, therefore uintptr_t.
If I had done int* address then address - ba would be nonsense, since ba decays into an array pointer of type int(*)[3]. They aren't compatible types.
Use integer division and remainder operators.
If you have the base and a pointer to an element, elt, then there are two things:
In "pure math" terms, you'll have to divide by the size of the elements in the array.
In "C" terms, when you subtract pointers this division is performed for you.
For example:
int a[2];
ptrdiff_t a0 = (ptrdiff_t)&a[0];
ptrdiff_t a1 = (ptrdiff_t)&a[1];
a1 - a0; // likely 4 or 8.
This will likely be 4 or 8 because that's the likely size of int on whatever machine you're using, and because we performed a "pure math" subtraction of two numbers.
But if you let C get involved, it tries to do the math for you:
int a[2];
int * a0 = &a[0];
int * a1 = &a[1];
a1 - a0; // 1
Because C knows the type, and because it's the law, the subtracted numbers get divided by the size of the type automatically, converting the pointer difference into an array-like index or offset.
This is important because it will affect how you do the math.
Now, if you know that the address of elt is base + SIZE * (R * i + j) you can find the answer with integer division (which may be performed automatically for you), subtraction, more integer division, and either modulus or multiply&subtract:
offset or number = elt - base. This will either give you an index (C style) or a numeric (pure math) difference, depending on how you do the computation.
offset = number / SIZE. This will finish the job, if you need it.
i = offset / R. Integer division here - just throw away the remainder.
j = offset - (i*R) OR j = offset % R. Pick what operation you want to use: multiply & subtract, or modulus.
I have two arrays {Ai} and {Bi} of natural numbers. The sums of all elements are equal.
I need to split each element of the two arrays into three natural numbers:
Ai = A1i + A2i + A3i
Bi = B1i + B2i + B3i
such that the sum of all elements of A1 is equal to the sum of all elements of B1 and the same for all the other pairs.
The important part I initially forgot about:
Each element from A1j, A2j, A3j should be between Aj/3-2 and Aj/3+2 or at least equal to one of these numbers
Each element from B1j, B2j, B3j should be between Bj/3-2 and Bj/3+2 or at least equal to one of these numbers
So the elements of arrays must be split in almost equal parts
I look for some more elegant solution than just calculating all possible variant for both arrays.
I look for some more elegant solution than just calculating all possible variant for both arrays.
It should be possible to divide them so that the sums of A1, A2 and A3 are near to a third of A, and the same for B. It would be easy to just make all values an exact third, but that’s not possible with natural numbers. So we have to floor the results (trivial) and distribute the remainders uniformly over the three arrays (manageable).
I don't know whether it’s the only solution, but it works in O(n) and my intuition says it will hold your invariants (though I didn’t proof it):
n = 3
for j=0 to n
A[j] = {}
x = 0 // rotating pointer for the next subarray
for i in A
part = floor(A[i] / n)
rest = A[i] % n
for j=0 to n
A[j][i] = part
// distribute the rest over the arrays, and rotate the pointer
for j=0 to rest
A[x][i]++
x++
/* Do the same for B */
One could also formulate the loop without the division, only distributing the single units (1) of an A[i] over the A[x][i]s:
n = 3
for j=0 to n
A[j] = {}
for k=0 to |A|
A[j][i] = 0
x = 0 // rotating pointer for the next subarray
for i in A
// distribute the rest over the arrays, and rotate the pointer
for j=0 to A[i]
A[x][i]++
x++
You should look up the principle of dynamic programming.
In this case, it seems to be similar to some coin change problems.
As for finding A1_i, A2_i, A3_i you should do it recursively:
def find_numbers(n, a, arr):
if arr[n] not empty:
return
if n == 0:
arr[n].append(a)
return
if a.size() > 2:
return
t = n
for each element of a:
t -= element
for i = 0 to :
find_numbers(n, append(a, i), arr)
We use arr so that we do not need to compute for each number multiple times the possible combinations. If you look at the call tree after a time this function will return the combinations from the arr, and not compute them again.
In your main call:
arr = []
for each n in A:
find_number(n, [], arr)
for each n in B:
find_number(n, [], arr)
Now you have all the combinations for each n in arr[n].
I know it is a subpart of the problem, but finding the right combinations for each A_i, B_i from arr is something really similar to this. > It is very important to read the links I gave you so that you understand the underlying theory behind.
I add the stipulation that A1, A2, and A3 must be calculated from A without knowledge of B, and, similarly, B1, B2, and B3 must be calculated without knowledge of A.
The requirement that each A1i, A2i, A3i must be in [Ai/3–2, Ai/3+2] implies that the sums of the elements of A1, A2, and A3 must each be roughly one-third that of A. The stipulation compels us to define this completely.
We will construct the arrays in any serial order (e.g., from element 0 to the last element). As we do so, we will ensure the arrays remain nearly balanced.
Let x be the next element of A to be processed. Let a be round(x/3). To account for x, we must append a total of 3•a+r to the arrays A1, A2, and A3, where r is –1, 0, or +1.
Let d be sum(A1) – sum(A)/3, where the sums are of the elements processed so far. Initially, d is zero, since no elements have been processed. By design, we will ensure d is –2/3, 0, or +2/3 at each step.
Append three values as shown below to A1, A2, and A3, respectively:
If r is –1 and d is –2/3, append a+1, a–1, a–1. This changes d to +2/3.
If r is –1 and d is 0, append a–1, a, a. This changes d to –2/3.
If r is –1 and d is +2/3, append a–1, a, a. This changes d to 0.
If r is 0, append a, a, a. This leaves d unchanged.
If r is +1 and d is –2/3, append a+1, a, a. This changes d to 0.
If r is +1 and d is 0, append a+1, a, a. This changes d to +2/3.
If r is +1 and d is +2/3, append a–1, a+1, a+1. This changes d to –2/3.
At the end, the sums of A1, A2, and A3 are uniquely determined by the sum of A modulo three. The sum of A1 is (sum(A3)–2)/3, sum(A3)/3, or (sum(A3)+2)/3 according to whether the sum of A modulo three is congruent to –1, 0, or +1, respectively.
Completing the demonstration:
In any case, a–1, a, or a+1 is appended to an array. a is round(x/3), so it differs from x/3 by less than 1, so a–1, a, and a+1 each differ from x/3 by less than 2, satisfying the constraint that the values must be in [Ai/3–2, Ai/3+2].
When B1, B2, and B3 are prepared in the same way as shown above for A1, A2, and A3, their sums are determined by the sum of B3. Since the sum of A equals the sum of B, the sums of A1, A2, and A3 equal the sums of B1, B2, and B3, respectively.
Suppose to have a __m128 variable holding 4 SP values, and you want the minimum one, is there any intrinsic function available, or anything other than the naive linear comparison among the values?
Right know my solution is the following (suppose the input __m128 variable is x):
x = _mm_min_ps(x, (__m128)_mm_srli_si128((__m128i)x, 4));
min = _mm_min_ss(x, (__m128)_mm_srli_si128((__m128i)x, 8))[0];
Which is quite horrible but it's working (btw, is there anything like _mm_srli_si128 but for the __m128 type?)
There is no single instruction/intrinsic but you can do it with two shuffles and two mins:
__m128 _mm_hmin_ps(__m128 v)
{
v = _mm_min_ps(v, _mm_shuffle_ps(v, v, _MM_SHUFFLE(2, 1, 0, 3)));
v = _mm_min_ps(v, _mm_shuffle_ps(v, v, _MM_SHUFFLE(1, 0, 3, 2)));
return v;
}
The output vector will contain the min of all the elements in the input vector, replicated throughout the output vector.
Paul R's answer is great! (#Paul R - if you read this thank you!) I just wanted to try to explain how it actually works for anyone new to SSE stuff like me. Of course
I might be wrong somewhere, so any corrections are welcome!
How does _mm_shuffle_ps work?
First of all, SSE registers have indexes that go in reverse to what you might expect, like this:
[6, 9, 8, 5] // values
3 2 1 0 // indexes
This order of indexing makes vector left-shifts move data from low to high indices, just like left-shifting the bits in an integer. The most-significant element is at the left.
_mm_shuffle_ps can mix the contents of two registers:
// __m128 a : (a3, a2, a1, a0)
// __m128 b : (b3, b2, b1, b0)
__m128 two_from_a_and_two_from_b = _mm_shuffle_ps(b, a, _MM_SHUFFLE(3, 2, 1, 0));
// ^ ^ ^ ^
// indexes into second operand indexes into first operand
// two_from_a_and_two_from_b : (a3, a2, b1, b0)
Here, we only want to shuffle the values of one register, not two. We can do that by passing v as both parameters, like this (you can see this in Paul R's function):
// __m128 v : (v3, v2, v1, v0)
__m128 v_rotated_left_by_1 = _mm_shuffle_ps(v, v, _MM_SHUFFLE(2, 1, 0, 3));
// v_rotated_left_by_1 : (v2, v1, v0, v3) // i.e. move all elements left by 1 with wraparound
I'm going to wrap it in a macro for readability though:
#define mm_shuffle_one(v, pattern) _mm_shuffle_ps(v, v, pattern)
(It can't be a function because the pattern argument to _mm_shuffle_ps must be constant at compile time.)
Here's a slightly modified version of the actual function – I added intermediate names for readability, as the compiler optimizes them out anyway:
inline __m128 _mm_hmin_ps(__m128 v){
__m128 v_rotated_left_by_1 = mm_shuffle_one(v, _MM_SHUFFLE(2, 1, 0, 3));
__m128 v2 = _mm_min_ps(v, v_rotated_left_by_1);
__m128 v2_rotated_left_by_2 = mm_shuffle_one(v2, _MM_SHUFFLE(1, 0, 3, 2));
__m128 v3 = _mm_min_ps(v2, v2_rotated_left_by_2);
return v3;
}
Why are shuffling the elements the way we are? And how do we find the smallest of four elements with just two min operations?
I had some trouble following how you can min 4 floats with just two vectorized min operations, but I understood it when I manually followed which values are min'd together, step by step. (Though it's likely more fun to do it on your own than read it)
Say we've got v:
[7,6,9,5] v
First, we min the values of v and v_rotated_left_by_1:
[7,6,9,5] v
3 2 1 0 // (just the indices of the elements)
[6,9,5,7] v_rotated_left_by_1
2 1 0 3 // (the indexes refer to v, and we rotated it left by 1, so the indices are shifted)
--------- min
[6,6,5,5] v2
3 2 1 0 // (explained
2 1 0 3 // below )
Each column under an element of v2 tracks which indexes of v were min'd together to get that element.
So, going column-wise left to right:
v2[3] == 6 == min(v[3], v[2])
v2[2] == 6 == min(v[2], v[1])
v2[1] == 5 == min(v[1], v[0])
v2[0] == 5 == min(v[0], v[3])
Now the second min:
[6,6,5,5] v2
3 2 1 0
2 1 0 3
[5,5,6,6] v2_rotated_left_by_2
1 0 3 2
0 3 2 1
--------- min
[5,5,5,5] v3
3 2 1 0
2 1 0 3
1 0 3 2
0 3 2 1
Voila! Each column under v3 contains (3,2,1,0) - each element of v3 has been mind with all the elements of v - so each element contains the minimum of the whole vector v.
After using the function, you can extract the minimum value with float _mm_cvtss_f32(__m128):
__m128 min_vector = _mm_hmin_ps(my_vector);
float minval = _mm_cvtss_f32(min_vector);
***
This is just a tangential thought, but what I found interesting is that this approach could be extended to sequences of arbitrary length, rotating the result of the previous step by 1, 2, 4, 8, ... 2**ceil(log2(len(v))) (i think) at each step.
That's cool from a theoretical perspective - if you can compare two sequences element-wise simultaneously, you can find the minimum/maximum1 of a sequences in logarithmic time!
1 This extends to all horizontal folds/reductions, like sum. Same shuffles, different vertical operation.
However, AVX (256-bit vectors) makes 128-bit boundaries special, and harder to shuffle across. If you only want a scalar result, extract the high half so every step narrows the vector width in half. (Like in Fastest way to do horizontal float vector sum on x86, which has more efficient shuffles than 2x shufps for 128-bit vectors, avoiding some movaps instructions when compiling without AVX.)
But if you want the result broadcast to every element like #PaulR's answer, you'd want to do in-lane shuffles (i.e. rotate within the 4 elements in every lane), then swap halves, or rotate 128-bit lanes.
So I found some similarities between arrays and set notation while learning about sets and sequences in precalc e.g. set notation: {a | cond } = { a1, a2, a3, a4, ..., an} given that n is the domain (or index) of the array, a subset of Natural numbers (or unsigned integer). Most programming languages would provide similar methods to arrays that are applied to sets e.g. upperbounds & lowerbounds; possibly suprema and infima too.
Where did arrays come from?
Python's list comprehensions, in this respect, are as good as it gets:
[x for x in someset if x < 5]
Very "set-like". The for x in <...> part specifies which set the elements are selected from, and if x... specifies the condition.