Deriving arrays in mathematics - arrays

So I found some similarities between arrays and set notation while learning about sets and sequences in precalc e.g. set notation: {a | cond } = { a1, a2, a3, a4, ..., an} given that n is the domain (or index) of the array, a subset of Natural numbers (or unsigned integer). Most programming languages would provide similar methods to arrays that are applied to sets e.g. upperbounds & lowerbounds; possibly suprema and infima too.
Where did arrays come from?

Python's list comprehensions, in this respect, are as good as it gets:
[x for x in someset if x < 5]
Very "set-like". The for x in <...> part specifies which set the elements are selected from, and if x... specifies the condition.

Related

Pairwise comparisons of a large amount of sorted arrays

Suppose I have n sorted integer arrays (a_1, ..., a_n, there may be duplicated elements in a single array), and T is a threshold value between 0 and 1. I would like to find all pairs of arrays the similarity of which is larger than T. The similarity of array a_j w.r.t. array a_i is defined as follows:
sim(i, j) = intersection(i, j) / length(i)
where intersection(i, j) returns the number of elements shared in a_i and a_j, and length(i) returns the length of array a_i.
I can enumerate all pairs of arrays and compute the similarity value, but this takes too much time for a large n (say n=10^5). Is there any data structure, pruning strategy, or other techniques that can reduce the time cost of this procedure? I'm using Java so the technique should be easily applicable in Java.
There are (n^2 - n)/2 pairs of arrays. If n=10^5, then you have to compute the similarity of 5 billion pairs of arrays. That's going to take some time.
One potential optimization is to shortcut your evaluation of two arrays if it becomes clear that you won't reach T. For example, if T is 0.5, you've examined more than half of the array and haven't found any intersections, then it's clear that that pair of arrays won't meet the threshold. I don't expect this optimization to gain you much.
It might be possible to make some inferences based on prior results. That is, if sim(1,2) = X and sim(1,3) < T, there's probably a value of X (likely would have to be very high) at which you can say definitively that sim(2,3) < T.

Efficiently store an N-dimensional array of mostly zeros in Matlab

I implemented a finite differences algorithm to solve a PDE.
The grid is a structured 2D domain of size [Nx, Nz], solved Nt times.
I pre-allocate the object containing all solutions:
sol = zeros(Nx, Nz, Nt, 'single') ;
This becomes very easily too large and I get a 'out of memory' error.
Unfortunately sparse doesn't work for N-dimensional arrays.
For the sake of the question it's not important to know the values, it goes without saying that the RAM usage grows exponentially with decreasing the grid spacing and increasing the simulation time.
I am aware that I do not need to store each time instant for the purpose of the advancement of the solution. It would be sufficient to just store the previous two time steps. However, for post-processing reasons I need to access the solution at all time-steps (or at least at a submultiple of the total number).It might help to specify that, even after the solution, the grid remains predominantly populated by zeros.
Am I fighting a lost battle or is there a more efficient way to proceed (other type of objects, vectorization...)?
Thank you.
You could store the array in sparse, linear form; that is, a column vector with length equal to the product of dimensions:
sol = sparse([], [], [], Nx*Nz*Nt, 1); % sparse column vector containing zeros
Then, instead of indexing normally,
sol(x, z, t),
you need to translate the indices x, z, t into the corresponding linear index:
For scalar indices you use
sol(x + Nx*(z-1) + Nx*Nz*(t-1))
You can define a helper function for convenience:
ind = #(sol, x, y, t) sol(x + Nx*(z-1) + Nx*Nz*(t-1))
so the indexing becomes more readable:
ind(sol, x, z, t)
For general (array) indices you need to reshape the indices along different dimensions so that implicit expansion produces the appropriate linear index:
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1))
which of course could also be encapsulated into a function.
Check that the conversion to linear indexing works (general case, using non-sparse array to compare with normal indexing):
Nx = 15; Nz = 18; Nt = 11;
sol = randi(9, Nx, Nz, Nt);
x = [5 6; 7 8]; z = 7; t = [4 9 1];
isequal(sol(x, z, t), ...
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1)))
gives
ans =
logical
1
You can create a a cell array of sparse matrices to store the results. However computations can be performed on full matrices if working with a full matrix is faster than sparse matrix and convert the full matrix to sparse matrix and place it in the cell.

Masking vector A for elements that does not match vector B

If I have a vector v = {10,9,8}, and a vector y = {10,5,7}. How can I write this so that it results in a vector x = {1,0,0}. In other Words, set ones where elements match, and zeroes if not? How would one write this in a mathematical way, or by using functional language terms like filter, map or such.
Although the question might be considered off-topic, the Kronecker delta comes to mind. If n is a nonnegative integer, and v,y in R^n, one can define the desired vector as x:={x_1,...,x_n} where x_i = delta_v_i,y_i for each i in {1,...n}.

Split arrays of natural numbers according to a requirement

I have two arrays {Ai} and {Bi} of natural numbers. The sums of all elements are equal.
I need to split each element of the two arrays into three natural numbers:
Ai = A1i + A2i + A3i
Bi = B1i + B2i + B3i
such that the sum of all elements of A1 is equal to the sum of all elements of B1 and the same for all the other pairs.
The important part I initially forgot about:
Each element from A1j, A2j, A3j should be between Aj/3-2 and Aj/3+2 or at least equal to one of these numbers
Each element from B1j, B2j, B3j should be between Bj/3-2 and Bj/3+2 or at least equal to one of these numbers
So the elements of arrays must be split in almost equal parts
I look for some more elegant solution than just calculating all possible variant for both arrays.
I look for some more elegant solution than just calculating all possible variant for both arrays.
It should be possible to divide them so that the sums of A1, A2 and A3 are near to a third of A, and the same for B. It would be easy to just make all values an exact third, but that’s not possible with natural numbers. So we have to floor the results (trivial) and distribute the remainders uniformly over the three arrays (manageable).
I don't know whether it’s the only solution, but it works in O(n) and my intuition says it will hold your invariants (though I didn’t proof it):
n = 3
for j=0 to n
A[j] = {}
x = 0 // rotating pointer for the next subarray
for i in A
part = floor(A[i] / n)
rest = A[i] % n
for j=0 to n
A[j][i] = part
// distribute the rest over the arrays, and rotate the pointer
for j=0 to rest
A[x][i]++
x++
/* Do the same for B */
One could also formulate the loop without the division, only distributing the single units (1) of an A[i] over the A[x][i]s:
n = 3
for j=0 to n
A[j] = {}
for k=0 to |A|
A[j][i] = 0
x = 0 // rotating pointer for the next subarray
for i in A
// distribute the rest over the arrays, and rotate the pointer
for j=0 to A[i]
A[x][i]++
x++
You should look up the principle of dynamic programming.
In this case, it seems to be similar to some coin change problems.
As for finding A1_i, A2_i, A3_i you should do it recursively:
def find_numbers(n, a, arr):
if arr[n] not empty:
return
if n == 0:
arr[n].append(a)
return
if a.size() > 2:
return
t = n
for each element of a:
t -= element
for i = 0 to :
find_numbers(n, append(a, i), arr)
We use arr so that we do not need to compute for each number multiple times the possible combinations. If you look at the call tree after a time this function will return the combinations from the arr, and not compute them again.
In your main call:
arr = []
for each n in A:
find_number(n, [], arr)
for each n in B:
find_number(n, [], arr)
Now you have all the combinations for each n in arr[n].
I know it is a subpart of the problem, but finding the right combinations for each A_i, B_i from arr is something really similar to this. > It is very important to read the links I gave you so that you understand the underlying theory behind.
I add the stipulation that A1, A2, and A3 must be calculated from A without knowledge of B, and, similarly, B1, B2, and B3 must be calculated without knowledge of A.
The requirement that each A1i, A2i, A3i must be in [Ai/3–2, Ai/3+2] implies that the sums of the elements of A1, A2, and A3 must each be roughly one-third that of A. The stipulation compels us to define this completely.
We will construct the arrays in any serial order (e.g., from element 0 to the last element). As we do so, we will ensure the arrays remain nearly balanced.
Let x be the next element of A to be processed. Let a be round(x/3). To account for x, we must append a total of 3•a+r to the arrays A1, A2, and A3, where r is –1, 0, or +1.
Let d be sum(A1) – sum(A)/3, where the sums are of the elements processed so far. Initially, d is zero, since no elements have been processed. By design, we will ensure d is –2/3, 0, or +2/3 at each step.
Append three values as shown below to A1, A2, and A3, respectively:
If r is –1 and d is –2/3, append a+1, a–1, a–1. This changes d to +2/3.
If r is –1 and d is 0, append a–1, a, a. This changes d to –2/3.
If r is –1 and d is +2/3, append a–1, a, a. This changes d to 0.
If r is 0, append a, a, a. This leaves d unchanged.
If r is +1 and d is –2/3, append a+1, a, a. This changes d to 0.
If r is +1 and d is 0, append a+1, a, a. This changes d to +2/3.
If r is +1 and d is +2/3, append a–1, a+1, a+1. This changes d to –2/3.
At the end, the sums of A1, A2, and A3 are uniquely determined by the sum of A modulo three. The sum of A1 is (sum(A3)–2)/3, sum(A3)/3, or (sum(A3)+2)/3 according to whether the sum of A modulo three is congruent to –1, 0, or +1, respectively.
Completing the demonstration:
In any case, a–1, a, or a+1 is appended to an array. a is round(x/3), so it differs from x/3 by less than 1, so a–1, a, and a+1 each differ from x/3 by less than 2, satisfying the constraint that the values must be in [Ai/3–2, Ai/3+2].
When B1, B2, and B3 are prepared in the same way as shown above for A1, A2, and A3, their sums are determined by the sum of B3. Since the sum of A equals the sum of B, the sums of A1, A2, and A3 equal the sums of B1, B2, and B3, respectively.

How to define Xor in Coq and prove its properties

This should be an easy question. I'm new with Coq.
I want to define the exclusive or in Coq (which to the best of my knowledge is not predefined). The important part is to allow for multiple propositions (e.g. Xor A B C D).
I also need the two properties:
(Xor A1 A2 ... An)/\~A1 -> Xor A2... An
(Xor A1 A2 ... An)/\A1 -> ~A2/\.../\~An
I'm currently having trouble defining the function for an undefined number of variables. I tried to define it by hand for two, three, four and five variables (that's how many I need). But then proving the properties is a pain and seems very inefficient.
Given your second property, I assume that your definition of exclusive or at higher arities is “exactly one of these propositions is true” (and not “an odd number of these propositions is true” or “at least one of these propositions is true and at least one is false”, which are other possible generalizations).
This exclusive or is not an associative property. This means you can't just define higher-arity xor as xor(A1,…,An)=xor(A1,xor(A2,…)). You need a global definition, and this means that the type constructor must take a list of arguments (or some other data structure, but a list is the most obvious choice).
Inductive xor : list Prop -> Prop := …
You now have two reasonable choices: build your definition of xor inductively from first principles, or invoke a list predicate. The list predicate would be “there is a unique element in the list matching this predicate”. Since the standard list library does not define this predicate, and defining it is slightly harder than defining xor, we'll build xor inductively.
The argument is a list, so let's break down the cases:
xor of an empty list is always false;
xor of the list (cons A L) is true iff either of these two conditions is met:
A is true and none of the elements of L are true;
A is false and exactly one of the elements of L is true.
This means we need to define an auxiliary predicate on lists of propositions, nand, characterizing the lists of false propositions. There are many possibilities here: fold the /\ operator, induct by hand, or call a list predicate (again, not in the standard list library). I'll induct by hand, but folding /\ is another reasonable choice.
Require Import List.
Inductive nand : list Prop -> Prop :=
| nand_nil : nand nil
| nand_cons : forall (A:Prop) L, ~A -> nand L -> nand (A::L).
Inductive xor : list Prop -> Prop :=
| xor_t : forall (A:Prop) L, A -> nand L -> xor (A::L)
| xor_f : forall (A:Prop) L, ~A -> xor L -> xor (A::L).
Hint Constructors nand xor.
The properties you want to prove are simple corollaries of inversion properties: given a constructed type, break down the possibilities (if you have a xor, it's either a xor_t or a xor_f). Here's a manual proof of the first; the second is very similar.
Lemma xor_tail : forall A L, xor (A::L) -> ~A -> xor L.
Proof.
intros. inversion_clear H.
contradiction.
assumption.
Qed.
Another set of properties you're likely to want is the equivalences between nand and the built-in conjunction. As an example, here's a proof that nand (A::nil) is equivalent to ~A. Proving that nand (A::B::nil) is equivalent to ~A/\~B and so on are merely more of the same. In the forward direction, this is once more an inversion property (analyse the possible constructors of the nand type). In the backward direction, this is a simple application of the constructors.
Lemma nand1 : forall A, nand (A::nil) <-> ~A.
Proof.
split; intros.
inversion_clear H. assumption.
constructor. assumption. constructor.
Qed.
You're also likely to need substitution and rearrangement properties at some point. Here are a few key lemmas that you may want to prove (these shouldn't be very difficult, just induct on the right stuff):
forall A1 B2 L, (A1<->A2) -> (xor (A1::L) <-> xor (A2::L))
forall K L1 L2, (xor L1 <-> xor L2) -> (xor (K++L1) <-> xor (K++L2))
forall K A B L, xor (K++A::B::L) <-> xor (K::B::A::L)
forall K L M N, xor (K++L++M++N) <-> xor (K++M++L++N)
Well, I suggest you start with Xor for 2 arguments and prove its properties.
Then if you want to generalize it you can define Xor taking a list of arguments -- you should
be able to define it and prove its properties using your 2-argument Xor.
I could give some more details but I think it's more fun to do it on your own, let me know how it goes :).

Resources