Masking vector A for elements that does not match vector B

Masking vector A for elements that does not match vector B - arrays

If I have a vector v = {10,9,8}, and a vector y = {10,5,7}. How can I write this so that it results in a vector x = {1,0,0}. In other Words, set ones where elements match, and zeroes if not? How would one write this in a mathematical way, or by using functional language terms like filter, map or such.

Although the question might be considered off-topic, the Kronecker delta comes to mind. If n is a nonnegative integer, and v,y in R^n, one can define the desired vector as x:={x_1,...,x_n} where x_i = delta_v_i,y_i for each i in {1,...n}.

Related

Pairwise comparisons of a large amount of sorted arrays

Suppose I have n sorted integer arrays (a_1, ..., a_n, there may be duplicated elements in a single array), and T is a threshold value between 0 and 1. I would like to find all pairs of arrays the similarity of which is larger than T. The similarity of array a_j w.r.t. array a_i is defined as follows:
sim(i, j) = intersection(i, j) / length(i)
where intersection(i, j) returns the number of elements shared in a_i and a_j, and length(i) returns the length of array a_i.
I can enumerate all pairs of arrays and compute the similarity value, but this takes too much time for a large n (say n=10^5). Is there any data structure, pruning strategy, or other techniques that can reduce the time cost of this procedure? I'm using Java so the technique should be easily applicable in Java.

There are (n^2 - n)/2 pairs of arrays. If n=10^5, then you have to compute the similarity of 5 billion pairs of arrays. That's going to take some time.
One potential optimization is to shortcut your evaluation of two arrays if it becomes clear that you won't reach T. For example, if T is 0.5, you've examined more than half of the array and haven't found any intersections, then it's clear that that pair of arrays won't meet the threshold. I don't expect this optimization to gain you much.
It might be possible to make some inferences based on prior results. That is, if sim(1,2) = X and sim(1,3) < T, there's probably a value of X (likely would have to be very high) at which you can say definitively that sim(2,3) < T.

Efficiently store an N-dimensional array of mostly zeros in Matlab

I implemented a finite differences algorithm to solve a PDE.
The grid is a structured 2D domain of size [Nx, Nz], solved Nt times.
I pre-allocate the object containing all solutions:
sol = zeros(Nx, Nz, Nt, 'single') ;
This becomes very easily too large and I get a 'out of memory' error.
Unfortunately sparse doesn't work for N-dimensional arrays.
For the sake of the question it's not important to know the values, it goes without saying that the RAM usage grows exponentially with decreasing the grid spacing and increasing the simulation time.
I am aware that I do not need to store each time instant for the purpose of the advancement of the solution. It would be sufficient to just store the previous two time steps. However, for post-processing reasons I need to access the solution at all time-steps (or at least at a submultiple of the total number).It might help to specify that, even after the solution, the grid remains predominantly populated by zeros.
Am I fighting a lost battle or is there a more efficient way to proceed (other type of objects, vectorization...)?
Thank you.

You could store the array in sparse, linear form; that is, a column vector with length equal to the product of dimensions:
sol = sparse([], [], [], Nx*Nz*Nt, 1); % sparse column vector containing zeros
Then, instead of indexing normally,
sol(x, z, t),
you need to translate the indices x, z, t into the corresponding linear index:
For scalar indices you use
sol(x + Nx*(z-1) + Nx*Nz*(t-1))
You can define a helper function for convenience:
ind = #(sol, x, y, t) sol(x + Nx*(z-1) + Nx*Nz*(t-1))
so the indexing becomes more readable:
ind(sol, x, z, t)
For general (array) indices you need to reshape the indices along different dimensions so that implicit expansion produces the appropriate linear index:
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1))
which of course could also be encapsulated into a function.
Check that the conversion to linear indexing works (general case, using non-sparse array to compare with normal indexing):
Nx = 15; Nz = 18; Nt = 11;
sol = randi(9, Nx, Nz, Nt);
x = [5 6; 7 8]; z = 7; t = [4 9 1];
isequal(sol(x, z, t), ...
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1)))
gives
ans =
logical
1

You can create a a cell array of sparse matrices to store the results. However computations can be performed on full matrices if working with a full matrix is faster than sparse matrix and convert the full matrix to sparse matrix and place it in the cell.

Is sparse tensor multiplication implemented in TensorFlow?

Multiplication of sparse tensors with themselves or with dense tensors does not seem to work in TensorFlow. The following example
from __future__ import print_function
import tensorflow as tf
x = tf.constant([[1.0,2.0],
[3.0,4.0]])
y = tf.SparseTensor(indices=[[0,0],[1,1]], values=[1.0,1.0], shape=[2,2])
z = tf.matmul(x,y)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run([x, y, z]))
fails with the error message
TypeError: Input 'b' of 'MatMul' Op has type string that does not match type
float32 of argument 'a'
Both tensors have values of type float32 as seen by evaluating them without the multiplication op. Multiplication of y with itself returns a similar error message. Multipication of x with itself works fine.

General-purpose multiplication for tf.SparseTensor is not currently implemented in TensorFlow. However, there are three partial solutions, and the right one to choose will depend on the characteristics of your data:
If you have a tf.SparseTensor and a tf.Tensor, you can use tf.sparse_tensor_dense_matmul() to multiply them. This is more efficient than the next approach if one of the tensors is too large to fit in memory when densified: the documentation has more guidance about how to decide between these two methods. Note that it accepts a tf.SparseTensor as the first argument, so to solve your exact problem you will need to use the adjoint_a and adjoint_b arguments, and transpose the result.
If you have two sparse tensors and need to multiply them, the simplest (if not the most performant) way is to convert them to dense and use tf.matmul:
a = tf.SparseTensor(...)
b = tf.SparseTensor(...)
c = tf.matmul(tf.sparse_tensor_to_dense(a, 0.0),
tf.sparse_tensor_to_dense(b, 0.0),
a_is_sparse=True, b_is_sparse=True)
Note that the optional a_is_sparse and b_is_sparse arguments mean that "a (or b) has a dense representation but a large number of its entries are zero", which triggers the use of a different multiplication algorithm.
For the special case of sparse vector by (potentially large and sharded) dense matrix multiplication, and the values in the vector are 0 or 1, the tf.nn.embedding_lookup operator may be more appropriate. This tutorial discusses when you might use embeddings and how to invoke the operator in more detail.
For the special case of sparse matrix by (potentially large and sharded) dense matrix, tf.nn.embedding_lookup_sparse() may be appropriate. This function accepts one or two tf.SparseTensor objects, with sp_ids representing the non-zero values, and the optional sp_weights representing their values (which otherwise default to one).

Recently, tf.sparse_tensor_dense_matmul(...) was added that allows multiplying a sparse matrix by a dense matrix.
https://www.tensorflow.org/versions/r0.9/api_docs/python/sparse_ops.html#sparse_tensor_dense_matmul
https://github.com/tensorflow/tensorflow/issues/1241

In TF2.4.1 you can use the methods in tensorflow.python.ops.linalg.sparse.sparse_csr_matrix_ops to multiply to arbitrary SparseTensor (I think up to 3 dimensions).
Something like the following should be used (in general you turn the sparse tensors into a CSR representation)
import tensorflow as tf
from tensorflow.python.ops.linalg.sparse import sparse_csr_matrix_ops
def tf_multiply(a: tf.SparseTensor, b: tf.SparseTensor):
a_sm = sparse_csr_matrix_ops.sparse_tensor_to_csr_sparse_matrix(
a.indices, a.values, a.dense_shape
)
b_sm = sparse_csr_matrix_ops.sparse_tensor_to_csr_sparse_matrix(
b.indices, b.values, b.dense_shape
)
c_sm = sparse_csr_matrix_ops.sparse_matrix_sparse_mat_mul(
a=a_sm, b=b_sm, type=tf.float32
)
c = sparse_csr_matrix_ops.csr_sparse_matrix_to_sparse_tensor(
c_sm, tf.float32
)
return tf.SparseTensor(
c.indices, c.values, dense_shape=c.dense_shape
)
For a while I was prefering scipy multiplication (via a py_function) because this multiplication in TF (2.3 and 2.4) was not performing as well as scipy. I tried again recently and, either I changed something on my code, or there was some fix in 2.4.1 that makes the TF sparse multiplication faster that using scipy, in both CPU and GPU.

It seems that
tf.sparse_matmul(
a,
b,
transpose_a=None,
transpose_b=None,
a_is_sparse=None,
b_is_sparse=None,
name=None
)
is not for multiplication of two SparseTensors.
a and b are Tensors not SparseTensors. And I have tried that, it is not working with SparseTensors.

tf.sparse_matmul is for multiplying two dense tensor not sparse type of data structure. That function is just an optimized version of tensor multiplication if the given matrix (or both two matrixes) have many zero value. Again it does not accept sparse tensor data type. It accepts dense tensor data type. It might fasten your calculations if values are mostly zero.
as far as I know there is no implementation of two sparse type tensor muliplication. but just one sparse one dense which is tf.sparse_tensor_dense_matmul(x, y) !

To make the answer more complete:
tf.sparse_matmul(
a,
b,
transpose_a=None,
transpose_b=None,
a_is_sparse=None,
b_is_sparse=None,
name=None
)
exists as well:
https://www.tensorflow.org/api_docs/python/tf/sparse_matmul

fast algorithm of finding sums in array

I am looking for a fast algorithm:
I have a int array of size n, the goal is to find all patterns in the array that
x1, x2, x3 are different elements in the array, such that x1+x2 = x3
For example I know there's a int array of size 3 is [1, 2, 3] then there's only one possibility: 1+2 = 3 (consider 1+2 = 2+1)
I am thinking about implementing Pairs and Hashmaps to make the algorithm fast. (the fastest one I got now is still O(n^2))
Please share your idea for this problem, thank you

Edit: The answer below applies to a version of this problem in which you only want one triplet that adds up like that. When you want all of them, since there are potentially at least O(n^2) possible outputs (as pointed out by ex0du5), and even O(n^3) in pathological cases of repeated elements, you're not going to beat the simple O(n^2) algorithm based on hashing (mapping from a value to the list of indices with that value).
This is basically the 3SUM problem. Without potentially unboundedly large elements, the best known algorithms are approximately O(n^2), but we've only proved that it can't be faster than O(n lg n) for most models of computation.
If the integer elements lie in the range [u, v], you can do a slightly different version of this in O(n + (v-u) lg (v-u)) with an FFT. I'm going to describe a process to transform this problem into that one, solve it there, and then figure out the answer to your problem based on this transformation.
The problem that I know how to solve with FFT is to find a length-3 arithmetic sequence in an array: that is, a sequence a, b, c with c - b = b - a, or equivalently, a + c = 2b.
Unfortunately, the last step of the transformation back isn't as fast as I'd like, but I'll talk about that when we get there.
Let's call your original array X, which contains integers x_1, ..., x_n. We want to find indices i, j, k such that x_i + x_j = x_k.
Find the minimum u and maximum v of X in O(n) time. Let u' be min(u, u*2) and v' be max(v, v*2).
Construct a binary array (bitstring) Z of length v' - u' + 1; Z[i] will be true if either X or its double [x_1*2, ..., x_n*2] contains u' + i. This is O(n) to initialize; just walk over each element of X and set the two corresponding elements of Z.
As we're building this array, we can save the indices of any duplicates we find into an auxiliary list Y. Once Z is complete, we just check for 2 * x_i for each x_i in Y. If any are present, we're done; otherwise the duplicates are irrelevant, and we can forget about Y. (The only situation slightly more complicated is if 0 is repeated; then we need three distinct copies of it to get a solution.)
Now, a solution to your problem, i.e. x_i + x_j = x_k, will appear in Z as three evenly-spaced ones, since some simple algebraic manipulations give us 2*x_j - x_k = x_k - 2*x_i. Note that the elements on the ends are our special doubled entries (from 2X) and the one in the middle is a regular entry (from X).
Consider Z as a representation of a polynomial p, where the coefficient for the term of degree i is Z[i]. If X is [1, 2, 3, 5], then Z is 1111110001 (because we have 1, 2, 3, 4, 5, 6, and 10); p is then 1 + x + x2 + x3 + x4 + x5 + x9.
Now, remember from high school algebra that the coefficient of xc in the product of two polynomials is the sum over all a, b with a + b = c of the first polynomial's coefficient for xa times the second's coefficient for xb. So, if we consider q = p2, the coefficient of x2j (for a j with Z[j] = 1) will be the sum over all i of Z[i] * Z[2*j - i]. But since Z is binary, that's exactly the number of triplets i,j,k which are evenly-spaced ones in Z. Note that (j, j, j) is always such a triplet, so we only care about ones with values > 1.
We can then use a Fast Fourier Transform to find p2 in O(|Z| log |Z|) time, where |Z| is v' - u' + 1. We get out another array of coefficients; call it W.
Loop over each x_k in X. (Recall that our desired evenly-spaced ones are all centered on an element of X, not 2*X.) If the corresponding W for twice this element, i.e. W[2*(x_k - u')], is 1, we know it's not the center of any nontrivial progressions and we can skip it. (As argued before, it should only be a positive integer.)
Otherwise, it might be the center of a progression that we want (so we need to find i and j). But, unfortunately, it might also be the center of a progression that doesn't have our desired form. So we need to check. Loop over the other elements x_i of X, and check if there's a triple with 2*x_i, x_k, 2*x_j for some j (by checking Z[2*(x_k - x_j) - u']). If so, we have an answer; if we make it through all of X without a hit, then the FFT found only spurious answers, and we have to check another element of W.
This last step is therefore O(n * 1 + (number of x_k with W[2*(x_k - u')] > 1 that aren't actually solutions)), which is maybe possibly O(n^2), which is obviously not okay. There should be a way to avoid generating these spurious answers in the output W; if we knew that any appropriate W coefficient definitely had an answer, this last step would be O(n) and all would be well.
I think it's possible to use a somewhat different polynomial to do this, but I haven't gotten it to actually work. I'll think about it some more....
Partially based on this answer.

It has to be at least O(n^2) as there are n(n-1)/2 different sums possible to check for other members. You have to compute all those, because any pair summed may be any other member (start with one example and permute all the elements to convince yourself that all must be checked). Or look at fibonacci for something concrete.
So calculating that and looking up members in a hash table gives amortised O(n^2). Or use an ordered tree if you need best worst-case.

You essentially need to find all the different sums of value pairs so I don't think you're going to do any better than O(n2). But you can optimize by sorting the list and reducing duplicate values, then only pairing a value with anything equal or greater, and stopping when the sum exceeds the maximum value in the list.

How to map a long integer number to a N-dimensional vector of smaller integers (and fast inverse)?

Given a N-dimensional vector of small integers is there any simple way to map it with one-to-one correspondence to a large integer number?
Say, we have N=3 vector space. Can we represent a vector X=[(int16)x1,(int16)x2,(int16)x3] using an integer (int48)y? The obvious answer is "Yes, we can". But the question is: "What is the fastest way to do this and its inverse operation?"
Will this new 1-dimensional space possess some very special useful properties?

For the above example you have 3 * 32 = 96 bits of information, so without any a priori knowledge you need 96 bits for the equivalent long integer.
However, if you know that your x1, x2, x3, values will always fit within, say, 16 bits each, then you can pack them all into a 48 bit integer.
In either case the technique is very simple you just use shift, mask and bitwise or operations to pack/unpack the values.

Just to make this concrete, if you have a 3-dimensional vector of 8-bit numbers, like this:
uint8_t vector[3] = { 1, 2, 3 };
then you can join them into a single (24-bit number) like so:
uint32_t all = (vector[0] << 16) | (vector[1] << 8) | vector[2];
This number would, if printed using this statement:
printf("the vector was packed into %06x", (unsigned int) all);
produce the output
the vector was packed into 010203
The reverse operation would look like this:
uint8_t v2[3];
v2[0] = (all >> 16) & 0xff;
v2[1] = (all >> 8) & 0xff;
v2[2] = all & 0xff;
Of course this all depends on the size of the individual numbers in the vector and the length of the vector together not exceeding the size of an available integer type, otherwise you can't represent the "packed" vector as a single number.

If you have sets Si, i=1..n of size Ci = |Si|, then the cartesian product set S = S1 x S2 x ... x Sn has size C = C1 * C2 * ... * Cn.
This motivates an obvious way to do the packing one-to-one. If you have elements e1,...,en from each set, each in the range 0 to Ci-1, then you give the element e=(e1,...,en) the value e1+C1*(e2 + C2*(e3 + C3*(...Cn*en...))).
You can do any permutation of this packing if you feel like it, but unless the values are perfectly correlated, the size of the full set must be the product of the sizes of the component sets.
In the particular case of three 32 bit integers, if they can take on any value, you should treat them as one 96 bit integer.
If you particularly want to, you can map small values to small values through any number of means (e.g. filling out spheres with the L1 norm), but you have to specify what properties you want to have.
(For example, one can map (n,m) to (max(n,m)-1)^2 + k where k=n if n<=m and k=n+m if n>m--you can draw this as a picture of filling in a square like so:
1 2 5 | draw along the edge of the square this way
4 3 6 v
8 7
if you start counting from 1 and only worry about positive values; for integers, you can spiral around the origin.)

I'm writing this without having time to check details, but I suspect the best way is to represent your long integer via modular arithmetic, using k different integers which are mutually prime. The original integer can then be reconstructed using the Chinese remainder theorem. Sorry this is a bit sketchy, but hope it helps.

To expand on Rex Kerr's generalised form, in C you can pack the numbers like so:
X = e[n];
X *= MAX_E[n-1] + 1;
X += e[n-1];
/* ... */
X *= MAX_E[0] + 1;
X += e[0];
And unpack them with:
e[0] = X % (MAX_E[0] + 1);
X /= (MAX_E[0] + 1);
e[1] = X % (MAX_E[1] + 1);
X /= (MAX_E[1] + 1);
/* ... */
e[n] = X;
(Where MAX_E[n] is the greatest value that e[n] can have). Note that these maximum values are likely to be constants, and may be the same for every e, which will simplify things a little.
The shifting / masking implementations given in the other answers are a generalisation of this, for cases where the MAX_E + 1 values are powers of 2 (and thus the multiplication and division can be done with a shift, the addition with a bitwise-or and the modulus with a bitwise-and).

There is some totally non portable ways to make this real fast using packed unions and direct accesses to memory. That you really need this kind of speed is suspicious. Methods using shifts and masks should be fast enough for most purposes. If not, consider using specialized processors like GPU for wich vector support is optimized (parallel).
This naive storage does not possess any usefull property than I can foresee, except you can perform some computations (add, sub, logical bitwise operators) on the three coordinates at once as long as you use positive integers only and you don't overflow for add and sub.
You'd better be quite sure you won't overflow (or won't go negative for sub) or the vector will become garbage.

#include <stdint.h> // for uint8_t
long x;
uint8_t * p = &x;
or
union X {
long L;
uint8_t A[sizeof(long)/sizeof(uint8_t)];
};
works if you don't care about the endian. In my experience compilers generate better code with the union because it doesn't set of their "you took the address of this, so I must keep it in RAM" rules as quick. These rules will get set off if you try to index the array with stuff that the compiler can't optimize away.
If you do care about the endian then you need to mask and shift.

I think what you want can be solved using multi-dimensional space filling curves. The link gives a lot of references on this, which in turn give different methods and insights. Here's a specific example of an invertible mapping. It works for any dimension N.
As for useful properties, these mappings are related to Gray codes.
Hard to say whether this was what you were looking for, or whether the "pack 3 16-bit ints into a 48-bit int" does the trick for you.