I have a lookup table in the form of a 2d array and a list of indices (in the form of two 1d arrays xs, ys) at which I would like to evaluate the lookup table. How to accomplish this in a fast manner?
It looks like a standard problem, however I found nothing about looking up array values at a general list of indices (e.g. not a cartesian product) in the docs. I tried
result = zeros((10^6,))
for i in [1:10^6]
x = xs[i]
y = ys[i]
result[i] = lookup[x, y]
end
Besides looking a bit cumbersome, this code is also 10 times slower then an equivalent numpy code. Also it looks like a standard problem, however I found nothing about looking up array values at a general list of indices (e.g. not a cartesian product) in the docs.
So what would be a fast alternative to the above code?
You can try broadcast_getindex (see http://julia.readthedocs.org/en/latest/stdlib/arrays/#Base.broadcast_getindex).
Otherwise, it looks like your code should be pretty efficient if you just change [1:10^6] to 1:10^6.
Here are the updated links for Base.getindex (see https://docs.julialang.org/en/v1/base/collections/#Base.getindex). The broadcasted implementation found here.
Related
I have a 2D array, and have computed necessary updates along a given dimension of it using a 1D array (said updates can't be computed in place as earlier calculations would override values needed in later calculations). I thus want to copy the updates into my 2D array. The most obvious way to do this would, at first glance, appear to be to use Array slicing and Array.blit.
I have tried the approach of extracting the relevant dimension using array slicing, and then blitting across to that, but that doesn't update the values inside the 2D array. I think what is happening is that a new, separate, 1D array is being created when I make the slice, and the values are being blitted into that new array, which of course is dropped a moment later when it goes back out of scope.
I suppose you could say that I was expecting the slicing to return a view into the 2D array which would work for the blit function call, but instead the slicing actually returns a new array with the values copied into it (which, thinking about it, is what slicing does otherwise, I believe).
Currently I am using a workaround whereby I create a 2D array, where one of the dimensions is only 1 element wide (thus effectively re-creating a 1D array), and then using Array2D.blit. I would prefer to do it directly though, both because I find this ugly, and moreover because it would be quite useful elsewhere in my program where I can't just declare a 1D array as 2D.
My first approach:
let srcArray = Array.zeroCreate srcArrayLength
... // do relevant computation
srcArray.[index] <- result
... // finish computation
Array.blit srcArray 0 destArray.[index, *] 0 srcArrayLength
My current approach:
let srcArray = Array2D.zeroCreate 1 srcArrayLength
... // do relevant computation
srcArray.[0,index] <- result
... // finish computation
Array2D.blit srcArray 0 0 destArray index 0 1 srcArrayLength
The former approach has no effect on my destination 2D array. The latter approach works where I use it, but as I said above it isn't nice, and cannot be used in another situation, where I have a jagged 2D array (i.e. 'a[][]) that I would like to blit across from.
How might I go about achieiving my aim? I thought of Span/Memory, but it wasn't clear to me if and how they could be used here. Alternatively, if you can spot a better way to do this that doesn't involve blit, I'm all-virtual-ears.
I figured out a fairly good solution to this, with the help of someone over in the F# Foundation Slack. Since nobody else has posted an answer, I'll put this one up.
Both Array.Copy (note that that is the .NET Array.Copy method, not the F#-specific Array.copy) and Buffer.BlockCopy were suggested to me. Array.Copy still complains about mismatching array types, but Buffer.BlockCopy ignores the dimensionality of the supplied array, and merely copies the specified number of bytes from one location to another. Using this and relying on the fact that 2D arrays are really stored as 1D arrays in row-major order (the same as C, I believe), it is quite possible to overwrite the last dimension of a multi-dimensional array reasonably cleanly.
I updated the code from the 'current approach' in my question to the below:
let srcArray = Array.zeroCreate srcArrayLength
... //do relevant computation
srcArray.[index] <- result
... //finish computation
Buffer.BlockCopy(srcArray, 0, destArray, firstDimIndex * lengthOfSecondDim * sizeof<'a>, lengthOfSecondDim * sizeof<'a>
Not only does it do the job in a way which I personally find a bit tidier, but it has a side-benefit in that it is noticeably faster than the second approach described in the question - I haven't yet run a benchmark to quantify the difference though.
After reading this I wrote a naive attempt to produce this
col1
---------
1
4
7
from this
ARRAY[[1,2,3], [4,5,6], [7,8,9]]
This works
SELECT unnest((ARRAY[[1,2,3], [4,5,6], [7,8,9]])[1:3][1:1]);
But I in my case, I don't know the length of the outer array.
So is there a way to hack together the slice "string" to take into account this variability?
Here was my attempt. I know, it's a bit funny
_ids := _ids_2D[('1:' || array_length(_ids_2D, 1)::text)::int][1:1];
As you can see, I just want to create the effect of [1:n]. Obviously '1:3' ain't going to parse nicely into what the array slice needs.
I could obviously use something like the unnest_2d_1d Erwin mentions in the answer linked above, but hoping for something more elegant.
If you are trying to get the first element of all nested (2nd dimension) arrays inside an array (1st dimension) then you may use
array_upper(anyarray, 1)
to get all elements of a specific dimension
anyarray[1:array_upper(anyarray, 1)][<dimension num>:<dimension num>]
e.g, to get all elements of the first dimension
anyarray[1:array_upper(anyarray, 1)][1:1]
as in the code above. Please refer to PostgreSQL manual section on Arrays for more information.
I am trying to iterate over the rows of a DataFrame in Julia to generate a new column for the data frame. I haven't come across a clear example of how to do this. In R this type of thing is vectorized but from my understanding not all of Julia's operations are vectorized so I need to loop over the rows. I know I can do this with indexing but I believe there must be a better way. I want to be able to reference the column values by name. Here is that I have:
test_df = DataFrame( A = [1,2,3,4,5], B = [2,3,4,5,6])
test_df["C"] = [ test_df[i,"A"] * test_df[i,"B"] for i in 1:size(test_df,1)]
Is this the Julia/DataFrames way of doing this? Is there a more Julia-eque way of doing this? Thanks for any feedback.
You'd be better off doing test_df[i,"A"] .* test_df[i,"B"]. In general, Julia uses a dot prefix to indicate operations that are elementwise. All of these element-wise operations are vectorized.
You also don't want to use an Array comprehension since you probably want a DataArray as your output. There are no DataArray comprehensions for now since comprehensions are built into the Julia parser, which makes them hard to override in libraries like DataArrays.jl.
The better, and already vectorized wa, to do what you want in your example would be
test_df[!, "C"] = test_df["A"] .* test_df["B"]
Now if for some reason you can't vectorize your operations and you really want to loop over rows (unlikely...) Then you can do as follows:
for row in eachrow( test_df )
# do something with row which is of type DataFrameRow
end
If you need the row index do
for (i, row) in enumerate( eachrow( test_df ) )
# do something with row and i
end
Is there a way to work with C-ordered or non-contiguous arrays natively in Julia?
For example, when using NumPy, C-ordered arrays are the default, but I can initialize a Fortran ordered array and do computations with that as well.
One easy way to do this was to take the Transpose of a matrix.
I can also work with non-contiguous arrays that are made via slicing.
I have looked through the documentation, etc. and can't find a way to make, declare, or work with a C-ordered array in Julia.
The transpose appears to return a copy.
Does Julia allow a user to work with C-ordered and non-contiguous arrays?
Is there currently any way to get a transpose or a slice without taking a copy?
Edit: I have found how to do slicing.
Currently it is available as a different type called a SubArray.
As an example, I could do the following to get the first row of a 100x100 array A
sub(A, 1, 1:100)
It looks like there are plans to improve this, as can be seen in https://github.com/JuliaLang/julia/issues/5513
This still leaves open the question of C-ordered arrays.
Is there an interface for C-ordered arrays?
Is there a way to do a transpose via a view instead of a copy?
Naturally, there's nothing that prevents you from working with row-major arrays as a chunk of memory, and certain packages (like Images.jl) support arbitrary ordering of arbitrary-dimensional arrays.
Presumably the main issue you're wondering about is linear algebra. Currently I don't know of anything out-of-the-box, but note that matrix multiplication in Julia is implemented through a series of functions with names like A_mul_B, At_mul_B, Ac_mul_Bc, etc, where t means transpose and c means conjugate. The parser replaces expressions like A'*b with Ac_mul_B(A, b) without actually taking the transpose.
Consequently, you could implement a RowMajorMatrix <: AbstractArray type yourself, and set up special multiplication rules:
A_mul_B(A::RowMajorMatrix, B::RowMajorMatrix) = At_mul_Bt(A, B)
A_mul_B(A::RowMajorMatrix, B::AbstractArray) = At_mul_B(A, B)
A_mul_B(A::AbstractArray, B::RowMajorMatrix) = A_mul_Bt(A, B)
etc. In addition to these two-argument versions, there are 3-argument versions (like A_mul_B!) that store the result in a pre-allocated output; you'd need to implement those, too. Finally, you'd also have to set up appropriate show methods (to display them appropriately), size methods, etc.
Finally, Julia's transpose function has been implemented in a cache-friendly manner, so it's quite a lot faster than the naive
for j = 1:n, i = 1:m
At[j,i] = A[i,j]
end
Consequently there are occasions where it's not worth worrying about creating custom implementations of algorithms, and you can just call transpose.
If you implement something like this, I'd encourage you to contribute it as a package, as it's likely that others may be interested.
I'm testing an arbitrarily-large, arbitrarily-dimensioned array of logicals, and I'd like to find out if any one or more of them are true. any() only works on a single dimension at a time, as does sum(). I know that I could test the number of dimensions and repeat any() until I get a single answer, but I'd like a quicker, and frankly, more-elegant, approach.
Ideas?
I'm running 2009a (R17, in the old parlance, I think).
If your data is in a matrix A, try this:
anyAreTrue = any(A(:));
EDIT: To explain a bit more for anyone not familiar with the syntax, A(:) uses the colon operator to take the entire contents of the array A, no matter what the dimensions, and reshape them into a single column vector (of size numel(A)-by-1). Only one call to ANY is needed to operate on the resulting column vector.
As pointed out, the correct solution is to reshape the result into a vector. Then any will give the desired result. Thus,
any(A(:))
gives the global result, true if any of numel(A) elements were true. You could also have used
any(reshape(A,[],1))
which uses the reshape operator explicitly. If you don't wish to do the extra step of converting your matrices into vectors to apply any, then another approach is to write a function of your own. For example, here is a function that would do it for you:
======================
function result = myany(A)
% determines if any element at all in A was non-zero
result = any(A(:));
======================
Save this as an m-file on your search path. The beauty of MATLAB (true for any programming language) is it is fully extensible. If there is some capability that you wish it had, just write a little idiom that does it. If you do this often enough, you will have customized the environment to fit your needs.