merge two sorted arrays in julia - arrays

Is there a neat function in julia which will merge two sorted arrays and return the sorted array for me? I have written:
c=1
p=1
i=1
n=length(tc)+length(tp)
t=Array{Float64}(n)
while(c<=length(tc) && p<=length(tp))
if(tp[p]<tc[c])
t[i]=tp[p]
p=p+1;
i=i+1;
else
t[i]=tc[c]
c=c+1;
i=i+1;
end
end
while(p<=length(tp))
t[i]=tp[p]
i=i+1
p=p+1
end
while(c<=length(tc))
t[i]=tc[c]
i=i+1
c=c+1
end
but is there no native function in base julia to do this?

Contrary to the other answers, there is in fact a method to do this in base Julia. BUT, it only works for arrays of integers, AND it will only work if the arrays are unique (in the sense that no integer is repeated in either array). Simply use the IntSet type as follows:
a = [2, 3, 4, 8]
b = [1, 5]
union(IntSet(a), IntSet(b))
If you run the above code, you'll note that the union function removes duplicates from the output, which is why I stated initially that your arrays must be unique (or else you must be happy to have duplicates removed in the output). You'll also notice that the union operation on the IntSet works much faster than union on a sorted Vector{Int}, since the former exploits the fact that an IntSet is pre-sorted.
Of course, the above is not really in the spirit of the question, which more concerns a solution for any type for which the lt operator is defined, as well as allowing for duplicates.
Here is a function that efficiently finds the union of two pre-sorted unique vectors. I've never had a need for the non-unique case myself so have not written a function that covers that case I'm afraid:
"union <- Return the union of the inputs as a new sorted vector"
function union_vec(x::Vector{T}, y::Vector{T})::Vector{T} where {T}
(nx, ny) = (1, 1)
z = T[]
while nx <= length(x) && ny <= length(y)
if x[nx] < y[ny]
push!(z, x[nx])
nx += 1
elseif y[ny] < x[nx]
push!(z, y[ny])
ny += 1
else
push!(z, x[nx])
nx += 1
ny += 1
end
end
if nx <= length(x)
[ push!(z, x[n]) for n = nx:length(x) ]
elseif ny <= length(y)
[ push!(z, y[n]) for n = ny:length(y) ]
end
return z
end
Another option is to look at sorted dictionaries, available in the DataStructures.jl package. I haven't done it myself, but a method that just inserts all observations into a sorted dictionary (checking for key duplication as you go) and then iterates over (keys, values) should also be a fairly efficient way to attack this problem.

Although an explicit function to merge two sorted vectors seems to be missing, one can be constructed easily from the existing building blocks (the question actually demonstrated this, but it doesn't define a function).
The following method tries to leverage the existing sort code and still remain efficient.
In code:
mergesorted(a,b) = sort!(vcat(a,b))
The following is an example:
julia> a = [1:2:11...];
julia> b = [2:3:20...];
julia> show(a)
[1,3,5,7,9,11]
julia> show(b)
[2,5,8,11,14,17,20]
julia> show(mergesorted(a,b))
[1,2,3,5,5,7,8,9,11,11,14,17,20]
I didn't benchmark the function, but QuickSort (the default sort algorithm) is usually good performing on pre-sorted arrays, so it should be OK and the allocation of a result vector is required in any implementation.

I keep coming across this in different projects, so I made a package MergeSorted (https://github.com/vvjn/MergeSorted.jl). You can use it as follows.
using MergeSorted
a = sort!(rand(1000))
b = sort!(rand(1000))
c = mergesorted(a,b)
sort!(vcat(a,b)) == c
Or without allocating new memory.
mergesorted!(c, a, b)
You can also use all of the sort options.
a = sort!(rand(1000), order=Base.Reverse)
b = sort!(rand(1000), order=Base.Reverse)
c = mergesorted(a,b, order=Base.Reverse)
sort!(vcat(a,b), order=Base.Reverse) == c
It's around 4-6 times faster than sort!(vcat(a,b)), which uses QuickSort by default, and twice as fast as sort!(vcat(a,b), alg=MergeSort) but MergeSort uses more memory.

No, such function does not exist. And actually I have not seen a language which has such function out of the box.
To do this, you have to maintain two pointers in each of the arrays, compare the values and move the smaller (based on what I see, this is exactly what you do).

Related

How do I split results into separate variables in matlab?

I'm pretty new to matlab, so I'm guessing there is some shortcut way to do this but I cant seem to find it
results = eqs\soltns;
A = results(1);
B = results(2);
C = results(3);
D = results(4);
E = results(5);
F = results(6);
soltns is a 6x1 vector and eqs is a 6x6 matrix, and I want the results of the operation in their own separate variables. It didn't let me save it like
[A, B, C, D, E, F] = eqs\soltns;
Which I feel like would make sense, but it doesn't work.
Up to now, I have never come across a MATLAB function doing this directly (but maybe I'm missing something?). So, my solution would be to write a function distribute on my own.
E.g. as follows:
result = [ 1 2 3 4 5 6 ];
[A,B,C,D,E,F] = distribute( result );
function varargout = distribute( vals )
assert( nargout <= numel( vals ), 'To many output arguments' )
varargout = arrayfun( #(X) {X}, vals(:) );
end
Explanation:
nargout is special variable in MATLAB function calls. Its value is equal to the number of output parameters that distribute is called with. So, the check nargout <= numel( vals ) evaluates if enough elements are given in vals to distribute them to the output variables and raises an assertion otherwise.
arrayfun( #(X) {X}, vals(:) ) converts vals to a cell array. The conversion is necessary as varargout is also a special variable in MATLAB's function calls, which must be a cell array.
The special thing about varargout is that MATLAB assigns the individual cells of varargout to the individual output parameters, i.e. in the above call to [A,B,C,D,E,F] as desired.
Note:
In general, I think such expanding of variables is seldom useful. MATLAB is optimized for processing of arrays, separating them to individual variables often only complicates things.
Note 2:
If result is a cell array, i.e. result = {1,2,3,4,5,6}, MATLAB actually allows to split its cells by [A,B,C,D,E,F] = result{:};
One way as long as you know the size of results in advance:
results = num2cell(eqs\soltns);
[A,B,C,D,E,F] = results{:};
This has to be done in two steps because MATLAB does not allow for indexing directly the results of a function call.
But note that this method is hard to generalize for arbitrary sizes. If the size of results is unknown in advance, it would probably be best to leave results as a vector in your downstream code.

Array of sets in Matlab

Is there a way to create an array of sets in Matlab.
Eg: I have:
a = ones(10,1);
b = zeros(10,1);
I need c such that c = [(1,0); (1,0); ...], i.e. each set in c has first element from a and 2nd element from b with the corresponding index.
Also is there some way I can check if an unknown set (x,y) is in c.
Can you all please help me out? I am a Matlab noob. Thanks!
There are not sets in your understanding in MATLAB (I assume that you are thinking of tuples on Python...) But there are cells in MATLAB. That is a data type that can store pretty much anything (you may think of pointers if you are familiar with the concept). It is indicated by using { }.
Knowing this, you could come up with a cell of arrays and check them using cellfun
% create a cell of numeric arrays
C = {[1,0],[0,2],[99,-1]}
% check which input is equal to the array [1,0]
lg = cellfun(#(x)isequal(x,[1,0]),C)
Note that you access the address of a cell with () and the content of a cell with {}. [] always indicate arrays of something. We come to this in a moment.
OK, this was the part that you asked for; now there is a bonus:
That you use the term set makes me feel that they always have the same size. So why not create an array of arrays (or better an array of vectors, which is a matrix) and check this matrix column-wise?
% array of vectors (there is a way with less brackets but this way is clearer):
M = [[1;0],[0;2],[99;-1]]
% check at which column (direction ",1") all rows are equal to the proposed vector:
lg = all(M == [0;2],1)
This way is a bit clearer, better in terms of memory and faster.
Note that both variables lg are arrays of logicals. You can use them directly to index the original variable, i.e. M(:,lg) and C{lg} returns the set that you are looking for.
If you would like to get logical value regarding if p is in C, maybe you can try the code below
any(sum((C-p).^2,2)==0)
or
any(all(C-p==0,2))
Example
C = [1,2;
3,-1;
1,1;
-2,5];
p1 = [1,2];
p2 = [1,-2];
>> any(sum((C-p1).^2,2)==0) # indicating p1 is in C
ans = 1
>> any(sum((C-p2).^2,2)==0) # indicating p2 is not in C
ans = 0
>> any(all(C-p1==0,2))
ans = 1
>> any(all(C-p2==0,2))
ans = 0

fastest method to multiply every element of array with a number in Scala

I am having a large array and I want to to multiply every element of array with a given number N.
I can do this in following way
val arr = Array.fill(100000)(math.random)
val N = 5.0
val newArr = arr.map ( _ * N )
So this will return me new array as i want. An other way could be
def demo (arr :Array [Double] , value : Double ) : Array[Double] ={
var res : Array[Double] = Array()
if ( arr.length == 1 )
res = Array ( arr.head + value )
else
res = demo ( arr.slice(0, arr.length/2) , value ) ++ demo ( arr.slice ( arr.length / 2 , arr.length ) , value )
res
}
I my case I have larger array and I have to perform this operation for Thousands of iterations. I want to ask is there any faster way to get same output? Will tail recursion will increase speed? Or any other technique?
Presumably you mean arr.head * value.
Neither of these are "faster" in big-O terms; they're both O(N), which makes sense because it's right there in the description: you need to multiply "every" number by some constant.
The only difference is that in the second case you spend a bunch of time slicing and concatenating arrays. So the first one is likely going to be faster. Tail recursion isn't going to help you because this isn't a recursive problem. All you need to do is loop once through the array.
If you have a really huge number of numbers in your array, you could parallelize the multiplication across the available processors (if more than one) by using arr.par.map.
Tail recursion will be better than regular recursion if you're writing your own function to recursively loop over the list, since tail recursion doesn't fall into Stack issues like regular recursion does.
arr.map(_ * N) should be fine for you though.
Whatever you do, try not to use var. Mutable variables are a code smell in Scala.
Also, when you're dealing with thousands of values it might be worth looking into different collection types like Vector over Array; different collections are efficient at different things. For more information on the performance of collections, check out the official Scala Collections Performance Characteristics page.
For multiplying the elements of an array with a number N, the time complexity will be O(N) as you will have to evaluate all the elements of the array.
arr.map ( _ * N )
IMHO the above code is the optimal solution evaluating it in O(N). But now if your array is very huge, I would recommend you to convert the list to a stream and then perform your transformation.
For example
arr.toStream.map{ _ * 2}

Julia: three dimensional arrays (performance)

Going thought the Julia's performance tips I haven't found any suggestions regarding how to speed up a code with three dimensional arrays.
From my understanding d-element Array{Array{Float64,2},1} would perform best when d (the third dimension) is small. However, I am not sure whether this is the case when d is large.
Is there any tutorial on this topic for Julia?
Example 1a (d=50)
x = [zeros(100, 10) for d=1:50];
#time for d=1:50
x[d] = rand(100,10);
end
0.000100 seconds (50 allocations: 396.875 KB)
Example 1b (d=50)
y=zeros(100, 10, 50);
#time for d=1:50
y[:,:,d] = rand(100,10);
end
0.000257 seconds (200 allocations: 400.781 KB)
Example 2a (d=50000)
x = [zeros(100, 10) for d=1:50000];
#time for d=1:50000
x[d] = rand(100,10);
end
0.410813 seconds (99.49 k allocations: 388.328 MB, 81.88% gc time)
Example 2b (d=50000)
y=zeros(100, 10, 50000);
#time for d=1:50000
y[:,:,d] = rand(100,10);
end
0.185929 seconds (298.98 k allocations: 392.898 MB, 6.83% gc time)
From my understanding d-element Array{Array{Float64,2},1} would perform best when d (the third dimension) is small. However, I am not sure whether this is the case when d is large.
No, it's moreso how you use it. A = Array{Array{Float64,2},1} is an array of pointers to matrices. The value of an array is the pointer or the reference. Thus A[i] returns a reference, i.e. it's cheap. A2 = Array{Float64,3} is a contiguous array of floats. It's really just an indexing setup over a linear slab of memory (and has a linear index A2[i] which runs through the whole thing using that linear form).
The latter has some advantages because it is contiguous. There's no indirection, so looping over all of A2s values will be faster. A has to deference two pointers to get a value, so a simple 3D loop will be slower if you don't know to deference each internal matrix only once. Also, you can get views to the matrices via #view A2[:,:,1] etc., but you have to take note that A2[:,:,1] by itself will make a copy of the matrix. A[1] is natural a view because it returns the reference to the matirx, and if you want to copy you'd have to explicitly do copy(A[1]). Because A is just a linear array of pointers, push!ing a new matrix onto it is cheap since it's just increasing a relatively small array (and push! is automatically amortized) to add a new pointer on the end (this is why things like DifferentialEqautions.jl use arrays of arrays to build timeseries instead of the more traditional matrix).
So they are different tools with different advantages and disadvantages.
As for your timings, you're doing two different things. x[d] = rand(100,10) is creating a new matrix and adding its reference to x. y[:,:,d] = rand(100,10) is creating a new matrix and looping through the values of y to change the values of y. You can see why that's slower. But what you're leaving out is the allocation-free cases.
function f2()
y=zeros(100, 10, 50);
#time for i in eachindex(y)
y[i] = rand()
end
y
end
In the small case this matches the array creation. You can't naively do this on case one, but as I said, if you dereference the pointer for the matrix once you do really well:
function f()
x = [zeros(100, 10) for d=1:5000];
#time #inbounds for d=1:50
xd = x[d]
for i in eachindex(xd)
xd[i] = rand()
end
end
x
end
So arrays of arrays can be great data structures in the right cases. The library RecursiveArrayTools.jl was created to take better advantage of it. For example, A3 = VectorOfArrays(A) gives A3 the same indexing structure as A2 by lazily transforming A[i,j,k] to A[k][i,j]. However, it keeps the advantages of A, but will automatically make sure to broadcast in the correct way like f. Another tool like this is the ArrayPartition which allows for heterogeneous typing in a broadcast-performant way.
So yeah, it's not always the right tool, but these heterogeneous and recursive arrays are great tools when used correctly.

In MATLAB: How should nested fields of a struct be converted to a cell array?

In MATLAB, I would like to extract a nested field for each index of a 1 x n struct (a nonscalar struct) and receive the output as a 1 x n cell array. As a simple example, suppose I start with the following struct s:
s(1).f1.fa = 'foo';
s(2).f1.fa = 'yedd';
s(1).f1.fb = 'raf';
s(2).f1.fb = 'da';
s(1).f2 = 'bok';
s(2).f2 = 'kemb';
I can produce my desired 1 x 2 cell array c using a for-loop:
n = length(s);
c = cell(1,n);
for k = 1:n
c{k} = s(k).f1.fa;
end
If I wanted to do analogously for a non-nested field, for example f2, then I could "vectorize" the operation (see this question), writing simply:
c = {s.f2};
However the same approach does not appear to work for nested fields. What then are possible ways to vectorize the above for-loop?
You cannot really vectorize it. The problem is that Matlab does not allow most forms of nested indexing, including []..
The most concise / readable option would be to concatenate s.f1 results in a structure array using [...], and then index into the new structure array with a separate call:
x = [s.f1]; c = {x.fa};
If you have a Mapping Toolbox, you could use extractfield to perform the second indexing in one expression:
c = extractfield([s.f1], 'fa');
Alternatively you could write a one-liner using arrayfun - here's a couple of options:
c = arrayfun(#(x) x.f1.fa, s, 'uni', false);
c = arrayfun(#(x) x.fa, [s.f1], 'uni', false);
Note that arrayfun and similar functions are generally slower than explicit for loops. So if the performance is critical, time / profile your code, before making a decision to get rid of the loop.

Resources