I have large 1D arrays a and b, and an array of pointers I that separates them into subarrays. My a and b barely fit into RAM and are of different dtypes (one contains UInt32s, the other Rational{Int64}s), so I don’t want to join them into a 2D array, to avoid changing dtypes.
For each i in I[2:end], I wish to sort the subarray a[I[i-1],I[i]-1] and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]. My attempt at this is:
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!( a[I[i-1], I[i]-1], b[I[i-1], I[i]-1] )
end
However, already on a small example, I see that sort! does not alter the view of a subarray:
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b); println(a,"\n",b) # works like it should
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]); println(a,"\n",b) # does nothing!!!
Any help on how to create such function sort! (as efficient as possible) are welcome.
Background: I am dealing with data coming from sparse arrays:
using SparseArrays
n=10^6; x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr;
Thus these a,b,I serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.
Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.
The problem is that a[1:5] is not a view, it's just a copy. instead make the view like
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1))
end
is what you are looking for
ps.
the #view a[2:3], #view(a[2:3]) or the #views macro can help making thins more readable.
First of all, you shouldn't redefine Base.sort! like this. Now, sort! will shadow Base.sort! and you'll get errors if you call sort!(a).
Also, a[I[i-1], I[i]-1] and b[I[i-1], I[i]-1] are not slices, they are just single elements, so nothing should happen if you sort them either with views or not. And sorting arrays in a moving-window way like this is not correct.
What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i+block_size-1) repeatedly in a loop, choosing a block_size that fits into memory, and modify both a and b according to p, then continue to the remaining part of a and find next p and repeat until nothing remains in a to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.
I am very new to Haskell (and functional programming in general) and I am trying to write a function called
"profileDistance m1 m2" that takes two matrices as parameters and needs to calculate the sum of the differences between each element in each matrix... I might have not explained that very well. Let me show it instead.
The matrices are on the form of: [[(Char,Int)]]
where each matrix might look something like this:
m1 = [[('A',1),('A',2)],
[('B',3),('B',4)],
[('C',5),('C',6)]]
m2 = [[('A',7),('A',8)],
[('B',9),('B',10)],
[('C',11),('C',12)]]
(Note: I wrote the numbers in order in this example but they can be ANY numbers in any order. The chars in each row in each matrix will however match like shown in the example.)
The result (in the case above) would look something like (psuedo code):
result = ((snd m1['A'][0])-(snd m2['A'][0]))+((snd m1['A'][1])-(snd m2['A'][1]))+((snd m1['B'][0])-(snd m2['B'][0]))+((snd m1['B'][1])-(snd m2['B'][1]))+((snd m1['C'][0])-(snd m2['C'][0]))+((snd m1['C'][1])-(snd m2['C'][1]))
This would be easy to do in any language that has for-loops and is non-functional but I have no idea how to do this in Haskell. I have a feeling that functions like map, fold or sum would help me here (admittedly I am not a 100% sure on how fold works). I hope there is an easy way to do this... please help.
Here a proposal:
solution m1 m2 = sum $ zipWith diffSnd flatM1 flatM2
where
diffSnd t1 t2 = snd t1 - snd t2
flatM1 = concat m1
flatM2 = concat m2
I wrote it so that it's easier to understand the building blocks.
The basic idea is to iterate simultaneously on our two lists of pairs using zipWith. Here its type:
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
It means it takes a function with type a -> b -> c, a list of a's and a list of b's, and it returns a list of c's. In other words, zipWith takes case of the iteration, you just have to specify what you want to do with every item the iteration yields, that in your case will be a pair of pairs (one from the first matrix, another one from the second).
The function passed to zipWith takes the snd element from each pair, and computes the difference. Looking back at zipWith signature you can deduce it will return a list of numbers. So the last thing we need to do is summing them, using the function sum.
There's one last problem. We actually do not have two lists of pairs to be passed to zipWith!, but two matrices. We need to "flatten" them in a list, preserving the order of the elements. That's exactly what concat does, hence the calls to that function in the definitions of flatM1 and flatM2.
I suggest you look into the implementation of every function I mentioned to have a better grasp of how iteration is expressed by mean of recursion. HTH
I have to solve Sudoku puzzles in the format of a vector containing 9 vectors (of length 9 each). Seeing as vectors are linked lists in Prolog, I figured the search would go faster if I transformed the puzzles in a 2D array format first.
Example puzzle:
puzzle(P) :- P =
[[_,_,8,7,_,_,_,_,6],
[4,_,_,_,_,9,_,_,_],
[_,_,_,5,4,6,9,_,_],
[_,_,_,_,_,3,_,5,_],
[_,_,3,_,_,7,6,_,_],
[_,_,_,_,_,_,_,8,9],
[_,7,_,4,_,2,_,_,5],
[8,_,_,9,_,5,_,2,3],
[2,_,9,3,_,8,7,6,_]].
I'm using ECLiPSe CLP to implement a solver. The best I've come up with so far is to write a domain like this:
domain(P):-
dim(P,[9,9]),
P[1..9,1..9] :: 1..9.
and a converter for the puzzle (parameter P is the given puzzle and Sudoku is the new defined grid with the 2D array). But I'm having trouble linking the values from the given initial puzzle to my 2D array.
convertVectorsToArray(Sudoku,P):-
( for(I,1,9),
param(Sudoku,P)
do
( for(J,1,9),
param(Sudoku,P,I)
do
Sudoku[I,J] is P[I,J]
)
).
Before this, I tried using array_list (http://eclipseclp.org/doc/bips/kernel/termmanip/array_list-2.html), but I kept getting type errors. How I did it before:
convertVectorsToArray(Sudoku,P):-
( for(I,1,9),
param(Sudoku,P)
do
( for(J,1,9),
param(Sudoku,P,I)
do
A is Sudoku[I],
array_list(A,P[I])
)
).
When my Sudoku finally outputs the example puzzle P in the following format:
Sudoku = []([](_Var1, _Var2, 8, 7, ..., 6), [](4, ...), ...)
then I'll be happy.
update
I tried again with the array_list; it almost works with the following code:
convertVectorsToArray(Sudoku,P):-
( for(I,1,9),
param(Sudoku,P)
do
X is Sudoku[I],
Y is P[I],
write(I),nl,
write(X),nl,
write(Y),nl,
array_list(X, Y)
).
The writes are there to see how the vectors/arrays look like. For some reason, it stops at the second iteration (instead of 9 times) and outputs the rest of the example puzzle as a vector of vectors. Only the first vector gets assigned correctly.
update2
While I'm sure the answer given by jschimpf is correct, I also figured out my own implementation:
convertVectorsToArray(Sudoku,[],_).
convertVectorsToArray(Sudoku,[Y|Rest],Count):-
X is Sudoku[Count],
array_list(X, Y),
NewCount is Count + 1,
convertVectorsToArray(Sudoku,Rest,NewCount).
Thanks for the added explanation on why it didn't work before though!
The easiest solution is to avoid the conversion altogether by writing your puzzle specification directly as a 2-D array. An ECLiPSe "array" is simply a structure with the functor '[]'/N, so you can write:
puzzle(P) :- P = [](
[](_,_,8,7,_,_,_,_,6),
[](4,_,_,_,_,9,_,_,_),
[](_,_,_,5,4,6,9,_,_),
[](_,_,_,_,_,3,_,5,_),
[](_,_,3,_,_,7,6,_,_),
[](_,_,_,_,_,_,_,8,9),
[](_,7,_,4,_,2,_,_,5),
[](8,_,_,9,_,5,_,2,3),
[](2,_,9,3,_,8,7,6,_)).
You can then use this 2-D array directly as the container for your domain variables:
sudoku(P) :-
puzzle(P),
P[1..9,1..9] :: 1..9,
...
However, if you want to keep your list-of-lists puzzle specification, and convert that to an array-of-arrays format, you can use array_list/2. But since that only works for 1-D arrays, you have to convert the nesting levels individually:
listoflists_to_matrix(Xss, Xzz) :-
% list of lists to list of arrays
( foreach(Xs,Xss), foreach(Xz,Xzs) do
array_list(Xz, Xs)
),
% list of arrays to array of arrays
array_list(Xzz, Xzs).
As for the reason your own code didn't work: this is due to the subscript notation P[I]. This
requires P to be an array (you were using it on lists)
works only in contexts where an arithmetic expression is expected, e.g. the right hand side of is/2, in arithmetic constraints, etc.