I have an array of some particular ints, and I want to find all the unique combinations (by addition) of these ints. I'm sure there's a way to do this functionally; I'm trying to avoid the iterative way of shoving for loops inside of for loops. I'm using Rust in this case, but this question is more generally a, "functional programming, how do?" take.
My first thought is that I should just zip every entry in v with every other entry, reduce them by addition to a single element, and filter duplicates. This is O(|v|^2), which feels bad, meaning I'm probably missing something fairly obvious to the functional whiz kids. Also I'm not even sure how to do it, I'd probably use a for loop to construct my new massive array.
My first pass: note that v holds all the numbers I care about.
let mut massive_arr = Vec::new();
for &elem in v.iter(){
for &elem2 in v.iter(){
massive_arr.push((elem,elem2));
}
}
let mut single_massive = Vec::new();
for &tuple in massive_arr.iter(){
single_massive.push(tuple.0 + tuple.1);
}
single_massive.dedup();
let summand: usize = single_massive.iter().sum(); println!("The sum of all that junk is {:?}", summand);```
Help me baptize my depraved iterations in the pure light of functional programming.
edited: I threw up an example before as I was still figuring out the implementation that actually worked, and the question was more of a, how do I do this better question. The thing above now actually works (but is still ugly!).
You can possibly use itertools (I do not have a compiler at hand, but you probably get the idea):
use itertools::Itertools;
iproduct!(v.iter(), v.iter()) // construct all pairs
.map(|tuple| tuple.0+tuple.1) // sum each pair
.unique() // de-duplicate (uses a HashMap internally)
.sum() // sum up
All this is still O(n^2) which is -- as far as I see -- asymptotically optimal, because all pairs of numbers might be needed.
To avoid the obvious duplicates, you can use tuple_combinations:
v.iter()
.tuple_combinations()
.map(|(a, b)| a+b)
.unique()
.sum()
Improving on #phimuemue's answer, you can avoid the obvious duplicates like this:
v.iter()
.enumerate()
.flat_map (|(i, a)| v[i+1..].iter().map (move |b| a+b))
.unique() // May not be needed or what you really want, see note below
.sum()
Playground
Note however that this may not give you the answer you really want if multiple pairs of numbers have the same sum. For example, given vec![1, 2, 3, 4] as input, which do you expect:
(1+2) + (1+3) + (1+4) + (2+3) + (2+4) + (3+4) = 30
or 3 + 4 + 5 + 6 + 7 = 25 because 1+4 == 2+3 == 5 is only counted once?
Related
I have large 1D arrays a and b, and an array of pointers I that separates them into subarrays. My a and b barely fit into RAM and are of different dtypes (one contains UInt32s, the other Rational{Int64}s), so I don’t want to join them into a 2D array, to avoid changing dtypes.
For each i in I[2:end], I wish to sort the subarray a[I[i-1],I[i]-1] and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]. My attempt at this is:
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!( a[I[i-1], I[i]-1], b[I[i-1], I[i]-1] )
end
However, already on a small example, I see that sort! does not alter the view of a subarray:
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b); println(a,"\n",b) # works like it should
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]); println(a,"\n",b) # does nothing!!!
Any help on how to create such function sort! (as efficient as possible) are welcome.
Background: I am dealing with data coming from sparse arrays:
using SparseArrays
n=10^6; x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr;
Thus these a,b,I serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.
Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.
The problem is that a[1:5] is not a view, it's just a copy. instead make the view like
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1))
end
is what you are looking for
ps.
the #view a[2:3], #view(a[2:3]) or the #views macro can help making thins more readable.
First of all, you shouldn't redefine Base.sort! like this. Now, sort! will shadow Base.sort! and you'll get errors if you call sort!(a).
Also, a[I[i-1], I[i]-1] and b[I[i-1], I[i]-1] are not slices, they are just single elements, so nothing should happen if you sort them either with views or not. And sorting arrays in a moving-window way like this is not correct.
What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i+block_size-1) repeatedly in a loop, choosing a block_size that fits into memory, and modify both a and b according to p, then continue to the remaining part of a and find next p and repeat until nothing remains in a to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.
I wrote an algorithm for ranking an array.
let rankFun array =
let arrayNew = List.toArray array
let arrayLength = Array.length arrayNew
let rankedArray = Array.create arrayLength 1.0
for i in 0 .. arrayLength - 2 do
for j in (i+1) .. arrayLength - 1 do
if arrayNew.[i] > arrayNew.[j] then
rankedArray.[i] <- rankedArray.[i] + 1.0
elif arrayNew.[i] < arrayNew.[j] then
rankedArray.[j] <- rankedArray.[j] + 1.0
else
rankedArray.[i] <- rankedArray.[i] + 0.0
rankedArray
I wanted to ask you what do you think about performance? I used for loops and I was wondering if you think there's another way better than this one. Before getting to this I was sorting my array, keeping original indexes, ranking and only afterwards resorting each rank to its original position, what was reeeeeally bad in terms of performance. Now I got to this improved version and was looking for some feedback. Any ideas?
Edit: Duplicated elements should have same rank. ;)
Thank you very much in advance. :)
I'm assuming that ranks can be taken from sorting the inputs, since you commented that the question's behavior on duplicates is a bug. It's surprising that the solution with sorting you described ran slower than the code shown in the question. It should be a lot faster.
A simple way to solve this via sorting is to build an ordered set from all values. Sets in F# are always ordered and contain no duplicates, so they can be used to create the ranking.
From the set, create a map from each value to its index in the set (plus one, to keep the ranking that starts with 1). With this, you can look up the rank of each value to fill the output array.
let getRankings array =
let rankTable =
Set.ofArray array
|> Seq.mapi (fun i n -> n, i + 1)
|> Map.ofSeq
array |> Array.map (fun n -> rankTable.[n])
This takes an array, rather than a list, because the input parameter in the question was called array. It also uses integers for the ranks, as this is the normal data type for this purpose.
This is much faster than the original algorithm, since all operations are at most O(n*log(n)), while the nested for-loops in the question are O(n^2). (See also: Wikipedia on Big O notation.) For only 10000 elements, the sorting-based solution already runs over 100 times faster on my computer.
(BTW, the statement else rankedArray.[i] <- rankedArray.[i] + 0.0 appears to do nothing. Unless you're doing some sort of black magic with the optimizer, you can just remove it.)
I am little confused with QuickSort in Scala. As per the specification, QuickSort can be applied to only Array, but not to ArrayBuffer. And QuickSort will do the sort in Place i.e. will change the original Array.
val intArray = Array(7,1,4,6,9) //> intArray : Array[Int] = Array(7, 1, 4, 6, 9)
Sorting.quickSort(intArray)
intArray.mkString("<"," and ",">") //> res4: String = <1 and 4 and 6 and 7 and 9>
Now I am not able to understand why the same I can't do with ArrayBuilder. Is there any reason behind this? and If I want to sort an ArrayBuilder with QuickSort Algorithm, what is the option provided by scala? Thanks for all help.
I believe the answer to your question can be found by examining the source code of Scala, to determine just what behavior you can get from the built in functions that come with ArrayBuffer : sortWith, sortBy, etc..
These functions come from defining trait "SeqLike" (you can examine the source code by browsing to that class in ScalaDoc, and clicking the link to next to source:"Seqlike" at the top of the documentation..
Anyhow most of the code related to sorting just gets pushed off this function:
def sorted[B >: A](implicit ord: Ordering[B]): Repr = {
val len = this.length
val arr = new ArraySeq[A](len)
var i = 0
for (x <- this.seq) {
arr(i) = x
i += 1
}
java.util.Arrays.sort(arr.array, ord.asInstanceOf[Ordering[Object]])
val b = newBuilder
b.sizeHint(len)
for (x <- arr) b += x
b.result()
}
Which basically makes a copy of the array and sorts it using Java.util.Arrays.Sort. So the next question becomes, "What is java doing here?" I was curious and the api described:
Implementation note: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.
So I think the basic answer is that scala relies heavily on Java's built in quicksorting, and you can make use of any Seq function (sorted, sortWith, sortBy) with better or equal performance to array quicksort functions. You may get a bit of performance hit in the "copy" phase, but classes are just pointers (it's not cloning the objects) so you're not losing much unless you have millions of items.
I have the following (imperative) algorithm that I want to implement in Haskell:
Given a sequence of pairs [(e0,s0), (e1,s1), (e2,s2),...,(en,sn)], where both "e" and "s" parts are natural numbers not necessarily different, at each time step one element of this sequence is randomly selected, let's say (ei,si), and based in the values of (ei,si), a new element is built and added to the sequence.
How can I implement this efficiently in Haskell? The need for random access would make it bad for lists, while the need for appending one element at a time would make it bad for arrays, as far as I know.
Thanks in advance.
I suggest using either Data.Set or Data.Sequence, depending on what you're needing it for. The latter in particular provides you with logarithmic index lookup (as opposed to linear for lists) and O(1) appending on either end.
"while the need for appending one element at a time would make it bad for arrays" Algorithmically, it seems like you want a dynamic array (aka vector, array list, etc.), which has amortized O(1) time to append an element. I don't know of a Haskell implementation of it off-hand, and it is not a very "functional" data structure, but it is definitely possible to implement it in Haskell in some kind of state monad.
If you know approx how much total elements you will need then you can create an array of such size which is "sparse" at first and then as need you can put elements in it.
Something like below can be used to represent this new array:
data MyArray = MyArray (Array Int Int) Int
(where the last Int represent how many elements are used in the array)
If you really need stop-and-start resizing, you could think about using the simple-rope package along with a StringLike instance for something like Vector. In particular, this might accommodate scenarios where you start out with a large array and are interested in relatively small additions.
That said, adding individual elements into the chunks of the rope may still induce a lot of copying. You will need to try out your specific case, but you should be prepared to use a mutable vector as you may not need pure intermediate results.
If you can build your array in one shot and just need the indexing behavior you describe, something like the following may suffice,
import Data.Array.IArray
test :: Array Int (Int,Int)
test = accumArray (flip const) (0,0) (0,20) [(i, f i) | i <- [0..19]]
where f 0 = (1,0)
f i = let (e,s) = test ! (i `div` 2) in (e*2,s+1)
Taking a note from ivanm, I think Sets are the way to go for this.
import Data.Set as Set
import System.Random (RandomGen, getStdGen)
startSet :: Set (Int, Int)
startSet = Set.fromList [(1,2), (3,4)] -- etc. Whatever the initial set is
-- grow the set by randomly producing "n" elements.
growSet :: (RandomGen g) => g -> Set (Int, Int) -> Int -> (Set (Int, Int), g)
growSet g s n | n <= 0 = (s, g)
| otherwise = growSet g'' s' (n-1)
where s' = Set.insert (x,y) s
((x,_), g') = randElem s g
((_,y), g'') = randElem s g'
randElem :: (RandomGen g) => Set a -> g -> (a, g)
randElem = undefined
main = do
g <- getStdGen
let (grownSet,_) = growSet g startSet 2
print $ grownSet -- or whatever you want to do with it
This assumes that randElem is an efficient, definable method for selecting a random element from a Set. (I asked this SO question regarding efficient implementations of such a method). One thing I realized upon writing up this implementation is that it may not suit your needs, since Sets cannot contain duplicate elements, and my algorithm has no way to give extra weight to pairings that appear multiple times in the list.
I have always been interested in algorithms, sort, crypto, binary trees, data compression, memory operations, etc.
I read Mark Nelson's article about permutations in C++ with the STL function next_perm(), very interesting and useful, after that I wrote one class method to get the next permutation in Delphi, since that is the tool I presently use most. This function works on lexographic order, I got the algo idea from a answer in another topic here on stackoverflow, but now I have a big problem. I'm working with permutations with repeated elements in a vector and there are lot of permutations that I don't need. For example, I have this first permutation for 7 elements in lexographic order:
6667778 (6 = 3 times consecutively, 7 = 3 times consecutively)
For my work I consider valid perm only those with at most 2 elements repeated consecutively, like this:
6676778 (6 = 2 times consecutively, 7 = 2 times consecutively)
In short, I need a function that returns only permutations that have at most N consecutive repetitions, according to the parameter received.
Does anyone know if there is some algorithm that already does this?
Sorry for any mistakes in the text, I still don't speak English very well.
Thank you so much,
Carlos
My approach is a recursive generator that doesn't follow branches that contain illegal sequences.
Here's the python 3 code:
def perm_maxlen(elements, prefix = "", maxlen = 2):
if not elements:
yield prefix + elements
return
used = set()
for i in range(len(elements)):
element = elements[i]
if element in used:
#already searched this path
continue
used.add(element)
suffix = prefix[-maxlen:] + element
if len(suffix) > maxlen and len(set(suffix)) == 1:
#would exceed maximum run length
continue
sub_elements = elements[:i] + elements[i+1:]
for perm in perm_maxlen(sub_elements, prefix + element, maxlen):
yield perm
for perm in perm_maxlen("6667778"):
print(perm)
The implentation is written for readability, not speed, but the algorithm should be much faster than naively filtering all permutations.
print(len(perm_maxlen("a"*100 + "b"*100, "", 1)))
For example, it runs this in milliseconds, where the naive filtering solution would take millenia or something.
So, in the homework-assistance kind of way, I can think of two approaches.
Work out all permutations that contain 3 or more consecutive repetitions (which you can do by treating the three-in-a-row as just one psuedo-digit and feeding it to a normal permutation generation algorithm). Make a lookup table of all of these. Now generate all permutations of your original string, and look them up in lookup table before adding them to the result.
Use a recursive permutation generating algorthm (select each possibility for the first digit in turn, recurse to generate permutations of the remaining digits), but in each recursion pass along the last two digits generated so far. Then in the recursively called function, if the two values passed in are the same, don't allow the first digit to be the same as those.
Why not just make a wrapper around the normal permutation function that skips values that have N consecutive repetitions? something like:
(pseudocode)
funciton custom_perm(int max_rep)
do
p := next_perm()
while count_max_rerps(p) < max_rep
return p
Krusty, I'm already doing that at the end of function, but not solves the problem, because is need to generate all permutations and check them each one.
consecutive := 1;
IsValid := True;
for n := 0 to len - 2 do
begin
if anyVector[n] = anyVector[n + 1] then
consecutive := consecutive + 1
else
consecutive := 1;
if consecutive > MaxConsecutiveRepeats then
begin
IsValid := False;
Break;
end;
end;
Since I do get started with the first in lexographic order, ends up being necessary by this way generate a lot of unnecessary perms.
This is easy to make, but rather hard to make efficient.
If you need to build a single piece of code that only considers valid outputs, and thus doesn't bother walking over the entire combination space, then you're going to have some thinking to do.
On the other hand, if you can live with the code internally producing all combinations, valid or not, then it should be simple.
Make a new enumerator, one which you can call that next_perm method on, and have this internally use the other enumerator, the one that produces every combination.
Then simply make the outer enumerator run in a while loop asking the inner one for more permutations until you find one that is valid, then produce that.
Pseudo-code for this:
generator1:
when called, yield the next combination
generator2:
internally keep a generator1 object
when called, keep asking generator1 for a new combination
check the combination
if valid, then yield it