Get disjoint elements from two tables - loops

I am trying to get disjoint elements from two tables. My tables currently are defined as:
local t1={elem5=true, elem2=true, ...}
local t2={elem2=true, elem5=true, ...}
However it would not be much of a problem to change the structure to:
local t1={elem5, elem2, ...}
local t2={elem2, elem5, ...}
How could I get disjoint elements from both tables efficienlty? Also I need to know which table the elements where originally part of.
What first came to mind was to loop over both tables:
local fromt1={}
for k, v in pairs(t1) do
if not t2[k] then
fromt1[#fromt1+1]=v
end
end
local fromt2={}
for k, v in pairs(t2) do
if not t1[k] then
fromt2[#fromt2+1]=v
end
end
But these are two loops, so I looked some more and found a function for iterating two tables in one loop (link):
function pairs2(t, ...)
local i, a, k, v = 1, {...}
return
function()
repeat
k, v = next(t, k)
if k == nil then
i, t = i + 1, a[i]
end
until k ~= nil or not t
return k, v
end
end
local fromt1, fromt2={}, {}
for k, v in pairs2(t1, t2) do
if not t2[k] then
fromt1[#fromt1+1]=v
end
if not t1[k] then
fromt2[#fromt2+1]=v
end
end
Any more efficient/cleaner way to get disjoint elements from two tables in Lua?

There's nothing wrong with the first aproach.
1) You have to iterate over both tables one way or another; whether you do it in two loops or one is irrelevant.
2) You need at least two additional tables for the two result sets.
The one optimization you could do:
the # operator on tables is somewhat expensive, so you can sometimes improve perofmance by keeping a number variable and increasing it manually with each insertion. But please don't just implement this because I told you. Benchmark your code and only use this optimization if you find that your code actually runs faster.
EDIT: I just noticed that I kida skipped one possible implementation because I assumed you don't want to change either of the original tables. If however, one of the two is a throwaway table and you don't mind it changing, consider this:
local function remove_first_from_second(first, second)
for key in pairs(first) do
second[key] = nil
end
return second
end
Running this both ways won't work:
remove_first_from_second(fromt1, fromt2) -- Removes shared keys from fromt2
remove_first_from_second(fromt2, fromt1) -- Removes nothing from fromt1
Because at the time you call it the second time, fromt2 already contains only the keys that fromt1 doesn't have.
However, since this problem only affects the second call, you can get away with just one in-between table (assuming both original tables can be mutated)

Related

Julia: How to efficiently sort subarrays of 2 large arrays in parallel?

I have large 1D arrays a and b, and an array of pointers I that separates them into subarrays. My a and b barely fit into RAM and are of different dtypes (one contains UInt32s, the other Rational{Int64}s), so I don’t want to join them into a 2D array, to avoid changing dtypes.
For each i in I[2:end], I wish to sort the subarray a[I[i-1],I[i]-1] and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]. My attempt at this is:
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!( a[I[i-1], I[i]-1], b[I[i-1], I[i]-1] )
end
However, already on a small example, I see that sort! does not alter the view of a subarray:
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b); println(a,"\n",b) # works like it should
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]); println(a,"\n",b) # does nothing!!!
Any help on how to create such function sort! (as efficient as possible) are welcome.
Background: I am dealing with data coming from sparse arrays:
using SparseArrays
n=10^6; x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr;
Thus these a,b,I serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.
Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.
The problem is that a[1:5] is not a view, it's just a copy. instead make the view like
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1))
end
is what you are looking for
ps.
the #view a[2:3], #view(a[2:3]) or the #views macro can help making thins more readable.
First of all, you shouldn't redefine Base.sort! like this. Now, sort! will shadow Base.sort! and you'll get errors if you call sort!(a).
Also, a[I[i-1], I[i]-1] and b[I[i-1], I[i]-1] are not slices, they are just single elements, so nothing should happen if you sort them either with views or not. And sorting arrays in a moving-window way like this is not correct.
What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i+block_size-1) repeatedly in a loop, choosing a block_size that fits into memory, and modify both a and b according to p, then continue to the remaining part of a and find next p and repeat until nothing remains in a to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.

Looping over array values in Lua

I have a variable as follows
local armies = {
[1] = "ARMY_1",
[2] = "ARMY_3",
[3] = "ARMY_6",
[4] = "ARMY_7",
}
Now I want to do an action for each value. What is the best way to loop over the values? The typical thing I'm finding on the internet is this:
for i, armyName in pairs(armies) do
doStuffWithArmyName(armyName)
end
I don't like that as it results in an unused variable i. The following approach avoids that and is what I am currently using:
for i in pairs(armies) do
doStuffWithArmyName(armies[i])
end
However this is still not as readable and simple as I'd like, since this is iterating over the keys and then getting the value using the key (rather imperatively). Another boon I have with both approaches is that pairs is needed. The value being looped over here is one I have control over, and I'd prefer that it can be looped over as easily as possible.
Is there a better way to do such a loop if I only care about the values? Is there a way to address the concerns I listed?
I'm using Lua 5.0 (and am quite new to the language)
The idiomatic way to iterate over an array is:
for _, armyName in ipairs(armies) do
doStuffWithArmyName(armyName)
end
Note that:
Use ipairs over pairs for arrays
If the key isn't what you are interested, use _ as placeholder.
If, for some reason, that _ placeholder still concerns you, make your own iterator. Programming in Lua provides it as an example:
function values(t)
local i = 0
return function() i = i + 1; return t[i] end
end
Usage:
for v in values(armies) do
print(v)
end

Post-Condition Violation with Feature in Eiffel

This is part of the class. This class is called BAG[G -> {HASHABLE, COMPARABLE}]
it inherits from ADT_BAG which has deferred features such as count, extend, remove, remove_all, add_all... more, and domain to be re-implemented.
domain returns ARRAY[G] which is a sorted array list of G
i always get Post-condition violation "value_semantics" which is something to do with object comparison but I checked and there is no code for object comparison which is very weird.
I tried to remake the code for domain feature several times and it ALWAYS ends up with a post-condition violation or a fail.
When I check the debugger the array "a" that is returned from domain always has count 0 but this does not make sense because i move keys from table to "a" but count is still 0.
Maybe I am transferring the keys wrong to the array?
code:
count: INTEGER
-- cardinality of the domain
do
result := domain.count -- has to be domain.count because loop invariant: consistent: count = domain.count
end
domain: ARRAY[G]
-- sorted domain of bag
local
tmp: G
a: ARRAY[G]
do
create a.make_empty
across 1 |..| (a.count) as i -- MOVING keys from table to array
loop
across table as t
loop
if not a.has (t.key) then
a.enter (t.key, i.item)
i.forth
end
end
end
across 1 |..| (a.count-1) as i -- SORTING
loop
if a[i.item] > a[i.item+1] then
tmp := a[i.item]
a[i.item] := a[i.item+1]
a[i.item+1] := tmp
end
end
Result := a
ensure then
value_semantics: Result.object_comparison -- VIOLATION THROWN HERE
correct_items: across 1 |..| Result.count as j all
has(Result[j.item]) end
sorted: across 1 |..| (Result.count-1) as j all
Result[j.item] <= Result[j.item+1] end
end
test code:
t3: BOOLEAN
local
sorted_domain: ARRAY[STRING]
do
comment("t3:test sorted domain")
sorted_domain := <<"bolts", "hammers", "nuts">>
sorted_domain.compare_objects
Result := bag2.domain ~ sorted_domain -- fails here
check Result end
end
The first loop across 1 |..| (a.count) as i is not going to make a single iteration because a is empty (has no elements) at the beginning. Indeed, it has been just created with create a.make_empty.
Also, because keys in the table are unique it is useless to check whether a key has been added to the resulting array: the test not a.has (t.key) will always succeed.
Therefore the first loop should go over keys of a table and add them into the resulting array. The feature {ARRAY}.force may be of interest in this case. The addition of the new elements should not make any "holes" in the array though. One way to achieve this is to add a new element right after the current upper bound of the array.
The sorting loop is also incorrect. Here the situation is reversed compared to the previous one: sorting cannot be done in a single loop, at least two nested loops are required. The template seems to be using Insertion sort, its algorithm can be found elsewhere.
EDIT: the original answer referred to {ARRAY}.extend instead of {ARRAY}.force. Unfortunately {ARRAY}.extend is not generally available, but a.extend (x) would have the same effect as a.force (x, a.upper + 1).

Growing arrays in Haskell

I have the following (imperative) algorithm that I want to implement in Haskell:
Given a sequence of pairs [(e0,s0), (e1,s1), (e2,s2),...,(en,sn)], where both "e" and "s" parts are natural numbers not necessarily different, at each time step one element of this sequence is randomly selected, let's say (ei,si), and based in the values of (ei,si), a new element is built and added to the sequence.
How can I implement this efficiently in Haskell? The need for random access would make it bad for lists, while the need for appending one element at a time would make it bad for arrays, as far as I know.
Thanks in advance.
I suggest using either Data.Set or Data.Sequence, depending on what you're needing it for. The latter in particular provides you with logarithmic index lookup (as opposed to linear for lists) and O(1) appending on either end.
"while the need for appending one element at a time would make it bad for arrays" Algorithmically, it seems like you want a dynamic array (aka vector, array list, etc.), which has amortized O(1) time to append an element. I don't know of a Haskell implementation of it off-hand, and it is not a very "functional" data structure, but it is definitely possible to implement it in Haskell in some kind of state monad.
If you know approx how much total elements you will need then you can create an array of such size which is "sparse" at first and then as need you can put elements in it.
Something like below can be used to represent this new array:
data MyArray = MyArray (Array Int Int) Int
(where the last Int represent how many elements are used in the array)
If you really need stop-and-start resizing, you could think about using the simple-rope package along with a StringLike instance for something like Vector. In particular, this might accommodate scenarios where you start out with a large array and are interested in relatively small additions.
That said, adding individual elements into the chunks of the rope may still induce a lot of copying. You will need to try out your specific case, but you should be prepared to use a mutable vector as you may not need pure intermediate results.
If you can build your array in one shot and just need the indexing behavior you describe, something like the following may suffice,
import Data.Array.IArray
test :: Array Int (Int,Int)
test = accumArray (flip const) (0,0) (0,20) [(i, f i) | i <- [0..19]]
where f 0 = (1,0)
f i = let (e,s) = test ! (i `div` 2) in (e*2,s+1)
Taking a note from ivanm, I think Sets are the way to go for this.
import Data.Set as Set
import System.Random (RandomGen, getStdGen)
startSet :: Set (Int, Int)
startSet = Set.fromList [(1,2), (3,4)] -- etc. Whatever the initial set is
-- grow the set by randomly producing "n" elements.
growSet :: (RandomGen g) => g -> Set (Int, Int) -> Int -> (Set (Int, Int), g)
growSet g s n | n <= 0 = (s, g)
| otherwise = growSet g'' s' (n-1)
where s' = Set.insert (x,y) s
((x,_), g') = randElem s g
((_,y), g'') = randElem s g'
randElem :: (RandomGen g) => Set a -> g -> (a, g)
randElem = undefined
main = do
g <- getStdGen
let (grownSet,_) = growSet g startSet 2
print $ grownSet -- or whatever you want to do with it
This assumes that randElem is an efficient, definable method for selecting a random element from a Set. (I asked this SO question regarding efficient implementations of such a method). One thing I realized upon writing up this implementation is that it may not suit your needs, since Sets cannot contain duplicate elements, and my algorithm has no way to give extra weight to pairings that appear multiple times in the list.

Help with a special case of permutations algorithm (not the usual)

I have always been interested in algorithms, sort, crypto, binary trees, data compression, memory operations, etc.
I read Mark Nelson's article about permutations in C++ with the STL function next_perm(), very interesting and useful, after that I wrote one class method to get the next permutation in Delphi, since that is the tool I presently use most. This function works on lexographic order, I got the algo idea from a answer in another topic here on stackoverflow, but now I have a big problem. I'm working with permutations with repeated elements in a vector and there are lot of permutations that I don't need. For example, I have this first permutation for 7 elements in lexographic order:
6667778 (6 = 3 times consecutively, 7 = 3 times consecutively)
For my work I consider valid perm only those with at most 2 elements repeated consecutively, like this:
6676778 (6 = 2 times consecutively, 7 = 2 times consecutively)
In short, I need a function that returns only permutations that have at most N consecutive repetitions, according to the parameter received.
Does anyone know if there is some algorithm that already does this?
Sorry for any mistakes in the text, I still don't speak English very well.
Thank you so much,
Carlos
My approach is a recursive generator that doesn't follow branches that contain illegal sequences.
Here's the python 3 code:
def perm_maxlen(elements, prefix = "", maxlen = 2):
if not elements:
yield prefix + elements
return
used = set()
for i in range(len(elements)):
element = elements[i]
if element in used:
#already searched this path
continue
used.add(element)
suffix = prefix[-maxlen:] + element
if len(suffix) > maxlen and len(set(suffix)) == 1:
#would exceed maximum run length
continue
sub_elements = elements[:i] + elements[i+1:]
for perm in perm_maxlen(sub_elements, prefix + element, maxlen):
yield perm
for perm in perm_maxlen("6667778"):
print(perm)
The implentation is written for readability, not speed, but the algorithm should be much faster than naively filtering all permutations.
print(len(perm_maxlen("a"*100 + "b"*100, "", 1)))
For example, it runs this in milliseconds, where the naive filtering solution would take millenia or something.
So, in the homework-assistance kind of way, I can think of two approaches.
Work out all permutations that contain 3 or more consecutive repetitions (which you can do by treating the three-in-a-row as just one psuedo-digit and feeding it to a normal permutation generation algorithm). Make a lookup table of all of these. Now generate all permutations of your original string, and look them up in lookup table before adding them to the result.
Use a recursive permutation generating algorthm (select each possibility for the first digit in turn, recurse to generate permutations of the remaining digits), but in each recursion pass along the last two digits generated so far. Then in the recursively called function, if the two values passed in are the same, don't allow the first digit to be the same as those.
Why not just make a wrapper around the normal permutation function that skips values that have N consecutive repetitions? something like:
(pseudocode)
funciton custom_perm(int max_rep)
do
p := next_perm()
while count_max_rerps(p) < max_rep
return p
Krusty, I'm already doing that at the end of function, but not solves the problem, because is need to generate all permutations and check them each one.
consecutive := 1;
IsValid := True;
for n := 0 to len - 2 do
begin
if anyVector[n] = anyVector[n + 1] then
consecutive := consecutive + 1
else
consecutive := 1;
if consecutive > MaxConsecutiveRepeats then
begin
IsValid := False;
Break;
end;
end;
Since I do get started with the first in lexographic order, ends up being necessary by this way generate a lot of unnecessary perms.
This is easy to make, but rather hard to make efficient.
If you need to build a single piece of code that only considers valid outputs, and thus doesn't bother walking over the entire combination space, then you're going to have some thinking to do.
On the other hand, if you can live with the code internally producing all combinations, valid or not, then it should be simple.
Make a new enumerator, one which you can call that next_perm method on, and have this internally use the other enumerator, the one that produces every combination.
Then simply make the outer enumerator run in a while loop asking the inner one for more permutations until you find one that is valid, then produce that.
Pseudo-code for this:
generator1:
when called, yield the next combination
generator2:
internally keep a generator1 object
when called, keep asking generator1 for a new combination
check the combination
if valid, then yield it

Resources