Find the permutations where no element stays in place - permutation

I'm working with permutations where each element is different from its original location. I would like an algorithm that given {an input length, row and digit}, will give me the output number. Here's an example:
If the input length is four, then all the permutations of 0123 are:
0123,0132,0213,0231,0312,0321,
1023,1032,1203,1230,1302,1320,
2013,2031,2103,2130,2301,2310,
3012,3021,3102,3120,3201,3210
The permutations in which no digit is in the same place (every digit has moved):
1032,1230,1302,
2031,2301,2310,
3012,3201,3210
Numbering starts at 0 so if the input to the function is {4,0,0}, the output should be the 0th (leftmost) digit of the 0th (first) permutation. First digit of 1032 is 1.
If the input is {4,1,1} then the output is the the second digit of 1230, which is 2.
The row number might be greater the nubmer of permutations. In that case, take the remainder modulo the number of permutations (in the above case, row modulo 9).
In the c language would be great.
(It's not homework, it's for work. Cuckoo hashing if you must know. I'd like to randomly select the swaps that I'll be making at each stage to see if it's better than BFS when the number of tables is greater than two.)

Why not just build a tree and iterate through it?
For example, if you have the digits 0123, then you know that the left most digit can be only from the set {1,2,3}. This would act as your first level in your tree.
Then, if you go down the path beginning with 1, you only have three options, {0, 2, 3}. If you go down the path beginning with 2 in the first level, you only have two options {0,3} (since you can't use 1 in the second digit from the left and the 2 was already used (you could pop the 2 from your list of choices)), etc.
The thing to watch out for in this approach is if you get to the end of a branch with 3 being your only option, in which case, you would just delete it.
For n > 10 generating all permutations and then filtering can become problematic. I think building out the tree would trim this significantly.
You can build the tree on the fly if need be. Your order can be defined by how you traverse the tree.

Brute-force approach in Python (you may use it to test your C implementation):
#!/usr/bin/env python
from itertools import ifilter, islice, permutations
def f(length, row, digit):
"""
>>> f(4, 0, 0)
1
>>> f(4, 1, 1)
2
"""
# 1. enumerate all permutations of range (range(3) -> [0,1,2], ..)
# 2. filter out permutations that have digits inplace
# 3. get n-th permutation (n -> row)
# 4. get n-th digit of the permutation (n -> digit)
return nth(ifilter(not_inplace, permutations(range(length))), row)[digit]
def not_inplace(indexes):
"""Return True if all indexes are not on their places.
"""
return all(i != d for i, d in enumerate(indexes))
def nth(iterable, n, default=None):
"""Return the nth item or a default value.
http://docs.python.org/library/itertools.html#recipes
"""
return next(islice(iterable, n, None), default)
if __name__=="__main__":
import doctest; doctest.testmod()

Related

Search specific permutation of permutationsubset with constraints

Iam searching one permutation P consisting of p1...pn of following subset S.
S is defined of the Labels L.
L1...Lk. Where a L contains pi...pj.
Where the inverse of P has at most k-1 decreasing adjecent Elements. k <= n.
Example:
n := 4
k := 2
L1 := 1,2
L2 := 3,4
L := L1,L2,L1,L2
S := 1324,1423,2314,2413
one solution would be P := 1342
no solution would be P := 3142 because decreasing adjecent elements are 2 but only max1 ist allowed because k =2.
Exists therefor an algorithm to find P of S defined by L?
Currently I use bruteforce to figure one permutation P, but its getting very fast unusable slow.
So each of L1, ..., Lk is a consecutive set of elements. At each place we see Li, Lj in the definition of L, one of three things is true:
i < j in which case it is ascending.
i = j in which case it could be ascending or descending.
i > j in which case it must be descending.
By counting the number of places where case 3 is true, we get a minimum number of descending elements already in the definition of L.
Next, for each Li we have a pattern we can write down with len(Li)-1 ; and , where a ; means that there are elements of other Ljs between two members of Li, and , means that Li elements are adjacent and so the order of the elements may result in a descent. We want to know, "For each possible number of descents within Li, how many permutations of Li have that number of descents?"
We will think of building the permutations as follows:
The first element goes at position 0.
The second element goes to position 0 or 1. (If at 0, the first element is moved.)
The third element goes to position 0, 1, or 2.
etc
A descent is when the next element is smaller than the previous, at a transition matching a ,.
We actually will want the following data structure for later use:
cache[Li] gives:
by how many elements are chosen:
by the last element chosen:
by the number of descents we will add:
how many ways of finishing this permutation
So we can write a recursive function that takes:
The pattern for Li.
How many elements have been chosen.
What index was last chosen.
It then returns a dictionary mapping descents to count of ways to finish the permutation for Li.
Memoize that and we get our desired data structure.
Now we'll repeat the idea. We want:
cache2[i] gives:
by number of descents to use:
how many permutations of L[i], L[i+1], ..., L[k] meet it.
Again we can write a recursive function using cache to calculate this, and we can memoize it to get cache2.
And NOW we can reverse the process.
We know how many descents came from the definition of L.
We know the distribution of remaining descents from cache2[1], so we can randomly pick how many descents there will be meeting our condition among L1...Lk.
For L1...Lk we can look at cache[L1][1][0] and cache2[i+1] to figure out how many descents there will be within Li with the correct probability.
For each Li we can look at how many descents we want to wind up with, its pattern, and cache2[Li] to figure out a random sequence of inserts winding up with the right pattern. The first insert is always at 0. After that you always know the size, and where the last insert was, and how many descents are left. So for each possiblenext insert you figure out if it counts as a descent (look at both pattern, and whether it is before the last insert), and the number of ways to finish from there. Then you can choose the next insert randomly with the right possibility.
For each Li we can turn the pattern of inserts into the list of values in order. (I will explain this step more.)
We can now follow the pattern of L and fill in all of the values.
Now for step 5, let's illustrate with your example from the chat. Suppose that L2 = [4, 5, 6] and the pattern of inserts we came up with was [0, 1, 0]. How do we figure out the arrangement of values?
Well first we do our inserts:
[1]
[1, 2]
[3, 1, 2]
This says that the first element (4) goes to the third place, the second (5) to the first, and the third (6) to the second. So our permutation for L2 is [5, 6, 4].
This will be a lot of code to write. But it will be polynomial. Specifically if m is the count of the most common label, cache will have total size at most O(k m^2). Thanks to memoization, each entry takes O(m) to calculate. Everything else is small relative to that. So total space is O(k m^2) and time is O(k m^3).

How to find Longest non-decreasing Subsequence containing duplicates in O(n) or O(nlogn)?

We know about an algorithm that will find the Longest Increasing subsequence in O(nlogn). I was wondering whether we can find the Longest non-decreasing subsequence with similar time complexity?
For example, consider an array : (4,10,4,8,9).
The longest increasing subsequence is (4,8,9).
And a longest non-decreasing subsequence would be (4,4,8,9).
First, here’s a “black box” approach that will let you find the longest nondecreasing subsequence using an off-the-shelf solver for longest increasing subsequences. Let’s take your sample array:
4, 10, 4, 8, 9
Now, imagine we transformed this array as follows by adding a tiny fraction to each number:
4.0, 10.1, 4.2, 8.3, 9.4
Changing the numbers this way will not change the results of any comparisons between two different integers, since the integer components have a larger magnitude difference than the values after the decimal point. However, if you compare the two 4s now, the latter 4 compares bigger than the previous one. If you now find the longest nondecreasing subsequence, you get back [4.0, 4.2, 8.3, 9.4], which you can then map back to [4, 4, 8, 9].
More generally, if you’re working with an array of n integer values, you can add i / n to each of the numbers, where i is its index, and you’ll be left with a sequence of distinct numbers. From there running a regular LIS algorithm will do the trick.
If you can’t work with fractions this way, you could alternatively multiply each number by n and then add in i, which also works.
On the other hand, suppose you have the code for a solver for LIS and want to convert it to one that solves the longest nondecreasing subsequence problem. The reasoning above shows that if you treat later copies of numbers as being “larger” than earlier copies, then you can just use a regular LIS. Given that, just read over the code for LIS and find spots where comparisons are made. When a comparison is made between two equal values, break the tie by considering the later appearance to be bigger than the earlier one.
I think the following will work in O(nlogn):
Scan the array from right to left, and for each element solve a subproblem of finding a longest subsequence starting from the given element of the array. E.g. if your array has indices from 0 to 4, then you start with the subarray [4,4] and check what's the longest sequence starting from 4, then you check subarray [3,4] and what's the longest subsequence starting from 3, next [2,4], and so on, until [0,4]. Finally, you choose the longest subsequence established in either of the steps.
For the last element (so subarray [4,4]) the longest sequence is always of length 1.
When in the next iteration you consider another element to the left (e.g., in the second step you consider the subarray [3,4], so the new element is element with the index 3 in the original array) you check if that element is not greater than some of the elements to its right. If so, you can take the result for some element from the right and add one.
For instance:
[4,4] -> longest sequence of length 1 (9)
[3,4] -> longest sequence of length 2 (8,9) 1+1 (you take the longest sequence from above which starts with 9 and add one to its length)
[2,4] -> longest sequence of length 3 (4,8,9) 2+1 (you take the longest sequence from above, i.e. (8,9), and add one to its length)
[1,4] -> longest sequence of length 1 (10) nothing to add to (10 is greater than all the elements to its right)
[0,4] -> longest sequence of length 4 (4,4,8,9) 3+1 (you take the longest sequence above, i.e. (4,8,9), and add one to its length)
The main issue is how to browse all the candidates to the right in logarithmic time. For that you keep a sorted map (a balanced binary tree). The keys are the already visited elements of the array. The values are the longest sequence lengths obtainable from that element. No need to store duplicates - among duplicate keys store the entry with largest value.

algorithm which finds the numbers in a sequence which appear 3 times or more, and prints their indexes

Suppose I input a sequence of numbers which ends with -1.
I want to print all the values of the sequence that occur in it 3 times or more, and also print their indexes in the sequence.
For example , if the input is : 2 3 4 2 2 5 2 4 3 4 2 -1
so the expected output in that case is :
2: 0 3 4 6 10
4: 2 7 9
First I thought of using quick-sort , but then I realized that as a result I will lose the original indexes of the sequence. I also have been thinking of using count, but that sequence has no given range of numbers - so maybe count will be no good in that case.
Now I wonder if I might use an array of pointers (but how?)
Do you have any suggestions or tips for an algorithm with time complexity O(nlogn) for that ? It would be very appreciated.
Keep it simple!
The easiest way would be to scan the sequence and count the number of occurrence of each element, put the elements that match the condition in an auxiliary array.
Then, for each element in the auxiliary array, scan the sequence again and print out the indices.
First of all, sorry for my bad english (It's not my language) I'll try my best.
So similar to what #vvigilante told, here is an algorithm implemented in python (it is in python because is more similar to pseudo code, so you can translate it to any language you want, and moreover I add a lot of comment... hope you get it!)
from typing import Dict, List
def three_or_more( input_arr:int ) -> None:
indexes: Dict[int, List[int]] = {}
#scan the array
i:int
for i in range(0, len(input_arr)-1):
#create list for the number in position i
# (if it doesn't exist)
#and append the number
indexes.setdefault(input_arr[i],[]).append(i)
#for each key in the dictionary
n:int
for n in indexes.keys():
#if the number of element for that key is >= 3
if len(indexes[n]) >= 3:
#print the key
print("%d: "%(n), end='')
#print each element int the current key
el:int
for el in indexes[n]:
print("%d,"%(el), end='')
#new line
print("\n", end='')
#call the function
three_or_more([2, 3, 4, 2, 2, 5, 2, 4, 3, 4, 2, -1])
Complexity:
The first loop scan the input array = O(N).
The second one check for any number (digit) in the array,
since they are <= N (you can not have more number than element), so it is O(numbers) the complexity is O(N).
The loop inside the loop go through all indexes corresponding to the current number...
the complexity seem to be O(N) int the worst case (but it is not)
So the complexity would be O(N) + O(N)*O(N) = O(N^2)
but remember that the two nest loop can at least print all N indexes, and since the indexes are not repeated the complexity of them is O(N)...
So O(N)+O(N) ~= O(N)
Speaking about memory it is O(N) for the input array + O(N) for the dictionary (because it contain all N indexes) ~= O(N).
Well if you do it in c++ remember that maps are way slower than array, so if N is small, you should use an array of array (or std::vector> ), else you can also try an unordered map that use hashes
P.S. Remember that get the size of a vector is O(1) time because it is a difference of pointers!
Starting with a sorted list is a good idea.
You could create a second array of original indices and duplicate all of the memory moves for the sort on the indices array. Then checking for triplicates is trivial and only requires sort + 1 traversal.

Generating also non-unique (duplicated) permutations

I've written a basic permutation program in C.
The user types a number, and it prints all the permutations of that number.
Basically, this is how it works (the main algorithm is the one used to find the next higher permutation):
int currentPerm = toAscending(num);
int lastPerm = toDescending(num);
int counter = 1;
printf("%d", currentPerm);
while (currentPerm != lastPerm)
{
counter++;
currentPerm = nextHigherPerm(currentPerm);
printf("%d", currentPerm);
}
However, when the number input includes repeated digits - duplicates - some permutations are not being generated, since they're duplicates. The counter shows a different number than it's supposed to - Instead of showing the factorial of the number of digits in the number, it shows a smaller number, of only unique permutations.
For example:
num = 1234567
counter = 5040 (!7 - all unique)
num = 1123456
counter = 2520
num = 1112345
counter = 840
I want to it to treat repeated/duplicated digits as if they were different - I don't want to generate only unique permutations - but rather generate all the permutations, regardless of whether they're repeated and duplicates of others.
Uhm... why not just calculate the factorial of the length of the input string then? ;)
I want to it to treat repeated/duplicated digits as if they were
different - I don't want to calculate only the number of unique
permutations.
If the only information that nextHigherPerm() uses is the number that's passed in, you're out of luck. Consider nextHigherPerm(122). How can the function know how many versions of 122 it has already seen? Should nextHigherPerm(122) return 122 or 212? There's no way to know unless you keep track of the current state of the generator separately.
When you have 3 letters for example ABC, you can make: ABC, ACB, BAC, BCA, CAB, CBA, 6 combinations (6!). If 2 of those letters repeat like AAB, you can make: AAB, ABA, BAA, IT IS NOT 3! so What is it? From where does it comes from? The real way to calculate it when a digit or letter is repeated is with combinations -> ( n k ) = n! / ( n! * ( n! - k! ) )
Let's make another illustrative example: AAAB, then the possible combinations are AAAB, AABA, ABAA, BAAA only four combinations, and if you calcualte them by the formula 4C3 = 4.
How is the correct procedure to generate all these lists:
Store the digits in an array. Example ABCD.
Set the 0 element of the array as the pivot element, and exclude it from the temp array. A {BCD}
Then as you want all the combinations (Even the repeated), move the elements of the temporal array to the right or left (However you like) until you reach the n element.
A{BCD}------------A{CDB}------------A{DBC}
Do the second step again but with the temp array.
A{B{CD}}------------A{C{DB}}------------A{D{BC}}
Do the third step again but inside the second temp array.
A{B{CD}}------------A{C{DB}}------------A{D{BC}}
A{B{DC}}------------A{C{BD}}------------A{D{CB}}
Go to the first array and move the array, BCDA, set B as pivot, and do this until you find all combinations.
Why not convert it to a string then treat your program like an anagram generator?

Algorithm to find "most common elements" in different arrays

I have for example 5 arrays with some inserted elements (numbers):
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
I need to find most common elements in those arrays and every element should go all the way till the end (see example below). In this example that would be the bold combination (or the same one but with "30" on the end, it's the "same") because it contains the smallest number of different elements (only two, 4 and 2/30).
This combination (see below) isn't good because if I have for ex. "4" it must "go" till it ends (next array mustn't contain "4" at all). So combination must go all the way till the end.
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
EDIT2: OR
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
OR anything else is NOT good.
Is there some algorithm to speed this thing up (if I have thousands of arrays with hundreds of elements in each one)?
To make it clear - solution must contain lowest number of different elements and the groups (of the same numbers) must be grouped from first - larger ones to the last - smallest ones. So in upper example 4,4,4,2 is better then 4,2,2,2 because in first example group of 4's is larger than group of 2's.
EDIT: To be more specific. Solution must contain the smallest number of different elements and those elements must be grouped from first to last. So if I have three arrrays like
1,2,3
1,4,5
4,5,6
Solution is 1,1,4 or 1,1,5 or 1,1,6 NOT 2,5,5 because 1's have larger group (two of them) than 2's (only one).
Thanks.
EDIT3: I can't be more specific :(
EDIT4: #spintheblack 1,1,1,2,4 is the correct solution because number used first time (let's say at position 1) can't be used later (except it's in the SAME group of 1's). I would say that grouping has the "priority"? Also, I didn't mention it (sorry about that) but the numbers in arrays are NOT sorted in any way, I typed it that way in this post because it was easier for me to follow.
Here is the approach you want to take, if arrays is an array that contains each individual array.
Starting at i = 0
current = arrays[i]
Loop i from i+1 to len(arrays)-1
new = current & arrays[i] (set intersection, finds common elements)
If there are any elements in new, do step 6, otherwise skip to 7
current = new, return to step 3 (continue loop)
print or yield an element from current, current = arrays[i], return to step 3 (continue loop)
Here is a Python implementation:
def mce(arrays):
count = 1
current = set(arrays[0])
for i in range(1, len(arrays)):
new = current & set(arrays[i])
if new:
count += 1
current = new
else:
print " ".join([str(current.pop())] * count),
count = 1
current = set(arrays[i])
print " ".join([str(current.pop())] * count)
>>> mce([[1, 4, 8, 10], [1, 2, 3, 4, 11, 15], [2, 4, 20, 21], [2, 30]])
4 4 4 2
If all are number lists, and are all sorted, then,
Convert to array of bitmaps.
Keep 'AND'ing the bitmaps till you hit zero. The position of the 1 in the previous value indicates the first element.
Restart step 2 from the next element
This has now turned into a graphing problem with a twist.
The problem is a directed acyclic graph of connections between stops, and the goal is to minimize the number of lines switches when riding on a train/tram.
ie. this list of sets:
1,4,8,10 <-- stop A
1,2,3,4,11,15 <-- stop B
2,4,20,21 <-- stop C
2,30 <-- stop D, destination
He needs to pick lines that are available at his exit stop, and his arrival stop, so for instance, he can't pick 10 from stop A, because 10 does not go to stop B.
So, this is the set of available lines and the stops they stop on:
A B C D
line 1 -----X-----X-----------------
line 2 -----------X-----X-----X-----
line 3 -----------X-----------------
line 4 -----X-----X-----X-----------
line 8 -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----
If we consider that a line under consideration must go between at least 2 consecutive stops, let me highlight the possible choices of lines with equal signs:
A B C D
line 1 -----X=====X-----------------
line 2 -----------X=====X=====X-----
line 3 -----------X-----------------
line 4 -----X=====X=====X-----------
line 8 -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----
He then needs to pick a way that transports him from A to D, with the minimal number of line switches.
Since he explained that he wants the longest rides first, the following sequence seems the best solution:
take line 4 from stop A to stop C, then switch to line 2 from C to D
Code example:
stops = [
[1, 4, 8, 10],
[1,2,3,4,11,15],
[2,4,20,21],
[2,30],
]
def calculate_possible_exit_lines(stops):
"""
only return lines that are available at both exit
and arrival stops, discard the rest.
"""
result = []
for index in range(0, len(stops) - 1):
lines = []
for value in stops[index]:
if value in stops[index + 1]:
lines.append(value)
result.append(lines)
return result
def all_combinations(lines):
"""
produce all combinations which travel from one end
of the journey to the other, across available lines.
"""
if not lines:
yield []
else:
for line in lines[0]:
for rest_combination in all_combinations(lines[1:]):
yield [line] + rest_combination
def reduce(combination):
"""
reduce a combination by returning the number of
times each value appear consecutively, ie.
[1,1,4,4,3] would return [2,2,1] since
the 1's appear twice, the 4's appear twice, and
the 3 only appear once.
"""
result = []
while combination:
count = 1
value = combination[0]
combination = combination[1:]
while combination and combination[0] == value:
combination = combination[1:]
count += 1
result.append(count)
return tuple(result)
def calculate_best_choice(lines):
"""
find the best choice by reducing each available
combination down to the number of stops you can
sit on a single line before having to switch,
and then picking the one that has the most stops
first, and then so on.
"""
available = []
for combination in all_combinations(lines):
count_stops = reduce(combination)
available.append((count_stops, combination))
available = [k for k in reversed(sorted(available))]
return available[0][1]
possible_lines = calculate_possible_exit_lines(stops)
print("possible lines: %s" % (str(possible_lines), ))
best_choice = calculate_best_choice(possible_lines)
print("best choice: %s" % (str(best_choice), ))
This code prints:
possible lines: [[1, 4], [2, 4], [2]]
best choice: [4, 4, 2]
Since, as I said, I list lines between stops, and the above solution can either count as lines you have to exit from each stop or lines you have to arrive on into the next stop.
So the route is:
Hop onto line 4 at stop A and ride on that to stop B, then to stop C
Hop onto line 2 at stop C and ride on that to stop D
There are probably edge-cases here that the above code doesn't work for.
However, I'm not bothering more with this question. The OP has demonstrated a complete incapability in communicating his question in a clear and concise manner, and I fear that any corrections to the above text and/or code to accommodate the latest comments will only provoke more comments, which leads to yet another version of the question, and so on ad infinitum. The OP has gone to extraordinary lengths to avoid answering direct questions or to explain the problem.
I am assuming that "distinct elements" do not have to actually be distinct, they can repeat in the final solution. That is if presented with [1], [2], [1] that the obvious answer [1, 2, 1] is allowed. But we'd count this as having 3 distinct elements.
If so, then here is a Python solution:
def find_best_run (first_array, *argv):
# initialize data structures.
this_array_best_run = {}
for x in first_array:
this_array_best_run[x] = (1, (1,), (x,))
for this_array in argv:
# find the best runs ending at each value in this_array
last_array_best_run = this_array_best_run
this_array_best_run = {}
for x in this_array:
for (y, pattern) in last_array_best_run.iteritems():
(distinct_count, lengths, elements) = pattern
if x == y:
lengths = tuple(lengths[:-1] + (lengths[-1] + 1,))
else :
distinct_count += 1
lengths = tuple(lengths + (1,))
elements = tuple(elements + (x,))
if x not in this_array_best_run:
this_array_best_run[x] = (distinct_count, lengths, elements)
else:
(prev_count, prev_lengths, prev_elements) = this_array_best_run[x]
if distinct_count < prev_count or prev_lengths < lengths:
this_array_best_run[x] = (distinct_count, lengths, elements)
# find the best overall run
best_count = len(argv) + 10 # Needs to be bigger than any possible answer.
for (distinct_count, lengths, elements) in this_array_best_run.itervalues():
if distinct_count < best_count:
best_count = distinct_count
best_lengths = lengths
best_elements = elements
elif distinct_count == best_count and best_lengths < lengths:
best_count = distinct_count
best_lengths = lengths
best_elements = elements
# convert it into a more normal representation.
answer = []
for (length, element) in zip(best_lengths, elements):
answer.extend([element] * length)
return answer
# example
print find_best_run(
[1,4,8,10],
[1,2,3,4,11,15],
[2,4,20,21],
[2,30]) # prints [4, 4, 4, 30]
Here is an explanation. The ...this_run dictionaries have keys which are elements in the current array, and they have values which are tuples (distinct_count, lengths, elements). We are trying to minimize distinct_count, then maximize lengths (lengths is a tuple, so this will prefer the element with the largest value in the first spot) and are tracking elements for the end. At each step I construct all possible runs which are a combination of a run up to the previous array with this element next in sequence, and find which ones are best to the current. When I get to the end I pick the best possible overall run, then turn it into a conventional representation and return it.
If you have N arrays of length M, this should take O(N*M*M) time to run.
I'm going to take a crack here based on the comments, please feel free to comment further to clarify.
We have N arrays and we are trying to find the 'most common' value over all arrays when one value is picked from each array. There are several constraints 1) We want the smallest number of distinct values 2) The most common is the maximal grouping of similar letters (changing from above for clarity). Thus, 4 t's and 1 p beats 3 x's 2 y's
I don't think either problem can be solved greedily - here's a counterexample [[1,4],[1,2],[1,2],[2],[3,4]] - a greedy algorithm would pick [1,1,1,2,4] (3 distinct numbers) [4,2,2,2,4] (two distinct numbers)
This looks like a bipartite matching problem, but I'm still coming up with the formulation..
EDIT : ignore; This is a different problem, but if anyone can figure it out, I'd be really interested
EDIT 2 : For anyone that's interested, the problem that I misinterpreted can be formulated as an instance of the Hitting Set problem, see http://en.wikipedia.org/wiki/Vertex_cover#Hitting_set_and_set_cover. Basically the left hand side of the bipartite graph would be the arrays and the right hand side would be the numbers, edges would be drawn between arrays that contain each number. Unfortunately, this is NP complete, but the greedy solutions described above are essentially the best approximation.

Resources