list in python, loop for, array - arrays

as a green hand python programmer, I have a little problem, I'll appreciate if somebody can help!!
I have two lists
a list of random repeated numbers(unknow size) like:
number = [44,198,57,48,658,24,7,44,44,44..]
for n in number
I want to give these numbers to a list of people in order, one number for one person. If a number repeat, the program will find out the person whom got this number when the first time it shows up. It means I want to print a list like
people = [1,2,3,4,5,6,7,1,1,1...]
print people

Store the mapping from numbers to indexes in a dict, and update it as you go.
numbers = [44,198,57,48,658,24,7,44,44,44.]
index_mapping = {}
indexes = []
next_index = 1
for number in numbers:
if number in index_mapping:
# already seen
indexes.append(index_mapping[number])
else:
# a new one
indexes.append(next_index)
index_mapping[number] = next_index
next_index += 1
print indexes

Related

Best way to efficiently manipulate lists within 3d array?

I am given n number of lists within a 3d array. These lists represent time. So for example one of these lists may be (1,4) and this means 'busy' for 1:00, 2:00, 3:00, 4:00. So whats an efficient way of turning (1,4) into (1,2,3,4) for all n number of lists within a 3d array. Keep in mind n could be up to 10000. I'm probably being a moron here but thanks for help.
///requests is the array being given. eg [(1,4),(2,9),(4,5)]
numberOfRequests=len(requests)
mostTaxi=1
talArray=[]
//Very ineffiecient way of solving current problem
for x in range(0,numberOfRequests):
for y in range((requests[x][0]),(requests[x][1])+1):
talArray.append(y)
//
busiestTime=max(set(talArray), key = talArray.count)
mostTaxi=talArray.count(busiestTime)
return mostTaxi
'''
If the tuples represent (start, end) time of a task, all You need to do is just get end - start from each tuple (it will represent the number of hours spend) and just get the max.
times = [
[(1,4),(2,9),(4,5)],
[(1,4),(1,20),(4,11)]
]
def calculate_time(t):
start, end = t
return end - start
longest = max((max(map(calculate_time, each)) for each in times))
print("The longest task took", longest , "hours")

Finding equal rows between multiple arrays of different sizes by voting

Let's say I have 5 arrays of different sizes. All of the arrays have the same number of columns, in my case 2, but a different number of rows. I need to find the elements of the rows that appear in at least 3 of such arrays.
Right now, I compare two arrays using ismember, then compare that result with the third array and then save the row values which occur in all the three arrays. I do this for every possible combination of 3 arrays; basically in my case, I have 10 of such operations in total. It's like choosing three out of 5 without repetitions.
It works, but I am looking for a more efficient implementation. In particular, I was looking for any implementation that can perform this by voting. It's like having sets of different sizes and trying to find the elements that appear in the majority of sets, in my case 3 arrays out of 5.
Not sure what you mean by voting but I think does what you are looking for.
It creates 1 big unique matrix of all the rows from the arrays. The does an ismember by rows of the unique array with each individual array. The ismember is summed together to get a count of how many times each unique row exists across your set of arrays.
You can then use that count to return a new array that has at least minNum occurrences.
You would call it like this:
>> [outRows, uRows, memCount]= getRowDuplicates(3,a,b,c,d,e)
Where a,b,c,d,e are you arrays and 3 is the minimum number of occurrences
function [outRows, uRows, memCount]= getRowDuplicates(minNum,varargin)
uRows = unique(vertcat(varargin{:}),'rows');
memCount = false(size(uRows,1),1);
for j = 1:nargin-1
memCount = memCount + ismember(uRows,varargin{j},'rows');
end
rowIdx = memCount >= minNum;
outRows = uRows(rowIdx,:);
Thanks to Aero Engy for the solution! This is my own attempt, after rewriting my initial, not very efficient, implementation. I thought someone might find it useful:
function [majorityPts] = MajorityPointSelection(Mat, numMajority)
result = vertcat(Mat{:});
allUniq = unique(result(:,1:2),'rows');
numUniq = size(allUniq,1);
allUniq = [allUniq zeros(numUniq,1); zeros(size(result,1)-numUniq, 3)];
for i = 1:numUniq
sumNumRow = sum(result(:,1:2) == allUniq(i,1:2),1);
allUniq(i,3) = sumNumRow(1);
end
allUniq(numUniq+1:end,:) = [];
majorityPts = allUniq(allUniq(:,3)>=numMajority,1:2);
end
Here Mat is a cell that contains all of the arrays that I want to compare in order to find the rows that appear in at least numMajority of them, where in my case numMajority = 3. Basically, I first dump all of the arrays in one big matrix (result), and then find the unique rows of this matrix. Finally, I count the number of each unique row and return the points that appear in the majority number of arrays as majorityPts.

How do I check to see if two (or more) elements of an array/vector are the same?

For one of my homework problems, we had to write a function that creates an array containing n random numbers between 1 and 365. (Done). Then, check if any of these n birthdays are identical. Is there a shorter way to do this than doing several loops or several logical expressions?
Thank you!
CODE SO FAR, NOT DONE YET!!
function = [prob] bdayprob(N,n)
N = input('Please enter the number of experiments performed: N = ');
n = input('Please enter the sample size: n = ');
count = 0;
for(i=1:n)
x(i) = randi(365);
if(x(i)== x)
count = count + 1
end
return
If I'm interpreting your question properly, you want to check to see if generating n integers or days results in n unique numbers. Given your current knowledge in MATLAB, it's as simple as doing:
n = 30; %// Define sample size
N = 10; %// Define number of trials
%// Define logical array where each location tells you whether
%// birthdays were repeated for a trial
check = false(1, N);
%// For each trial...
for idx = 1 : N
%// Generate sample size random numbers
days = randi(365, n, 1);
%// Check to see if the total number of unique birthdays
%// are equal to the sample size
check(idx) = numel(unique(days)) == n;
end
Woah! Let's go through the code slowly shall we? We first define the sample size and the number of trials. We then specify a logical array where each location tells you whether or not there were repeated birthdays generated for that trial. Now, we start with a loop where for each trial, we generate random numbers from 1 to 365 that is of n or sample size long. We then use unique and figure out all unique integers that were generated from this random generation. If all of the birthdays are unique, then the total number of unique birthdays generated should equal the sample size. If we don't, then we have repeats. For example, if we generated a sample of [1 1 1 2 2], the output of unique would be [1 2], and the total number of unique elements is 2. Since this doesn't equal 5 or the sample size, then we know that the birthdays generated weren't unique. However, if we had [1 3 4 6 7], unique would give the same output, and since the output length is the same as the sample size, we know that all of the days are unique.
So, we check to see if this number is equal to the sample size for each iteration. If it is, then we output true. If not, we output false. When I run this code on my end, this is what I get for check. I set the sample size to 30 and the number of trials to be 10.
check =
0 0 1 1 0 0 0 0 1 0
Take note that if you increase the sample size, there is a higher probability that you will get duplicates, because randi can be considered as sampling with replacement. Therefore, the larger the sample size, the higher the chance of getting duplicate values. I made the sample size small on purpose so that we can see that it's possible to get unique days. However, if you set it to something like 100, or 200, you will most likely get check to be all false as there will most likely be duplicates per trial.
Here are some more approaches that avoid loops. Let
n = 20; %// define sample size
x = randi(365,n,1); %// generate n values between 1 and 365
Any of the following code snippets returns true (or 1) if there are two identical values in x, and false (or 0) otherwise:
Sort and then check if any two consecutive elements are the same:
result = any(diff(sort(x))==0);
Do all pairwise comparisons manually; remove self-pairs and duplicate pairs; and check if any of the remaining comparisons is true:
result = nnz(tril(bsxfun(#eq, x, x.'),-1))>0;
Compute the distance between distinct values, considering each pair just once, and then check if any distance is 0:
result = any(pdist(x(:))==0);
Find the number of occurrences of the most common value (mode):
[~, occurs] = mode(x);
result = occurs>1;
I don't know if I'm supposed to solve the problem for you, but perhaps a few hints may lead you in the right direction (besides I'm not a matlab expert so it will be in general terms):
Maybe not, but you have to ask yourself what they expect of you. The solution you propose requires you to loop through the array in two nested loops which will mean n*(n-1)/2 times through the loop (ie quadratic time complexity).
There are a number of ways you can improve the time complexity of the problem. The most straightforward would be to have a 365 element table where you can keep track if a particular number has been seen yet - which would require only a single loop (ie linear time complexity), but perhaps that's not what they're looking for either. But maybe that solution is a little bit ad-hoc? What we're basically looking for is a fast lookup if a particular number has been seen before - there exists more memory efficient structures that allows look up in O(1) time and O(log n) time (if you know these you have an arsenal of tools to use).
Then of course you could use the pidgeonhole principle to provide the answer much faster in some special cases (remember that you only asked to determine whether two or more numbers are equal or not).

Generating also non-unique (duplicated) permutations

I've written a basic permutation program in C.
The user types a number, and it prints all the permutations of that number.
Basically, this is how it works (the main algorithm is the one used to find the next higher permutation):
int currentPerm = toAscending(num);
int lastPerm = toDescending(num);
int counter = 1;
printf("%d", currentPerm);
while (currentPerm != lastPerm)
{
counter++;
currentPerm = nextHigherPerm(currentPerm);
printf("%d", currentPerm);
}
However, when the number input includes repeated digits - duplicates - some permutations are not being generated, since they're duplicates. The counter shows a different number than it's supposed to - Instead of showing the factorial of the number of digits in the number, it shows a smaller number, of only unique permutations.
For example:
num = 1234567
counter = 5040 (!7 - all unique)
num = 1123456
counter = 2520
num = 1112345
counter = 840
I want to it to treat repeated/duplicated digits as if they were different - I don't want to generate only unique permutations - but rather generate all the permutations, regardless of whether they're repeated and duplicates of others.
Uhm... why not just calculate the factorial of the length of the input string then? ;)
I want to it to treat repeated/duplicated digits as if they were
different - I don't want to calculate only the number of unique
permutations.
If the only information that nextHigherPerm() uses is the number that's passed in, you're out of luck. Consider nextHigherPerm(122). How can the function know how many versions of 122 it has already seen? Should nextHigherPerm(122) return 122 or 212? There's no way to know unless you keep track of the current state of the generator separately.
When you have 3 letters for example ABC, you can make: ABC, ACB, BAC, BCA, CAB, CBA, 6 combinations (6!). If 2 of those letters repeat like AAB, you can make: AAB, ABA, BAA, IT IS NOT 3! so What is it? From where does it comes from? The real way to calculate it when a digit or letter is repeated is with combinations -> ( n k ) = n! / ( n! * ( n! - k! ) )
Let's make another illustrative example: AAAB, then the possible combinations are AAAB, AABA, ABAA, BAAA only four combinations, and if you calcualte them by the formula 4C3 = 4.
How is the correct procedure to generate all these lists:
Store the digits in an array. Example ABCD.
Set the 0 element of the array as the pivot element, and exclude it from the temp array. A {BCD}
Then as you want all the combinations (Even the repeated), move the elements of the temporal array to the right or left (However you like) until you reach the n element.
A{BCD}------------A{CDB}------------A{DBC}
Do the second step again but with the temp array.
A{B{CD}}------------A{C{DB}}------------A{D{BC}}
Do the third step again but inside the second temp array.
A{B{CD}}------------A{C{DB}}------------A{D{BC}}
A{B{DC}}------------A{C{BD}}------------A{D{CB}}
Go to the first array and move the array, BCDA, set B as pivot, and do this until you find all combinations.
Why not convert it to a string then treat your program like an anagram generator?

Help with a special case of permutations algorithm (not the usual)

I have always been interested in algorithms, sort, crypto, binary trees, data compression, memory operations, etc.
I read Mark Nelson's article about permutations in C++ with the STL function next_perm(), very interesting and useful, after that I wrote one class method to get the next permutation in Delphi, since that is the tool I presently use most. This function works on lexographic order, I got the algo idea from a answer in another topic here on stackoverflow, but now I have a big problem. I'm working with permutations with repeated elements in a vector and there are lot of permutations that I don't need. For example, I have this first permutation for 7 elements in lexographic order:
6667778 (6 = 3 times consecutively, 7 = 3 times consecutively)
For my work I consider valid perm only those with at most 2 elements repeated consecutively, like this:
6676778 (6 = 2 times consecutively, 7 = 2 times consecutively)
In short, I need a function that returns only permutations that have at most N consecutive repetitions, according to the parameter received.
Does anyone know if there is some algorithm that already does this?
Sorry for any mistakes in the text, I still don't speak English very well.
Thank you so much,
Carlos
My approach is a recursive generator that doesn't follow branches that contain illegal sequences.
Here's the python 3 code:
def perm_maxlen(elements, prefix = "", maxlen = 2):
if not elements:
yield prefix + elements
return
used = set()
for i in range(len(elements)):
element = elements[i]
if element in used:
#already searched this path
continue
used.add(element)
suffix = prefix[-maxlen:] + element
if len(suffix) > maxlen and len(set(suffix)) == 1:
#would exceed maximum run length
continue
sub_elements = elements[:i] + elements[i+1:]
for perm in perm_maxlen(sub_elements, prefix + element, maxlen):
yield perm
for perm in perm_maxlen("6667778"):
print(perm)
The implentation is written for readability, not speed, but the algorithm should be much faster than naively filtering all permutations.
print(len(perm_maxlen("a"*100 + "b"*100, "", 1)))
For example, it runs this in milliseconds, where the naive filtering solution would take millenia or something.
So, in the homework-assistance kind of way, I can think of two approaches.
Work out all permutations that contain 3 or more consecutive repetitions (which you can do by treating the three-in-a-row as just one psuedo-digit and feeding it to a normal permutation generation algorithm). Make a lookup table of all of these. Now generate all permutations of your original string, and look them up in lookup table before adding them to the result.
Use a recursive permutation generating algorthm (select each possibility for the first digit in turn, recurse to generate permutations of the remaining digits), but in each recursion pass along the last two digits generated so far. Then in the recursively called function, if the two values passed in are the same, don't allow the first digit to be the same as those.
Why not just make a wrapper around the normal permutation function that skips values that have N consecutive repetitions? something like:
(pseudocode)
funciton custom_perm(int max_rep)
do
p := next_perm()
while count_max_rerps(p) < max_rep
return p
Krusty, I'm already doing that at the end of function, but not solves the problem, because is need to generate all permutations and check them each one.
consecutive := 1;
IsValid := True;
for n := 0 to len - 2 do
begin
if anyVector[n] = anyVector[n + 1] then
consecutive := consecutive + 1
else
consecutive := 1;
if consecutive > MaxConsecutiveRepeats then
begin
IsValid := False;
Break;
end;
end;
Since I do get started with the first in lexographic order, ends up being necessary by this way generate a lot of unnecessary perms.
This is easy to make, but rather hard to make efficient.
If you need to build a single piece of code that only considers valid outputs, and thus doesn't bother walking over the entire combination space, then you're going to have some thinking to do.
On the other hand, if you can live with the code internally producing all combinations, valid or not, then it should be simple.
Make a new enumerator, one which you can call that next_perm method on, and have this internally use the other enumerator, the one that produces every combination.
Then simply make the outer enumerator run in a while loop asking the inner one for more permutations until you find one that is valid, then produce that.
Pseudo-code for this:
generator1:
when called, yield the next combination
generator2:
internally keep a generator1 object
when called, keep asking generator1 for a new combination
check the combination
if valid, then yield it

Resources