How to convert two associated arrays so that elements are evenly distributed? - arrays

There are two arrays, an array of images and an array of the corresponding labels. (e.g pictures of figures and it's values)
The occurrences in the labels are unevenly distributed.
What I want is to cut both arrays in such a way, that the labels are evenly distributed. E.g. every label occurs 2 times.
To test I've just created two 1D arrays and it was working:
labels = np.array([1, 2, 3, 3, 1, 2, 1, 3, 1, 3, 1,])
images = np.array(['A','B','C','C','A','B','A','C','A','C','A',])
x, y = zip(*sorted(zip(images, labels)))
label = list(set(y))
new_images = []
new_labels = []
amount = 2
for i in label:
start = y.index(i)
stop = start + amount
new_images = np.append(new_images, x[start: stop])
new_labels = np.append(new_labels, y[start: stop])
What I get/want is this:
new_labels: [ 1. 1. 2. 2. 3. 3.]
new_images: ['A' 'A' 'B' 'B' 'C' 'C']
(It is not necessary, that the arrays are sorted)
But when I tried it with the right data (images.shape = (35000, 32, 32, 3), labels.shape = (35000)) I've got an error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This does not help me a lot:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think that my solution is quite dirty anyhow. Is there a way to do it right?
Thank you very much in advance!

When your labels are equal, the sort function tries to sort on the second value of the tuples it has as input, since this is an array in the case of your real data, (instead of the 1D data), it cannot compare them and raises this error.
Let me explain it a bit more detailed:
x, y = zip(*sorted(zip(images, labels)))
First, you zip your images and labels. What this means, is that you create tuples with the corresponding elements of images and lables. The first element from images by the first element of labels, etc.
In case of your real data, each label is paired with an array with shape (32, 32, 3).
Second you sort all those tuples. This function tries first to sort on the first element of the tuple. However, when they are equal, it will try to sort on the second element of the tuples. Since they are arrays it cannot compare them en throws an error.
You can solve this by explicitly telling the sorted function to only sort on the first tuple element.
x, y = zip(*sorted(zip(images, labels), key=lambda x: x[0]))
If performance is required, using itemgetter will be faster.
from operator import itemgetter
x, y = zip(*sorted(zip(images, labels), key=itemgetter(0)))

Related

Count elements in 1st array less than or equal than elements in 2nd array python

I have an array Aof 21381120 elements ranking from [0,1]. I need to construct a new array B in which the element i contains the number of elements in A less than or equal than A[i].
My attempt:
A = np.random.random(10000) # for reproducibility
g = np.sort(A)
B = [np.sum(g<=element) for element in A]
I am still using a for loop, taking too much time. Since I have to do this several times I was wondering if exists a better way to do it.
EDIT
I gave an example of the array A for reproducibility. This does what is expected to. But I need it to be faster (for arrays having 2e9 elements).
For instance if:
A = [0.1,0.01,0.3,0.5,1]
I expect the output to be
B = [2, 1, 3, 4, 5]
You could use binary search to speed up searching in a sorted array. Binary search in numpy.
A = np.random.rand(10000) # for reproducibility
g = np.sort(A)
B = [np.searchsorted(g, element) for element in A]
Looks like sorting is the way to go because in a sorted array A, the number of elements less than or equal to A[i] is almost i + 1.
However, if an element is repeated, you'll have to look at the nearest element that's to the right of A[i]:
A = [1,2,3,4,4,4,5,6]
^^^^^ A[3] == A[4] == A[5]
Here, the number of elements <= A[3] is 3 + <number of repeated 4's>. Maybe you could roll your own sorting algorithm that would keep track of such repetitions. Or count the repetitions before sorting the array.
Then the final formula would be:
N(<= A[k]) = k + <number of elements equal to A[k]>
So the speed of your code would mainly depend on the speed of the sorting algorithm.

Swift For loop Enumeration in Sort differs

Im trying to manual sort on the below array.
The issue here is, the result varies while reading the item from the "for-loop enumuration" (noted as //(2)) verses reading it as a subscript (noted as //(1)). It could be a minor issue hiding behind my eye. Appreciate your time.
var mySortArray : Array<Int> = []
mySortArray = [1,5,3,3,21,11,2]
for (itemX,X) in mySortArray.enumerated() {
for (itemY,Y) in mySortArray.enumerated() {
// if mySortArray[itemX] < mySortArray[itemY] // (1)
if X < Y // (2)
{
//Swap the position of item in the array
mySortArray.swapAt(itemX, itemY)
}
}
}
print(mySortArray)
// Prints [1, 2, 3, 3, 5, 11, 21] ( for condition // (1))
// Prints [2, 1, 3, 5, 11, 3, 21] ( for condition // (2))
mySortArray = [1,5,3,3,21,11,2]
print("Actual Sort Order : \(mySortArray.sorted())")
// Prints Actual Sort Order : [1, 2, 3, 3, 5, 11, 21]
The problem here is that the function .enumerated() returns a new sequence and iterates that. Think of it as a new array.
So, you are working with 3 different arrays here.
You have an unsorted array that you want to fix. Lets call this the w ("working array") and then you have you array x and array y.
So, w is [1,5,3,3,21,11,2], x and y are effectively the same as w at the beginning.
Now you get your first two values that need to swap...
valueX is at index 1 of x (5). valueY is at index 2 of y (3).
And you swap them... in w.
So now w is [1,3,5,3,21,11,2] but x and y are unchanged.
So now you indexes are being thrown off. You are comparing items in x with items in y and then swapping them in we which is completely different.
You need to work with one array the whole time.
Of course... there is also the issue that your function is currently very slow. O(n^2) and there are much more efficient ways of sorting.
If you are doing this as an exercise in learning how to write sort algorithms then keep going. If not you should really be using the .sort() function.
Really what you want to be doing is not using .enumerated() at all. Just use ints to get (and swap) values in w.
i.e. something like
for indexX in 0..<w.count {
for indexY in indexX..<w.count {
// do some comparison stuff.
// do some swapping stuff.
}
}

An array of arrays of different sizes

I'm learning R and I'd like to make an "array of arrays" (not sure if the expression is correct) inserting for example these values
N_seq = c(10,50,100,500,1000)
inside this function (not correct):
x = rnorm(N_seq,3.2,1)
The desired result should be like an object made by five arrays (as length(N_seq) = 5) where each one is equal to the result of x inserting each value of N_seq (so that x[1] has the values of rnorm(N_seq[i], 3.2, 1) with length 10, and x[2] has the values rnorm(N_seq[2], 3.2, 1) with length 50, etc.
For ragged array, use "list". This is a special type of "vector" in R. You can not only hold vectors of difference length in each list element, but also different type of objects for each list element.
The lapply function for "list apply" is frequently used to process a list and / or return a list. For your task, you can do:
lapply(N_seq, FUN = rnorm, mean = 3.2, sd = 1)
lapply applies function FUN to each vector elements of N_seq, where mean = 3.2 and sd = 1 are additional parameters passed to FUN, which is rnorm here.

Finding an element of a structure based on a field value

I have a 1x10 structure array with plenty of fields and I would like to remove from the struct array the element with a specific value on one of the field variables.
I know the value im looking for and the field I should be looking for and I also know how to delete the element from the struct array once I find it. Question is how(if possible) to elegantly identify it without going through a brute force solution ie a for-loop that goes through elements of the struct array to compare with the value I m looking for.
Sample code: buyers as 1x10 struct array with fields:
id,n,Budget
and the variable to find in the id values like id_test = 12
You can use the fact that if you have an array of structs, and you use the dot referencing, this creates a comma-separated list. If you enclose this in [] it will attempt to create an array and if you enclose it in {} it will be coerced into a cell array.
a(1).value = 1;
a(2).value = 2;
a(3).value = 3;
% Into an array
[a.value]
% 1 2 3
% Into a cell array
{a.value}
% [1] [2] [3]
So to do your comparison, you can convert the field you care about into either an array of cell array to do the comparison. This comparison will then yield a logical array which you can use to index into the original structure.
For example
% Some example data
s = struct('id', {1, 2, 3}, 'n', {'a', 'b', 'c'}, 'Budget', {100, 200, 300});
% Remove all entries with id == 2
s = s([s.id] ~= 2);
% Remove entries that have an id of 2 or 3
s = s(~ismember([s.id], [2 3]));
% Find ones with an `n` of 'a' (uses a cell array since it's strings)
s = s(ismember({s.id}, 'a'));

How do concatenation and indexing differ for cells and arrays in MATLAB?

I am a little confused about the usage of cells and arrays in MATLAB and would like some clarification on a few points. Here are my observations:
An array can dynamically adjust its own memory to allow for a dynamic number of elements, while cells seem to not act in the same way:
a=[]; a=[a 1]; b={}; b={b 1};
Several elements can be retrieved from cells, but it doesn't seem like they can be from arrays:
a={'1' '2'}; figure; plot(...); hold on; plot(...); legend(a{1:2});
b=['1' '2']; figure; plot(...); hold on; plot(...); legend(b(1:2));
%# b(1:2) is an array, not its elements, so it is wrong with legend.
Are these correct? What are some other different usages between cells and array?
Cell arrays can be a little tricky since you can use the [], (), and {} syntaxes in various ways for creating, concatenating, and indexing them, although they each do different things. Addressing your two points:
To grow a cell array, you can use one of the following syntaxes:
b = [b {1}]; % Make a cell with 1 in it, and append it to the existing
% cell array b using []
b = {b{:} 1}; % Get the contents of the cell array as a comma-separated
% list, then regroup them into a cell array along with a
% new value 1
b{end+1} = 1; % Append a new cell to the end of b using {}
b(end+1) = {1}; % Append a new cell to the end of b using ()
When you index a cell array with (), it returns a subset of cells in a cell array. When you index a cell array with {}, it returns a comma-separated list of the cell contents. For example:
b = {1 2 3 4 5}; % A 1-by-5 cell array
c = b(2:4); % A 1-by-3 cell array, equivalent to {2 3 4}
d = [b{2:4}]; % A 1-by-3 numeric array, equivalent to [2 3 4]
For d, the {} syntax extracts the contents of cells 2, 3, and 4 as a comma-separated list, then uses [] to collect these values into a numeric array. Therefore, b{2:4} is equivalent to writing b{2}, b{3}, b{4}, or 2, 3, 4.
With respect to your call to legend, the syntax legend(a{1:2}) is equivalent to legend(a{1}, a{2}), or legend('1', '2'). Thus two arguments (two separate characters) are passed to legend. The syntax legend(b(1:2)) passes a single argument, which is a 1-by-2 string '12'.
Every cell array is an array! From this answer:
[] is an array-related operator. An array can be of any type - array of numbers, char array (string), struct array or cell array. All elements in an array must be of the same type!
Example: [1,2,3,4]
{} is a type. Imagine you want to put items of different type into an array - a number and a string. This is possible with a trick - first put each item into a container {} and then make an array with these containers - cell array.
Example: [{1},{'Hallo'}] with shorthand notation {1, 'Hallo'}

Resources