How to generate random pairs of strings contained in a numpy array - arrays

Suppose I have the following numpy array of strings:
a = numpy.array(['apples', 'foobar', ‘bananas’, 'cowboy'])
I would like to generate len(a)/2 random pairs of strings from that matrix in another numpy matrix without repeating elements in each pair, with unique pairs and also with unique values for each pair (each value is unique for each pair). I also need to fix the random number generator so the pairs are always the same no matter how many times the algorithm is run. Is it possible to do that?

numpy.random.choice
The random.choice method is probably going to achieve what you're after.
import numpy as np
n_samples = 2
# Set the random state to the same value each time,
# this ensures the pseudorandom array that's generated is the same each time.
random_state = 42
np.random.seed(random_state)
a = np.array(['apples', 'foobar', ‘bananas’, 'cowboy'])
new_a = np.random.choice(a, (2, n_samples), replace=False)
The replace=False ensures that an element is only used once, which would produce the following output.
array([['foobar', 'cowboy'],
['bananas', 'apples']], dtype='<U7')

Related

How to count for 2 different arrays how many times the elements are repeated, in MATLAB?

I have array A (44x1) and B (41x1), and I want to count for both arrays how many times the elements are repeated. And if the repeated values are present in both arrays, I want their counting to be divided (for instance: value 0.5 appears 500 times in A and 350 times in B, so now divide 500 by 350).
I have to do this for bigger arrays as well, so I was thinking about using a looping (but no idea how to do it on MATLAB).
I got what I want on python:
import pandas as pd
data1 = pd.read_excel('C:/Users/Desktop/Python/data1.xlsx')
data2 = pd.read_excel('C:/Users/Desktop/Python/data2.xlsx')
for i in data1['Mag'].value_counts() & data2['Mag'].value_counts():
a = data1['Mag'].value_counts()/data2['Mag'].value_counts()
print(a)
break
Any idea of how to do the same on MATLAB? Thanks!
Since you can enumerate all valid earthquake magnitude values, you could use:
% Make up some data
A=randi([2 58],[100 1])/10;
B=randi([2 58],[20 1])/10;
% Round data to nearest tenth
%A=round(A,1); %uncomment if necessary
%B=round(B,1); %same
% Divide frequencies
validmags=0.2:0.1:5.8;
Afreqs=sum(double( abs(A-validmags)<1e-6 ),1); %relies on implicit expansion; A must be a column vector and validmags must be a row vector; dimension argument to sum() only to remind user; double() not really needed
Bfreqs=sum(double( abs(B-validmags)<1e-6 ),1); %same
Bfreqs./Afreqs, %for a fancier version: [{'Magnitude'} num2cell(validmags) ; {'Freq(B)/Freq(A)'} num2cell(Bfreqs./Afreqs)].'
The last line will produce NaN for 0/0, +Inf for nn/0, and 0 for 0/nn.
You could also use uniquetol, align the unique values of each vector, and divide the respective absolute frequencies. But I think the above approach is cleaner and easier to understand.

Match each element of one array with elements of other array without loops

I want to match each element of one array (lessnum) with elements of the other array say (cc). Then multiply with a number from the third array (gl). I am doing using loops. The length of arrays are very large therefore it takes couple of hours. Is it possible to do without loops or make it faster. Here is the code, I am doing,
uniquec=sort(unique(cc));
maxc=max(uniquec);
c35p=0.35*maxc;
lessnum=uniquec(uniquec<=c35p);
greaternum=uniquec(uniquec>c35p);
gl=linspace(1,2,length(lessnum));
gr=linspace(2,1,length(greaternum));
newC=zeros(size(cc));
for i=1:length(gl)
newC(cc==lessnum(i))= cc(cc==lessnum(i)).*gl(i);
end
for i=1:length(gr)
newC(cc==greaternum(i))= cc(cc==greaternum(i)).*gr(i);
end
What you need to do is instead of storing the values that are less than or greater than c35p in lessnum and greaternum, respectively, you should store the indices of these numbers. That way, you can directly access the newC variable using these indices and then multiply your linearly generated values.
Further modifications are explained in the code itself. If you have any confusion you can read the help for unique
Here is the modified code (I assume that cc is a one-dimensional array)
%randomly generate a cc vector
cc = randi(100, 1, 10);
% modified code below
[uniquec, ~, induniquec]=unique(cc, 'sorted'); % modified to explicitly specify the inbuilt sorting capability of unique and generate the indicies of unique values in the array
maxc=max(uniquec);
c35p=0.35*maxc;
lessnum=uniquec<=c35p; % instead of lessnum=uniquec(uniquec<=c35p);
greaternum=uniquec>c35p; % instead of greaternum=uniquec(uniquec>c35p);
gl=linspace(1,2,sum(lessnum));
gr=linspace(2,1,sum(greaternum));
% now there is no need for 'for' loops. We first modify the unique values as specified and then regenerate the required matrix using the indices obtained previously
newC=uniquec;
newC(lessnum) = newC(lessnum) .* gl;
newC(greaternum) = newC(greaternum) .* gr;
newC = newC(induniquec);
This new code will run much faster than the original one but is much more memory intensive depending on the number of unique values in your original array.

Get specific cells from a cell array

I have a numeric array sized 1000x1 which have values 0 and 1, called conditionArray.
I have a cell array called netNames with the same size (1000x1) and its cells contain string values (which are name of some circuit nets).
I want to extract net names which from netNames which their pairwise condition bit is 1 in conditionArray.
E.g. if conditionArray(100) is equal to extract its net name from netNames{100}.
Output of this process can be stored in an string array or cell array.
Are there any ways to do this operation with pairwise operations or I should use a for statement for this?
You shall check out cellfun in Matlab anytime you want to manipulate each element inside a cellarray without using a for loop.
As I understand, you have:
N = 1000;
% an array with 0s and 1s (this generates random 0s and 1s):
conditionArray = randi([0,1],N);
% a cell array with strings (this generates random 5-character strings):
netNames = cell(N);
netNames = cellfun(#(c)char(randi([double('a'),double('z')],1,5)), netNames, 'UniformOutput',false);
To extract the elements from netNames where conditionArray is 1, you can do:
netNames(conditionArray==1)
This uses logical indexing into the cell array.

How to structure multiple python arrays for sorting

A fourier analysis I'm doing outputs 5 data fields, each of which I've collected into 1-d numpy arrays: freq bin #, amplitude, wavelength, normalized amplitude, %power.
How best to structure the data so I can sort by descending amplitude?
When testing with just one data field, I was able to use a dict as follows:
fourier_tuples = zip(range(len(fourier)), fourier)
fourier_map = dict(fourier_tuples)
import operator
fourier_sorted = sorted(fourier_map.items(), key=operator.itemgetter(1))
fourier_sorted = np.argsort(-fourier)[:3]
My intent was to add the other arrays to line 1, but this doesn't work since dicts only accept 2 terms. (That's why this post doesn't solve my issue.)
Stepping back, is this a reasonable approach, or are there better ways to combine & sort separate arrays? Ultimately, I want to take the data values from the top 3 freqs and associated other data, and write them to an output data file.
Here's a snippet of my data:
fourier = np.array([1.77635684e-14, 4.49872050e+01, 1.05094837e+01, 8.24322470e+00, 2.36715913e+01])
freqs = np.array([0. , 0.00246951, 0.00493902, 0.00740854, 0.00987805])
wavelengths = np.array([inf, 404.93827165, 202.46913583, 134.97942388, 101.23456791])
amps = np.array([4.33257766e-16, 1.09724890e+00, 2.56328871e-01, 2.01054261e-01, 5.77355886e-01])
powers% = np.array([4.8508237956526163e-32, 0.31112370227749603, 0.016979224022185751, 0.010445983875848858, 0.086141014686372669])
The last 4 arrays are other fields corresponding to 'fourier'. (Actual array lengths are 42, but pared down to 5 for simplicity.)
You appear to be using numpy, so here is the numpy way of doing this. You have the right function np.argsort in your post, but you don't seem to use it correctly:
order = np.argsort(amplitudes)
This is similar to your dictionary trick only it computes the inverse shuffling compared to your procedure. Btw. why go through a dictionary and not simply a list of tuples?
The contents of order are now indices into amplitudes the first cell of order contains the position of the smallest element of amplitudes, the second cell contains the position of the next etc. Therefore
top5 = order[:-6:-1]
Provided your data are 1d numpy arrays you can use top5 to extract the elements corresponding to the top 5 ampltiudes by using advanced indexing
freq_bin[top5]
amplitudes[top5]
wavelength[top5]
If you want you can group them together in columns and apply top5 to the resulting n-by-5 array:
np.c_[freq_bin, amplitudes, wavelength, ...][top5, :]
If I understand correctly you have 5 separate lists of the same length and you are trying to sort all of them based on one of them. To do that you can either use numpy or do it with vanilla python. Here are two examples from top of my head (sorting is based on the 2nd list).
a = [11,13,10,14,15]
b = [2,4,1,0,3]
c = [22,20,23,25,24]
#numpy solution
import numpy as np
my_array = np.array([a,b,c])
my_sorted_array = my_array[:,my_array[1,:].argsort()]
#vanilla python solution
from operator import itemgetter
my_list = zip(a,b,c)
my_sorted_list = sorted(my_list,key=itemgetter(1))
You can then flip the array with my_sorted_array = np.fliplr(my_sorted_array) if you wish or if you are working with lists you can reverse it in place with my_sorted_list.reverse()
EDIT:
To get first n values only, you have to simply slice the array similarly to what #Paul is suggesting. Slice is done in a similar manner to classic list slicing by specifying start:stop:step (you can omit the step) arguments. In your case for 5 top columns it would be [:,-5:]. So in the example above you can take top 2 columns from each row like this:
my_sliced_sorted_array = my_sorted_array[:,-2:]
result will be:
array([[15, 13],
[ 3, 4],
[24, 20]])
Hope it helps.

Fast Random Permutation of Binary Array

For my project, I wish to quickly generate random permutations of a binary array of fixed length and a given number of 1s and 0s. Given these random permutations, I wish to add them elementwise.
I am currently using numpy's ndarray object, which is convenient for adding elementwise. My current code is as follows:
# n is the length of the array. I want to run this across a range of
# n=100 to n=1000.
row = np.zeros(n)
# m_list is a given list of integers. I am iterating over many possible
# combinations of possible values for m in m_list. For example, m_list
# could equal [5, 100, 201], for n = 500.
for m in m_list:
row += np.random.permutation(np.concatenate([np.ones(m), np.zeros(n - m)]))
My question is, is there any faster way to do this? According to timeit, 1000000 calls of "np.random.permutation(np.concatenate([np.ones(m), np.zeros(n - m)]))" takes 49.6 seconds. For my program's purposes, I'd like to decrease this by an order of magnitude. Can anyone suggest a faster way to do this?
Thank you!
For me version with array allocation outside the loop
was faster but not much - 8% or so, using cProfile
row = np.zeros(n, dtype=np.float64)
wrk = np.zeros(n, dtype=np.float64)
for m in m_list:
wrk[0:m] = 1.0
wrk[m:n] = 0.0
row += np.random.permutation(wrk)
You might try to shuffle(wrk) in-place instead of returning another array from permutation, but for me difference was negligible

Resources