Adding random numbers to each number in a column of a table - arrays

I want to add random numbers to each element within the column of a table. This is is what I've been doing, but my approach adds same random number to all elements in that specific column.
NewEdge(:,2) = NewEdge(:,2)+ randi(3);
How can I add a different random number to each element?

NewEdge(:,2) = NewEdge(:,2)+ randi(3,size(NewEdge(:,2)); % Looks pretty
NewEdge(:,2) = NewEdge(:,2)+ randi(3,size(NewEdge,1),1); % Probably faster
randi(3) is a single scalar. Random, but still one number. You want to add a vector of random numbers, so call randi(imax,sz1,sz2), where imax is your maximum allowable integer, 3 in your case, and sz1,sz2 the sizes of your desired matrix, in this case you want as many rows as contained in NewEdge, and only a single column.

Related

How to count for 2 different arrays how many times the elements are repeated, in MATLAB?

I have array A (44x1) and B (41x1), and I want to count for both arrays how many times the elements are repeated. And if the repeated values are present in both arrays, I want their counting to be divided (for instance: value 0.5 appears 500 times in A and 350 times in B, so now divide 500 by 350).
I have to do this for bigger arrays as well, so I was thinking about using a looping (but no idea how to do it on MATLAB).
I got what I want on python:
import pandas as pd
data1 = pd.read_excel('C:/Users/Desktop/Python/data1.xlsx')
data2 = pd.read_excel('C:/Users/Desktop/Python/data2.xlsx')
for i in data1['Mag'].value_counts() & data2['Mag'].value_counts():
a = data1['Mag'].value_counts()/data2['Mag'].value_counts()
print(a)
break
Any idea of how to do the same on MATLAB? Thanks!
Since you can enumerate all valid earthquake magnitude values, you could use:
% Make up some data
A=randi([2 58],[100 1])/10;
B=randi([2 58],[20 1])/10;
% Round data to nearest tenth
%A=round(A,1); %uncomment if necessary
%B=round(B,1); %same
% Divide frequencies
validmags=0.2:0.1:5.8;
Afreqs=sum(double( abs(A-validmags)<1e-6 ),1); %relies on implicit expansion; A must be a column vector and validmags must be a row vector; dimension argument to sum() only to remind user; double() not really needed
Bfreqs=sum(double( abs(B-validmags)<1e-6 ),1); %same
Bfreqs./Afreqs, %for a fancier version: [{'Magnitude'} num2cell(validmags) ; {'Freq(B)/Freq(A)'} num2cell(Bfreqs./Afreqs)].'
The last line will produce NaN for 0/0, +Inf for nn/0, and 0 for 0/nn.
You could also use uniquetol, align the unique values of each vector, and divide the respective absolute frequencies. But I think the above approach is cleaner and easier to understand.

Split matrix into several depending on value in Matlab

I have a cell array that I need to split into several matrices so that I can take the sum of subsets of the data. This is a sample of what I have:
A = {'M00.300', '1644.07';...
'M00.300', '9745.42'; ...
'M00.300', '2232.88'; ...
'M00.600', '13180.82'; ...
'M00.600', '2755.19'; ...
'M00.600', '15800.38'; ...
'M00.900', '18088.11'; ...
'M00.900', '1666.61'};
I want the sum of the second columns for each of 'M00.300', 'M00.600', and 'M00.900'. For example, to correspond to 'M00.300' I would want 1644.07 + 9745.42 + 2232.88.
I don't want to just hard code it because each data set is different, so I need the code to work for different size cell arrays.
I'm not sure of the best way to do this, I was going to begin by looping through A and comparing the strings in the first column and creating matrices within that loop, but that sounded messy and not efficient.
Is there a simpler way to do this?
Classic use of accumarray. You would use the first column as an index and the second column as the values associated with each index. accumarray works where you group values that belong to the same index together and you apply a function to those values. In your case, you'd use the default behaviour and sum things together.
However, you'll need to convert the first column into numeric labels. The third output of unique will help you do this. You'll also need to convert the second column into a numeric array and so str2double is a perfect way to do this.
Without further ado:
[val,~,id] = unique(A(:,1)); %// Get unique values and indices
out = accumarray(id, str2double(A(:,2))); %// Aggregate the groups and sum
format long g; %// For better display of precision
T = table(val, out) %// Display on a nice table
I get this:
>> T = table(val, out)
T =
val out
_________ ________
'M00.300' 13622.37
'M00.600' 31736.39
'M00.900' 19754.72
The above uses the table class that is available from R2013b and onwards. If you don't have this, you can perhaps use a for loop and print out each cell and value separately:
for idx = 1 : numel(out)
fprintf('%s: %f\n', val{idx}, out(idx));
end
We get:
M00.300: 13622.370000
M00.600: 31736.390000
M00.900: 19754.720000

Randomize matrix elements between two values while keeping row and column sums fixed (MATLAB)

I have a bit of a technical issue, but I feel like it should be possible with MATLAB's powerful toolset.
What I have is a random n by n matrix of 0's and w's, say generated with
A=w*(rand(n,n)<p);
A typical value of w would be 3000, but that should not matter too much.
Now, this matrix has two important quantities, the vectors
c = sum(A,1);
r = sum(A,2)';
These are two row vectors, the first denotes the sum of each column and the second the sum of each row.
What I want to do next is randomize each value of w, for example between 0.5 and 2. This I would do as
rand_M = (0.5-2).*rand(n,n) + 0.5
A_rand = rand_M.*A;
However, I don't want to just pick these random numbers: I want them to be such that for every column and row, the sums are still equal to the elements of c and r. So to clean up the notation a bit, say we define
A_rand_c = sum(A_rand,1);
A_rand_r = sum(A_rand,2)';
I want that for all j = 1:n, A_rand_c(j) = c(j) and A_rand_r(j) = r(j).
What I'm looking for is a way to redraw the elements of rand_M in a sort of algorithmic fashion I suppose, so that these demands are finally satisfied.
Now of course, unless I have infinite amounts of time this might not really happen. I therefore accept these quantities to fall into a specific range: A_rand_c(j) has to be an element of [(1-e)*c(j),(1+e)*c(j)] and A_rand_r(j) of [(1-e)*r(j),(1+e)*r(j)]. This e I define beforehand, say like 0.001 or something.
Would anyone be able to help me in the process of finding a way to do this? I've tried an approach where I just randomly repick the numbers, but this really isn't getting me anywhere. It does not have to be crazy efficient either, I just need it to work in finite time for networks of size, say, n = 50.
To be clear, the final output is the matrix A_rand that satisfies these constraints.
Edit:
Alright, so after thinking a bit I suppose it might be doable with some while statement, that goes through every element of the matrix. The difficult part is that there are four possibilities: if you are in a specific element A_rand(i,j), it could be that A_rand_c(j) and A_rand_r(i) are both too small, both too large, or opposite. The first two cases are good, because then you can just redraw the random number until it is smaller than the current value and improve the situation. But the other two cases are problematic, as you will improve one situation but not the other. I guess it would have to look at which criteria is less satisfied, so that it tries to fix the one that is worse. But this is not trivial I would say..
You can take advantage of the fact that rows/columns with a single non-zero entry in A automatically give you results for that same entry in A_rand. If A(2,5) = w and it is the only non-zero entry in its column, then A_rand(2,5) = w as well. What else could it be?
You can alternate between finding these single-entry rows/cols, and assigning random numbers to entries where the value doesn't matter.
Here's a skeleton for the process:
A_rand=zeros(size(A)) is the matrix you are going to fill
entries_left = A>0 is a binary matrix showing which entries in A_rand you still need to fill
col_totals=sum(A,1) is the amount you still need to add in every column of A_rand
row_totals=sum(A,2) is the amount you still need to add in every row of A_rand
while sum( entries_left(:) ) > 0
% STEP 1:
% function to fill entries in A_rand if entries_left has rows/cols with one nonzero entry
% you will need to keep looping over this function until nothing changes
% update() A_rand, entries_left, row_totals, col_totals every time you loop
% STEP 2:
% let (i,j) be the indeces of the next non-zero entry in entries_left
% assign a random number to A_rand(i,j) <= col_totals(j) and <= row_totals(i)
% update() A_rand, entries_left, row_totals, col_totals
end
update()
A_rand(i,j) = random_value;
entries_left(i,j) = 0;
col_totals(j) = col_totals(j) - random_value;
row_totals(i) = row_totals(i) - random_value;
end
Picking the range for random_value might be a little tricky. The best I can think of is to draw it from a relatively narrow distribution centered around N*w*p where p is the probability of an entry in A being nonzero (this would be the average value of row/column totals).
This doesn't scale well to large matrices as it will grow with n^2 complexity. I tested it for a 200 by 200 matrix and it worked in about 20 seconds.

MATLAB Efficiently find the row that contains two of three elements in a large matrix

I have a large matrix, let's call it A, which has dimension Mx3, e.g. M=4000 rows x 3 columns. Each row in the matrix contains three numbers, eg. [241 112 478]. Out of these three numbers, we can construct three pairs, eg. [241 112], [112 478], [241 478]. Of the other 3999 rows:
For each of the three pairs, exactly one row of M (only one) will contain the same pair. However, the order of the numbers could be scrambled. For example, exactly one row will read: [333 478 112]. No other row will have both 478 and 112. I am interested in finding the index of that row, for each of the three pairs. The output should then be another matrix, call it B, with same dimensions 4000x3, where each row has the indices of the rows in the original matrix A that share a pair of numbers.
No other row will contain the same three numbers.
Other rows may contain none of the numbers or one of the numbers.
Here's a function that accomplishes this, but is very slow - I'd like to know if there is a more efficient way. Thanks in advance!
M=size(A,1); % no elements
B=zeros(M,3);
for j=1:M
l=1;
k=1;
while l<4 % there cant be more than 3
if k~=j
s=sum(ismember(A(j,:),A(k,:)));
if s==2
B(j,l)=k;
l=l+1;
end
end
k=k+1;
end
For loops are not needed, just use ismember the following way:
row_id1=find(sum(ismember(M,[241 112]),2)>1);
row_id2=find(sum(ismember(M,[478 112]),2)>1);
row_id3=find(sum(ismember(M,[478 241]),2)>1);
each row_id will give you the row index of the pair in that line regardless of the order at which it appears.
The only assumption made here is that one of the pair numbers you look for will not appear twice in a row, (i.e. [112 333 112]). If that assumption is wrong, this can be addressed using unique.

Django: Creating model instances with a random integer field value that average up to a specified number

I have a model like so:
class RunnerStat(models.Model):
id_card= models.CharField(max_length=32)
miles = models.PositiveSmallIntegerField()
last_modified = models.DateField(auto_now=True)
I want to create several RunnerStat instances with random miles values that average up to a specific number.
I know this might involve some statistics (distribution, etc.). Does anyone have any pointers? Or done something similar and could share some code?
Example: Create 100 RunnerStat objects with random miles values that average out to 10.
If the average has to be around the the given number but not exactly it you can use the random.gauss(mu, sigma) from the random module. This will create a more natural random set of of values that have a mean (average) around the given value for mu with a standard deviation of sigma. The more runners you create the closer the mean will get to the desired value.
import random
avg = 10
stddev = 5
n = random.gauss(avg,stddev)
for r in range(100):
r = RunnerStat(miles=avg+n)
r.save()
If you need the average to be the exact number then you could always create a runner (or more reasonably a few runners) that counter balance what ever your current difference from the mean is.
Well for something to average to a specific number, they need to add up to a specific number. For instance, if you want 100 items to average to 10, the 100 items need to add up to 1000 since 1000/100 = 10.
One way to do this, which isn't completely random is to generate a random number, then both subtract and add that to your average, generating two RunnerStat items.
So you do something like this (note this is from my head and untested):
import random
avg = 10
n = random.randint(0,5)
r1 = RunnerStat(miles=avg-n)
r2 = RunnerStat(miles=avg+n)
r1.save()
r2.save()
Of course fill in the other fields too. I just put the miles in the RunnerStats. The downside is that your RunnerStats must be an even number. You could write it to pass in a number and if it is odd the last one must be exactly the number you want the average to be.

Resources