Shuffle, then find and replace duplicates in two dimensional array - without sorting - c

I'm looking for efficient algorithm (or any at all..) for this tricky thing. I'll simplify my problem. In my application, this array is about 10000 times bigger :)
I have an 2D array like this:
0 2 1 3 4
1 2 0 4 3
0 2 1 3 4
4 1 2 3 0
Yes, in every row there are values range from 0 to 4 but in different order. The order matters! I can't just sort it and solve this in easy way :)
Then, I shuffle it by choosing a random indexes and swapping them - couple of times. Example result:
0 1 1 1 4
1 2 2 4 3
0 2 3 3 4
4 2 0 3 0
I see duplicates in the rows, that's not good.. Algorithm should find this duplicates and replace them with a value that will not be another duplicate in particular row, for example:
0 1 2 3 4
1 2 0 4 3
0 2 3 1 4
4 2 0 3 1
Can you share your ideas? Maybe there is already very famous algorithm for this problem? I'd be grateful for any hint.
EDIT
Clarification for T_G: After the shuffle, particular row can't exchange values with another rows. It need to find duplicates and replace it with available (any) value left - which is not another duplicate.
After shuffling:
0 1 1 1 4
1 2 2 4 3
0 2 3 3 4
4 2 0 3 0
Steps:
I have 0; I don't see another zeros. Next.
I have 1; I see another 1; I should change it (the second one); there is no 2 in this row, so lets change this duplicate 1 to 2.
I have 1; I see another 1. I should change it (the second one); there is no 3 in this row, so lets change this duplicate 1 to 3. etc...
So if you input this row:
0 0 0 0 0 0 0 0 0
You should get:
0 1 2 3 4 5 6 7 8

Try something like this:
// Iterate matrix lines, line by line
for(uint32_t line_no = 0; line_no < max_line_num; line_no++) {
// counters for each symbol 0-4; index is symbol, val is counter
uint8_t counters[6];
// Clear counters before usage
memset(0, counters, sizeof(counters));
// Compute counters
for(int i = 0; i < 6; i++)
counters[matrix[line_no][i]]++;
// Index of maybe unused symbol; by default is 4
int j = 4;
// Iterate line in reversed order
for(int i = 4; i >= 0; i--)
if(counters[matrix[line_no][i]] > 1) { // found dup
while(counters[j] != 0) // find unused symbol "j"
j--;
counters[matrix[line_no][i]]--; // Decrease dup counter
matrix[line_no][i] = j; // substitute dup to symbol j
counters[j]++; // this symbol j is used
} // for + if
} // for lines

Related

Transforming a data file into matrix with columns of identical elements?

I have a very large data file which has a format like below:
1 2 3 4 6 7 8
1 2 3 4 6
1 2 3 5 4 6
1 2 3 4 6
1 2 3 4 6
1 2 3 4 6 8
I am trying to load this data into Matlab. My aim is to create a matrix which has identical elements per one column and if some value is missing fill it with zero. So the output will be something like below:
1 2 3 4 0 6 7 8
1 2 3 4 0 6 0 0
1 2 3 4 5 6 0 0
1 2 3 4 0 6 0 0
1 2 3 4 0 6 0 0
1 2 3 4 0 6 0 8
Can someone give me any idea/code-snippets/links to realize this?
OK. Here is how I did it(test.dat is the file name with the input data):
%// The first section reads the dat file and fills missing entries in columns with zeros
fid = fopen('test.dat');
textLine = fgets(fid); % Read first line.
lineCounter = 1;
while ischar(textLine)
% get into numbers array.
numbers = sscanf(textLine, '%f ');
% Put numbers into a cell array IF and only if
% you need them after the loop has exited.
% First method - each number in one cell.
for k = 1 : length(numbers)
ca{lineCounter, k} = numbers(k);
end
% ALternate way where the whole array is in one cell.
ca2{lineCounter} = numbers;
% Read the next line.
textLine = fgets(fid);
lineCounter = lineCounter + 1;
end
fclose(fid);
emptyIndex = cellfun(#isempty,ca); %# Find indices of empty cells
ca(emptyIndex) = {0}; %# Fill empty cells with 0
A=cell2mat(ca);
%// The second section with create a new matrix AA from A matrix
%// which will be a unique entry in each column with missing entries as zero
uniq=unique(A);
row=size(A);
row=row(1);
%not considering zero
AA=zeros(row,uniq(end));
AA_idx=[];
for x=uniq(2):uniq(end)
AA_idxr=mod(find(A==x),row);
AA_idxr(AA_idxr==0)=row;
AA_idxc=x*ones(length(AA_idxr),1);
% AA_idxc(AA_idxc==0)=uniq(end)
c=[AA_idxr AA_idxc];
AA_idx=cat(1,AA_idx,c);
c=[];
end
for i=1:length(AA_idx)
index=AA_idx(i,:);
a=index(1);
b=index(2);
AA(a,b)=b;
end

Counting the occurance of a unique number in an array - MATLAB

I have an array that looks something like...
1 0 0 1 2 2 1 1 2 1 0
2 1 0 0 0 1 1 0 0 2 1
1 2 2 1 1 1 2 0 0 1 0
0 0 0 1 2 1 1 2 0 1 2
however my real array is (50x50).
I am relatively new to MATLAB and need to be able to count the amount of unique values in each row and column, for example there is four '1's in row-2 and three '0's in column-3. I need to be able to do this with my real array.
It would help even more if these quantities of unique values were in arrays of their own also.
PLEASE use simple language, or else i will get lost, for example if representing an array, don't call it x, but perhaps column_occurances_array... for me please :)
What I would do is iterate over each row of your matrix and calculate a histogram of occurrences for each row. Use histc to calculate the occurrences of each row. The thing that is nice about histc is that you are able to specify where the bins are to start accumulating. These correspond to the unique entries for each row of your matrix. As such, use unique to compute these unique entries.
Now, I would use arrayfun to iterate over all of your rows in your matrix, and this will produce a cell array. Each element in this cell array will give you the counts for each unique value for each row. Therefore, assuming your matrix of values is stored in A, you would simply do:
vals = arrayfun(#(x) [unique(A(x,:)); histc(A(x,:), unique(A(x,:)))], 1:size(A,1), 'uni', 0);
Now, if we want to display all of our counts, use celldisp. Using your example, and with the above code combined with celldisp, this is what I get:
vals{1} =
0 1 2
3 5 3
vals{2} =
0 1 2
5 4 2
vals{3} =
0 1 2
3 5 3
vals{4} =
0 1 2
4 4 3
What the above display is saying is that for the first row, you have 3 zeros, 5 ones and 3 twos. The second row has 5 zeros, 4 ones and 2 twos and so on. These are just for the rows. If you want to do these for columns, you have to modify your code slightly to operate along columns:
vals = arrayfun(#(x) [unique(A(:,x)) histc(A(:,x), unique(A(:,x)))].', 1:size(A,2), 'uni', 0);
By using celldisp, this is what we get:
vals{1} =
0 1 2
1 2 1
vals{2} =
0 1 2
2 1 1
vals{3} =
0 2
3 1
vals{4} =
0 1
1 3
vals{5} =
0 1 2
1 1 2
vals{6} =
1 2
3 1
vals{7} =
1 2
3 1
vals{8} =
0 1 2
2 1 1
vals{9} =
0 2
3 1
vals{10} =
1 2
3 1
vals{11} =
0 1 2
2 1 1
This means that in the first column, we see 1 zero, 2 ones and 1 two, etc. etc.
I absolutely agree with rayryeng! However, here is some code which might be easier to understand for you as a beginner. It is without cell arrays or arrayfuns and quite self-explanatory:
%% initialize your array randomly for demonstration:
numRows = 50;
numCols = 50;
yourArray = round(10*rand(numRows,numCols));
%% do some stuff of what you are asking for
% find all occuring numbers in yourArray
occVals = unique(yourArray(:));
% now you could sort them just for convinience
occVals = sort(occVals);
% now we could create a matrix occMat_row of dimension |occVals| x numRows
% where occMat_row(i,j) represents how often the ith value occurs in the
% jth row, analoguesly occMat_col:
occMat_row = zeros(length(occVals),numRows);
occMat_col = zeros(length(occVals),numCols);
for k = 1:length(occVals)
occMat_row(k,:) = sum(yourArray == occVals(k),2)';
occMat_col(k,:) = sum(yourArray == occVals(k),1);
end

Matlab - removing rows and columns from a matrix that contain 0's

I'm working on a problem involving beam deflections (it's not too fun :P)
I need to reduce the global stiffness matrix into the structure stiffness matrix, I do this by removing any rows and columns from the original matrix that contain a 0.
So if I have a matrix like so (let's call it K):
0 0 5 3 0 0
0 0 7 8 0 0
7 1 2 6 2 1
3 8 6 9 5 3
0 0 4 5 0 0
0 0 1 8 0 0
The reduced matrix (let's call it S) would be just
2 6
6 9
Here's what I have written so far to reduce global matrix K to stiffness matrix S
S = K;
for i = 1:length(S(:,1))
for j = 1:length(S(1,:))
if S(i,j) == 0
S(i,:) = [];
S(:,j) = [];
break;
end
end
end
However I get "Index exceeds matrix dimensions" on the line containing the "if" statement, and I'm not sure my thinking is correct on the best way to remove all rows and columns. Appreciate any feedback!
Easy:
S = K(all(K,2), all(K,1));
For nxn matrix, alternatively you can try out matrix multiplication based approach -
K=[
0 0 5 3 2 0
0 0 7 8 7 0
7 1 6 6 2 1
3 8 6 8 5 3
0 0 4 5 5 0
5 3 7 8 1 6] %// Slightly different than the one in question
K1 = double(K~=0)
K2 = K1*K1==size(K,1)
K3 = K(K2)
S = reshape(K3,max(sum(K2,1)),max(sum(K2,2)))
Output -
S =
6 6 2
6 8 5
7 8 1
The problem is when you remove some row or column you should not increase i or j but MATLAB's for loop automatically updates them. Also your algorithm cannot handle the cases like:
0 1 0
1 1 1
1 1 1
It will only remove the first column due to break condition so you need to remove it but handle indexes properly somehow. Another approach may be firstly taking product of rows and columns then checking those products and removing the corresponding rows and columns when an element of a product is zero. An example implementation in MATLAB might be like:
function [S] = stiff(K)
S = K;
% product of each row, rows(k) == 0 if there is a 0 in row k
rows = prod(S,2);
% product of each column, cols(k) == 0 if there is a 0 in column k
cols = prod(S,1);
Here we compute the product of each row and each column
% firstly eliminate the rows
% row numbers in the new matrix
ii=1;
for i = 1:size(S,1),
if rows(i) == 0,
S(ii, :) = []; % delete the row
else
ii = ii + 1; % skip the row
end
end
Here we remove rows that contain zeros by updating the index manually (notice ii).
% handle the columns now
ii = 1;
for i = 1:size(S,2),
if cols(i) == 0,
S(:, ii) = []; % delete the row
else
ii = ii + 1; % skip the row
end
end
end
Here we apply same operation to remaining columns.
Another method I can suggest is by converting the matrix K into a logical matrix where anything that is non-zero is 1 and 0 otherwise. You would then do a column sum on this matrix then check to see if any columns don't sum to the number of rows you have. You remove these columns, then do a row sum on the intermediate matrix and check if any rows don't sum to the number of columns you have. You remove these rows to be left with your final matrix. As such:
Kbool = K ~= 0;
colsToRemove = sum(Kbool,1) ~= size(Kbool,1);
K(colsToRemove,:) = [];
rowsToRemove = sum(Kbool,2) ~= size(Kbool,2);
K(:,rowsToRemove) = [];

Matlab counting elements in array

Hey guys I just have a quick question regarding counting elements in an array.
the array is something like this
B = [1 0 1 0 0 -1; 1 1 1 0 -1 -1; 0 1 -1 0 0 1]
From this array i want to create an array structure, called column counts and another row counts. I really do want to crate an array structure, even if it is a less efficient process.
basically i want to go through the array and total for each column, row the total amount of times these values occur. For instance for the first row, i want the following output.
Row Counts
-1 0 1
1 3 2
thanks in advance
You can use the hist function to do this.
fprintf('Row counts\n');
disp([-1 0 1])
fprintf('\n')
for row = 1:3
disp(hist(m(i,:),3));
end
yields
Row counts
-1 0 1
1 3 2
2 1 3
1 3 2
I don't fully understand your question, but if you want to count the occurrences of an element in a Matlab array you can do something like:
% Find value 3 in array A
A =[ 1 4 5 3 3 1 2 4 2 3 ];
count = sum( A == 3 )
When comparing A==3 Matlab will fill an array with 0 and 1, meaning the second one that the element in the given position in A has the element you were looking for. So you can count the occurrences by accumulating the values in the array A==3
Edit: you can access the different dimensions like that:
A = [ 1 2 3 4; 1 2 3 4; 1 2 3 4 ]; % 3rows x 4columns matrix
count1 = sum( A(:,1) == 2 ); % count occurrences in the first column
count2 = sum( A(:,3) == 2 ); % ' ' third column
count3 = sum( A(2,:) == 2 ); % ' ' second row
You always access given rows or columns like that.

Longest subsequence with alternating increasing and decreasing values

Given an array , we need to find the length of longest sub-sequence with alternating increasing and decreasing values.
For example , if the array is ,
7 4 8 9 3 5 2 1 then the L = 6 for 7,4,8,3,5,2 or 7,4,9,3,5,1 , etc.
It could also be the case that first we have small then big element.
What could be the most efficient solution for this ? I had a DP solution in mind. And if we were to do it using brute force how would we do it (O(n^3) ?) ?
And it's not a homework problem.
You indeed can use dynamic programming approach here. For sake of simplicity , assume we need to find only the maximal length of such sequence seq (it will be easy to tweak solution to find the sequence itself).
For each index we will store 2 values:
maximal length of alternating sequence ending at that element where last step was increasing (say, incr[i])
maximal length of alternating sequence ending at that element where last step was decreasing (say, decr[i])
also by definition we assume incr[0] = decr[0] = 1
then each incr[i] can be found recursively:
incr[i] = max(decr[j])+1, where j < i and seq[j] < seq[i]
decr[i] = max(incr[j])+1, where j < i and seq[j] > seq[i]
Required length of the sequence will be the maximum value in both arrays, complexity of this approach is O(N*N) and it requires 2N of extra memory (where N is the length of initial sequence)
simple example in c:
int seq[N]; // initial sequence
int incr[N], decr[N];
... // Init sequences, fill incr and decr with 1's as initial values
for (int i = 1; i < N; ++i){
for (int j = 0; j < i; ++j){
if (seq[j] < seq[i])
{
// handle "increasing" step - need to check previous "decreasing" value
if (decr[j]+1 > incr[i]) incr[i] = decr[j] + 1;
}
if (seq[j] > seq[i])
{
if (incr[j]+1 > decr[i]) decr[i] = incr[j] + 1;
}
}
}
... // Now all arrays are filled, iterate over them and find maximum value
How algorithm will work:
step 0 (initial values):
seq = 7 4 8 9 3 5 2 1
incr = 1 1 1 1 1 1 1 1
decr = 1 1 1 1 1 1 1 1
step 1 take value at index 1 ('4') and check previous values. 7 > 4 so we make "decreasing step from index 0 to index 1, new sequence values:
incr = 1 1 1 1 1 1 1 1
decr = 1 2 1 1 1 1 1 1
step 2. take value 8 and iterate over previous value:
7 < 8, make increasing step: incr[2] = MAX(incr[2], decr[0]+1):
incr = 1 1 2 1 1 1 1 1
decr = 1 2 1 1 1 1 1 1
4 < 8, make increasing step: incr[2] = MAX(incr[2], decr[1]+1):
incr = 1 1 3 1 1 1 1 1
decr = 1 2 1 1 1 1 1 1
etc...

Resources