A unique key for a two dimensional array of letters - c

I have a two dimensional array of letters. Any letter can vary according to a certain alphabet.
I want to make a unique key for this array according to the letters and its position.
For example, if the array is 3 * 3 and the alphabet is {0, a, b, c, *}, the array can be in the form like:
0 b c
b * a
a a 0
I have tried Key = sum(code(letter)*(r*3+c)) for all r and c, where r and c are the row and the column, but it still gives me the same key for different array forms.
What do I miss?
P.S. code(letter) is a mapping function to convert the letter into a value.

You need to take into account the size of alphabet. If code and indices are all zero based it would be:
key = Sum(code(letter)*pow(L, r*C+c))
where L is the number of letters and C is the number of columns. However watch out for numeric overflow. For larger alphabets or matrices you need to use one of the following:
Lessen the requirement of keys being unique and use a hash (hash combiner).
Larger number type for the key or even unlimited arithmetic type such as in GMP lib.
Compression such as arithmetic coding if the distribution of letters is not even. However you still run into the risk of not being able to fit / compress specific matrix into the key.

Related

Match each element of one array with elements of other array without loops

I want to match each element of one array (lessnum) with elements of the other array say (cc). Then multiply with a number from the third array (gl). I am doing using loops. The length of arrays are very large therefore it takes couple of hours. Is it possible to do without loops or make it faster. Here is the code, I am doing,
uniquec=sort(unique(cc));
maxc=max(uniquec);
c35p=0.35*maxc;
lessnum=uniquec(uniquec<=c35p);
greaternum=uniquec(uniquec>c35p);
gl=linspace(1,2,length(lessnum));
gr=linspace(2,1,length(greaternum));
newC=zeros(size(cc));
for i=1:length(gl)
newC(cc==lessnum(i))= cc(cc==lessnum(i)).*gl(i);
end
for i=1:length(gr)
newC(cc==greaternum(i))= cc(cc==greaternum(i)).*gr(i);
end
What you need to do is instead of storing the values that are less than or greater than c35p in lessnum and greaternum, respectively, you should store the indices of these numbers. That way, you can directly access the newC variable using these indices and then multiply your linearly generated values.
Further modifications are explained in the code itself. If you have any confusion you can read the help for unique
Here is the modified code (I assume that cc is a one-dimensional array)
%randomly generate a cc vector
cc = randi(100, 1, 10);
% modified code below
[uniquec, ~, induniquec]=unique(cc, 'sorted'); % modified to explicitly specify the inbuilt sorting capability of unique and generate the indicies of unique values in the array
maxc=max(uniquec);
c35p=0.35*maxc;
lessnum=uniquec<=c35p; % instead of lessnum=uniquec(uniquec<=c35p);
greaternum=uniquec>c35p; % instead of greaternum=uniquec(uniquec>c35p);
gl=linspace(1,2,sum(lessnum));
gr=linspace(2,1,sum(greaternum));
% now there is no need for 'for' loops. We first modify the unique values as specified and then regenerate the required matrix using the indices obtained previously
newC=uniquec;
newC(lessnum) = newC(lessnum) .* gl;
newC(greaternum) = newC(greaternum) .* gr;
newC = newC(induniquec);
This new code will run much faster than the original one but is much more memory intensive depending on the number of unique values in your original array.

Excel: Fill a range of cells with a value or formula depending on only one cell

We have a project on a certain math subject and I am done with the computations and it works just fine. So the task is, let's say you have a system of linear equations of certain number of unknowns, you input the number of unknowns, and fill in the values, and using matrix computations, find all the value of unknowns.
To make this short, I already finished the "find the value of unknowns" along with the computation, I checked it, and it seems fine. I can put 6 as the number of unknowns and it automatically computes the inverse of a 6x6 matrix and it will return the 6 unknown values using Index INDIVIDUALLY.
(Note: We aren't allowed to use VBA or Macros since we haven't discussed that yet.
The problem is, I don't know how to automatically fill a RANGE of cells with a VALUE or A FORMULA based on a SINGLE cell value.
For example, In cell A1, I will input 5 (which indicates the number of unknowns), then upon inputting this and hitting enter, let's say a range of cells A2 to A6 (which is 5 cells) will be automatically filled with incremented letters, like for A2 -> A ; A3 -> B ; ... A6 -> E, of which these letters indicate the 5 unknowns.
PROBLEM 2.
Another follow up question, let's say I input again 5, which again stands for the number of missing values/unknowns, in A1, besides the column of the variables A,B,C,D,E (5 unknowns), I want to automatically fill column B respectively with values from an array.
This is just the same with my first problem but this time, instead of incremented letters, it would be Incremented Index function.
For example: I input 5
*Column A will automatically be filled with the variables/letters
*Column B will automatically be filled with the values from an array that's computed using a formula but is not shown independently on cells.
I already have the formula
INDEX(Formula I created, Row number of the answer from the Formula I created , Column number of the answer from the formula I created)
The answers from the formula I made myself is also an array, an "n" rows and 1 column array. If I put the Index formula on a SINGLE cell, it returns specified row number value from the array that resulted in the computation from my formula
What I want is for example, for 5 unknowns
**A | B**
1|.......5..........................
2|.......A..............Some Value 1
3|.......B..............Some Value 2
4|.......C..............Some Value 3
5|.......D..............Some Value 4
6|.......E..............Some Value 5
Wherein the "Some Value" is the Arrayed Answer from my formula and the "1,2,3,4,5" specifies the row number from that arrayed answer.
This is upon inputting the matrix values, inputting the number of unknowns "n" in A1, and automatically filling a range of cells A2 to A"n" with letters A up to what letter "n" corresponds, and automatically filling a range of Cells B2 to B"n" with my formula but with incremented row number for every row in the Index(Formula, Row number , Column number) function.
Note: I hope there's a way to do this using excel functions only since we haven't discussed VBA or Macros yet so we can't use those, and even If we can, I have no knowledge for that. haha. :D
THANK YOU THANK YOU THANK YOU SO MUCH IN ADVANCED! Cheers. :D
Here's a formula for column A:A (write this in cell A2) and drag down:
=IF(ROW()-1<=$A$1,CHAR(ROW()+63),"")

Matlab cell array to string vector - unique

Going nuts with cell array, because I just can't get rid of it... However, it will be an easy one for you guys out here.
So here is why:
I have a dataset (data) which contains two variables: A (Numbers) and B (cell array).
Unfortunately I can't even reconstruct the problem nevertheless my imported table looks like this:
data=dataset;
data.A = [1;1;3;3;3];
data.B = ['A';'A';'BUU';'BUU';'A'];
where data.B is of the type 5x1 cell which I can't reconstruct
all I want now is the unique rows like
ans= [1 A;3 BUU;3 A]
the result should be in a dataset or just two vectors where the rows are equivalent.
but unique([dataA dataB],'rows') can't handle cell arrays and I can't find anywhere in the www how I simple convert the cell array B to a vector of strings (does it exist?).
cell2mat() didn't work for me, because of the different word length ('A' vs 'BUU').
Though, two things I would love to learn: Making an 5x1 cell to an string vector
and find unique rows out of numbers and strings (or cells).
Thank you very much!
Cheers Dominik
The problem is that the A and B fields are of a different type. Although they could be concatenated into a cell array, unique can't handle that. A general trick for cases like this is to "translate" elements of each field (column) to unique identifiers, i.e. numbers. This translation can be done applying unique to each field separately and getting its third output. The obtained identifiers can now be concatenated into a matrix, so that each row of this matrix is a "composite identifier". Finally, unique with 'rows' option can be applied to this matrix.
So, in your case:
[~, ~, kA] = unique(data.A);
[~, ~, kB] = unique(data.B);
[~, jR] = unique([kA kB], 'rows');
Now build the result as (same format as data)
result.A = data.A(jR);
result.B = data.B(jR);
or as (2D cell array)
result = cat(2, mat2cell(data.A(jR), ones(1,numel(jR))), data.B(jR));
Here is my clumpsy solution
tt.A = [1;1;3;3;3];
tt.B = {'A';'A';'BUU';'BUU';'A'};
Convert integers to characters, then merge and find unique strings
tt.C = cellstr(num2str(tt.A));
tt.D = cellfun(#(x,y) [x y],tt.C,tt.B,'UniformOutput',0);
[tt.F,tt.E] = unique(tt.D);
Display results
tt.F

Order insensitive hash function for an array

I'm looking for a hash-function which will produce the same result for unordered sequences containing same elements.
For example:
Array_1: [a, b, c]
Array_2: [b, a, c]
Array_3: [c, b, a]
The hash-function should return the same result for each of these arrays.
How to achieve this?
The most popular answer is to sort elements by some rule, then concatenate, then take hash.
Is there any other method?
if a,b,c are numbers, you could sum up and then build a hash on the sum.
You may multiply, too.
But take care about zeros!
XOR-ing numbers is also an approach.
for very small numbers you may consider to set the bit indexed by the number. This means building a long (64bit) as input for the hash allows only element numbers in range 0-63.
The more elements you have the more collisions you will get.
In the end you map n elements with m bits (resulting to 2^(m*n) range) to a hash value with k bits.
Usually m and k is a constant but n varies.
Please aware any access as by a hash requires a test whether to get the correct element. In general a hash is NOT unique.
otherwise sort the element and then do the hash as proposed
Regarding the comment from CodesInChaos:
in order to be able to omit a test, the numbers of bits of the hash should be much greater than the sum of elements bits. Say at least 64 bits more. In general this situation is not given.
One common case of secure hash/unique id is a guid. This means effectively 128 bits.
A random sequence of text char reaches this number of bits within 20-25 characters.
Longer texts are very likely to produce collisions. It depends on the use case whether this is still acceptable.
XOR | Sum | Sum of squares | ...
where | denotes concat.
or
XOR of hash of elements

Generating also non-unique (duplicated) permutations

I've written a basic permutation program in C.
The user types a number, and it prints all the permutations of that number.
Basically, this is how it works (the main algorithm is the one used to find the next higher permutation):
int currentPerm = toAscending(num);
int lastPerm = toDescending(num);
int counter = 1;
printf("%d", currentPerm);
while (currentPerm != lastPerm)
{
counter++;
currentPerm = nextHigherPerm(currentPerm);
printf("%d", currentPerm);
}
However, when the number input includes repeated digits - duplicates - some permutations are not being generated, since they're duplicates. The counter shows a different number than it's supposed to - Instead of showing the factorial of the number of digits in the number, it shows a smaller number, of only unique permutations.
For example:
num = 1234567
counter = 5040 (!7 - all unique)
num = 1123456
counter = 2520
num = 1112345
counter = 840
I want to it to treat repeated/duplicated digits as if they were different - I don't want to generate only unique permutations - but rather generate all the permutations, regardless of whether they're repeated and duplicates of others.
Uhm... why not just calculate the factorial of the length of the input string then? ;)
I want to it to treat repeated/duplicated digits as if they were
different - I don't want to calculate only the number of unique
permutations.
If the only information that nextHigherPerm() uses is the number that's passed in, you're out of luck. Consider nextHigherPerm(122). How can the function know how many versions of 122 it has already seen? Should nextHigherPerm(122) return 122 or 212? There's no way to know unless you keep track of the current state of the generator separately.
When you have 3 letters for example ABC, you can make: ABC, ACB, BAC, BCA, CAB, CBA, 6 combinations (6!). If 2 of those letters repeat like AAB, you can make: AAB, ABA, BAA, IT IS NOT 3! so What is it? From where does it comes from? The real way to calculate it when a digit or letter is repeated is with combinations -> ( n k ) = n! / ( n! * ( n! - k! ) )
Let's make another illustrative example: AAAB, then the possible combinations are AAAB, AABA, ABAA, BAAA only four combinations, and if you calcualte them by the formula 4C3 = 4.
How is the correct procedure to generate all these lists:
Store the digits in an array. Example ABCD.
Set the 0 element of the array as the pivot element, and exclude it from the temp array. A {BCD}
Then as you want all the combinations (Even the repeated), move the elements of the temporal array to the right or left (However you like) until you reach the n element.
A{BCD}------------A{CDB}------------A{DBC}
Do the second step again but with the temp array.
A{B{CD}}------------A{C{DB}}------------A{D{BC}}
Do the third step again but inside the second temp array.
A{B{CD}}------------A{C{DB}}------------A{D{BC}}
A{B{DC}}------------A{C{BD}}------------A{D{CB}}
Go to the first array and move the array, BCDA, set B as pivot, and do this until you find all combinations.
Why not convert it to a string then treat your program like an anagram generator?

Resources