comparing multiple column files using python3 - file

input_file1:
a 1 33
a 34 67
a 68 78
b 1 99
b 100 140
c 1 70
c 71 100
c 101 190
input file2:
a 5 23
a 30 72
a 76 78
b 5 30
c 23 88
c 92 98
I want to compare these two files such that for every value of 'a' in file2 the two integers (boundary) fall in the range (boundaries) of 'a' in file1 or between two ranges.

Instead of storing values like this 'a 1 33', you can make one structure (like 'a:1:33') for your data while writing into file. So that it will become easy to read data also.
Then, you can read each line and can split it based on ':' separator and you can compare with another file easily.

Related

What is a correct name for an operation that turns 3-column long table into a compact "2D" table with variable number of columns?

For example, from this table
row
col
val
0
A
32
0
B
31
0
C
35
1
A
30
1
B
29
1
C
29
2
A
15
2
B
14
2
D
18
3
A
34
3
B
39
3
C
34
3
D
35
it should produce this table:
A
B
C
D
0
32
31
35
1
30
29
29
2
15
14
18
3
34
39
34
35
Is there some official, canonical (or at least popular specific unambiguous) term for such operation (or its reverse)?
I am trying to find (or implement & publish) a tool that transforms CSV this way, but am unsure what to search for (or how to name it).
The term is pivot.
Some databases have native support for pivot, eg SQL Server's PIVOT (and even UNPIVOT) keywords.
For most databases you must craft a query that does the job.

SockMerchant Challenge Ruby Array#count not counting?

So, i'm doing a beginners challenge on HackerHank and, a strange behavior of ruby is boggling my mind.
The challenge is: find and count how many pairs there are in the array. (sock pairs)
Here's my code.
n = 100
ar = %w(50 49 38 49 78 36 25 96 10 67 78 58 98 8 53 1 4 7 29 6 59 93 74 3 67 47 12 85 84 40 81 85 89 70 33 66 6 9 13 67 75 42 24 73 49 28 25 5 86 53 10 44 45 35 47 11 81 10 47 16 49 79 52 89 100 36 6 57 96 18 23 71 11 99 95 12 78 19 16 64 23 77 7 19 11 5 81 43 14 27 11 63 57 62 3 56 50 9 13 45)
def sockMerchant(n, ar)
counter = 0
ar.each do |item|
if ar.count(item) >= 2
counter += ar.count(item)/2
ar.delete(item)
end
end
counter
end
print sockMerchant(n, ar)
The problem is, it doesn't count well. after running the function, in it's internal array ar still have countable pairs, and i prove it by running it again.
There's more. If you sort the array, it behaves differently.
it doesnt make sense to me.
you can check the behavior on this link
https://repl.it/repls/HuskyFrighteningNaturallanguage
You're deleting items from a collection while iterating over it - expect bad stuff to happen. In short, don't do that if you don't want to have such problems, see:
> arr = [1,2,1]
# => [1, 2, 1]
> arr.each {|x| puts x; arr.delete(x) }
# 1
# => [2]
We never get the 2 in our iteration.
A simple solution, that is a small variation of your code, could look as follows:
def sock_merchant(ar)
ar.uniq.sum do |item|
ar.count(item) / 2
end
end
Which is basically finding all unique socks, and then counting pairs for each of them.
Note that its complexity is n^2 as for each unique element n of the array, you have to go through the whole array in order to find all elements that are equal to n.
An alternative, first group all socks, then check how many pairs of each type we have:
ar.group_by(&:itself).sum { |k,v| v.size / 2 }
As ar.group_by(&:itself), short for ar.group_by { |x| x.itself } will loop through the array and create a hash looking like this:
{"50"=>["50", "50"], "49"=>["49", "49", "49", "49"], "38"=>["38"], ...}
And by calling sum on it, we'll iterate over it, summing the number of found elements (/2).

How to get the 3rd column in an array by using 1st and 2nd column values as index in Matlab

I have an array with three columns like this:
A B C
10 75 20
30 67 50
85 12 30
98 49 70
I have A and B values, and I want to get the corresponding C value.
For example if I enter (30,67) it should display 50.
Does Matlab have any trick for getting C value?
(my dataset is very large, and I need a fast way)
you can use ismember:
ABC = [10 75 20
30 67 50
85 12 30
98 49 70];
q = [30 67
85 12];
[~, locb] = ismember( q, ABC(:,1:2), 'rows' );
C = ABC(locb,3);
The result you get is
C =
50
30
Note that the code assume all pairs in q can be found in ABC.
Let your input data be defined as
data = [ 10 75 20
30 67 50
85 12 30
98 49 70];
values = [ 30 67];
This should be pretty fast:
index = data(:,1)==values(1) & data(:,2)==values(2); %// logical index to matching rows
result = data(index,3); %// third-column value for those rows
This gives all third-column values that match, should there be more than one.
If you want to specify several pairs of values at once, and obtain all matching results:
index = any(bsxfun(#eq, data(:,1).', values(:,1)), 1) & ...
any(bsxfun(#eq, data(:,2).', values(:,2)), 1);
result = data(index,3);
For example, given
data = [ 10 75 20
30 67 50
85 12 30
98 49 70
30 67 80 ];
values = [ 30 67
98 49];
the result would be
result =
50
70
80
You can create a sparse matrix. This solution only works if C does not contain any zeros and A and B are integers larger 0
A = [10 30 85 98]';
B = [75 67 12 49]';
C = [20 50 30 70]';
S = sparse(A,B,C);
S(10,75) % returns corresponding C-Value if found, 0 otherwise.
Try accumarray:
YourMatrix = accumarray([A B],C,[],#mean,true);
This way YourMatrix will be a matrix of size [max(A) max(B)], with the values of C at YourMatrix(A(ind),B(ind)), with ind the desired index of A and B:
A = [10 30 85 98]';
B = [75 67 12 49]';
C = [20 50 30 70]';
YourMatrix = accumarray([A B],C,[],#mean,true);
ind = 2;
YourMatrix(A(ind),B(ind))
ans =
50
This way, when there is a repetition in A B, it will return the corresponding C value, provided each unique pair of A B has the same C value. The true flag makes accumarray output a sparse matrix as opposed to a full matrix.

Retrieving specific elements in matlab from arrays in MATLAB

I have an array in MATLAB
For example
a = 1:100;
I want to select the first 4 element in every successive 10 elements.
In this example I want to b will be
b = [1,2,3,4,11,12,13,14, ...]
can I do it without for loop?
I read in the internet that i can select the element for each step:
b = a(1:10:end);
but this is not working for me.
Can you help me?
With reshape
%// reshaping your matrix to nx10 so that it has successive 10 elements in each row
temp = reshape(a,10,[]).'; %//'
%// taking first 4 columns and reshaping them back to a row vector
b = reshape(temp(:,1:4).',1,[]); %//'
Sample Run for smaller size (although this works for your actual dimensions)
a = 1:20;
>> b
b =
1 2 3 4 11 12 13 14
To vectorize the operation you must generate the indices you wish to extract:
a = 1:100;
b = a(reshape(bsxfun(#plus,(1:4)',0:10:length(a)-1),[],1));
Let's break down how this works. First, the bsxfun function. This performs a function, here it is addition (#plus) on each element of a vector. Since you want elements 1:4 we make this one dimension and the other dimension increases by tens. this will lead a Nx4 matrix where N is the number of groups of 4 we wish to extract.
The reshape function simply vectorizes this matrix so that we can use it to index the vector a. To better understand this line, try taking a look at the output of each function.
Sample Output:
>> b = a(reshape(bsxfun(#plus,(1:4)',0:10:length(a)-1),[],1))
b =
Columns 1 through 19
1 2 3 4 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43
Columns 20 through 38
44 51 52 53 54 61 62 63 64 71 72 73 74 81 82 83 84 91 92
Columns 39 through 40
93 94

How can I create an array of ratios inside a for loop in MATLAB?

I would like to create an array or vector of musical notes using a for loop. Every musical note, A, A#, B, C...etc is a 2^(1/12) ratio of the previous/next. E.G the note A is 440Hz, and A# is 440 * 2^(1/12) Hz = 446.16Hz.
Starting from 27.5Hz (A0), I want a loop that iterates 88 times to create an array of each notes frequency up to 4186Hz, so that will look like
f= [27.5 29.14 30.87 ... 4186.01]
So far, I've understood this much:
f = [];
for i=1:87,
%what goes here
% f = [27.5 * 2^(i/12)]; ?
end
return;
There is no need to do a loop for this in matlab, you can simply do:
f = 27.5 * 2.^((0:87)/12)
The answer:
f =
Columns 1 through 13
27.5 29.135 30.868 32.703 34.648 36.708 38.891 41.203 43.654 46.249 48.999 51.913 55
Columns 14 through 26
58.27 61.735 65.406 69.296 73.416 77.782 82.407 87.307 92.499 97.999 103.83 110 116.54
Columns 27 through 39
123.47 130.81 138.59 146.83 155.56 164.81 174.61 185 196 207.65 220 233.08 246.94
Columns 40 through 52
261.63 277.18 293.66 311.13 329.63 349.23 369.99 392 415.3 440 466.16 493.88 523.25
Columns 53 through 65
554.37 587.33 622.25 659.26 698.46 739.99 783.99 830.61 880 932.33 987.77 1046.5 1108.7
Columns 66 through 78
1174.7 1244.5 1318.5 1396.9 1480 1568 1661.2 1760 1864.7 1975.5 2093 2217.5 2349.3
Columns 79 through 88
2489 2637 2793.8 2960 3136 3322.4 3520 3729.3 3951.1 4186
maxind = 87;
f = zeros(1, maxind); % preallocate, better performance and avoids mlint warnings
for ii=1:maxind
f(ii) = 27.5 * 2^(ii/12);
end
The reason I named the loop variable ii is because i is the name of a builtin function. So it's considered bad practice to use that as a variable name.
Also, in your description you said you want to iterate 88 times, but the above loop only iterates 1 through 87 (both inclusive). If you want to iterate 88 times change maxind to 88.

Resources