I have 3 txt files s1.txt, s2.txt, s3.txt.Each have the same format and number of data.I want to combine only the second column of each of the 3 files into one file. Before I combine the data, I sorted it according to the 1st column:
UnSorted file:
s1.txt s2.txt s3.txt
1 23 2 33 3 22
4 32 4 32 2 11
5 22 1 10 5 28
2 55 8 11 7 11
Sorted file:
s1.txt s2.txt s3.txt
1 23 1 10 2 11
2 55 2 33 3 22
4 32 4 32 5 28
5 22 8 11 7 11
Here is the code I have so far:
BaseFile ='s'
n=3
fid=fopen('RT.txt','w');
for i=1:n
%Open each file consecutively
d(i)=fopen([BaseFile num2str(i)'.txt']);
%read data from file
A=textscan(d(i),'%f%f')
a=A{1}
b=A{2}
ab=[a,b];
%sort the data according to the 1st column
B=sortrows(ab,1);
%delete the 1st column after being sorted
B(:,1)=[]
%write to a new file
fprintf(fid,'%d\n',B');
%close (d(i));
end
fclose(fid);
How can I get the output in the new txt file in this format?
23 10 11
55 33 22
32 32 28
22 11 11
instead of this format?
23
55
32
22
10
33
32
11
11
22
28
11
Create the output matrix first, then write it to the file.
Here is the new code:
BaseFile ='s';
n=3;
for i=1:n % it's not recommended to use i or j as variables, since they used in complex math, but I'll leave it up to you
% Open each file consecutively
d=fopen([BaseFile num2str(i) '.txt']);
% read data from file
A=textscan(d,'%f%f', 'CollectOutput',1);
% sort the data according to the 1st column
B=sortrows(A{:},1);
% Instead of deleting a column create new matrix
if(i==1)
C = zeros(size(B,1),n);
end
% Check input file and save the 2nd column
if size(B,1) ~= size(C,1)
error('Input files have different number of rows');
end
C(:,i) = B(:,2);
% don't write yet
fclose (d);
end
% write to a new file
fid=fopen('RT.txt','w');
for k=1:size(C,1)
fprintf(fid, [repmat('%d\t',1,n-1) '%d\n'], C(k,:));
end
fclose(fid);
EDIT:
Actually to write only numbers to a file you don't need FPRINTF. Use DLMWRITE instead:
dlmwrite('RT.txt',C,'\t')
Related
I need to find the the size of bin with maximum and minimum element. I am using histc function in MATLAB.
Here is what I am doing,
A=[1 2 3 11 22 3 4 55 6 7 2 33 44 5 22]
edges = [10 inf];
N = histc(A,edges)
it gives N=[6,0]; means there are 6 elements having values greater than 10. Now I want to count what is the maximum count in a bin for my condition.
here it should be 2 as there are two instances where we have two integers satisfying my condition 11 22 and 33 44
How to count it in MATLAB.
Here you go;
A=[1 2 3 11 22 3 4 55 6 7 2 33 44 5 22]
arr=diff([0 (find(~(A>10))) numel(A)+1]) -1;
arr(find(arr(1,:)==0))=[];
largest=max(arr); % longest sequence of occurences of numbers > 10
smallest=min(arr); % smallest sequence of occurences of numbers > 10
Cheers!!
Given a 5 x 5 Grid comprising of tiles numbered from 1 to 25 and a set of 5 start-end point pairs.
For each pair,find a path from the start point to the end point.
The paths should meet the below conditions:
a) Only Horizontal and Vertical moves allowed.
b) No two paths should overlap.
c) Paths should cover the entire grid
Input consist of 5 lines.
Each line contains two space-separated integers,Starting and Ending point.
Output: Print 5 lines. Each line consisting of space-separated integers,the path for the corresponding start-end pair. Assume that such a path Always exists. In case of Multiple Solution,print any one of them.
Sample Input
1 22
4 17
5 18
9 13
20 23
Sample Output
1 6 11 16 21 22
4 3 2 7 12 17
5 10 15 14 19 18
9 8 13
20 25 24 23
i think there should be restriction or it lacks some more information about the input ( start point and endpoint)
because if we take following input then covering whole grid is not possible
1 22,
6 7,
11 12,
16 17,
8 9
i have extracted 23 sentences from a text file which are divided and shown in separate line each sentence is given a number in ascending order {1,2,3,...}, code i used for this is as follows:
sentences = regexp(F,'\S.*?[\.\!\?]','match')
char(sentences)
now i did some processing and got filtered answer which shows a subset of sentences as shown below:
result = 1 4 5 9 11 14 16 17
the code i used for result is as follows:
result = unique([OccursTogether{:}]);
display(result)
now what i want to do is to show the sentences that are not present in the result variable for example the result i need is as follows:
result2 = 2 3 6 7 8 10 12 13 15 18 19 20 21 22 23
remember sentences is [1*N] cell where as result is simple array saving integers.
The function you are looking for is setdiff:
%// Create an array containing the indices of all the sentences
AllSentences = 1:23;
%// Indices of sentences present
result = [1 4 5 9 11 14 16 17]
%// And not present
NotPresent = setdiff(AllSentences,result)
NotPresent =
Columns 1 through 13
2 3 6 7 8 10 12 13 15 18 19 20 21
Columns 14 through 15
22 23
I'm not sure to understand what is a cell array and what is not, but for cell arrays you can convert them to numeric arrays using cell2mat and apply the same methodology.
Eg:
AllSentences = {1:23};
NotPresent = setdiff(cell2mat(AllSentences),result)
I need to loop through coloumn 1 of a matrix and return (i) when I have come across ALL of the elements of another vector which i can predefine.
check_vector = [1:43] %% I dont actually need to predefine this - i know I am looking for the numbers 1 to 43.
matrix_a coloumn 1 (which is the only coloumn i am interested in looks like this for example
1
4
3
5
6
7
8
9
10
11
12
13
14
16
15
18
17
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
1
3
4
2
6
7
8
We want to loop through matrix_a and return the value of (i) when we have hit all of the numbers in the range 1 to 43.
In the above example we are looking for all the numbers from 1 to 43 and the iteration will end round about position 47 in matrix_a because it is at this point that we hit number '2' which is the last number to complete all numbers in the sequence 1 to 43.
It doesnt matter if we hit several of one number on the way, we count all those - we just want to know when we have reached all the numbers from the check vector or in this example in the sequence 1 to 43.
Ive tried something like:
completed = []
for i = 1:43
complete(i) = find(matrix_a(:,1) == i,1,'first')
end
but not working.
Assuming A as the input column vector, two approaches could be suggested here.
Approach #1
With arrayfun -
check_vector = [1:43]
idx = find(arrayfun(#(n) all(ismember(check_vector,A(1:n))),1:numel(A)),1)+1
gives -
idx =
47
Approach #2
With customary bsxfun -
check_vector = [1:43]
idx = find(all(cumsum(bsxfun(#eq,A(:),check_vector),1)~=0,2),1)+1
To find the first entry at which all unique values of matrix_a have already appeared (that is, if check_vector consists of all unique values of matrix_a): the unique function almost gives the answer:
[~, ind] = unique(matrix_a, 'first');
result = max(ind);
Someone might have a more compact answer but is this what your after?
maxIndex = 0;
for ii=1:length(a)
[f,index] = ismember(ii,a);
maxIndex=max(maxIndex,max(index));
end
maxIndex
Here is one solution without a loop and without any conditions on the vectors to be compared. Given two vectors a and b, this code will find the smallest index idx where a(1:idx) contains all elements of b. idx will be 0 when b is not contained in a.
a = [ 1 4 3 5 6 7 8 9 10 11 12 13 14 16 15 18 17 19 20 21 22 23 24 25 26 ...
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 1 3 4 2 6 7 8 50];
b = 1:43;
[~, Loca] = ismember(b,a);
idx = max(Loca) * all(Loca);
Some details:
ismember(b,a) checks if all elements of b can be found in a and the output Loca lists the indices of these elements within a. The index will be 0, if the element cannot be found in a.
idx = max(Loca) then is the highest index in this list of indices, so the smallest one where all elements of b are found within a(1:idx).
all(Loca) finally checks if all indices in Loca are nonzero, i.e. if all elements of b have been found in a.
I have multiple folders Case-1, Case-2....Case-N and they all have a file named PPD. I want to extract all 2nd columns and put them into one file named 123.dat.
It seems that I cannot use awk in a for loop.
case=$1
for (( i = 1; i <= $case ; i ++ ))
do
file=Case-$i
cp $file/PPD temp$i.dat
awk 'FNR==1{f++}{a[f,FNR]=$2}
END
{for(x=1;x<=FNR;x++)
{for(y=1;y<ARGC;y++)
printf("%s ",a[y,x]);print ""} }'
temp$i.dat >> 123.dat
done
Now 123.dat only has the date of the last PPD in Case-N
I know I can use join(I used that command before) if every PPD file has at least one column the same, but it turns out to be extremely slow if I have lots of Case folders
Maybe
eval paste $(printf ' <(cut -f2 %s)' Case-*/PPD)
There is probably a limit to how many process substitutions you can perform in one go. I did this with 20 columns and it was fine. Process substitutions are a Bash feature, so not portable to other Bourne-compatible shells in general.
The wildcard will be expanded in alphabetical order. If you want the cases in numerical order, maybe use case-[1-9] case-[1-9][0-9] case-[1-9][0-9][0-9] to force the expansion to get the single digits first, then the double digits, etc.
The interaction between the outer shell script and inner awk invocation aren't working the way you expect.
Every time through the loop, the shell script calls awk a new time, which means that f will be unset, and then that first clause will set it to 1. It will never become 2. That is, you are starting a new awk process for each iteration through the outer loop, and awk is starting from scratch each time.
There are other ways to structure your code, but as a minimal tweak, you can pass in the number $i to the awk invocation using the -v option, e.g. awk -v i="$i" ....
Note that there are better ways to structure your overall solution, as other answerers have already suggested; I meant this response to be an answer the question, "Why doesn't this work?" and not "Please rewrite this code."
The below AWK program can help you.
#!/usr/bin/awk -f
BEGIN {
# Defaults
nrecord=1
nfiles=0
}
BEGINFILE {
# Check if the input file is accessible,
# if not skip the file and print error.
if (ERRNO != "") {
print("Error: ",FILENAME, ERRNO)
nextfile
}
}
{
# Check if the file is accessed for the first time
# if so then increment nfiles. This is to keep count of
# number of files processed.
if ( FNR == 1 ) {
nfiles++
} else if (FNR > nrecord) {
# Fetching the maximum size of the record processed so far.
nrecord=FNR
}
# Fetch the second column from the file.
array[nfiles,FNR]=$2
}
END {
# Iterate through the array and print the records.
for (i=1; i<=nrecord; i++) {
for (j=1; j<=nfiles; j++) {
printf("%5s", array[j,i])
}
print ""
}
}
Output:
$ ./get.awk Case-*/PPD
1 11 21
2 12 22
3 13 23
4 14 24
5 15 25
6 16 26
7 17 27
8 18 28
9 19 29
10 20 30
Here the Case*/PPD expands to Case-1/PPD, Case-2/PPD, Case-3/PPD and so on. Below are the source files for which the output was generated.
$ cat Case-1/PPD
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
$ cat Case-2/PPD
11 11 11 11
12 12 12 12
13 13 13 13
14 14 14 14
15 15 15 15
16 16 16 16
17 17 17 17
18 18 18 18
19 19 19 19
20 20 20 20
$ cat Case-3/PPD
21 21 21 21
22 22 22 22
23 23 23 23
24 24 24 24
25 25 25 25
26 26 26 26
27 27 27 27
28 28 28 28
29 29 29 29
30 30 30 30