Assume we have two files named file1 and file2.
File1:
a=b
c=d
e=f
File2:
a=p
c=o
e=f
g=h
i=j
Here file2 has the same keys of file1 and different values.Apart from some extra key-value pair of its own.
Compare two files keys, replace file2 value with file1 value based on key match. Retain the new entries in file2.
So, my final output should be :
File2:
a=b
c=d
e=f
g=h
i=j
Thanks In Advance.
Quickest way without using scripts is using the tool called "meld".
I can give one way of approaching the problem (though not the best)
1.read from first file line by line
2.split based on "=" expression
3.store the two variables as key and value
make an array of all key value pairs
4.read from the second file and repeat the procedure
compare the two arrays and save values not in first array only
In this specific case you can use "cut" command in shell to choose fields .
I personally prefer Perl script for file operations like this :)
Related
I have a file with pipe delimiter and one record has more columns than expected.
For example:
File NPS.txt
1|a|10
2|b|20
3|c|30
4|d|40|old
The last column has more columns than expected and I want to know the line number to understand what the problem is.
I found this command:
awk -F\; '{print NF}' NPS.txt | sort | uniq -c
With this command I know that one columns has one column added but I do not know which one is.
I would use a bash script
a) Define a counter variable, starting at 0,
b) iterate over each line in your file, adding +1 to the counter at the beginning of each loop,
c) split each line into an array based on the "|" delimiter, logging the counter # if the array contains more than 3 elements. you can log to console or write to a file.
It's been awhile since I've scripted in Linux, but these references might help:
Intro:
https://www.guru99.com/introduction-to-shell-scripting.html
For Loops:
https://www.cyberciti.biz/faq/bash-for-loop/
Bash String Splitting
How do I split a string on a delimiter in Bash?
Making Scripts Executable
https://www.andrewcbancroft.com/blog/musings/make-bash-script-executable/
There may be a good one-liner out there, but it's not a difficult script to write.
I am trying to run TreSpex analysis on a series of trees, which are saved in newick format as .fasta.txt files in a folder.
I have a list of Taxa names saved in a .txt file
I enter:
perl TreSpEx.v1.pl -fun e -ipt *fasta.txt -tf Taxa_List.txt
But it won't run. I tried writing a loop for each file within the folder but am not very good with them and my line of
for i in treefile/; do perl TreSpEx.v1.1.pl -fun e -ipt *.fasta.txt -tf Taxa_List.txt; done
won't work because -ipt apparently needs a name that starts with a letter or number
In your second example you are actually doing the same thing as in first (but posible several times).
I'm not familiar with TreSpEx or know Bash very well for that matter (which it seems you are using), but you might try something like below.
for i in treefile/*.fasta.txt ; do
perl TreSpEx.v1.1.pl -fun e -ipt $i -tf Taxa_List.txt;
done
Basically, you need to use a variable from the for loop (i) to pass name of each file to the command.
So, I would like to get some help creating a shell script that will allow me to submit an array job where each individual job has multiple input files. An example of how I run array jobs that have one input per job is as follows:
DIR=/WhereMyFilesAre
LIST=($DIR/*fastq) #files I want to process
INDEX=$((SGE_TASK_ID-1))
INPUT_FILE=${LIST[$INDEX]}
bwa aln ${DIR}/referencegenome.fasta $INPUT_FILE > ${INPUT_FILE%.fastq}.sai
So, basically what I want to do is something similar, except if I had 2 or more lists of files instead of one. And those files need to be paired properly. For instance, if I had File1_A.txt, File1_B.txt, File2_A.txt, File2_B.txt, and something that looked generically like
program input1 input2 > output
I would want the resulting jobs to have lines that look like
program File1_A.txt File1_B.txt > File1.txt
program File2_A.txt File2_B.txt > File2.txt
As you specify, if two input files are of fixed naming nomenclature except for the $INDEX then just use SGE_TASK_ID as INDEX in your job script:
program File${SGE_TASK_ID}_A.txt File${SGE_TASK_ID}_B.txt > File${SGE_TASK_ID}.txt
I have two files in Unix Box, both have around 10 million rows.
File1 (Only one column)
ASD123
AFG234
File2 (Only one column)
ASD456
AFG234
Now I want to compare the records from File 1 to File 2 and output those that are in File2. How to achieve this?
I have tried a while loop and grep, seems it is way too slow, any ideas will be appreciated.
If you want to find all the rows from file A which are also in file B, you can use grep's inbuilt -f option:
grep -Ff fileA.txt fileB.txt
This should be faster than putting it inside any kind of loop (although given the size of your files, it may still take some time).
Quick question, can i do this?:
while IFS=: read menu script
do
echo "$x. $menu"
command[x]="$script"
let x++
done < file.txt
read two strings per line from a file, print one and save the other to an array..
file.txt looks like this:
File Operations:~/scripts/project/File_Operations.sh
Directory Operations:~/scripts/project/Directory_Operations.sh
Process Management:~/scripts/project/Process_Management.sh
Search Operations:~/scripts/project/Search_Operations.sh
Looks right. 2 things.
You need to initialise x. x=0
When you use x as a subscript it needs the $. i.e. command[$x]="$script"
And done forget the {}s when referencing the command array. e.g. ${command[0]}
What shell are you using? Works for me in bash, I just prepended the following two lines to the script:
#!/bin/bash
x=0
Without setting $x to 0, the user is presented with
. File Operations
1. Directory Operations
2. Process Management
3. Search Operations