Cat command does not merge files in script - file

I am not sure what I am doing wrong in following situation - I want to merge multiple files into one, but I want to merge a lot of them, so I want to use something like this:
*input file* that contains lines
file1.txt >> mergedfiles1.txt
file2.txt >> mergedfiles1.txt
file3.txt >> mergedfiles2.txt
file4.txt >> mergedfiles2.txt
...
If I try to use simple script as I usually do
cat *input file* | while read i
do
cat $i
done
it actually doesn't merge the files, but writes
*content* of file1.txt
cat: >> : file does not exist
cat: mergedfiles1.txt : file does not exist
I have tried to put command cat right at the beginning of each line of the input file but it did not work as well.
I guess it is a simple mistake, but I am not able to find a solution.
Thanks for help.

You can merge your three files using cat this way:
cat file1 file2 filet3 > merged_files

you need to use it like this
cat input_file > output file

That's because bash treats an empty space as a line separator.
So you've got to manage this thing.
Actually, you can remove the >> from your input file and do something like this :
k=0
for line in $(cat inputFile); do
if [ $k -eq 0 ]; then
src=$line
let k=$k+1
else
cat $src>>$line
k=0
fi
done
It's been a while since my last bash script, but the logic is pretty simple.
Since bash uses the spaces a line separators, you have to keep a counter to know if the line's really over.
So we're using k.
Having k = 0 means that we're in the first half of the line, so we need to store the filename into a var (src).
When k is 1, it means that we're in the final half of our line, so we can actually execute the cat command.
My code will work if your text is like :
file1.txt mergedfiles1.txt
file2.txt mergedfiles1.txt
file3.txt mergedfiles2.txt
file4.txt mergedfiles2.txt

I wasn't quite sure what you wanted to do, but thought this example might help:
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for file in *
do
echo "$file"
done
IFS=$SAVEIFS
The manipulating of IFS is so you can pick up and process space separated files, which are more common coming out of Windows reports than -- at least from my experience -- Linux (Unix).

Related

In Shell, how can I make an array of files and how do I use it to then compare it to another file?

How do I make an array of files in shell. Is it just
files = (file1.txt file2.txt file3.txt)
or is it
files = ("file1.txt" "file2.txt" "file3.txt)
If later on in the program I want to refer to this array and pick a specific file to read, how would I do that. I am asking because I would like to have a conditional that would read that file and compare it to an output file.
Would I just do: ( I know I can't do this)
if grep -q ${outputs[i]} output; then
#etc.
But what should I do instead then?
No spaces around = in assignments, arrays are no excepetion.
files=( file1.txt "file2.txt" )
Quotes around file names are needed if file names contain special characters, e.g. whitespace, parentheses, etc.
You can use a for loop to walk the array:
for file in "${array[#]}" ; do
if grep -q "$file" output ; then
Are you sure you want to use $file as the regular expression? Then you need to handle regex characters specially, i.e. dot, caret, dollar sign, etc.
If you want to compare contents of the two files, use diff, not grep.
if diff -q "$file" output > /dev/null ; then
echo Same
else
echo Different
fi

Creating variables from command line in BASH

I'm a very new user to bash, so bear with me.
I'm trying to run a bash script that will take inputs from the command line, and then run a c program which its output to other c programs.
For example, at command line I would enter as follows:
$ ./script.sh -flag file1 < file2
The within the script I would have:
./c_program -flag file1 < file2 | other_c_program
The problem is, -flag, file1 and file2 need to be variable.
Now I know that for -flag and file this is fairly simple- I can do
FLAG=$1
FILE1=$2
./c_program $FLAG FILE1
My problem is: is there a way to assign a variable within the script to file2?
EDIT:
It's a requirement of the program that the script is called as
$ ./script.sh -flag file1 < file2
You can simply run ./c_program exactly as you have it, and it will inherit stdin from the parent script. It will read from wherever its parent process is reading from.
FLAG=$1
FILE1=$2
./c_program "$FLAG" "$FILE1" | other_c_program # `< file2' is implicit
Also, it's a good idea to quote variable expansions. That way if $FILE1 contains whitespace or other tricky characters the script will still work.
There is no simple way to do what you are asking. This is because when you run this:
$ ./script.sh -flag file1 < file2
The shell which interprets the command will open file2 for reading and pipe its contents to script.sh. Your script will never know what the file's name was, therefore cannot store that name as a variable. However, you could invoke your script this way:
$ ./script.sh -flag file1 file2
Then it is quite straightforward--you already know how to get file1, and file2 is the same.

Unique file names in a directory in unix

I have a capture file in a directory in which some logs are being written in a file
word.cap
now there is a script in which when its size becomes exactly 1.6Gb then it clears itself and prepares files in below format in same directory-
word.cap.COB2T_1389889231
word.cap.COB2T_1389958275
word.cap.COB2T_1390035286
word.cap.COB2T_1390132825
word.cap.COB2T_1390213719
Now i want to pick all these files in a script one by one and want to perform some actions.
my script is-
today=`date +%d_%m_%y`
grep -E '^IPaddress|^Node' /var/rawcap/word.cap.COB2T* | awk '{print $3}' >> snmp$today.txt
sort -u snmp$today.txt > snmp_final_$today.txt
so, what should i write to pick all file names of above mentioned format one by one as i will place this script in crontab,but i don't want to read main word.cap file as that is being edited.
As per your comment:
Thanks, this is working but i have a small issue in this. There are
some files which are bzipped i.e. word.cap.COB2T_1390213719.bz2, so i
dont want these files in list, so what should be done?
You could add a condition inside the loop:
for file in word.cap.COB2T*; do
if [[ "$file" != *.bz2 ]]; then
# Do something here
echo ${file};
fi
done

How to find duplicate lines across 2 different files? Unix

From the unix terminal, we can use diff file1 file2 to find the difference between two files. Is there a similar command to show the similarity across 2 files? (many pipes allowed if necessary.
Each file contains a line with a string sentence; they are sorted and duplicate lines removed with sort file1 | uniq.
file1: http://pastebin.com/taRcegVn
file2: http://pastebin.com/2fXeMrHQ
And the output should output the lines that appears in both files.
output: http://pastebin.com/FnjXFshs
I am able to use python to do it as such but i think it's a little too much to put into the terminal:
x = set([i.strip() for i in open('wn-rb.dic')])
y = set([i.strip() for i in open('wn-s.dic')])
z = x.intersection(y)
outfile = open('reverse-diff.out')
for i in z:
print>>outfile, i
If you want to get a list of repeated lines without resorting to AWK, you can use -d flag to uniq:
sort file1 file2 | uniq -d
As #tjameson mentioned it may be solved in another thread.
Just would like to post another solution:
sort file1 file2 | awk 'dup[$0]++ == 1'
refer to awk guide to get some awk
basics, when the pattern value of a line is true this line will be
printed
dup[$0] is a hash table in which each key is each line of the input,
the original value is 0 and increments once this line occurs, when
it occurs again the value should be 1, so dup[$0]++ == 1 is true.
Then this line is printed.
Note that this only works when there are not duplicates in either file, as was specified in the question.

bash - variable storing multiple lines of file

This is my code:
grep $to_check $forbidden >${dir}variants_of_interest;
cat ${dir}variants_of_interest | (while read line; do
#process ${line} and echo result
done;
)
Thank to grep I get lines of data that I then process separately in loop. I would like to use variable instead of using file variants_of_interest.
Reason for this is that I am afraid that writing to file thousands of time (and consequently reading from it) rapidly slows down computation, so I am hoping that avoiding writing to file could help. What do you think?
I have to do thousands of grep commands and variants_of_interest contains up to 10 lines only.
Thanks for your suggestions.
You can just make grep pass its output directly to the loop:
grep "$to_check" "$forbidden" | while read line; do
#process "${line}" and echo result
done
I removed the explicit subshell in your example, since it is already in a separate one due to the piping. Also don't forget to quote the $line variable to prevent whitespace expansion on use.
You dont have to write a file. Simply iterate over the result of grep:
grep $to_check $forbidden | (while read line; do
#process ${line} and echo result
done;
)
This might work for you:
OIFS="$IFS"; IFS=$'\n'; lines=($(grep $to_check $forbidden)); IFS="$OIFS"
for line in "${lines[#]}"; do echo $(process ${line}); done
The first line places the results of the grep into the variable array lines.
The second line processes the array lines placing each line into the variable line

Resources