How to display line numbers in side by side diff in unix? - file

The scenario is that i have 2 files which i want to diff side by side using the following command with the line numbers:
diff -y file1.txt file2.txt
and
sdiff file1.txt file2.txt
The above command just prints the side by side diff but doesn't display the line numbers. Is there any way to do it ? I searched a lot but couldn't find any solutions. I can't use third party tools FYI. Any genius ideas from anyone ?
Update:
I want the file numbers present of the file itself and not the line numbers generated by piping to cat -n etc.. Lets say, i am doing diff using "--suppress-common-l‌​ines" then the line numbers should be omitted which are not shown in the diff.

Below code can be used to display the uncommon fields in two files, side by side.
sdiff -l file1 file2 | cat -n | grep -v -e '($'
Below code will display common fields along with line numbers in the output.
diff -y file1 file2 | cat -n | grep -v -e '($'

sdiff -s <(cat -n file1.txt) <(cat -n file2.txt)
This gives you side-by-side output with line-numbers from the source files.

The following command will display the side-by-side output prepended with line numbers for file1.txt and identical lines removed.
sdiff -l file1.txt file2.txt | cat -n | grep -v -e '($'

I had the same issue and ended up using a graphical tool (diffuse) under fedora 28

Related

Find tmux session that a PID belongs to

I am using htop so see what processes are taking up a lot of memory so I can kill them. I have a lot of tmux sessions and lots of similar processes. How can I check which tmux pane a PID is in so I can be sure I am killing stuff I want to kill?
Given that PID in the below line is the target pid number:
$ tmux list-panes -a -F "#{pane_pid} #{pane_id}" | grep ^PID
The above will identify the pane where the PID is running. The output will be two strings. The first number should be the same as PID and the second one (with a percent sign) is "tmux pane id". Example output:
2345 %30
Now, you can use "tmux pane id" to kill the pane without "manually" searching for it:
$ tmux kill-pane -t %30
To answer your question completely, in order to find *tmux session* that a PID belongs to, this command can be used:
$ tmux list-panes -a -F "#{pane_pid} #{session_name}" | grep ^PID
# example output: 2345 development
Here's another possibly useful "line":
$ tmux list-panes -a -F "#{pane_pid} #{session_name}:#{window_index}:#{pane_index}" | grep ^PID
# example output: 2345 development:2:0
The descriptions for all of the interpolation strings (example #{pane_pid}) can be looked up in tmux man page in the FORMATS section.
The answers above give you the pids of the shells running in the panes, you'll be out of luck if you want to find something running in the shells.
try:
https://gist.github.com/nkh/0dfa8bf165a53832a4b5b17ee0d7ab12
This scrip gives you all the pids as well as the files the processes have opened. I never know in which session, window, pane, attached or not, I have a file open, this helps.
I haven't tried it on another machine, tell me if you encounter any problem.
lsof needs to be installed.
if you just want pids, pstree is useful, you can modity the script to use it (it's already there commented)
The following script displays the tree of processes in each window (or pane). It takes list of PIDs as one parameter (one PID per line). Specified processes are underlined. It automatically pipes to less unless is a part of some other pipe. Example:
$ ./tmux-processes.sh "$(pgrep ruby)"
-- session-name-1 window-index-1 window-name-1
7184 7170 bash bash --rcfile /dev/fd/63 -i
7204 7184 vim vim ...
-- session-name-2 window-index-2 window-name-2
7186 7170 bash bash --rcfile /dev/fd/63 -i
10771 7186 bash bash ./manage.sh runserver
10775 10771 django-admi /srv/www/s1/env/bin/python /srv/www/s1/env/bin/...
5761 10775 python /srv/www/s1/env/bin/python /srv/www/s1/env/bin/...
...
tmux-processes.sh:
#!/usr/bin/env bash
set -eu
pids=$1
my_pid=$$
subtree_pids() {
local pid=$1 level=${2:-0}
if [ "$pid" = "$my_pid" ]; then
return
fi
echo "$pid"
ps --ppid "$pid" -o pid= | while read -r pid; do
subtree_pids "$pid" $((level + 1))
done
}
# server_pid=$(tmux display-message -p '#{pid}')
underline=$(tput smul)
# reset=$(tput sgr0) # produces extra symbols in less (^O), TERM=screen-256color (under tmux)
reset=$(echo -e '\033[m')
re=$(echo "$pids" | paste -sd'|')
tmux list-panes -aF '#{session_name} #{window_index} #{window_name} #{pane_pid}' \
| while read -r session_name window_index window_name pane_pid; do
echo "-- $session_name $window_index $window_name"
ps -p "$(subtree_pids "$pane_pid" | paste -sd,)" -Ho pid=,ppid=,comm=,args= \
| sed -E 's/^/ /' \
| awk \
-v re="$re" -v underline="$underline" -v reset="$reset" '
$1 ~ re {print underline $0 reset}
$1 !~ re {print $0}
'
done | {
[ -t 1 ] && less -S || cat
}
Details regarding listing tmux processes you can find here.
To underline lines I use ANSI escape sequences. To show the idea separately, here's a script that displays list of processes and underlines some of them (having PIDs passed as an argument):
#!/usr/bin/env bash
set -eu
pids=$1
bold=$(tput bold)
# reset=$(tput sgr0) # produces extra symbols in less (^O), TERM=xterm-256color
reset=$(echo -e '\033[m')
underline=$(tput smul)
re=$(echo "$pids" | paste -sd'|')
ps -eHo pid,ppid,comm,args | awk \
-v re="$re" -v bold="$bold" -v reset="$reset" -v underline="$underline" '
$1 ~ re {print underline $0 reset}
$1 !~ re {print $0}
'
Usage:
$ ./ps.sh "$(pgrep ruby)"
Details regarding less and $(tput sgr0) can be found here.

Faster grep function for big (27GB) files

I have to grep from a file (5MB) containing specific strings the same strings (and other information) from a big file (27GB).
To speed up the analysis I split the 27GB file into 1GB files and then applied the following script (with the help of some people here). However it is not very efficient (to produce a 180KB file it takes 30 hours!).
Here's the script. Is there a more appropriate tool than grep? Or a more efficient way to use grep?
#!/bin/bash
NR_CPUS=4
count=0
for z in `echo {a..z}` ;
do
for x in `echo {a..z}` ;
do
for y in `echo {a..z}` ;
do
for ids in $(cat input.sam|awk '{print $1}');
do
grep $ids sample_"$z""$x""$y"|awk '{print $1" "$10" "$11}' >> output.txt &
let count+=1
[[ $((count%NR_CPUS)) -eq 0 ]] && wait
done
done #&
A few things you can try:
1) You are reading input.sam multiple times. It only needs to be read once before your first loop starts. Save the ids to a temporary file which will be read by grep.
2) Prefix your grep command with LC_ALL=C to use the C locale instead of UTF-8. This will speed up grep.
3) Use fgrep because you're searching for a fixed string, not a regular expression.
4) Use -f to make grep read patterns from a file, rather than using a loop.
5) Don't write to the output file from multiple processes as you may end up with lines interleaving and a corrupt file.
After making those changes, this is what your script would become:
awk '{print $1}' input.sam > idsFile.txt
for z in {a..z}
do
for x in {a..z}
do
for y in {a..z}
do
LC_ALL=C fgrep -f idsFile.txt sample_"$z""$x""$y" | awk '{print $1,$10,$11}'
done >> output.txt
Also, check out GNU Parallel which is designed to help you run jobs in parallel.
My initial thoughts are that you're repeatedly spawning grep. Spawning processes is very expensive (relatively) and I think you'd be better off with some sort of scripted solution (e.g. Perl) that doesn't require the continual process creation
e.g. for each inner loop you're kicking off cat and awk (you won't need cat since awk can read files, and in fact doesn't this cat/awk combination return the same thing each time?) and then grep. Then you wait for 4 greps to finish and you go around again.
If you have to use grep, you can use
grep -f filename
to specify the set of patterns to match in the filename, rather than a single pattern on the command line. I suspect form the above you can pre-generate such a list.
ok I have a test file containing 4 character strings ie aaaa aaab aaac etc
ls -lh test.txt
-rw-r--r-- 1 root pete 1.9G Jan 30 11:55 test.txt
time grep -e aaa -e bbb test.txt
<output>
real 0m19.250s
user 0m8.578s
sys 0m1.254s
time grep --mmap -e aaa -e bbb test.txt
<output>
real 0m18.087s
user 0m8.709s
sys 0m1.198s
So using the mmap option shows a clear improvement on a 2 GB file with two search patterns, if you take #BrianAgnew's advice and use a single invocation of grep try the --mmap option.
Though it should be noted that mmap can be a bit quirky if the source files changes during the search.
from man grep
--mmap
If possible, use the mmap(2) system call to read input, instead of the default read(2) system call. In some situations, --mmap yields better performance. However, --mmap can cause undefined behavior (including core dumps) if an input file shrinks while grep is operating, or if an I/O error occurs.
Using GNU Parallel it would look like this:
awk '{print $1}' input.sam > idsFile.txt
doit() {
LC_ALL=C fgrep -f idsFile.txt sample_"$1" | awk '{print $1,$10,$11}'
}
export -f doit
parallel doit {1}{2}{3} ::: {a..z} ::: {a..z} ::: {a..z} > output.txt
If the order of the lines is not important this will be a bit faster:
parallel --line-buffer doit {1}{2}{3} ::: {a..z} ::: {a..z} ::: {a..z} > output.txt

Bash executing a subset of lines in script

I have a file which contains commands similar to:
cat /home/ptay89/test/01.out
cat /home/ptay89/testing/02.out
...
But I only want a few of them executing. For example, if I only want to see the output files ending in 1.out, I can do this:
cat commands | grep 1.out | sh
However, I get the following output for each of the lines in the commands file:
: cannot be loaded - no such file or directoryst/01.out
When I copy and past the commands I want from the file directly, it works fine. Are there better ways of doing this?
You probably have spurious carriage returns in your file (created under Windows?). Use tr instead of cat to remove them:
tr -d '\015' <commands | grep 1.out | sh
Try doing a
grep -e '^cat.*out' commands | grep 1.out | sh
That should ignore any weird characters and take only the ones you need.

Multiple grep keywords on same line?

I'm using the command grep 3 times on the same line like this
ls -1F ./ | grep / | grep -v 0_*.* | grep -v undesired_result
is there a way to combine them into one command instead of having it to pipe it 3 times?
There's no way to do both a positive search (grep <something>) and a negative search (grep -v <something>) in one command line, but if your grep supports -E (alternatively, egrep), you could do ls -1F ./ | grep / | grep -E -v '0_*.*|undesired_result' to reduce the sub-process count by one. To go beyond that, you'd have to come up with a specific regular expression that matches either exactly what you want or everything you don't want.
Actually, I guess that first sentence isn't entirely true if you have egrep, but building the proper regular expression that correctly includes both the positive and negative parts and covers all possible orderings of the parts might be more frustrating than it's worth...

How to find and remove line from file in Unix?

I have one file (for example: test.txt), this file contains some lines and for example one line is: abcd=11
But it can be for example: abcd=12
Number is different but abcd= is the same in all case, so could anybody give me command for finding this line and remove it?
I have tried: sed -e \"/$abcd=/d\" /test.txt >/test.txt but it removes all lines from my file and I also have tried: sed -e \"/$abcd=/d\" /test.txt >/testNew.txt but it doesn't delete line from test.txt, it only creates new file (testNew.txt) and in this file it removes my line. But it is not what I want.
Based on your description in your text, here is a cleaned-up version of your sed script that should work.
Assuming a linux GNU sed
sed -i '/abcd=/d' /test.txt
If you're using OS-X, then you need
sed -i "" '/abcd=/d' /test.txt
If these don't work, then use old-school sed with a conditional mv to manage your tmpfiles.
sed '/abcd=/d' /test.txt > test.txt.$$ && /bin/mv test.txt.$$ test.txt
Notes:
Not sure why you're doing \"/$abcd=/d\", you don't need to escape " chars unless you're doing more with this code than you indicate (like using eval). Just write it as "/$abcd=/d".
Normally you don't need '-e'
If you really want to use '$abcd, then you need to give it a value AND as you're matching the string 'abcd=', then you can do
abcd='abcd='
sed -i "/${abcd}/d" /test.txt
I hope this helps.
Here's a solution using grep:
$ grep -v '^\$abcd=' test.txt
Proof of concept:
$ cat test.txt
a
b
ab
ac
$abcd=1
$abcd=2
$abcd
ab
a
$abcd=3
x
$ grep -v '^\$abcd=' test.txt
a
b
ab
ac
$abcd
ab
a
x
As far as I know, this command can be used to create some other file with the deleted lines. Now that we have another file we can rename that file and delete the original file if we want.
You will just have to do this
grep -v '^\$abcd=' test.txt > tmp.txt
now tmp.txt will have contents
a
b
ab
ac
$abcd
ab
a
x
If you want you may rename this to test.txt after deleting test.txt

Resources