grep Trim txt file by certain line number

grep Trim txt file by certain line number - file

I have a txt file containing, let's say, 1000 lines. I would like to trim it obtaining a file with 100 lines, composed by lines 0, 10, 20, 30, etc of the original file.
Is that possible with grep or something? thanks

it could be easily done by awk/sed one-liner:
awk
awk '!(NR%10)' file
sed
sed -n '0~10p' file
or
sed '0~10!d` file
see below example: (sed one liner will give same output)
print the first 10 lines:
kent$ seq 1000|awk '!(NR%10)'|head -10
10
20
30
40
50
60
70
80
90
100
total lines:
kent$ seq 1000|awk '!(NR%10)'|wc -l
100

Related

shell insert a line every n lines

I have two files and I am trying to insert a line from file2 into file1 every other 4 lines starting at the beginning of file1. So for example:
file1:
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
file2:
50
43
21
output I am trying to get:
50
line 1
line 2
line 3
line 4
43
line 5
line 6
line 7
line 8
21
line 9
line 10
The code I have:
while read line
do
sed '0~4 s/$/$line/g' < file1.txt > file2.txt
done < file1.txt
I am getting the following error:
sed: 1: "0~4 s/$/$line/g": invalid command code ~

The following steps through both files without loading either one into an array in memory:
awk '(NR-1)%4==0{getline this<"file2";print this} 1' file1
This might be preferable if your actual file2 is larger than what you want to hold in memory.
This breaks down as follows:
(NR-1)%4==0 - a condition which matches every 4th line starting at 0
getline this<"file2" - gets a line from "file2" and stores it in the variable this
print this - prints ... this.
1 - shorthand for "print the current line", which in this case comes from file1 (awk's normal input)

It is easing to do this using awk:
awk 'FNR==NR{a[i++]=$0; next} !((FNR-1) % 4){print a[j++]} 1' file2 file1
50
line 1
line 2
line 3
line 4
43
line 5
line 6
line 7
line 8
21
line 9
line 10
While processing first file in input i.e. file2, we store each line in array with key as an incrementing number starting with 0.
While processing second file in input i.e. file1, we check if current record # is divisible by 4 using modulo arithmetic and if it is then insert a line from file2 and increment the index counter.
Finally using action 1, we print lines from file1.

This might work for you (GNU sed):
sed -e 'Rfile1' -e 'Rfile1' -e 'Rfile1' -e 'Rfile1' file2
or just use cat and paste:
cat file1 | paste -d\\n file2 - - - -

another alternative with unix toolchain
$ paste file2 <(pr -4ats file1) | tr '\t' '\n'
50
line 1
line 2
line 3
line 4
43
line 5
line 6
line 7
line 8
21
line 9
line 10

Here's a goofy way to do it with paste and tr
paste file2 <(paste - - - - <file1) | tr '\t' '\n'
Assumes you don't have any actual tabs in your input files.

Bash Indented Output for Multiple Variables

I have a script that loops over every text file in a directory, and stores the content in variables. The content can be anywhere from 1-50 characters long. The amount of text files is unknown. I would like to print the content in such a way that each variable falls into a clean column.
for file in $LIBPATH/*.txt; do
name=$( awk 'FNR == 1 {print $0}' $file )
height=$( awk 'FNR == 2 {print $0}' $file )
weight=$( awk 'FNR == 3 {print $0}' $file )
echo $name $height $weight
done
This code produces the output:
Avril Stewart 99 54
Sally Kinghorn 170 60
John Young 195 120
While the desired output is:
Avril Stewart 99 54
Sally Kinghorn 170 60
John Young 195 120
Thanks!

Use printf:
printf '%-20s %3s %3s\n' "$name" "$height" "$weight"
%3s ensures that all fields use three characters, %-20s does the same for 20 characters, but the - in front makes the output left-aligned.
If you want to limit the output to e.g. 20 characters, you can use
printf '%-20.20s %3s %3s\n' "$name" "$height" "$weight"
This will give you a left aligned minimum width of 20 characters and a maximum width of 20 characters, in other words it will ensure that you always have exactly 20 characters.

Getting output of shell command in bash array

I have a uniq -c output, that outputs about 7-10 lines with the count of each pattern that was repeated for each unique line pattern. I want to store the output of my uniq -c file.txt into a bash array. Right now all I can do is store the output into a variable and print it. However, bash currently thinks the entire output is just one big string.
How does bash recognize delimiters? How do you store UNIX shell command output as Bash arrays?
Here is my current code:
proVar=`awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c`
echo $proVar
And current output I get:
587 chr1 578 chr2 359 chr3 412 chr4 495 chr5 362 chr6 287 chr7 408 chr8 285 chr9 287 chr10 305 chr11 446 chr12 247 chr13 307 chr14 308 chr15 365 chr16 342 chr17 245 chr18 252 chr19 210 chr20 193 chr21 173 chr22 145 chrX 58 chrY
Here is what I want:
proVar[1] = 2051
proVar[2] = 1243
proVar[3] = 1068
...
proVar[22] = 814
proVar[X] = 72
proVar[Y] = 13
In the long run, I'm hoping to make a barplot based on the counts for each index, where every 50 counts equals one "=" sign. It will hopefully look like the below
chr1 ===========
chr2 ===========
chr3 =======
chr4 =========
...
chrX ==
chrY =
Any help, guys?

To build the associative array, try this:
declare -A proVar
while read -r val key; do
proVar[${key#chr}]=$val
done < <(awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c)
Note: This assumes that your command's output is composed of multiple lines, each containing one key-value pair; the single-line output shown in your question comes from passing $proVar to echo without double quotes.
Uses a while loop to read each output line from a process substitution (<(...)).
The key for each assoc. array entry is formed by stripping prefix chr from each input line's first whitespace-separated token, whereas the value is the rest of the line (after the separating space).
To then create the bar plot, use:
while IFS= read -r key; do
echo "chr${key} $(printf '=%.s' $(seq $(( ${proVar[$key]} / 50 ))))"
done < <(printf '%s\n' "${!proVar[#]}" | sort -n)
Note: Using sort -n to sort the keys will put non-numeric keys such as X and Y before numeric ones in the output.
$(( ${proVar[$key]} / 50 )) calculates the number of = chars. to display, using integer division in an arithmetic expansion.
The purpose of $(seq ...) is to simply create as many tokens (arguments) as = chars. should be displayed (the tokens created are numbers, but their content doesn't matter).
printf '=%.s' ... is a trick that effectively prints as many = chars. as there are arguments following the format string.
printf '%s\n' "${!proVar[#]}" | sort -n sorts the keys of the assoc. array numerically, and its output is fed via a process substitution to the while loop, which therefore iterates over the keys in sorted order.

You can create an array in an assignment using parentheses:
proVar=(`awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c`)
There's no built-in way to create an associative array directly from input. For that you'll need an additional loop.

Delete lines in a file containing argument passed on command line

I'm trying to delete specific lines based on the argument passed in.
My data.txt file contains
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
My del.sh contains
myvar=$1
sed'/$myvar/d' data.txt > temp.txt
mv temp.txt > data.txt
but it just prints every line in temp.txt to data.txt....however
sed '/64/d' data.txt > temp.txt
will do the correct data transfer (but I don't want to hardcode 64), I feel like there's some kind of syntax error with the argument. Any input please

It's because of the single quotes, change them to double quotes. Variables inside single quotes are not interpolated, so you are sending the literal string $myvar to sed, instead of the value of $myvar.
Change:
sed '/$myvar/d' data.txt
to:
sed "/$myvar/d" data.txt
Note: You will run into issues when $myvar contains regular expression meta characters or forward slashes as pointed out in this response from Ed Morton. If you are not in complete control of your input you will need to find another avenue to accomplish this.

Assuming this is undesirable behavior:
$ cat file
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
$ myvar=6
$ sed "/$myvar/d" file
Monitor 22 42 50
$ myvar=/
$ sed "/$myvar/d" file
sed: -e expression #1, char 3: unknown command: `/'
$ myvar=.
$ sed "/$myvar/d" file
$
Try this instead:
$ myvar=6
$ awk -v myvar="$myvar" '{for (i=1; i<=NF;i++) if ($i == myvar) next }1' file
Monitor 22 42 50
Game 32 64 128
$ myvar=/
$ awk -v myvar="$myvar" '{for (i=1; i<=NF;i++) if ($i == myvar) next }1' file
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
$ myvar=.
$ awk -v myvar="$myvar" '{for (i=1; i<=NF;i++) if ($i == myvar) next }1' file
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
and if you think you can just escape the /s and use sed, you can't because you might be adding a 2nd backslash to one already present:
$ foo='\/'
$ myvar=${foo//\//\\\/}
$ sed "/$myvar/d" file
sed: -e expression #1, char 5: unknown command: `/'
$ awk -v myvar="$myvar" '{for (i=1; i<=NF;i++) if ($i == myvar) next }1' file
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
This is simply NOT a job you can in general do with sed due to it's syntax and it's restriction of only allowing REs in it's search.

You can also use awk to do the same,
awk '!/'$myvar'/' data.txt > temp.txt && mv temp.txt data.txt
Use -i option in addition to what #SeanBright proposed. Then you won't need > temp.txt and mv temp.txt data.txt.
sed -i "/$myvar/d" data.txt

How to split a large file into small ones by line number

I am trying to split my large big file into small bits using the line numbers. For example my file has 30,000,000 lines and i would like to divide this into small files wach of which has 10,000 lines(equivalent to 3000 small files).
I used the 'split' in unix but it seems that it is limited to only 100 files.
Is there a way of overcoming this limitation of 100 files?
If there is another way of doing this, please advise as well.
Thanks.

Using GNU awk
gawk '
BEGIN {
i=1
}
{
print $0 > "small"i".txt"
}
NR%10==0 {
close("file"i".txt"); i++
}' bigfile.txt
Test:
[jaypal:~/temp] seq 100 > bigfile.txt
[jaypal:~/temp] gawk 'BEGIN {i=1} {print $0 > "small"i".txt" } NR%10==0 { close("file"i".txt"); i++ }' bigfile.txt
[jaypal:~/temp] ls small*
small1.txt small10.txt small2.txt small3.txt small4.txt small5.txt small6.txt small7.txt small8.txt small9.txt
[jaypal:~/temp] cat small1.txt
1
2
3
4
5
6
7
8
9
10
[jaypal:~/temp] cat small10.txt
91
92
93
94
95
96
97
98
99
100

Not an answer, just added a way to do the renaming-part as requested in a comment
$ touch 000{1..5}.txt
$ ls
0001.txt 0002.txt 0003.txt 0004.txt 0005.txt
$ rename 's/^0*//' *.txt
$ ls
1.txt 2.txt 3.txt 4.txt 5.txt
I also tried the above with 3000-files without any problems.