Shell script cut the beginning and the end of a file - file

So I have a file and I'd like cut the first 33 lines and the last 6 lines of it. What I am trying to do is get the whole file in a cat command (cat file) and then use the "head" and "tail" commands to remove those parts, but I don't know how to do so.
Eg (this is just the idea)
cat file - head -n 33 file - tail -n 6 file
How am I supposed to do this? Is it possible to do it with "sed" (how)? Thanks in advance.

This is probably what you want:
$ tail -n +34 file | head -n -6
See the tail
-n, --lines=K
output the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth
and head
-n, --lines=[-]K
print the first K lines instead of the first 10; with the leading '-', print all but the last K lines of each file
man pages.
Example:
$ cat file
one
two
three
four
five
six
seven
eight
$ tail -n +4 file | head -n -2
four
five
six
Notice that you don't need the cat (see UUOC).

first count total lines, then print the middle part: (read file twice)
l=$(wc -l file)
awk -v l="$l" 'NR>33&&NR<l-6' file
or load the file in array, then print lines you need : (read file once)
awk '{a[NR]=$0}END{for(i=34;i<NR-6;i++)print a[i]}' file
or awk with head, don't think so much in this way: (read file twice):
awk 'NR>33' file|head -n-6

sed -n '1,33b; 34{N;N;N;N;N};N;P;D' file
this will work +

This might work for you (GNU sed):
sed '1,33d;:a;$d;N;s/\n/&/6;Ta;P;D' file

Related

Sed: Better way to address the n-th line where n are elements of an array

We know that the sed command loops over each line of a file and for each line, it loops over the given commands list and does something. But when the file is extremely large, the time and resource cost on the repeating operation may be terrible.
Suppose that I have an array of line numbers which I want to use as addresses to delete or print with sed command (e.g. A=(20000 30000 50000 90000)) and there is a VERY LARGE object file.
The easiest way may be:
(Remark by #John1024, careful about the line number changes for each loop)
( for NL in ${A[#]}; do sed "$NL d" $very_large_file; done; )>.temp_file;
cp .temp_file $very_large_file; rm .temp_file
The problem of the code above is that, for each indexed line number of the array, it needs to loop over the whole file.
To avoid this, one can:
#COMM=`echo "${A[#]}" | sed 's/\s/d;/g;s/$/d'`;
#sed -i "$COMM" $very_large_file;
#Edited: Better with direct parameter expansion:
sed -i "${A[#]/%/d;}" $very_large_file;
It first print the array and replace its SPACE and the END_OF_LINE with the d command of sed, so that the string looks like "20000d;30000d;50000d;90000d"; on the second line, we treat this string as the command list of sed. The result is that with this code, it only loops over the file for once.
More over, for in-place operation (argument -i), one cannot quit using q with sed even though the greatest line number of interest has passed, because if so, the lines after the that line (e.g. 90001+) will disappear (It seems that the in-place operation is just overwriting the file with stdout).
Better ideas?
(Reply to #user unknown:) I think it could be even more efficient if we manage to "quit" the loop once all indexed lines have passed. We can't, using sed -i, for the aforementioned reasons. Printing each line to a file cost more time than copying a file (e.g. cat file1 > file2 and cp file1 file2). We may benefit from this concept, using any other methods or tools. This is what I expect.
PS: The points of this question are "Lines location" and "Efficiency"; the "delete lines" operation is just an example. For real tasks, there are much more - append/insert/substituting, field separating, cases judgement followed by read from/write to files, calculations etc.
In order words, it may invoke all kind of operations, creating sub-shells or not, caring about the variable passing, ... so, the tools to use should allow me to line processing, and the problem is how to get myself onto the lines of interest, doing all kinds operations.
Any comments are appreciated.
First make a copy to a testfile for checking the results.
You want to sort the linenumbers, highest first.
echo "${a[#]}" | sed 's/\s/\n/g' | sort -rn
You can feed commands into ed using printf:
printf "%s\n" "command1" "command2" w q testfile | ed -s testfile
Combine these
printf "%s\n" $(echo "${a[#]}" | sed 's/\s/\n/g' | sort -rn | sed 's/$/d/') w q |
ed -s testfile
Edit (tx #Ed_Morton):
This can be written in less steps with
printf "%s\n" $(printf '%sd\n' "${a[#]}" | sort -rn ) w q | ed -s testfile
I can not remove the sort, because each delete instruction is counting the linenumbers from 1.
I tried to find a command for editing the file without redirecting to another, but I started with the remark that you should make a copy. I have no choice, I have to upvote the straight forward awk solution that doesn't need a sort.
sed is for doing s/old/new, that is all, and when you add a shell loop to the mix you've really gone off the rails (see https://unix.stackexchange.com/q/169716/133219). To delete lines whose numbers are stored in an array is (using seq to generate input since no sample input/output provided in the question):
$ a=( 3 7 8 )
$ seq 10 |
awk -v a="${a[*]}" 'BEGIN{split(a,tmp); for (i in tmp) nrs[tmp[i]]} !(NR in nrs)'
1
2
4
5
6
9
10
and if you wanted to stop processing with awk once the last target line has been deleted and let tail finish the job then you could figure out the max value in the array up front and then do awk on just the part up to that last target line:
max=$( printf '%s\n' "${a[#]}" | sort -rn | head -1 )
head -"$max" file | awk '...' file > out
tail +"$((max+1))" file >> out
idk if that'd really be any faster than just letting awk process the whole file since awk is very efficient, especially when you're not referencing any fields and so it doesn't do any field splitting, but you could give it a try.
You could generate an intermediate sed command file from your lines.
echo ${A[#]} | sort -n > lines_to_delete
min=`head -1` lines_to_delete
max=`head -1` lines_to_delete
# skip to first and from last line, delete the others
sed -i -e 1d -e ${linecount}d -e 's#$#d#' lines_to_delete
head -${min} input > output
sed -f lines_to_delete input >> output
tail -${max} input >> output
mv output input

How to concatenate two files (removing the first line of the second file) in a single line of unix command?

I am trying to create a new file (say file3) by stacking two files (file1 and file2) together. I need the entirety of file1 but want to exclude the first row of file2. What I have now is a two step process using head and sed.
cat file1.csv file2.csv > file3.csv
sed -i 1001d file3.csv
The last step requires me to find out the length of file1 before removing the line after so that it corresponds to the first line of file2. How do I combine these two lines into a single line of code? I tried this and it failed:
cat file1.csv sed 1d file2.csv > file3.csv
You can use a compound statement like this:
{ cat file1.csv; sed '1d' file2.csv; } > file3.csv
You can use process substitution, like:
cat file1.csv <(tail -n +2 file2.csv) > file3.csv
One way is cat file1.csv > file3.csv; tail -n +2 file2.csv >> file3.csv.
Another one is tail -n +2 file2.csv | cat file1.csv - > file3.csv
You can use your second try, but with process substitution:
cat file1.csv <(sed '1d' file2.csv) > file3.csv
Your command failed because sed was treated as filename that cat tried to access.
If your shell doesn't support process substitution, you can use this instead:
sed '1d' file2.csv | cat file1.csv - > file3.csv
Alternatively, you can use awk:
awk 'FNR!=NR && FNR==1 {next} 1' file{1,2}.csv > file3.csv
FNR!=NR is true if we're not in the first file (file record number is not equal to overall record number), and FNR==1 is true on the first line of each file. Together, the condition is true on the first line of each but the first file; next skips that line. 1 gets all other lines printed.

Strange Memory Behavior handling TSV

I have a .tsv and I need to figure out the frequencies variables in a specific column and organize that data in descending order. I run a script in c which downloads a buffer and saves it to a .tsv file with a date stamp for a name in the same directory as my code. I then open my Terminal and run the following command, per this awesome SO answer:
cat 2016-09-06T10:15:35Z.tsv | awk -F '\t' '{print $1}' * | LC_ALL=C sort | LC_ALL=C uniq -c | LC_ALL=C sort -nr > tst.tsv
To break this apart by pipes, what this does is:
cat the .tsv file to get its contents into the pipe
awk -F '\t' '{print $1}' * breaks the file's contents up by tab and pushes the contents of the first column into the pipe
LC_ALL=C sort takes the contents of the pipe and sorts them to have like-values next to one another, then pushes that back into the pipe
LC_ALL=C uniq -c takes the stuff in the pipe and figures our how many times each value occurs and then pushes that back into the pipe (e.g, Max 3, if the name Max shows up 3 times)
Finally, LC_ALL=C sort -nr sorts the stuff in the pipe again to be in descending order, and then prints it to stdout, which I pipe into a file.
Here is where things get interesting. If I do all of this in the same directory as the c code which downloaded my .tsv file to begin with, I get super wacky results which appear to be a mix of my actual .tsv file, some random corrupted garbage, and the contents of the c code which got it in the first place. Here is an example:
( count ) ( value )
1 fprintf(f, " %s; out meta qt; rel %s; out meta qt; way %s; out meta qt; >; out meta qt;", box_line, box_line, box_line);
1 fclose(f);
1 char* out_file = request_osm("cmd_tmp.txt", true);
1 bag_delete(lines_of_request);
1
1
1
1
1
1??g?
1??g?
1?
1?LXg$E
... etc. Now if you scroll up in that, you also find some correct values, from the .tsv I was parsing:
( count ) ( value )
1 312639
1 3065411
1 3065376
1 300459
1 2946076
... etc. And if I move my .tsv into its own folder, and then cd into that folder and run that same command again, it works perfectly.
( count ) ( value )
419362 452999
115770 136420
114149 1380953
72850 93290
51180 587015
45833 209668
31973 64756
31216 97928
30586 1812906
Obviously I have a functional answer to my problem - just put the file in its own folder before parsing it. But I think that this memory corruption suggests there may be some larger issue at hand I should fix now, and I'd rather get on top of it that kick it down the road with a temporary symptomatic patch, so to speak.
I should mention that my c code does use system(cmd) sometimes.
The second command is the problem:
awk -F '\t' '{print $1}' *
See the asterisks at the end? It tells awk to process all files in the current directory. Instead, you want to just process standard input (the pipe output).
Just remove the asterisks and it should work.

What is the shell script instruction to divide a file with sorted lines to small files?

I have a large text file with the next format:
1 2327544589
1 3554547564
1 2323444333
2 3235434544
2 3534532222
2 4645644333
3 3424324322
3 5323243333
...
And the output should be text files with a suffix in the name with the number of the first column of the original file keeping the number of the second column in the corresponding output file as following:
file1.txt:
2327544589
3554547564
2323444333
file2.txt:
3235434544
3534532222
4645644333
file3.txt:
3424324322
5323243333
...
The script should run on Solaris but I'm also having trouble with the instruction awk and options of another instruccions like -c with cut; its very limited so I am searching for common commands on Solaris. I am not allowed to change or install anything on the system. Using a loop is not very efficient because the script takes too long with large files. So aside from using the awk instruction and loops, any suggestions?
Something like this perhaps:
$ awk 'NF>1{print $2 > "file"$1".txt"}' input
$ cat file1.txt
2327544589
3554547564
2323444333
or if you have bash available, try this:
#!/bin/bash
while read a b
do
[ -z $a ] && continue
echo $b >> "file"$a".txt"
done < input
output:
$ paste file{1..3}.txt
2327544589 3235434544 3424324322
3554547564 3534532222 5323243333
2323444333 4645644333

replacing lines of a text file with text of another file using sed or awk

I have a text file e.g File1.txt and I want to replace its few lines with new lines available in another text file e.g File2.txt. The format of File1.txt is as below It has pointers start and end.
START
line 1
line 2
line 3
line 4
line 5
END
I want to change line 1 to line 5 with the lines available in File2.txt. The number of lines are not equal in File1.txt and File2.txt. File2.txt may have more or less lines as in File1.txt.
I need input from someone. Thanking in anticipation
If the parts of File1.txt that you want to preserve are fixed,
you only need to print the second file and include that parts:
printf 'BEGIN\n\n%s\n\nEND\n' "$(<File2.txt)"
IF that's not the case (substitute START/END with the patterns
that match the parts that you want to preserve):
awk 'NR == FNR {
f2 = f2 ? f2 RS $0 : $0
next
}
/START|END/ || !NF {
print; next
}
NF && !c++ {
print f2
}' File2.txt File1.txt
This GNU sed one liner might work:
sed -re '/^START/,/^END/{/^START/{p;r File2.txt' -e '};/^END/p;d}' File1.txt
This inserts File2.txt between START and END but doesn't preserve empty lines after line 1 and before line 2
This tries to preserve empty lines:
sed -re '/^START/,/^END/{//!{/^$/{p;d};x;/./{x;d};x;h;r File2.txt' -e ';d};x;s/.*//;x}' File1.txt

Resources