What is the performance difference between gawk and ....? [closed]

What is the performance difference between gawk and ....? [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
This question has been discussed here on Meta and my answer give links to a test system to answer this.
The question often comes up about whether to use gawk or mawk or C or some other language due to performance so let's create a canonical question/answer for a trivial and typical awk program.
The result of this will be an answer that provides a comparison of the performance of different tools performing the basic text processing tasks of regexp matching and field splitting on a simple input file. If tool X is twice as fast as every other tool for this task then that is useful information. If all the tools take about the same amount of time then that is useful information too.
The way this will work is that over the next couple of days many people will contribute "answers" which are the programs to be tested and then one person (volunteers?) will test all of them on one platform (or a few people will test some subset on their platform so we can compare) and then all of the results will be collected into a single answer.
Given a 10 Million line input file created by this script:
$ awk 'BEGIN{for (i=1;i<=10000000;i++) print (i%5?"miss":"hit"),i," third\t \tfourth"}' > file
$ wc -l file
10000000 file
$ head -10 file
miss 1 third fourth
miss 2 third fourth
miss 3 third fourth
miss 4 third fourth
hit 5 third fourth
miss 6 third fourth
miss 7 third fourth
miss 8 third fourth
miss 9 third fourth
hit 10 third fourth
and given this awk script which prints the 4th then 1st then 3rd field of every line that starts with "hit" followed by an even number:
$ cat tst.awk
/hit [[:digit:]]*0 / { print $4, $1, $3 }
Here are the first 5 lines of expected output:
$ awk -f tst.awk file | head -5
fourth hit third
fourth hit third
fourth hit third
fourth hit third
fourth hit third
and here is the result when piped to a 2nd awk script to verify that the main script above is actually functioning exactly as intended:
$ awk -f tst.awk file |
awk '!seen[$0]++{unq++;r=$0} END{print ((unq==1) && (seen[r]==1000000) && (r=="fourth hit third")) ? "PASS" : "FAIL"}'
PASS
Here are the timing results of the 3rd execution of gawk 4.1.1 running in bash 4.3.33 on cygwin64:
$ time awk -f tst.awk file > /dev/null
real 0m4.711s
user 0m4.555s
sys 0m0.108s
Note the above is the 3rd execution to remove caching differences.
Can anyone provide the equivalent C, perl, python, whatever code to this:
$ cat tst.awk
/hit [[:digit:]]*0 / { print $4, $1, $3 }
i.e. find THAT REGEXP on a line (we're not looking for some other solution that works around the need for a regexp), split the line at each series of contiguous white space and print the 4th, then 1st, then 3rd fields separated by a single blank char?
If so we can test them all on one platform to see/record the performance differences.
The code contributed so far:
AWK (can be tested against gawk, etc. but mawk, nawk and perhaps others will require [0-9] instead of [:digit:])
awk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file
PHP
php -R 'if(preg_match("/hit \d*0 /", $argn)){$f=preg_split("/\s+/", $argn); echo $f[3]." ".$f[0]." ".$f[2];}' < file
shell
egrep 'hit [[:digit:]]*0 ' file | awk '{print $4, $1, $3}'
grep --mmap -E "^hit [[:digit:]]*0 " file | awk '{print $4, $1, $3 }'
Ruby
$ cat tst.rb
File.open("file").readlines.each do |line|
line.gsub(/(hit)\s[0-9]*0\s+(.*?)\s+(.*)/) { puts "#{$3} #{$1} #{$2}" }
end
$ ruby tst.rb
Perl
$ cat tst.pl
#!/usr/bin/perl -nl
# A solution much like the Ruby one but with atomic grouping
print "$4 $1 $3" if /^(hit)(?>\s+)(\d*0)(?>\s+)((?>[^\s]+))(?>\s+)(?>([^\s]+))$/
$ perl tst.pl file
Python
none yet
C
none yet

Applying egrep before awk gives a great speedup:
paul#home ~ % wc -l file
10000000 file
paul#home ~ % for i in {1..5}; do time egrep 'hit [[:digit:]]*0 ' file | awk '{print $4, $1, $3}' | wc -l ; done
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.63s user 0.02s system 85% cpu 0.759 total
awk '{print $4, $1, $3}' 0.70s user 0.01s system 93% cpu 0.760 total
wc -l 0.00s user 0.02s system 2% cpu 0.760 total
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.65s user 0.01s system 85% cpu 0.770 total
awk '{print $4, $1, $3}' 0.71s user 0.01s system 93% cpu 0.771 total
wc -l 0.00s user 0.02s system 2% cpu 0.771 total
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.64s user 0.02s system 82% cpu 0.806 total
awk '{print $4, $1, $3}' 0.73s user 0.01s system 91% cpu 0.807 total
wc -l 0.02s user 0.00s system 2% cpu 0.807 total
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.63s user 0.02s system 86% cpu 0.745 total
awk '{print $4, $1, $3}' 0.69s user 0.01s system 92% cpu 0.746 total
wc -l 0.00s user 0.02s system 2% cpu 0.746 total
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.62s user 0.02s system 88% cpu 0.727 total
awk '{print $4, $1, $3}' 0.67s user 0.01s system 93% cpu 0.728 total
wc -l 0.00s user 0.02s system 2% cpu 0.728 total
versus:
paul#home ~ % for i in {1..5}; do time gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null; done
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.46s user 0.04s system 97% cpu 2.548 total
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.43s user 0.03s system 98% cpu 2.508 total
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.40s user 0.04s system 98% cpu 2.489 total
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.38s user 0.04s system 98% cpu 2.463 total
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.39s user 0.03s system 98% cpu 2.465 total
'nawk' is even slower!
paul#home ~ % for i in {1..5}; do time nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null; done
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 6.05s user 0.06s system 92% cpu 6.606 total
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 6.11s user 0.05s system 96% cpu 6.401 total
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 5.78s user 0.04s system 97% cpu 5.975 total
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 5.71s user 0.04s system 98% cpu 5.857 total
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 6.34s user 0.05s system 93% cpu 6.855 total

On OSX Yosemite
time bash -c 'grep --mmap -E "^hit [[:digit:]]*0 " file | awk '\''{print $4, $1, $3 }'\''' >/dev/null
real 0m5.741s
user 0m6.668s
sys 0m0.112s

first Idea
File.open("file").readlines.each do |line|
line.gsub(/(hit)\s[0-9]*0\s+(.*?)\s+(.*)/) { puts "#{$3} #{$1} #{$2}" }
end
Second idea
File.read("file").scan(/(hit)\s[[:digit:]]*0\s+(.*?)\s+(.*)/) { |f,s,t| puts "#{t} #{f} #{s}" }
Trying to get something able to compare answer I ended up creating a github repo here. Each push to this repo trigger a build on travis-ci which compose a markdown file pushed in turn to the gh-pages branch to update a web page with a view on the build results.
Anyone wishing to participate can fork the github repo, add tests and do a pull request wich I'll merge asap if it does not break the others tests.

mawk is slightly faster than gawk.
$ time bash -c 'mawk '\''/hit [[:digit:]]*0 / { print $4, $1, $3 }'\'' file | wc -l'
0
real 0m1.160s
user 0m0.484s
sys 0m0.052s
$ time bash -c 'gawk '\''/hit [[:digit:]]*0 / { print $4, $1, $3 }'\'' file | wc -l'
100000
real 0m1.648s
user 0m0.996s
sys 0m0.060s
(Only 1,000,000 lines in my input file. Best results of many displayed, though they were quite consistent.)

Here comes an equivalent in PHP:
$ time php -R 'if(preg_match("/hit \d*0 /", $argn)){$f=preg_split("/\s+/", $argn); echo $f[3]." ".$f[0]." ".$f[2];}' < file > /dev/null
real 2m42.407s
user 2m41.934s
sys 0m0.355s
compared to your awk:
$ time awk -f tst.awk file > /dev/null
real 0m3.271s
user 0m3.165s
sys 0m0.104s
I tried a different approach in PHP where I iterate trough the file manually, this makes things a lot faster but I'm still not impressed:
tst.php
<?php
$fd=fopen('file', 'r');
while($line = fgets($fd)){
if(preg_match("/hit \d*0 /", $line)){
$f=preg_split("/\s+/", $line);
echo $f[3]." ".$f[0]." ".$f[2]."\n";
}
}
fclose($fd);
Results:
$ time php tst.php > /dev/null
real 0m27.354s
user 0m27.042s
sys 0m0.296s

Related

Listing optimized binary function sizes in bytes

I am optimizing my code for an MCU, I want to see the overview of sizes of all functions in my C program including all libraries, like say:
$ objdump some arguments that I do not know | pipe through something | clean the result some how
_main: 300 bytes
alloc: 200 bytes
do_something: 1111 bytes
etc...

nm -S -t d a.out | grep function_name | awk '{gsub ("0*", "", $2); print $2}'
Or print a list of sorted sizes for each symbol:
nm -S --size-sort -t d a.out | awk '{gsub ("0*", "", $2); print $4 " " $2}'
nm list symbols from object files. We are using -S to print the size of defined symbols in the second column and -t d to make output in decimal format.

$ objdump a.out -t|grep "F .text"|cut -f 2-3
00000000000000b1 __slow_div
00000000000000d8 __array_compact_and_grow
00000000000000dc __log
00000000000000de __to_utf
00000000000000e9 __string_compact
00000000000000fe __gc_release
000000000000001d __gc_check
000000000000001f array_compact
00000000000001e8 __string_split
000000000000002a c_roots_clear
000000000000002f f
000000000000002f _start
000000000000002f wpr
000000000000003d array_get
Not perfect, sizes are in hex, but it's easy to write a script that will convert to decimals and sort arbitrarily.

print records using awk with array

/stu/class/sub/med/science score: 100 name: A status: Pass Roll: 12
/stu/class/sub/med/hist score: 75 name: B status: Pass Roll: 13
/stu/class/sub/med/comp score: 96 name: C status: Pass Roll: 14
/stu/class/sub/med/geo score: 40 name: D status: Fail Roll:15
/stu/class/sub/med/mat score: 100 name: D status: Pass Roll:16
I have above details in a file say "input.txt" and i want to print the $1 if those of scored greater than 95. In this case, I should get the following output.
science score: 100
comp score: 96
mat score: 100
Can someone help me to code this. I am trying to use awk with array. So far i have done this, but did not yet complete.
declare -A list
i=1
while read line; do
list[$i]=$(echo $line | awk '{print $3}')
((i++));
done < input.txt
for i in "${list[#]}"; do
echo "$i";
if ...
done

It is as simple as this:
awk '$3 > 95 {sub(/.*\//,"",$1); print $1, $2, $3}' test.txt
awk separates lines by spaces by default. The third field is the actual score, so this is where you check for a value greater than 95. To get the output you stated, you have to print the first three fields, and not only the first (which would be the path only).
Update: Thanks to #jaypal for the comment on how to remove the path.
Note however, that this only works if you don't have any spaces in your path- and filenames.

$ awk -F'[/ ]+' '$8>95{print $6, $7, $8}' file
science score: 100
comp score: 96
mat score: 100

Try this:
awk '{if ($3 > 95) print $1, $2, $3;}' < input.txt
Redirection is optional therefore, the following will work as well.
awk '{if ($3 > 95) print $1, $2, $3}' input.txt

Cannot print entire array in Bash Shell script

I've written a shell script to get the PIDs of specific process names (e.g. pgrep python, pgrep java) and then use top to get the current CPU and Memory usage of those PIDs.
I am using top with the '-p' option to give it a list of comma-separated PID values. When using it in this mode, you can only query 20 PIDs at once, so I've had to come up with a way of handling scenarios where I have more than 20 PIDs to query. I'm splitting up the list of PIDs passed to the function below and "despatching" multiple top commands to query the resources:
# $1 = List of PIDs to query
jobID=0
for pid in $1; do
if [ -z $pidsToQuery ]; then
pidsToQuery="$pid"
else
pidsToQuery="$pidsToQuery,$pid"
fi
pidsProcessed=$(($pidsProcessed+1))
if [ $(($pidsProcessed%20)) -eq 0 ]; then
debugLog "DESPATCHED QUERY ($jobID): top -bn 1 -p $pidsToQuery | grep \"^ \" | awk '{print \$9,\$10}' | grep -o '.*[0-9].*' | sed ':a;N;\$!ba;s/\n/ /g'"
resourceUsage[$jobID]=`top -bn 1 -p "$pidsToQuery" | grep "^ " | awk '{print $9,$10}' | grep -o '.*[0-9].*' | sed ':a;N;$!ba;s/\n/ /g'`
jobID=$(($jobID+1))
pidsToQuery=""
fi
done
resourceUsage[$jobID]=`top -bn 1 -p "$pidsToQuery" | grep "^ " | awk '{print $9,$10}' | grep -o '.*[0-9].*' | sed ':a;N;$!ba;s/\n/ /g'`
The top command will return the CPU and Memory usage for each PID in the format (CPU, MEM, CPU, MEM etc)...:
13 31.5 23 22.4 55 10.1
The problem is with the resourceUsage array. Say, I have 25 PIDs I want to process, the code above will place the results of the first 20 PIDs in to $resourceUsage[0] and the last 5 in to $resourceUsage[1]. I have tested this out and I can see that each array element has the list of values returned from top.
The next bit is where I'm having difficulty. Any time I've ever wanted to print out or use an entire array's set of values, I use ${resourceUsage[#]}. Whenever I use that command in the context of this script, I only get element 0's data. I've separated out this functionality in to a script below, to try and debug. I'm seeing the same issue here too (data output to debug.log in same dir as script):
#!/bin/bash
pidList="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25"
function quickTest() {
for ((i=0; i<=1; i++)); do
resourceUsage[$i]=`echo "$i"`
done
echo "${resourceUsage[0]}"
echo "${resourceUsage[1]}"
echo "${resourceUsage[#]}"
}
function debugLog() {
debugLogging=1
if [ $debugLogging -eq 1 ]; then
currentTime=$(getCurrentTime 1)
echo "$currentTime - $1" >> debug.log
fi
}
function getCurrentTime() {
if [ $1 -eq 0 ]; then
echo `date +%s`
elif [ $1 -eq 1 ]; then
echo `date`
fi
}
jobID=0
for pid in $pidList; do
if [ -z $pidsToQuery ]; then
pidsToQuery="$pid"
else
pidsToQuery="$pidsToQuery,$pid"
fi
pidsProcessed=$(($pidsProcessed+1))
if [ $(($pidsProcessed%20)) -eq 0 ]; then
debugLog "DESPATCHED QUERY ($jobID): top -bn 1 -p $pidsToQuery | grep \"^ \" | awk '{print \$9,\$10}' | grep -o '.*[0-9].*' | sed ':a;N;\$!ba;s/\n/ /g'"
resourceUsage[$jobID]=`echo "10 10.5 11 11.5 12 12.5 13 13.5"`
debugLog "Resource Usage [$jobID]: ${resourceUsage[$jobID]}"
jobID=$(($jobID+1))
pidsToQuery=""
fi
done
#echo "Dispatched job: $pidsToQuery"
debugLog "DESPATCHED QUERY ($jobID): top -bn 1 -p $pidsToQuery | grep \"^ \" | awk '{print \$9,\$10}' | grep -o '.*[0-9].*' | sed ':a;N;\$!ba;s/\n/ /g'"
resourceUsage[$jobID]=`echo "14 14.5 15 15.5"`
debugLog "Resource Usage [$jobID]: ${resourceUsage[$jobID]}"
memUsageInt=0
memUsageDec=0
cpuUsage=0
i=1
debugLog "Row 0: ${resourceUsage[0]}"
debugLog "Row 1: ${resourceUsage[1]}"
debugLog "All resource usage results: ${resourceUsage[#]}"
for val in ${resourceUsage[#]}; do
resourceType=$(($i%2))
if [ $resourceType -eq 0 ]; then
debugLog "MEM RAW: $val"
memUsageInt=$(($memUsageInt+$(echo $val | cut -d '.' -f 1)))
memUsageDec=$(($memUsageDec+$(echo $val | cut -d '.' -f 2)))
debugLog " MEM INT: $memUsageInt"
debugLog " MEM DEC: $memUsageDec"
elif [ $resourceType -ne 0 ]; then
debugLog "CPU RAW: $val"
cpuUsage=$(($cpuUsage+$val))
debugLog "CPU TOT: $cpuUsage"
fi
i=$(($i+1))
done
debugLog "$MEM DEC FINAL: $memUsageDec (pre)"
memUsageDec=$(($memUsageDec/10))
debugLog "$MEM DEC FINAL: $memUsageDec (post)"
memUsage=$(($memUsageDec+$memUsageInt))
debugLog "MEM USAGE: $memUsage"
debugLog "CPU USAGE: $cpuUsage"
debugLog "MEM USAGE: $memUsage"
debugLog "PROCESSED VALS: $cpuUsage,$memUsage"
echo "$cpuUsage,$memUsage"
I'm really stuck here as I've printed out entire arrays before in Bash Shell with no problem. I've even repeated this in the shell console with a few lines and it works fine there:
listOfValues[0]="1 2 3 4"
listOfValues[1]="5 6 7 8"
echo "${listOfValues[#]}"
Am I missing something totally obvious? Any help would be greatly appreciated!
Thanks in advance! :)

Welcome to StackOverflow, and thanks for providing a test case! The bash tag wiki has additional suggestions for creating small, simplified test cases. Here's a minimal version that shows your problem:
log() {
echo "$1"
}
array=(foo bar)
log "Values: ${array[#]}"
Expected: Values: foo bar. Actual: Values: foo.
This happens because ${array[#]} is magic in quotes, and turns into multiple arguments. The same is true for $#, and for brevity, let's consider that:
Let's say $1 is foo and $2 is bar.
The single parameter "$#" (in quotes) is equivalent to the two arguments "foo" "bar".
"Values: $#" is equivalent to the two parameters "Values: foo" "bar"
Since your log statement ignores all arguments after the first one, none of them show up. echo does not ignore them, and instead prints all arguments space separated, which is why it appeared to work interactively.
This is as opposed to ${array[*]} and $*, which are exactly like $# except not magic in quotes, and does not turn into multiple arguments.
"$*" is equivalent to "foo bar"
"Values: $*" is equivalent to "Values: foo bar"
In other words: If you want to join the elements in an array into a single string, Use *. If you want to add all the elements in an array as separate strings, use #.
Here is a fixed version of the test case:
log() {
echo "$1"
}
array=(foo bar)
log "Values: ${array[*]}"
Which outputs Values: foo bar

I would use ps, not top, to get the desired information. Regardless, you probably want to put the data for each process in a separate element of the array, not one batch of 20 per element. You can do this using a while loop and a process substitution. I use a few array techniques to simplify the process ID handling.
pid_array=(1 2 3 4 5 6 7 8 9 ... )
while (( ${#pid_array[#]} > 0 )); do
printf -v pidsToQuery "%s," "${pid_array[#]:0:20}"
pid_array=( "${pid_array[#]:20}" )
while read cpu mem; do
resourceUsage+=( "$cpu $mem" )
done < <( top -bn -1 -p "${pidsToQuery%,}" ... )
done

Delete lines in a file containing argument passed on command line

I'm trying to delete specific lines based on the argument passed in.
My data.txt file contains
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
My del.sh contains
myvar=$1
sed'/$myvar/d' data.txt > temp.txt
mv temp.txt > data.txt
but it just prints every line in temp.txt to data.txt....however
sed '/64/d' data.txt > temp.txt
will do the correct data transfer (but I don't want to hardcode 64), I feel like there's some kind of syntax error with the argument. Any input please

It's because of the single quotes, change them to double quotes. Variables inside single quotes are not interpolated, so you are sending the literal string $myvar to sed, instead of the value of $myvar.
Change:
sed '/$myvar/d' data.txt
to:
sed "/$myvar/d" data.txt
Note: You will run into issues when $myvar contains regular expression meta characters or forward slashes as pointed out in this response from Ed Morton. If you are not in complete control of your input you will need to find another avenue to accomplish this.

Assuming this is undesirable behavior:
$ cat file
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
$ myvar=6
$ sed "/$myvar/d" file
Monitor 22 42 50
$ myvar=/
$ sed "/$myvar/d" file
sed: -e expression #1, char 3: unknown command: `/'
$ myvar=.
$ sed "/$myvar/d" file
$
Try this instead:
$ myvar=6
$ awk -v myvar="$myvar" '{for (i=1; i<=NF;i++) if ($i == myvar) next }1' file
Monitor 22 42 50
Game 32 64 128
$ myvar=/
$ awk -v myvar="$myvar" '{for (i=1; i<=NF;i++) if ($i == myvar) next }1' file
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
$ myvar=.
$ awk -v myvar="$myvar" '{for (i=1; i<=NF;i++) if ($i == myvar) next }1' file
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
and if you think you can just escape the /s and use sed, you can't because you might be adding a 2nd backslash to one already present:
$ foo='\/'
$ myvar=${foo//\//\\\/}
$ sed "/$myvar/d" file
sed: -e expression #1, char 5: unknown command: `/'
$ awk -v myvar="$myvar" '{for (i=1; i<=NF;i++) if ($i == myvar) next }1' file
Cpu 500 64 6
Monitor 22 42 50
Game 32 64 128
This is simply NOT a job you can in general do with sed due to it's syntax and it's restriction of only allowing REs in it's search.

You can also use awk to do the same,
awk '!/'$myvar'/' data.txt > temp.txt && mv temp.txt data.txt
Use -i option in addition to what #SeanBright proposed. Then you won't need > temp.txt and mv temp.txt data.txt.
sed -i "/$myvar/d" data.txt

Awk: extract different columns from many different files

File Example
I have a 3-10 amount of files with:
- different number of columns
- same number of rows
- inconsistent spacing (sometimes one space, other tabs, sometimes many spaces) **within** the very files like the below
> 0 55.4 9.556E+09 33
> 1 1.3 5.345E+03 1
> ........
> 33 134.4 5.345E+04 932
>
........
I need to get column (say) 1 from file1, column 3 from file2, column 7 from file3 and column 1 from file4 and combine them into a single file, side by side.
Trial 1: not working
paste <(cut -d[see below] -f1 file1) <(cut -d[see below] -f3 file2) [...]
where the delimiter was ' ' or empty.
Trial 2: working with 2 files but not with many files
awk '{
a1=$1;b1=$4;
getline <"D2/file1.txt";
print a1,$1,b1,$4
}' D1/file1.txt >D3/file1.txt
Now more general question:
How can I extract different columns from many different files?

In your paste / cut attempt, replace cut by awk:
$ paste <(awk '{print $1}' file1 ) <(awk '{print $3}' file2 ) <(awk '{print $7}' file3) <(awk '{print $1}' file4)

Assuming each of your files has the same number of rows, here's one way using GNU awk. Run like:
awk -f script.awk file1.txt file2.txt file3.txt file4.txt
Contents of script.awk:
FILENAME == ARGV[1] { one[FNR]=$1 }
FILENAME == ARGV[2] { two[FNR]=$3 }
FILENAME == ARGV[3] { three[FNR]=$7 }
FILENAME == ARGV[4] { four[FNR]=$1 }
END {
for (i=1; i<=length(one); i++) {
print one[i], two[i], three[i], four[i]
}
}
Note:
By default, awk separates columns on whitespace. This includes tab characters and spaces, and any amount of these. This makes awk ideal for files with inconsistent spacing. You can also expand the above code to include more files if you wish.

The combination of cut and paste should work:
$ cat f1
foo
bar
baz
$ cat f2
1 2 3
4 5 6
7 8 9
$ cat f3
a b c d
e f g h
i j k l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -d' ' -f3 f3)
foo 2 c
bar 5 g
baz 8 k
Edit: This works with tabs, too:
$ cat f4
a b c d
e f g h
i j k l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -f3 f4)
foo 2 c
bar 5 g
baz 8 k

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

What is the performance difference between gawk and ....? [closed] - c

On OSX Yosemite time bash -c 'grep --mmap -E "^hit [[:digit:]]*0 " file | awk '\''{print $4, $1, $3 }'\''' >/dev/null real 0m5.741s user 0m6.668s sys 0m0.112s

Related

Listing optimized binary function sizes in bytes

print records using awk with array

Cannot print entire array in Bash Shell script

Delete lines in a file containing argument passed on command line

Awk: extract different columns from many different files

Categories

Resources