terminal, diff of two files, print out all lines, which are on the 2nd file a no on 1st file - file

On follow samples, every line can be empty or can have some characters. The characters can be other than numbers too. Every line can have line feeds and tabs too.
Follow looks partly fine, a don't work with more complex content:
file1.txt
1
2
3
5
file2.txt
1
4
5
working with simple sample above:
comm -1 -3 file1.txt file2.txt
Output, which is fine
4
More complex sample, which don't fit
file1.txt
0
2
3
4
5
6
7
8
10
file2.txt
1
4
6
7
8
9
10
wrong output (the 10 should not on output on this sample)
1
9
10

If your sort the contend your file1.txt and file2.txt on the same way, before running your sample code, your sample code works fine.
You can do it on follow way:
sort file1.txt > file1_sorted.txt
sort file2.txt > file2_sorted.txt
After than, use the files above, for your code:
comm -1 -3 file1_sorted.txt file2_sorted.txt

Related

SWI-Prolog, Read the file and sum the numbers in the file

I am learning prolog and have the following problem:
Reads an input file, line by line. Then write the sum of each line to the output file.
Given an input.txt input file of the following form:
1 2 7 4 5
3 1 0 7 9
Each entry line are integers separated by a space.
? - calculate (‘input.txt’, ’output.txt’).
Here is the content of the output.txt file:
19
20
I have tried many ways but still not working, hope someone can help me
go :-
setup_call_cleanup(
open('output.txt', write, Out),
forall(file_line_sum('input.txt', Sum), writeln(Out, Sum)),
close(Out)
).
file_line_sum(File, Sum) :-
file_line(File, Line),
line_sum(Line, Sum).
line_sum(Line, Sum) :-
split_string(Line, " ", "", NumsStr),
maplist(string_number, NumsStr, Nums),
sum_list(Nums, Sum).
string_number(Str, Num) :-
number_string(Num, Str).
file_line(File, Line) :-
setup_call_cleanup(
open(File, read, In),
stream_line(In, Line),
close(In)
).
stream_line(In, Line) :-
repeat,
read_line_to_string(In, Line1),
(Line1 == end_of_file -> !, fail ; Line = Line1).
Contents of input.txt:
1 2 7 4 5
3 1 0 7 9
123 456 7890
Result in swi-prolog:
?- time(go).
% 115 inferences, 0.001 CPU in 0.001 seconds (90% CPU, 195568 Lips)
Generated output.txt:
19
20
8469

Modify IFS in bash while building and array

I'm trying to build an array from 4 different arrays in bash with a custom IFS, can you lend me a hand please.
#!/bin/bash
arr1=(1 2 3 4)
arr2=(1 2 3 4)
arr3=(1 2 3 4)
arr4=(1 2 3 4)
arr5=()
oldIFS=$IFS
IFS=\;
for i in ${!arr1[#]}; do
arr5+=($(echo ${arr1[i]} ${arr2[i]} ${arr3[i]} ${arr4[i]}))
done
IFS=$oldIFS
echo ${arr5[#]}
i what the output to be:
1 1 1 1;2 2 2 2;3 3 3 3;4 4 4 4 4 4
But it doesn't work the output is with normal ' '.
1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4 4
Any ideeas?
I tried IFS in different places:
1) In the for loop
2) Before arr5()
I tested it in the for loop and after IFS does change to ";" but it doesn't take effect in the array creation.
IFS is used during the expansion of ${arr5[*]}, not while creating arr5.
arr1=(1 2 3 4)
arr2=(1 2 3 4)
arr3=(1 2 3 4)
arr4=(1 2 3 4)
arr5=()
for i in ${!arr1[#]}; do
arr5+=("${arr1[i]}" "${arr2[i]}" "${arr3[i]}" "${arr4[i]}")
done
(IFS=";"; echo "${arr5[*]}")
Where possible, it's simpler to just change IFS in a subshell rather than try to save and restore its value manually. (Your attempt fails in the rare but possible case that IFS was unset to begin with.)
That said, if you just want the ;-delimited string and arr5 was a way to get there, just build the string directly:
for i in ${!arr1[#]}; do
s+="${arr1[i]} ${arr2[i]} ${arr3[i]} ${arr4[i]};"
done
s=${s%;} # Remove the last extraneous semicolon

Accurate AWK array searching

Can anybody offer some help getting this AWK to search correctly?
I need to search inside the "sample.txt" file for all the 6 array elements in the "combinations" file. However, I need the search to happen from every single character instead of like an ordinary text editor search box type search, which searches by blocks after each occurrence. I need to search in the most squeezed in way so as to display exactly every times it happens. For example I need the type of search that finds inside the string "AAAAA" the combination "AAA" happening 3 times, not 1 time. See my previous post about this: BASH: Search a string and exactly display the exact number of times a substring happens inside it
The sample.txt file is:
AAAAAHHHAAHH
The combinations file is:
AA
HH
AAA
HHH
AAH
HHA
How do I get the script
#!/bin/bash
awk 'NR==FNR {data=$0; next} {printf "%s %d \n",$1,gsub($1,$1,data)}' 'sample.txt' combinations > searchoutput
to output the desired output:
AA 5
HH 3
AAA 3
HHH 1
AAH 2
HHA 1
instead of what it is currently outputing:
AA 3
HH 2
AAA 1
HHH 1
AAH 2
HHA 1
?
As we can see, the script is only finding the combinations just like a text editor. I need it to search for the combinations from the start of every character instead so that the desired output happens.
How do I have the AWK output the desired output instead? Can't thank you enough.
there may be a faster way to find the first match and carry forward from that index, but this might be simpler
$ awk 'NR==1{content=$0;next}
{c=0; len1=length($1);
for(i=1;i<=length(content)-len1+1;i++)
c+=substr(content,i,len1)==$1;
print $1,c}' file combs
AA 5
HH 3
AAA 3
HHH 1
AAH 2
HHA 1
you might try this:
$ awk '{x="AAAAAHHHAAHH"; n=0}{
while(t=index(x,$0)){n++; x=substr(x,t+1) }
print $0,n
}' combinations.txt
AA 5
HH 3
AAA 3
HHH 1
AAH 2
HHA 1

shell insert a line every n lines

I have two files and I am trying to insert a line from file2 into file1 every other 4 lines starting at the beginning of file1. So for example:
file1:
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
file2:
50
43
21
output I am trying to get:
50
line 1
line 2
line 3
line 4
43
line 5
line 6
line 7
line 8
21
line 9
line 10
The code I have:
while read line
do
sed '0~4 s/$/$line/g' < file1.txt > file2.txt
done < file1.txt
I am getting the following error:
sed: 1: "0~4 s/$/$line/g": invalid command code ~
The following steps through both files without loading either one into an array in memory:
awk '(NR-1)%4==0{getline this<"file2";print this} 1' file1
This might be preferable if your actual file2 is larger than what you want to hold in memory.
This breaks down as follows:
(NR-1)%4==0 - a condition which matches every 4th line starting at 0
getline this<"file2" - gets a line from "file2" and stores it in the variable this
print this - prints ... this.
1 - shorthand for "print the current line", which in this case comes from file1 (awk's normal input)
It is easing to do this using awk:
awk 'FNR==NR{a[i++]=$0; next} !((FNR-1) % 4){print a[j++]} 1' file2 file1
50
line 1
line 2
line 3
line 4
43
line 5
line 6
line 7
line 8
21
line 9
line 10
While processing first file in input i.e. file2, we store each line in array with key as an incrementing number starting with 0.
While processing second file in input i.e. file1, we check if current record # is divisible by 4 using modulo arithmetic and if it is then insert a line from file2 and increment the index counter.
Finally using action 1, we print lines from file1.
This might work for you (GNU sed):
sed -e 'Rfile1' -e 'Rfile1' -e 'Rfile1' -e 'Rfile1' file2
or just use cat and paste:
cat file1 | paste -d\\n file2 - - - -
another alternative with unix toolchain
$ paste file2 <(pr -4ats file1) | tr '\t' '\n'
50
line 1
line 2
line 3
line 4
43
line 5
line 6
line 7
line 8
21
line 9
line 10
Here's a goofy way to do it with paste and tr
paste file2 <(paste - - - - <file1) | tr '\t' '\n'
Assumes you don't have any actual tabs in your input files.

Converting continuous streaming text into comma separated, multi-line file

I'm trying to convert a continuous stream of data (random) into comma separated and line separated values. I'm converting the continuous data into csv and then after some columns (let's say 80), I need to put a newline and repeat the process until.
Here's what I did for csv:
gawk '$1=$1' FIELDWIDTHS='4 5 7 1 9 5 10 6 8 3 2 2 8 4 8 8 4 6 9 1' OFS=, tmp
'tmp' is the file with following data:
"ZaOAkHEnOsBmD5yZk8cNLC26rIFGSLpzuGHtZgb4VUP4x1Pd21bukeK6wUYNueQQMglvExbnjEaHuoxU0b7Dcne5Y4JP332RzgiI3ZDgHOzm0gjDLVat8au7uckM3t60nqFX0Cy93jXZ5T0IaQ4fw2JfdNF1PbqxDxXv7UGiyysFJ8z16TmYQ9zfBRCZvZirIyRboHNEGgMUFZ18y8XXCGrbpeL0WLstzpSuXetmo47G2xPkDLDcFA6cdM4WAFNpoC2ztspY7YyVsoMZdU7D3u3Lm6dDcKuJKdTV6600GkbLuvAamKGyzMtoqW3liI3ybdTNR9KLz2l7KTjUiGgc3Eci5wnhIosAUMkcSQVxFrZdJ9MVyj6duXAk0CJoRvHYuyfdAr7vjlwjkLkYPtFvAZp6wK3dfetoh3ZmhJhUxqzuxOLDQ9FYcvz64iuIUbgXVZoRnpRoNGw7j3fCwyaqCi..."
I'm generating the continuous sequence from /dev/urandom. I'm not getting how to repeat the gawk after some column by adding a newline character after the column ends.
I got it actually. A simple for loop did that.
Here's my whole code:
for i in $(seq 10)
do
tr -dc A-Za-z0-9 < /dev/urandom | head -c 100 > tmp
gawk '$1=$1' FIELDWIDTHS='4 5 7 1 9 5 10 6 8 3 2 2 8 4 8 8 4 6 9 1' OFS=, tmp >> tmp1
done
Any optimizations would be appreciated.

Resources