concatenate within loop using sed - loops

I have two types of files (A.n and B). A.n files are named A.1, A.2, A.3 etc.
In B file there is a word like 'Element.xyz' which I want to replace by 'Element_n.xyz' with all other things unchanged and append below the A.n files. The two added A.n and B files should be named as
final.n i.e. final.1, final.2, final.3 etc
So Final.1 file should look like this:
A.1
B file with Element.txt is replaced by Element_1.txt
I tried with this code but failed:
for f in A.*;
sed 's/Element.txt/Element_$f.txt/g' B >> tt;
cat A.$f tt >> final_$f.txt ;
done

Your code would append more and more data to the file tt. Your code would put the output from the B file before, not after, the text from the A file. Furthermore, the single quotes prevent the variable from being visible to sed. If I understand your question correctly, you are looking simply for
for f in A.*; do
n=${f#A.}
( cat "$f"
sed "s/Element.txt/Element_$n.txt/g" B ) >"final_$n".txt
done
The parentheses group the two commands so that their output can be redirected together at once. The file name in $f contains the A. part, so we chop it off with a parameter expansion and store that in $n. The argument to cat should obviously be the file name itself.

Related

Printing in Tabular format in TCL/PERL

I have a script in tcl in which a variable gets a collection of data in every loop and appends in a file. Suppose in loop1 ,
$var = {xy} {ty} {po} {iu} {ii}
and in loop2
$var = {a} {b} {c} {d1} {d2} {e3}
Now in a file f.txt the variable in dumped. Like puts $file $var. And in file it comes like this:
Line number 1: {xy} {ty} {po} {iu} {ii}
Line number 2: {a} {b} {c} {d1} {d2}
I want to print them finally in a file in tabular format. Like below:
xy a
ty b
po c
iu d1
ii d2
First, read the file in and extract the words on the first two lines:
set f [open "f.txt"]
set words1 [regexp -all -inline {\S+} [gets $f]]
set words2 [regexp -all -inline {\S+} [gets $f]]
close $f
The trick here is that regexp -all -inline returns all matching substrings, and \S+ selects non-whitespace character sequences.
Then, because we're producing tabular output, we need to measure the maximum size of the items in the first list. We might as well measure the second list at the same time.
set len1 [tcl::mathfunc::max {*}[lmap w $words1 {string length $w}]]
set len2 [tcl::mathfunc::max {*}[lmap w $words2 {string length $w}]]
The lmap applies a string length to each word, and then we find the maximum of them. {*} substitutes the list (of word lengths) as multiple arguments.
Now, we can iterate over the two lists and produce formatted output:
foreach w1 $words1 w2 $words2 {
puts [format "%-*s %-*s" $len1 $w1 $len2 $w2]
}
The format sequence %-*s consumes two arguments, one is the length of the field, and the other is the string to put in that field. It left-aligns the value within the field, and pads on the right with spaces. Without the - it would right-align; that's more useful for integers. You could instead use tab characters to separate, which usually works well if the words are short, but isn't so good once you get a wider mix of lengths.
If you're looking to produce an actual Tab-Separated Values file, the csv package in Tcllib will generate those fine with the right (obvious!) options.
Try this:
$ perl -anE 'push #{$vars[$_]}, ($F[$_] =~ s/^[{]|[}]$//gr) for 0.. $#F; END {say join "\t", #$_ for #vars}' f.txt
xy a
ty b
po c
iu d1
ii d2
command line switches:
-a : Turn on autosplit on white space to #F array.
-n : Loop over lines in input file, setting the #F array to the words on the current line.
-E : Execute the following argument as a one-liner
Removing surrounding braces from each words:
$F[$_] =~ s/^[{]|[}]$//gr
g : global substitution (we want to remove both { and })
r : non destructive operation, returns the result of the substitution instead of modifying #F

What's the unix command to copy specific lines from one file to another file?

I searched the web for hours, please excuse me if I overlooked something. I'm a beginner. I want to copy lines that include a certain string from file1 to file2. These lines from file 1 have to be inserted in file2, but only in specific lines that include another string.
(It's about the entire lines with the timecode)
Content of file1:
1
00:00:16,520 --> 00:00:23,200
Some text
2
00:00:25,800 --> 00:00:32,600
Some more text
Content of file2:
1
00: 00: 16,520 -> 00: 00: 23,200
Different text
2
00: 00: 25,720 -> 00: 00: 32,520
More different text
awk '/ --> /' file1 lists the lines I need from file1. But what do I have to add to the code to take these awk results and copy them only into the lines of file2 that include '/ -> /'??
Thanks a lot for your support!!!
Result in file2 should be:
1
00:00:16,520 --> 00:00:23,200
Different text
2
00:00:25,800 --> 00:00:32,600
More different text
Note: below is for GNU awk
So you wanna replace timeline of subtitles, right?
Given that they're indentically indexed, i.e. the number above the timecode are the same.
Then you can try this:
awk 'ARGIND==1 && /^[0-9]+$/{getline timeline; tl[$0]=timeline;}ARGIND==2 &&/^[0-9]+$/{getline tmp2drop; print $0 ORS tl[$0];} ' file1 file2
Note that /^[0-9]+$/ is the criterial, which match a whole line with a number only.
But if you have such subtitle text exists, then it will leads to error replace.
Another way is to use the line number(FNR denoted) as index:
awk 'ARGIND==1 && /-->/{tl[FNR]=$0} ARGIND==2 {if (/->/) print tl[FNR]; else print $0} ' file1 file2
But if the line number are not the same between two files, for example some subtitle texts are multiline, it still will replace wronly.
Given the occurances are at the relatively same places, we can manage a index on our own:
awk 'ARGIND==1 && /-->/{tl[i++]=$0} ARGIND==2 {if (/->/) print tl[j++]; else print $0} ' file1 file2
None of these are perfect, but to give you an idea how you could do the thing.
Choose depends on your situation, and improve the code yourself :)
note: They are just print to console, if you want replace the file. you can use > or '>>` to print the output to a temp file, and later rename to file2.
For example:
awk 'ARGIND==1 && /-->/{tl[i++]=$0} ARGIND==2 {if (/->/) print tl[j++]; else print $0} ' file1 file2 >> tmpFile2check
If you are not using GNU awk, ARGIND==1 won't work, then use this:
awk 'NR==FNR && /-->/{tl[i++]=$0} NR>FNR {if (/->/) print tl[j++]; else print $0} ' file1 file2 >> tmpFile2check
NR means the Number of Records, FNR means current File's Number of Records. If they are equal then it's the first file the script is dealing with. If NR>FNR means it's not the first file.
Note if file1 is or could be empty, then this mechanism will fail, then you should change to FILENAME=="file1" or other file checking method to avoid error processing.

Strange Memory Behavior handling TSV

I have a .tsv and I need to figure out the frequencies variables in a specific column and organize that data in descending order. I run a script in c which downloads a buffer and saves it to a .tsv file with a date stamp for a name in the same directory as my code. I then open my Terminal and run the following command, per this awesome SO answer:
cat 2016-09-06T10:15:35Z.tsv | awk -F '\t' '{print $1}' * | LC_ALL=C sort | LC_ALL=C uniq -c | LC_ALL=C sort -nr > tst.tsv
To break this apart by pipes, what this does is:
cat the .tsv file to get its contents into the pipe
awk -F '\t' '{print $1}' * breaks the file's contents up by tab and pushes the contents of the first column into the pipe
LC_ALL=C sort takes the contents of the pipe and sorts them to have like-values next to one another, then pushes that back into the pipe
LC_ALL=C uniq -c takes the stuff in the pipe and figures our how many times each value occurs and then pushes that back into the pipe (e.g, Max 3, if the name Max shows up 3 times)
Finally, LC_ALL=C sort -nr sorts the stuff in the pipe again to be in descending order, and then prints it to stdout, which I pipe into a file.
Here is where things get interesting. If I do all of this in the same directory as the c code which downloaded my .tsv file to begin with, I get super wacky results which appear to be a mix of my actual .tsv file, some random corrupted garbage, and the contents of the c code which got it in the first place. Here is an example:
( count ) ( value )
1 fprintf(f, " %s; out meta qt; rel %s; out meta qt; way %s; out meta qt; >; out meta qt;", box_line, box_line, box_line);
1 fclose(f);
1 char* out_file = request_osm("cmd_tmp.txt", true);
1 bag_delete(lines_of_request);
1
1
1
1
1
1??g?
1??g?
1?
1?LXg$E
... etc. Now if you scroll up in that, you also find some correct values, from the .tsv I was parsing:
( count ) ( value )
1 312639
1 3065411
1 3065376
1 300459
1 2946076
... etc. And if I move my .tsv into its own folder, and then cd into that folder and run that same command again, it works perfectly.
( count ) ( value )
419362 452999
115770 136420
114149 1380953
72850 93290
51180 587015
45833 209668
31973 64756
31216 97928
30586 1812906
Obviously I have a functional answer to my problem - just put the file in its own folder before parsing it. But I think that this memory corruption suggests there may be some larger issue at hand I should fix now, and I'd rather get on top of it that kick it down the road with a temporary symptomatic patch, so to speak.
I should mention that my c code does use system(cmd) sometimes.
The second command is the problem:
awk -F '\t' '{print $1}' *
See the asterisks at the end? It tells awk to process all files in the current directory. Instead, you want to just process standard input (the pipe output).
Just remove the asterisks and it should work.

Iteration with ls and space in name of file

I use bash on Ubuntu and I have some files in a folder, some with space in their name, other non.
I would like an array with file's name.
Example : [foo.txt, I am a file.txt, bar.jpg, etc.]
My code :
for x in "$(ls -1 test/)"; do
fileList+=($x)
done
I get : [foo.txt, I, am, a, file.txt, bar.jpg, etc.]
If I put fileList+=("$x") I get one line array [foo.txt I am a file.txt bar.jpg etc.].
How can I do to get what I want?
Thank you.
Why not use shell globs? E.g.
for x in test/*; do
...
or
filelist=( test/* )
EDIT:
shopt -s nullglob
shopt -s dotglob
might be also wanted.
Try using read, like this:
ls | while read f ; do
echo "$f"
done

What is the shell script instruction to divide a file with sorted lines to small files?

I have a large text file with the next format:
1 2327544589
1 3554547564
1 2323444333
2 3235434544
2 3534532222
2 4645644333
3 3424324322
3 5323243333
...
And the output should be text files with a suffix in the name with the number of the first column of the original file keeping the number of the second column in the corresponding output file as following:
file1.txt:
2327544589
3554547564
2323444333
file2.txt:
3235434544
3534532222
4645644333
file3.txt:
3424324322
5323243333
...
The script should run on Solaris but I'm also having trouble with the instruction awk and options of another instruccions like -c with cut; its very limited so I am searching for common commands on Solaris. I am not allowed to change or install anything on the system. Using a loop is not very efficient because the script takes too long with large files. So aside from using the awk instruction and loops, any suggestions?
Something like this perhaps:
$ awk 'NF>1{print $2 > "file"$1".txt"}' input
$ cat file1.txt
2327544589
3554547564
2323444333
or if you have bash available, try this:
#!/bin/bash
while read a b
do
[ -z $a ] && continue
echo $b >> "file"$a".txt"
done < input
output:
$ paste file{1..3}.txt
2327544589 3235434544 3424324322
3554547564 3534532222 5323243333
2323444333 4645644333

Resources