Converting continuous streaming text into comma separated, multi-line file - file

I'm trying to convert a continuous stream of data (random) into comma separated and line separated values. I'm converting the continuous data into csv and then after some columns (let's say 80), I need to put a newline and repeat the process until.
Here's what I did for csv:
gawk '$1=$1' FIELDWIDTHS='4 5 7 1 9 5 10 6 8 3 2 2 8 4 8 8 4 6 9 1' OFS=, tmp
'tmp' is the file with following data:
"ZaOAkHEnOsBmD5yZk8cNLC26rIFGSLpzuGHtZgb4VUP4x1Pd21bukeK6wUYNueQQMglvExbnjEaHuoxU0b7Dcne5Y4JP332RzgiI3ZDgHOzm0gjDLVat8au7uckM3t60nqFX0Cy93jXZ5T0IaQ4fw2JfdNF1PbqxDxXv7UGiyysFJ8z16TmYQ9zfBRCZvZirIyRboHNEGgMUFZ18y8XXCGrbpeL0WLstzpSuXetmo47G2xPkDLDcFA6cdM4WAFNpoC2ztspY7YyVsoMZdU7D3u3Lm6dDcKuJKdTV6600GkbLuvAamKGyzMtoqW3liI3ybdTNR9KLz2l7KTjUiGgc3Eci5wnhIosAUMkcSQVxFrZdJ9MVyj6duXAk0CJoRvHYuyfdAr7vjlwjkLkYPtFvAZp6wK3dfetoh3ZmhJhUxqzuxOLDQ9FYcvz64iuIUbgXVZoRnpRoNGw7j3fCwyaqCi..."
I'm generating the continuous sequence from /dev/urandom. I'm not getting how to repeat the gawk after some column by adding a newline character after the column ends.

I got it actually. A simple for loop did that.
Here's my whole code:
for i in $(seq 10)
do
tr -dc A-Za-z0-9 < /dev/urandom | head -c 100 > tmp
gawk '$1=$1' FIELDWIDTHS='4 5 7 1 9 5 10 6 8 3 2 2 8 4 8 8 4 6 9 1' OFS=, tmp >> tmp1
done
Any optimizations would be appreciated.

Related

terminal, diff of two files, print out all lines, which are on the 2nd file a no on 1st file

On follow samples, every line can be empty or can have some characters. The characters can be other than numbers too. Every line can have line feeds and tabs too.
Follow looks partly fine, a don't work with more complex content:
file1.txt
1
2
3
5
file2.txt
1
4
5
working with simple sample above:
comm -1 -3 file1.txt file2.txt
Output, which is fine
4
More complex sample, which don't fit
file1.txt
0
2
3
4
5
6
7
8
10
file2.txt
1
4
6
7
8
9
10
wrong output (the 10 should not on output on this sample)
1
9
10
If your sort the contend your file1.txt and file2.txt on the same way, before running your sample code, your sample code works fine.
You can do it on follow way:
sort file1.txt > file1_sorted.txt
sort file2.txt > file2_sorted.txt
After than, use the files above, for your code:
comm -1 -3 file1_sorted.txt file2_sorted.txt

How to use a nested for loop to create a 1D array of integer pairs on a range?

In bash, I am trying to create 1-D array that contains all possible integer pairs on a range from a low value to a high value (i.e. 1 to 2)
I've tried using a nested for loop, however the out put I get is the array of the correct size, but all values are the high value (in this case 2)
I've tried nested for loops, however the array I am creating is not the correct size nor does it contain the correct combinations.
for (( i=$low; i<=$high; i++ ))
do
range_array[i]=$i
done
range=${#range_array[#]}
range_squared=$(( $range*$range))
new_range=$(( 2*$range_squared))
for (( i = $low; i <= $high; i++ ));do
for (( j = 1; j <= $new_range; j++ )) do
combo_array[j]=$i
done
done
echo "the following is the combo array"
echo ${combo_array[#]}
I expect the combo_array to be:
1 1 1 2 2 1 2 2
instead it is
2 2 2 2 2 2 2 2
That's too much work for such a trivial task. Here is a simple and working one:
combo_array=()
for ((i=low; i<=high; ++i)); do
for ((j=low; j<=high; ++j)); do
combo_array+=("$i" "$j")
done
done
echo "${combo_array[#]}"
Given low=6 and high=9, it outputs
6 6 6 7 6 8 6 9 7 6 7 7 7 8 7 9 8 6 8 7 8 8 8 9 9 6 9 7 9 8 9 9

Modify IFS in bash while building and array

I'm trying to build an array from 4 different arrays in bash with a custom IFS, can you lend me a hand please.
#!/bin/bash
arr1=(1 2 3 4)
arr2=(1 2 3 4)
arr3=(1 2 3 4)
arr4=(1 2 3 4)
arr5=()
oldIFS=$IFS
IFS=\;
for i in ${!arr1[#]}; do
arr5+=($(echo ${arr1[i]} ${arr2[i]} ${arr3[i]} ${arr4[i]}))
done
IFS=$oldIFS
echo ${arr5[#]}
i what the output to be:
1 1 1 1;2 2 2 2;3 3 3 3;4 4 4 4 4 4
But it doesn't work the output is with normal ' '.
1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4 4
Any ideeas?
I tried IFS in different places:
1) In the for loop
2) Before arr5()
I tested it in the for loop and after IFS does change to ";" but it doesn't take effect in the array creation.
IFS is used during the expansion of ${arr5[*]}, not while creating arr5.
arr1=(1 2 3 4)
arr2=(1 2 3 4)
arr3=(1 2 3 4)
arr4=(1 2 3 4)
arr5=()
for i in ${!arr1[#]}; do
arr5+=("${arr1[i]}" "${arr2[i]}" "${arr3[i]}" "${arr4[i]}")
done
(IFS=";"; echo "${arr5[*]}")
Where possible, it's simpler to just change IFS in a subshell rather than try to save and restore its value manually. (Your attempt fails in the rare but possible case that IFS was unset to begin with.)
That said, if you just want the ;-delimited string and arr5 was a way to get there, just build the string directly:
for i in ${!arr1[#]}; do
s+="${arr1[i]} ${arr2[i]} ${arr3[i]} ${arr4[i]};"
done
s=${s%;} # Remove the last extraneous semicolon

Combine two arrays in Bash line by line [duplicate]

This question already has answers here:
How to merge two arrays in a zipper like fashion in Bash?
(6 answers)
Closed 6 years ago.
I have 2 arrays in BASH and I want to combine them line by line i.e.
arr1=( 1 2 3 4 )
arr2=( 5 6 7 8 )
When simply adding one array to another it's like 1 2 3 4 5 6 7 8 and I want the output of this combination to be 1 5 2 6 3 7 4 8 (line by line)
Any advice?
arr1=( 1 2 3 4 )
arr2=( 5 6 7 8 )
declare -a result
resultIndex=0
for index in ${!arr1[*]}; do
result[$resultIndex]=${arr1[$index]}
let "resultIndex++"
result[$resultIndex]=${arr2[$index]}
let "resultIndex++"
done
echo "${result[#]}"

How to import data with markers - but excluding those markers?

When I go to import a matrix of data, in the first row of the first column there is a marker for every new time data is acquired and this marker is interfering with how MATLAB imports the data.
Is there a way to code this out?
for example:
'>1 6 1 1 -0.00161
1 6 1 2 -0.00140
1 6 1 3 -0.00145
1 6 1 4 -0.00153
1 6 1 5 -0.00120
1 6 1 6 -0.00076
I would prefer to not manually remove the > from the data as there will be potentially thousands.
If you're under *nix system or you have cygwin then you can get rid of these > if you send this output to the command sed. For instance:
user#host $ cat out.txt
>0 5 3 4
0 6 4 3
>1 5 3 6
1 2 4 5
user#host $ cat out.txt |sed 's/>//g'
If you need to store this new output to a file:
user#host $ cat out.txt
0 5 3 4
0 6 4 3
>1 5 3 6
1 2 4 5
user#host $ cat out.txt |sed 's/>//g' > out_without_unneeded_symbols.txt
user#host $ cat out_without_unneeded_symbols.txt
0 5 3 4
0 6 4 3
1 5 3 6
1 2 4 5
If this output is taken from some program at current dir:
user#host $ ./some_program |sed 's/>//g'
Here is one possible implementation in MATLAB:
% read file lines as a cell array of strings
fid = fopen('file.dat', 'rt');
C = textscan(fid, '%s', 'Delimiter','');
C = C{1};
fclose(fid);
% find marker locations
markers = strncmp('>', C, 1);
% remove markers
C = regexprep(C, '^>', '');
% parse numbers into a numeric matrix
X = regexp(C, '\s+', 'split');
X = str2double(vertcat(X{:}));
The result:
% the full matrix
>> X
X =
0 5 3 4
0 6 4 3
1 5 3 6
1 2 4 5
% only the marked rows
>> X(markers,:)
ans =
0 5 3 4
1 5 3 6

Resources