bash - variable storing multiple lines of file - file

This is my code:
grep $to_check $forbidden >${dir}variants_of_interest;
cat ${dir}variants_of_interest | (while read line; do
#process ${line} and echo result
done;
)
Thank to grep I get lines of data that I then process separately in loop. I would like to use variable instead of using file variants_of_interest.
Reason for this is that I am afraid that writing to file thousands of time (and consequently reading from it) rapidly slows down computation, so I am hoping that avoiding writing to file could help. What do you think?
I have to do thousands of grep commands and variants_of_interest contains up to 10 lines only.
Thanks for your suggestions.

You can just make grep pass its output directly to the loop:
grep "$to_check" "$forbidden" | while read line; do
#process "${line}" and echo result
done
I removed the explicit subshell in your example, since it is already in a separate one due to the piping. Also don't forget to quote the $line variable to prevent whitespace expansion on use.

You dont have to write a file. Simply iterate over the result of grep:
grep $to_check $forbidden | (while read line; do
#process ${line} and echo result
done;
)

This might work for you:
OIFS="$IFS"; IFS=$'\n'; lines=($(grep $to_check $forbidden)); IFS="$OIFS"
for line in "${lines[#]}"; do echo $(process ${line}); done
The first line places the results of the grep into the variable array lines.
The second line processes the array lines placing each line into the variable line

Related

Make a list of all files in two folders then iterate through the combined list randomly

I have two directories with photos that I want to manipulate to output a random order of the files each time a script is run. How would I create such a list?
d1=/home/Photos/*.jpg
d2=/mnt/JillsPC/home/Photos/*.jpg
# somehow make a combined list, files = d1 + d2
# somehow randomise the file order
# during execution of the for;do;done loop, no file should be repeated
for f in $files; do
echo $f # full path to each file
done
I wouldn't use variables if you don't have to. It's more natural if you chain a couple of commands together with pipes or process substitution. That way everything operates on streams of data without loading the entire list of names into memory all at once.
You can use shuf to randomly permute input lines, and find to list files one per line. Or, to be maximally safe, let's use \0 separators. Finally, a while loop with process substitution reads line by line into a variable.
while IFS= read -d $'\0' -r file; do
echo "$file"
done < <(find /home/Photos/ /mnt/JillsPC/home/Photos/ -name '*.jpg' -print0 | shuf -z)
That said, if you do want to use some variables then you should use arrays. Arrays handle file names with whitespace and other special characters correctly, whereas regular string variables muck them all up.
d1=(/home/Photos/*.jpg)
d2=(/mnt/JillsPC/home/Photos/*.jpg)
files=("${d1[#]}" "${d2[#]}")
Iterating in order would be easy:
for file in "${files[#]}"; do
echo "$file"
done
Shuffling is tricky though. shuf is still the best tool but it works best on a stream of data. We can use printf to print each file name with the trailing \0 we need to make shuf -z happy.
d1=(/home/Photos/*.jpg)
d2=(/mnt/JillsPC/home/Photos/*.jpg)
files=("${d1[#]}" "${d2[#]}")
while IFS= read -d $'\0' -r file; do
echo "$file"
done < <(printf '%s\0' "${files[#]}" | shuf -z)
Further reading:
How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
How can I find and safely handle file names containing newlines, spaces or both?
I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
How can I randomize (shuffle) the order of lines in a file? Or select a random line from a file, or select a random file from a directory?
I came up with this solution after some more reading:
files=(/home/roy/Photos/*.jpg /mnt/JillsPC/home/jill/Photos/*.jpg)
printf '%s\n' "${files[#]}" | sort -R
Edit: updated with John's improvements from comments.
You can add any number of directories into an array declaration (though see caveat with complex names in comments).
sort -R seems to use shuf internally from looking at it's man page.
This was the original, which works, but is not as robust as the above:
files=(/home/roy/Photos/*.jpg /mnt/JillsPC/home/jill/Photos/*.jpg)
(IFS=$'\n'; echo "${files[*]}") | sort -R
With IFS=$'\n', echoing the array will display it line by line (IFS=$'somestring' is syntax for string literals with escape sequences. So unlike '\n', $'\n' is the correct way to set it to a line break). IFS is not needed when using the printf method above.
echo ${files[*]} will print out all array elements at once, using the IFS defined in

While loop reading only last line?

i want to print all array data of foo line by line but this loop is
printing last line of string of array it is not printing all line of
array variable please help.
foo=( $(grep name emp.txt) )
while read -r line ; do echo "$line"
done <<< ${foo[#]}
While David C. Rankin presented a working alternative, he chose to not explain why the original approach didn't work. See the Bash Reference Manual: Word Splitting:
The shell scans the results of parameter expansion, command
substitution, and arithmetic expansion that did not occur within
double quotes for word splitting.
So, you can make your approach work by using double quotes around the command substitution as well as the parameter expansion:
foo=("$(grep name emp.txt)")
while read -r line; do echo "$line"
done <<<"${foo[#]}"
Note that this assigns the whole grep output to the sole array element ${foo[0]}, i. e., we don't need an array at all and could use a simple variable foo just as well.
If you do want to read the grep output lines into an array with one line per element, then there's the Bash Builtin Command readarray:
< <(grep name emp.txt) readarray foo
This uses the expansion Process Substitution.
i want to replace some text can i use sed command in echo "$line"
Of course you can use echo "$line" | sed ….

Saving in arrays and comparing an argument to an array in bash

I do not know why this code stopped working
I tested it a couple of times and it was running great
what I am trying to do hear is place first and second in 2 different arrays
and then comparing argument $2 ==> $comment to the array varA and if it is in the array i do not want to store it in the text file $file
comment=$2
dueD=$3
x=0
hasData()
{
declare -a varA varB
cat $file | while IFS=$'\t' read -r num first second;do
varA+=("$first")
varB+=("$second")
done
if [[ ${varA[#]} == ~$comment ]]; then
echo "already in the Todo list"
else
x=$(cat $file | wc -l)
x=$(($x+1))
echo -e "$x\t$comment\t$dueD" >> $file
fi
I think I am storing the values wrong in the array because when I try
echo ${varA[#]}
nothing gets printed
more over I think my if statement is not accurate enough since this is the 4th time I edit it and it works but after a while it no longer works
need assistance kindly
Your pipeline creates a sub-shell. Therefore your assignments to varA and varB happen in the sub-shell and are lost as soon as the sub-shell exits. See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)? for how to do this without a sub-shell. – Etan Reisner
Look at the solutions there. See how they don't use a pipe? That's the solution: Don't use a pipe. Use one of the other input redirection options. – Etan Reisner

Bash: read lines into an array *without* touching IFS

I'm trying to read the lines of output from a subshell into an array, and I'm not willing to set IFS because it's global. I don't want one part of the script to affect the following parts, because that's poor practice and I refuse to do it. Reverting IFS after the command is not an option because it's too much trouble to keep the reversion in the right place after editing the script. How can I explain to bash that I want each array element to contain an entire line, without having to set any global variables that will destroy future commands?
Here's an example showing the unwanted stickiness of IFS:
lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines without IFS"
IFS=$'\r\n' lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines with IFS"
lines=($(egrep "^-o" speccmds.cmd))
echo "${#lines[#]} lines without IFS?"
The output is:
42 lines without IFS
6 lines with IFS
6 lines without IFS?
This question is probably based on a misconception.
IFS=foo read does not change IFS outside of the read operation itself.
Thus, this would have side effects, and should be avoided:
IFS=
declare -a array
while read -r; do
array+=( "$REPLY" )
done < <(your-subshell-here)
...but this is perfectly side-effect free:
declare -a array
while IFS= read -r; do
array+=( "$REPLY" )
done < <(your-subshell-here)
With bash 4.0 or newer, there's also the option of readarray or mapfile (synonyms for the same operation):
mapfile -t array < <(your-subshell-here)
In examples later added to your answer, you have code along the lines of:
lines=($(egrep "^-o" speccmds.cmd))
The better way to write this is:
mapfile -t lines < <(egrep "^-o" speccmds.cmd)
Are you trying to store the lines of the output in an array, or the words of each line?
lines
mapfile -t arrayname < <(your subshell)
This does not use IFS at all.
words
(your subshell) | while IFS=: read -ra words; do ...
The form var=value command args... puts the var variable into the environment of the command, and does not affect the current shell's environment.

How to work with block in loop by using sed?

I have file with some blocks, like this:
<start> test var=3333
<g>test=000000000000 tst <s>
<end>
...
<start> var=564735628
<title>somethink<\title>
<end>
...
And I need to get block between and sections in a loop.
And then I need to get some simbols in the current block.
I try to do like this:
for block in $(cat $file | sed -n '/<start>/,/<end>/p;'); do
echo $block
done
Result is:
<start>
instead
<start> test 1
<g>test=000000000000 tst <s>
<end>
How can I get the entire block for further processing?
Ok, I try to explain
Source is
<start> test var=3333
<g>test=000000000000 tst <s>
<end>
Result of yours code is not a block. It is just a sting.
The string is <end>t> test var=3333tst <s>
As can you see it is overlaping strings of the block on each other.
One sugession, do not used sed here.
Use languages like perl or python which gives modules for parsing HTML and XML.
You could do something like:
block=""
cat $file | sed -n '/<start>/,/<end>/p;' | while read -r line; do
if [ -z "$block" ]; then
block="$line"
else
block=$(printf "%s\\n%s" "$block" "$line")
fi
if printf "%s\\n" "$line" | grep "<end>" > /dev/null; then
echo "$block"
block=""
fi
done
As choroba said in his answer, your for loop will use the IFS variable to split sed's output into separate fields, and the block variable will contain only a single field. (Ie., block will contain <start>, then test, then var=3333, and so on).
A solution is to force it read line by line, by piping the output of sed into the loop command, and read the line using the read command. The -r flag for the read command forces it not to interpret the backslash as an escaping character. Now we have a variable $line with our line, but not the block. To get the block, simply concatenate the lines together until we find the <end> string.
If the $block variable is empty, we can simply assign the $line to it. Otherwise, we use the printf command to generate a new string containing the previous value of $block concatanated with a newline character and the contents of $line. This newline character prevents that the block will become a single line.
To test if we found the last line, we can print the current value of the block and see if grep finds it. I used printf because it's safer then echo when the string we want to print starts with a variable (we can't guarantee that the variable doesn't start with a hyphen, which echo could interpret as an option). We must also remember to clear the block variable when we actually find a block, in order to prepare it for the next block.
Word splitting is applied to the output of your sed command. You can set IFS to an empty value to prevent word splitting on the sed output, but it will make the whole output of sed into one "block". I would rather switch to a more powerful language like Perl.
This might work for you (GNU sed and bash):
OIFS=$IFS; IFS=$'\n'; block=($(sed '/<start>/,/<end>/!d' file)); IFS=$OIFS
for x in "${!block[#]}"; do echo "${block[x]}"; done
Slurp the sed command output into an array block and loop through the array.
By changing IFS and inserting a delimiter character in between your blocks, you can iterate through each block.
For example, use : as the delimiter
OLDIFS=$IFS; IFS=':'
blocks=$(sed -n '/start/,/end/ {/start/ s/^/:/; p}' file)
for block in ${blocks#:}; do
echo "This is block $((count++))"
echo "$block"
done
IFS=$OLDIFS
Note:
Blocks are 'separated' by inserting : before <start> and setting IFS to :
${blocks#:} removes the first :, otherwise :block1:block2... is interpreted as emptyblock:block1:block2..., i.e. the loop iterates over the non-existent first block (which is empty and exists due to how : is placed)
Alternatively, : can be placed behind <end> but then the last line of the block would become <end>:\n so there would be an extra newline before the start of next block.

Resources