How to work with block in loop by using sed? - loops

I have file with some blocks, like this:
<start> test var=3333
<g>test=000000000000 tst <s>
<end>
...
<start> var=564735628
<title>somethink<\title>
<end>
...
And I need to get block between and sections in a loop.
And then I need to get some simbols in the current block.
I try to do like this:
for block in $(cat $file | sed -n '/<start>/,/<end>/p;'); do
echo $block
done
Result is:
<start>
instead
<start> test 1
<g>test=000000000000 tst <s>
<end>
How can I get the entire block for further processing?
Ok, I try to explain
Source is
<start> test var=3333
<g>test=000000000000 tst <s>
<end>
Result of yours code is not a block. It is just a sting.
The string is <end>t> test var=3333tst <s>
As can you see it is overlaping strings of the block on each other.

One sugession, do not used sed here.
Use languages like perl or python which gives modules for parsing HTML and XML.

You could do something like:
block=""
cat $file | sed -n '/<start>/,/<end>/p;' | while read -r line; do
if [ -z "$block" ]; then
block="$line"
else
block=$(printf "%s\\n%s" "$block" "$line")
fi
if printf "%s\\n" "$line" | grep "<end>" > /dev/null; then
echo "$block"
block=""
fi
done
As choroba said in his answer, your for loop will use the IFS variable to split sed's output into separate fields, and the block variable will contain only a single field. (Ie., block will contain <start>, then test, then var=3333, and so on).
A solution is to force it read line by line, by piping the output of sed into the loop command, and read the line using the read command. The -r flag for the read command forces it not to interpret the backslash as an escaping character. Now we have a variable $line with our line, but not the block. To get the block, simply concatenate the lines together until we find the <end> string.
If the $block variable is empty, we can simply assign the $line to it. Otherwise, we use the printf command to generate a new string containing the previous value of $block concatanated with a newline character and the contents of $line. This newline character prevents that the block will become a single line.
To test if we found the last line, we can print the current value of the block and see if grep finds it. I used printf because it's safer then echo when the string we want to print starts with a variable (we can't guarantee that the variable doesn't start with a hyphen, which echo could interpret as an option). We must also remember to clear the block variable when we actually find a block, in order to prepare it for the next block.

Word splitting is applied to the output of your sed command. You can set IFS to an empty value to prevent word splitting on the sed output, but it will make the whole output of sed into one "block". I would rather switch to a more powerful language like Perl.

This might work for you (GNU sed and bash):
OIFS=$IFS; IFS=$'\n'; block=($(sed '/<start>/,/<end>/!d' file)); IFS=$OIFS
for x in "${!block[#]}"; do echo "${block[x]}"; done
Slurp the sed command output into an array block and loop through the array.

By changing IFS and inserting a delimiter character in between your blocks, you can iterate through each block.
For example, use : as the delimiter
OLDIFS=$IFS; IFS=':'
blocks=$(sed -n '/start/,/end/ {/start/ s/^/:/; p}' file)
for block in ${blocks#:}; do
echo "This is block $((count++))"
echo "$block"
done
IFS=$OLDIFS
Note:
Blocks are 'separated' by inserting : before <start> and setting IFS to :
${blocks#:} removes the first :, otherwise :block1:block2... is interpreted as emptyblock:block1:block2..., i.e. the loop iterates over the non-existent first block (which is empty and exists due to how : is placed)
Alternatively, : can be placed behind <end> but then the last line of the block would become <end>:\n so there would be an extra newline before the start of next block.

Related

Make a list of all files in two folders then iterate through the combined list randomly

I have two directories with photos that I want to manipulate to output a random order of the files each time a script is run. How would I create such a list?
d1=/home/Photos/*.jpg
d2=/mnt/JillsPC/home/Photos/*.jpg
# somehow make a combined list, files = d1 + d2
# somehow randomise the file order
# during execution of the for;do;done loop, no file should be repeated
for f in $files; do
echo $f # full path to each file
done
I wouldn't use variables if you don't have to. It's more natural if you chain a couple of commands together with pipes or process substitution. That way everything operates on streams of data without loading the entire list of names into memory all at once.
You can use shuf to randomly permute input lines, and find to list files one per line. Or, to be maximally safe, let's use \0 separators. Finally, a while loop with process substitution reads line by line into a variable.
while IFS= read -d $'\0' -r file; do
echo "$file"
done < <(find /home/Photos/ /mnt/JillsPC/home/Photos/ -name '*.jpg' -print0 | shuf -z)
That said, if you do want to use some variables then you should use arrays. Arrays handle file names with whitespace and other special characters correctly, whereas regular string variables muck them all up.
d1=(/home/Photos/*.jpg)
d2=(/mnt/JillsPC/home/Photos/*.jpg)
files=("${d1[#]}" "${d2[#]}")
Iterating in order would be easy:
for file in "${files[#]}"; do
echo "$file"
done
Shuffling is tricky though. shuf is still the best tool but it works best on a stream of data. We can use printf to print each file name with the trailing \0 we need to make shuf -z happy.
d1=(/home/Photos/*.jpg)
d2=(/mnt/JillsPC/home/Photos/*.jpg)
files=("${d1[#]}" "${d2[#]}")
while IFS= read -d $'\0' -r file; do
echo "$file"
done < <(printf '%s\0' "${files[#]}" | shuf -z)
Further reading:
How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
How can I find and safely handle file names containing newlines, spaces or both?
I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
How can I randomize (shuffle) the order of lines in a file? Or select a random line from a file, or select a random file from a directory?
I came up with this solution after some more reading:
files=(/home/roy/Photos/*.jpg /mnt/JillsPC/home/jill/Photos/*.jpg)
printf '%s\n' "${files[#]}" | sort -R
Edit: updated with John's improvements from comments.
You can add any number of directories into an array declaration (though see caveat with complex names in comments).
sort -R seems to use shuf internally from looking at it's man page.
This was the original, which works, but is not as robust as the above:
files=(/home/roy/Photos/*.jpg /mnt/JillsPC/home/jill/Photos/*.jpg)
(IFS=$'\n'; echo "${files[*]}") | sort -R
With IFS=$'\n', echoing the array will display it line by line (IFS=$'somestring' is syntax for string literals with escape sequences. So unlike '\n', $'\n' is the correct way to set it to a line break). IFS is not needed when using the printf method above.
echo ${files[*]} will print out all array elements at once, using the IFS defined in

While loop reading only last line?

i want to print all array data of foo line by line but this loop is
printing last line of string of array it is not printing all line of
array variable please help.
foo=( $(grep name emp.txt) )
while read -r line ; do echo "$line"
done <<< ${foo[#]}
While David C. Rankin presented a working alternative, he chose to not explain why the original approach didn't work. See the Bash Reference Manual: Word Splitting:
The shell scans the results of parameter expansion, command
substitution, and arithmetic expansion that did not occur within
double quotes for word splitting.
So, you can make your approach work by using double quotes around the command substitution as well as the parameter expansion:
foo=("$(grep name emp.txt)")
while read -r line; do echo "$line"
done <<<"${foo[#]}"
Note that this assigns the whole grep output to the sole array element ${foo[0]}, i. e., we don't need an array at all and could use a simple variable foo just as well.
If you do want to read the grep output lines into an array with one line per element, then there's the Bash Builtin Command readarray:
< <(grep name emp.txt) readarray foo
This uses the expansion Process Substitution.
i want to replace some text can i use sed command in echo "$line"
Of course you can use echo "$line" | sed ….

Creating a Array from new lines out in bash

I am trying to make a array/list from a bash output and then I want to for loop it. I keep on getting Syntax error: "(" unexpected (expecting "done"). If I had to put it in python term, I want to break the string up by \n and then for loop it.
IFS=$'\n'
DELETE = ($($MEGACOPY --dryrun --reload --download --local $LOCALDIR --remote $REMOTEDIR | sed 's|F '$LOCALDIR'|'$REMOTEDIR'|g'))
unset IFS
# And remove it
for i in $DELETE; do
$MEGARM $i
done
First, shell is not python. Spaces around equal signs don't work:
DELETE = ($($MEGACOPY --dryrun --reload --download --local $LOCALDIR --remote $REMOTEDIR | sed 's|F '$LOCALDIR'|'$REMOTEDIR'|g'))
When the shell sees the above, it interprets DELETE as a program name and = as its first argument. The error that you see is because the shell was unable to parse the second argument.
Replace the above with:
DELETE=($("$MEGACOPY" --dryrun --reload --download --local "$LOCALDIR" --remote "$REMOTEDIR" | sed 's|F '"$LOCALDIR"'|'"$REMOTEDIR"'|g'))
Second, regarding the for loop, DELETE is an array and arrays have special syntax:
for i in "${DELETE[#]}"; do
"$MEGARM" "$i"
done
Notes:
Unless you want word splitting and pathname expansion, all shell variables should be inside double-quotes.
It is best practices to use lower or mixed case for variable names. The system uses all upper case variables for its name and you don't want to accidentally overwrite one of them.

How to create an array from the lines of a command's output

I have a file called failedfiles.txt with the following content:
failed1
failed2
failed3
I need to use grep to return the content on each line in that file, and save the output in a list to be accessed. So I want something like this:
temp_list=$(grep "[a-z]" failedfiles.txt)
However, the problem with this is that when I type
echo ${temp_list[0]}
I get the following output:
failed1 failed2 failed3
But what I want is when I do:
echo ${temp_list[0]}
to print
failed1
and when I do:
echo ${temp_list[1]}
to print
failed2
Thanks.
#devnull's helpful answer explains why your code didn't work as expected: command substitution always returns a single string (possibly composed of multiple lines).
However, simply putting (...) around a command substitution to create an array of lines will only work as expected if the lines output by the command do not have embedded spaces - otherwise, each individual (whitespace-separated) word will become its own array element.
Capturing command output lines at once, in an array:
To capture the lines output by an arbitrary command in an array, use the following:
bash < 4 (e.g., on OSX as of OS X 10.9.2): use read -a
IFS=$'\n' read -rd '' -a linesArray <<<"$(grep "[a-z]" failedfiles.txt)"
bash >= 4: use readarray:
readarray -t linesArray <<<"$(grep "[a-z]" failedfiles.txt)"
Note:
<<< initiates a so-called here-string, which pipes the string to its right (which happens to be the result of a command substitution here) into the command on the left via stdin.
While command <<< string is functionally equivalent to echo string | command in principle, the crucial difference is that the latter creates subshells, which make variable assignments in command pointless - they are localized to each subshell.
An alternative to combining here-strings with command substitution is [input] process substitution - <(...) - which, simply put, allows using a command's output as if it were an input file; the equivalent of <<<"$(command)" is < <(command).
read: -a reads into an array, and IFS=$'\n' ensures that every line is considered a separate field and thus read into its own array element; -d '' ensures that ALL lines are read at once (before breaking them into fields); -r turns interpretation of escape sequence in the input off.
readarray (also callable as mapfile) directly breaks input lines into an array of lines; -t ensures that the terminating \n is NOT included in the array elements.
Looping over command output lines:
If there is no need to capture all lines in an array at once and looping over a command's output line by line is sufficient, use the following:
while IFS= read -r line; do
# ...
done < <(grep "[a-z]" failedfiles.txt)
IFS= ensures that each line is read unmodified in terms of whitespace; remove it to have leading and trailing whitespace trimmed.
-r ensures that the lines are read 'raw' in that substrings in the input that look like escape sequences - e.g., \t - are NOT interpreted as such.
Note the use of [input] process substitution (explained above) to provide the command output as input to the read loop.
You did not create an array. What you did was Command Substitution which would simply put the output of a command into a variable.
In order to create an array, say:
temp_list=( $(grep "[a-z]" failedfiles.txt) )
You might also want to refer to Guide on Arrays.
The proper and portable way to loop over lines in a file is simply
while read -r line; do
... something with "$line"
done <failedfiles.txt

bash - variable storing multiple lines of file

This is my code:
grep $to_check $forbidden >${dir}variants_of_interest;
cat ${dir}variants_of_interest | (while read line; do
#process ${line} and echo result
done;
)
Thank to grep I get lines of data that I then process separately in loop. I would like to use variable instead of using file variants_of_interest.
Reason for this is that I am afraid that writing to file thousands of time (and consequently reading from it) rapidly slows down computation, so I am hoping that avoiding writing to file could help. What do you think?
I have to do thousands of grep commands and variants_of_interest contains up to 10 lines only.
Thanks for your suggestions.
You can just make grep pass its output directly to the loop:
grep "$to_check" "$forbidden" | while read line; do
#process "${line}" and echo result
done
I removed the explicit subshell in your example, since it is already in a separate one due to the piping. Also don't forget to quote the $line variable to prevent whitespace expansion on use.
You dont have to write a file. Simply iterate over the result of grep:
grep $to_check $forbidden | (while read line; do
#process ${line} and echo result
done;
)
This might work for you:
OIFS="$IFS"; IFS=$'\n'; lines=($(grep $to_check $forbidden)); IFS="$OIFS"
for line in "${lines[#]}"; do echo $(process ${line}); done
The first line places the results of the grep into the variable array lines.
The second line processes the array lines placing each line into the variable line

Resources