Append to an array variable from a pipeline command - arrays

I am writing a bash function to get all git repositories, but I have met a problem when I want to store all the git repository pathnames to the array patharray. Here is the code:
gitrepo() {
local opt
declare -a patharray
locate -b '\.git' | \
while read pathname
do
pathname="$(dirname ${pathname})"
if [[ "${pathname}" != *.* ]]; then
# Note: how to add an element to an existing Bash Array
patharray=("${patharray[#]}" '\n' "${pathname}")
# echo -e ${patharray[#]}
fi
done
echo -e ${patharray[#]}
}
I want to save all the repository paths to the patharray array, but I can't get it outside the pipeline which is comprised of locate and while command.
But I can get the array in the pipeline command, the commented command # echo -e ${patharray[#]} works well if uncommented, so how can I solve the problem?
And I have tried the export command, however it seems that it can't pass the patharray to the pipeline.

Bash runs all commands of a pipeline in separate SubShells. When a subshell containing a while loop ends, all changes you made to the patharray variable are lost.
You can simply group the while loop and the echo statement together so they are both contained within the same subshell:
gitrepo() {
local pathname dir
local -a patharray
locate -b '\.git' | { # the grouping begins here
while read pathname; do
pathname=$(dirname "$pathname")
if [[ "$pathname" != *.* ]]; then
patharray+=( "$pathname" ) # add the element to the array
fi
done
printf "%s\n" "${patharray[#]}" # all those quotes are needed
} # the grouping ends here
}
Alternately, you can structure your code to not need a pipe: use ProcessSubstitution
( Also see the Bash manual for details - man bash | less +/Process\ Substitution):
gitrepo() {
local pathname dir
local -a patharray
while read pathname; do
pathname=$(dirname "$pathname")
if [[ "$pathname" != *.* ]]; then
patharray+=( "$pathname" ) # add the element to the array
fi
done < <(locate -b '\.git')
printf "%s\n" "${patharray[#]}" # all those quotes are needed
}

First of all, appending to an array variable is better done with array[${#array[*]}]="value" or array+=("value1" "value2" "etc") unless you wish to transform the entire array (which you don't).
Now, since pipeline commands are run in subprocesses, changes made to a variable inside a pipeline command will not propagate to outside it. There are a few options to get around this (most are listed in Greg's BashFAQ/024):
pass the result through stdout instead
the simplest; you'll need to do that anyway to get the value from the function (although there are ways to return a proper variable)
any special characters in paths can be handled reliably by using \0 as a separator (see Capturing output of find . -print0 into a bash array for reading \0-separated lists)
locate -b0 '\.git' | while read -r -d '' pathname; do dirname -z "$pathname"; done
or simply
locate -b0 '\.git' | xargs -0 dirname -z
avoid running the loop in a subprocess
avoid pipeline at all
temporary file/FIFO (bad: requires manual cleanup, accessible to others)
temporary variable (mediocre: unnecessary memory overhead)
process substitution (a special, syntax-supported case of FIFO, doesn't require manual cleanup; code adapted from Greg's BashFAQ/020):
i=0 #`unset i` will error on `i' usage if the `nounset` option is set
while IFS= read -r -d $'\0' file; do
patharray[i++]="$(dirname "$file")" # or however you want to process each file
done < <(locate -b0 '\.git')
use the lastpipe option (new in Bash 4.2) - doesn't run the last command of a pipeline in a subprocess (mediocre: has global effect)

Related

Bash: how to print and run a cmd array which has the pipe operator, |, in it

This is a follow-up to my question here: How to write bash function to print and run command when the command has arguments with spaces or things to be expanded
Suppose I have this function to print and run a command stored in an array:
# Print and run the cmd stored in the passed-in array
print_and_run() {
echo "Running cmd: $*"
# run the command by calling all elements of the command array at once
"$#"
}
This works fine:
cmd_array=(ls -a /)
print_and_run "${cmd_array[#]}"
But this does NOT work:
cmd_array=(ls -a / | grep "home")
print_and_run "${cmd_array[#]}"
Error: syntax error near unexpected token `|':
eRCaGuy_hello_world/bash$ ./print_and_run.sh
./print_and_run.sh: line 55: syntax error near unexpected token `|'
./print_and_run.sh: line 55: `cmd_array=(ls -a / | grep "home")'
How can I get this concept to work with the pipe operator (|) in the command?
If you want to treat an array element containing only | as an instruction to generate a pipeline, you can do that. I don't recommend it -- it means you have security risk if you don't verify that variables into your string can't consist only of a single pipe character -- but it's possible.
Below, we create a random single-use "$pipe" sigil to make that attack harder. If you're unwilling to do that, change [[ $arg = "$pipe" ]] to [[ $arg = "|" ]].
# generate something random to make an attacker's job harder
pipe=$(uuidgen)
# use that randomly-generated sigil in place of | in our array
cmd_array=(
ls -a /
"$pipe" grep "home"
)
exec_array_pipe() {
local arg cmd_q
local -a cmd=( )
while (( $# )); do
arg=$1; shift
if [[ $arg = "$pipe" ]]; then
# log an eval-safe copy of what we're about to run
printf -v cmd_q '%q ' "${cmd[#]}"
echo "Starting pipeline component: $cmd_q" >&2
# Recurse into a new copy of ourselves as a child process
"${cmd[#]}" | exec_array_pipe "$#"
return
fi
cmd+=( "$arg" )
done
printf -v cmd_q '%q ' "${cmd[#]}"
echo "Starting pipeline component: $cmd_q" >&2
"${cmd[#]}"
}
exec_array_pipe "${cmd_array[#]}"
See this running in an online sandbox at https://ideone.com/IWOTfO
Do this instead. It works.
print_and_run() {
echo "Running cmd: $1"
eval "$1"
}
Example usage:
cmd='ls -a / | grep -C 9999 --color=always "home"'
print_and_run "$cmd"
Output:
Running cmd: ls -a / | grep -C 9999 --color=always "home"
(rest of output here, with the word "home" highlighted in red)
The general direction is that you don't. You do not store the whole command line to be printed later, and this is not the direction you should take.
The "bad" solution is to use eval.
The "good" solution is to store the literal '|' character inside the array (or some better representation of it) and parse the array, extract the pipe parts and execute them. This is presented by Charles in the other amazing answer. It is just rewriting the parser that already exists in the shell. It requires significant work, and expanding it will require significant work.
The end result is, is that you are reimplementing parts of shell inside shell. Basically writing a shell interpreter in shell. At this point, you can just consider taking Bash sources and implementing a new shopt -o print_the_command_before_executing option in the sources, which might just be simpler.
However, I believe the end goal is to give users a way to see what is being executed. I would propose to approach it like .gitlab-ci.yml does with script: statements. If you want to invent your own language with "debug" support, do just that instead of half-measures. Consider the following YAML file:
- ls -a / | grep "home"
- echo other commands
- for i in "stuff"; do
echo "$i";
done
- |
for i in "stuff"; do
echo "$i"
done
Then the following "runner":
import yaml
import shlex
import os
import sys
script = []
input = yaml.safe_load(open(sys.argv[1], "r"))
for line in input:
script += [
"echo + " + shlex.quote(line).replace("\n", "<newline>"), # some unicode like ␤ would look nice
line,
]
os.execvp("bash", ["bash", "-c", "\n".join(script)])
Executing the runner results in:
+ ls -a / | grep "home"
home
+ echo other commands
other commands
+ for i in "stuff"; do echo "$i"; done
stuff
+ for i in "stuff"; do<newline> echo "$i"<newline>done<newline>
stuff
This offers greater flexibility and is rather simple, supports any shell construct with ease. You can try gitlab-ci/cd on their repository and read the docs.
The YAML format is only an example of the input format. Using special comments like # --- cut --- between parts and extracting each part with the parser will allow running shellcheck over the script. Instead of generating a script with echo statements, you could run Bash interactively, print the part to be executed and then "feed" the part to be executed to interactive Bash. This will alow to preserve $?.
Either way - with a "good" solution, you end up with a custom parser.
Instead of passing an array, you can pass the whole function and use the output of declare -f with some custom parsing:
print_and_run() {
echo "+ $(
declare -f "$1" |
# Remove `f() {` and `}`. Remove indentation.
sed '1d;2d;$d;s/^ *//' |
# Replace newlines with <newline>.
sed -z 's/\n*$//;s/\n/<newline>/'
)"
"$#"
}
cmd() { ls -a / | grep "home"; }
print_and_run cmd
Results in:
+ ls --color -F -a / | grep "home"
home/
It will allow for supporting any shell construct and still allow you to check it with shellcheck and doesn't require that much work.

Creating an array from JSON data of file/directory locations; bash not registering if element is a directory

I have a JSON file which holds data like this: "path/to/git/directory/location": "path/to/local/location". A minimum example of the file might be this:
{
"${HOME}/dotfiles/.bashrc": "${HOME}/.bashrc",
"${HOME}/dotfiles/.atom/": "${HOME}/.atom/"
}
I have a script that systematically reads the above JSON (called locations.json) and creates an array, and then prints elements of the array that are directories. MWE:
#!/usr/bin/env bash
unset sysarray
declare -A sysarray
while IFS=: read -r field data
do
sysarray["${field}"]="${data}"
done <<< $(sed '/^[{}]$/d;s/\s*"/"/g;s/,$//' locations.json)
for file in "${sysarray[#]}"
do
if [ -d "${file}" ]
then
echo "${file}"
fi
done
However, this does not print the directory (i.e., ${HOME}/.atom).
I don't understand why this is happening, because
I have tried creating an array manually (i.e., not from a JSON) and checking if its elements are directories, and that works fine.
I have tried echoing each element in the array into a temporary file and reading each line in the file to see if it was a product of how the information was stored in the array, but no luck.
I have tried adding | tr -d "[:blank:]" | tr -d '\"' after using sed on the JSON (to see if it was a product of unintended whitespace or quotes), but no luck.
I have tried simply running [ -d "${HOME}/.atom/" ] && echo '.atom is a directory', and that works (so indeed it is a directory). I'm unsure what might be causing this.
Help on this would be great!
You could use a tool to process json files properly, which will deal with any valid json.
#!/usr/bin/env bash
unset sysarray
declare -A sysarray
while IFS="=" read -r field data
do
sysarray["${field}"]=$(eval echo "${data}")
done <<< $(jq -r 'keys[] as $k | "\($k)=\(.[$k])"' locations.json)
for file in "${sysarray[#]}"
do
if [ -d "${file}" ]
then
echo "${file}"
fi
done
Another problem is that, once the extra quote signs are properly processed, we have a literal ${HOME} that is not expanded. The only solution I came is using eval to force the expansion. It is not the nicest way, but right now I cannot find a better solution.

Saving egrep output containing '*' to bash array

I would like to save egrep output to a bash array:
arr=( $(egrep -Rn 'regex') )
If there so happens to be a '*' in the egrep result, it appears like bash is expanding the '*' to be all files in current directory. And the expansion plus results of egrep are then saved into arr.
How do I fix this? I want the '*' in the grep results to be unaltered.
To use the idiom attempted in the question "correctly" might look something like this:
# DON'T DO THIS.
set -f # turn off globbing
IFS=$'\n' # word-split only on newlines
arr=( $(...) ) # populate array
unset IFS # return IFS to defaults (assuming it was there before)
set +f # turn globbing back on
Obviously, there's a lot of room to get this wrong and leave your shell in a state other than the way it started (What if your script had a different initial IFS value? What if this code is sourced from a script that wants globbing to be disabled to work correctly?). Don't do it.
One approach, compatible with bash 3.x, is to use read -a (reading into an array) with IFS (used to separate fields) containing a newline, and -d (used to separate records) set to a NUL:
IFS=$'\n' read -r -d '' -a arr < <(egrep -Rn 'regex' && printf '\0')
The trailing NUL added to the input is present to ensure that read exits successfully; otherwise, this could trigger an abrupt exit if using set -e.
A longer but more explicit approach is to do the iteration yourself:
arr=( )
while IFS= read -r; do
arr+=( "$REPLY" )
done < <(egrep -Rn 'regex')
Another, using bash 4.x features (readarray, AKA mapfile):
readarray -t arr < <(egrep -Rn 'regex')

How can I store the "find" command results as an array in Bash

I am trying to save the result from find as arrays.
Here is my code:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=`find . -name ${input}`
len=${#array[*]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
I get 2 .txt files under current directory.
So I expect '2' as result of ${len}. However, it prints 1.
The reason is that it takes all result of find as one elements.
How can I fix this?
P.S
I found several solutions on StackOverFlow about a similar problem. However, they are a little bit different so I can't apply in my case. I need to store the results in a variable before the loop. Thanks again.
Update 2020 for Linux Users:
If you have an up-to-date version of bash (4.4-alpha or better), as you probably do if you are on Linux, then you should be using Benjamin W.'s answer.
If you are on Mac OS, which —last I checked— still used bash 3.2, or are otherwise using an older bash, then continue on to the next section.
Answer for bash 4.3 or earlier
Here is one solution for getting the output of find into a bash array:
array=()
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done < <(find . -name "${input}" -print0)
This is tricky because, in general, file names can have spaces, new lines, and other script-hostile characters. The only way to use find and have the file names safely separated from each other is to use -print0 which prints the file names separated with a null character. This would not be much of an inconvenience if bash's readarray/mapfile functions supported null-separated strings but they don't. Bash's read does and that leads us to the loop above.
[This answer was originally written in 2014. If you have a recent version of bash, please see the update below.]
How it works
The first line creates an empty array: array=()
Every time that the read statement is executed, a null-separated file name is read from standard input. The -r option tells read to leave backslash characters alone. The -d $'\0' tells read that the input will be null-separated. Since we omit the name to read, the shell puts the input into the default name: REPLY.
The array+=("$REPLY") statement appends the new file name to the array array.
The final line combines redirection and command substitution to provide the output of find to the standard input of the while loop.
Why use process substitution?
If we didn't use process substitution, the loop could be written as:
array=()
find . -name "${input}" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done <tmpfile
rm -f tmpfile
In the above the output of find is stored in a temporary file and that file is used as standard input to the while loop. The idea of process substitution is to make such temporary files unnecessary. So, instead of having the while loop get its stdin from tmpfile, we can have it get its stdin from <(find . -name ${input} -print0).
Process substitution is widely useful. In many places where a command wants to read from a file, you can specify process substitution, <(...), instead of a file name. There is an analogous form, >(...), that can be used in place of a file name where the command wants to write to the file.
Like arrays, process substitution is a feature of bash and other advanced shells. It is not part of the POSIX standard.
Alternative: lastpipe
If desired, lastpipe can be used instead of process substitution (hat tip: Caesar):
set +m
shopt -s lastpipe
array=()
find . -name "${input}" -print0 | while IFS= read -r -d $'\0'; do array+=("$REPLY"); done; declare -p array
shopt -s lastpipe tells bash to run the last command in the pipeline in the current shell (not the background). This way, the array remains in existence after the pipeline completes. Because lastpipe only takes effect if job control is turned off, we run set +m. (In a script, as opposed to the command line, job control is off by default.)
Additional notes
The following command creates a shell variable, not a shell array:
array=`find . -name "${input}"`
If you wanted to create an array, you would need to put parens around the output of find. So, naively, one could:
array=(`find . -name "${input}"`) # don't do this
The problem is that the shell performs word splitting on the results of find so that the elements of the array are not guaranteed to be what you want.
Update 2019
Starting with version 4.4-alpha, bash now supports a -d option so that the above loop is no longer necessary. Instead, one can use:
mapfile -d $'\0' array < <(find . -name "${input}" -print0)
For more information on this, please see (and upvote) Benjamin W.'s answer.
Bash 4.4 introduced a -d option to readarray/mapfile, so this can now be solved with
readarray -d '' array < <(find . -name "$input" -print0)
for a method that works with arbitrary filenames including blanks, newlines, and globbing characters. This requires that your find supports -print0, as for example GNU find does.
From the manual (omitting other options):
mapfile [-d delim] [array]
-d
The first character of delim is used to terminate each input line, rather than newline. If delim is the empty string, mapfile will terminate a line when it reads a NUL character.
And readarray is just a synonym of mapfile.
The following appears to work for both Bash and Z Shell on macOS.
#! /bin/sh
IFS=$'\n'
paths=($(find . -name "foo"))
unset IFS
printf "%s\n" "${paths[#]}"
If you are using bash 4 or later, you can replace your use of find with
shopt -s globstar nullglob
array=( **/*"$input"* )
The ** pattern enabled by globstar matches 0 or more directories, allowing the pattern to match to an arbitrary depth in the current directory. Without the nullglob option, the pattern (after parameter expansion) is treated literally, so with no matches you would have an array with a single string rather than an empty array.
Add the dotglob option to the first line as well if you want to traverse hidden directories (like .ssh) and match hidden files (like .bashrc) as well.
you can try something like
array=(`find . -type f | sort -r | head -2`) , and in order to print the array values , you can try something like echo "${array[*]}"
None of these solutions suited me because I didn't feel like learning readarray and mapfile. Here is what I came up with.
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
# The only change is here. Append to array for each non-empty line.
array=()
while read line; do
[[ ! -z "$line" ]] && array+=("$line")
done; <<< $(find . -name ${input} -print)
len=${#array[#]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
You could do like this:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=(`find . -name '*'${input}'*'`)
for i in "${array[#]}"
do :
echo $i
done
In bash, $(<any_shell_cmd>) helps to run a command and capture the output. Passing this to IFS with \n as delimiter helps to convert that to an array.
IFS='\n' read -r -a txt_files <<< $(find /path/to/dir -name "*.txt")

Problem with appending to bash arrays

I'm trying to create an alias that will get all "Modified" files and run the php syntax check on them...
function gitphpcheck () {
filearray=()
git diff --name-status | while read line; do
if [[ $line =~ ^M ]]
then
filename="`echo $line | awk '{ print $2 }'`"
echo "$filename" # correct output
filearray+=($filename)
fi
done
echo "--------------FILES"
echo ${filearray[#]}
# will do php check here, but echo of array is blank
}
As Wrikken says, the while body runs in a subshell, so all changes to the filearray array will disappear when the subshell ends. A couple of different solutions come to mind:
Process substitution (less readable but does not require a subshell)
while read line; do
:
done < <(git diff --name-status)
echo "${filearray[#]}"
Use the modified variable in the subshell using command grouping
git diff --name-status | {
while read line; do
:
done
echo "${filearray[#]}"
}
# filearray is empty here
You've piped | things to while, which is essentially another process, so the filearray variable is a different one (not the same scope).

Resources