Bash arrays: appending and prepending to each element in array

Bash arrays: appending and prepending to each element in array - arrays

I'm trying to build a long command involving find. I have an array of directories that I want to ignore, and I want to format this directory into the command.
Basically, I want to transform this array:
declare -a ignore=(archive crl cfg)
into this:
-o -path "$dir/archive" -prune -o -path "$dir/crl" -prune -o -path "$dir/cfg" -prune
This way, I can simply add directories to the array, and the find command will adjust accordingly.
So far, I figured out how to prepend or append using
${ignore[#]/#/-o -path \"\$dir/}
${ignore[#]/%/\" -prune}
But I don't know how to combine these and simultaneously prepend and append to each element of an array.

You cannot do it simultaneously easily. Fortunately, you do not need to:
ignore=( archive crl cfg )
ignore=( "${ignore[#]/%/\" -prune}" )
ignore=( "${ignore[#]/#/-o -path \"\$dir/}" )
echo ${ignore[#]}
Note the parentheses and double quotes - they make sure the array contains three elements after each substitution, even if there are spaces involved.

Have a look at printf, which does the job as well:
printf -- '-o -path "$dir/%s" -prune ' ${ignore[#]}

In general, you should strive to always treat each variable in the quoted form (e.g. "${ignore[#]}") instead of trying to insert quotation marks yourself (just as you should use parameterized statements instead of escaping the input in SQL) because it's hard to be perfect by manual escaping; for example, suppose a variable contains a quotation mark.
In this regard, I would aim at crafting an array where each argument word for find becomes an element: ("-o" "-path" "$dir/archive" "-prune" "-o" "-path" "$dir/crl" "-prune" "-o" "-path" "$dir/cfg" "-prune") (a 12-element array).
Unfortunately, Bash doesn't seem to support a form of parameter expansion where each element expands to multiple words. (p{1,2,3}q expands to p1q p2q p3q, but with a=(1 2 3), p"${a[#]}"q expands to p1 2 3q.) So you need to resort to a loop:
declare -a args=()
for i in "${ignore[#]}"
do
args+=(-o -path "$dir/$i" -prune) # I'm not sure if you want to have
# $dir expanded at this point;
# otherwise, just use "\$dir/$i".
done
find ... "${args[#]}" ...

If I understand right,
declare -a ignore=(archive crl cfg)
a=$(echo ${ignore[#]} | xargs -n1 -I% echo -o -path '"$dir/%"' -prune)
echo $a
prints
-o -path "$dir/archive" -prune -o -path "$dir/crl" -prune -o -path "$dir/cfg" -prune
Works only with xargs what has the next switches:
-I replstr
Execute utility for each input line, replacing one or more occurrences of replstr in up to replacements
(or 5 if no -R flag is specified) arguments to utility with the entire line of input. The resulting
arguments, after replacement is done, will not be allowed to grow beyond 255 bytes; this is implemented
by concatenating as much of the argument containing replstr as possible, to the constructed arguments to
utility, up to 255 bytes. The 255 byte limit does not apply to arguments to utility which do not contain
replstr, and furthermore, no replacement will be done on utility itself. Implies -x.
-J replstr
If this option is specified, xargs will use the data read from standard input to replace the first occur-
rence of replstr instead of appending that data after all other arguments. This option will not affect
how many arguments will be read from input (-n), or the size of the command(s) xargs will generate (-s).
The option just moves where those arguments will be placed in the command(s) that are executed. The
replstr must show up as a distinct argument to xargs. It will not be recognized if, for instance, it is
in the middle of a quoted string. Furthermore, only the first occurrence of the replstr will be
replaced. For example, the following command will copy the list of files and directories which start
with an uppercase letter in the current directory to destdir:
/bin/ls -1d [A-Z]* | xargs -J % cp -rp % destdir

Related

bash array count always returns 1

I searched all over for this, but the terms are apparently too general. I'm writing a script to search a group of folders for .mp3 files. Some folders don't have mp3's so they have to be excluded.
I created an array to hold the uniq'd folder names. This find command will get the folders I need.
Folders=$(sudo find /my/music/ -type f -name "*.mp3" | cut -d'/' -f7 | sort -u)
When I try to count the number of folders in the array, I always get 1
echo ${#Folders[#]}
echo ${Folders[#]} prints them out on separate lines so I thought they were separate array elements. Can anyone explain what is going on? You might have to jiggle the field number in the cut command to reproduce locally.

Folders is not an array but a variable.
You need:
Folders=( $(sudo find /my/music/ -type f -name "*.mp3" | cut -d'/' -f7 | sort -u) )
i.e. enclose the command substitution with (). Now ${#Folders[#]} would give you the number of elements of array Folders.

Or do :
sudo find /my/music/ -type f -name "*.mp3" | cut -d'/' -f7 | sort -u | wc -l
Note
wc -l prints the number of lines which in this case would be the number of unique files
to make things a bit more explicit, use -printf "%p\n" option with find where %p specifier prints the file with full path.

Assuming bash 4 or later, don't use find here; use the globstar operator.
shopt -s globstar
folders=( /my/music/**/*.mp3 )
Also assuming that cut -d/ -f7 is supposed to extract the filename alone, follow this up with
folders=${folders[#]##*/}
Other methods for populating the array must take more care to accomodate files containing whitespace or characters like ?, *, or [. File names containing newlines (rare, but not illegal) are much more difficult to handle correctly. Pathname expansion is done inside the shell, so you don't need to worry about any such special characters.

Read filenames with embedded whitespace into an array in a shell script

Basically I'm searching for a multi-word file which is present in many directories using find command and the output is stored on to a variable vari
vari = `find -name "multi word file.xml"
When I try to delete the file using a for loop to iterate through.,
for file in ${vari[#]}
the execution fails saying.,
rm: cannot remove `/abc/xyz/multi':: No such file or directory
Could you guys please help me with this scenario??

If you really need to capture all file paths in an array up front (assumes bash, primarily due to use of arrays and process substitution (<(...))[1]; a POSIX-compliant solution would be more cumbersome[2]; also note that this is a line-based solution, so it won't handle filenames with embedded newlines correctly, but that's very rare in practice):
# Read matches into array `vari` - safely: no word splitting, no
# globbing. The only caveat is that filenames with *embedded* newlines
# won't be handled correctly, but that's rarely a concern.
# bash 4+:
readarray -t vari < <(find . -name "multi word file.xml")
# bash 3:
IFS=$'\n' read -r -d '' -a vari < <(find . -name "multi word file.xml")
# Invoke `rm` with all array elements:
rm "${vari[#]}" # !! The double quotes are crucial.
Otherwise, let find perform the deletion directly (these solutions also handle filenames with embedded newlines correctly):
find . -name "multi word file.xml" -delete
# If your `find` implementation doesn't support `-delete`:
find . -name "multi word file.xml" -exec rm {} +
As for what you tried:
vari=`find -name "multi word file.xml"` (I've removed the spaces around =, which would result in a syntax error) does not create an array; such a command substitution returns the stdout output from the enclosed command as a single string (with trailing newlines stripped).
By enclosing the command substitution in ( ... ), you could create an array:
vari=( `find -name "multi word file.xml"` ),
but that would perform word splitting on the find's output and not properly preserve filenames with spaces.
While this could be addressed with IFS=$'\n' so as to only split at line boundaries, the resulting tokens are still subject to pathname expansion (globbing), which can inadvertently alter the file paths.
While this could also be addressed with a shell option, you now have 2 settings you need to perform ahead of time and restore to their original value; thus, using readarray or read as demonstrated above is the simpler choice.
Even if you did manage to collect the file paths correctly in $vari as an array, referencing that array as ${vari[#]} - without double quotes - would break, because the resulting strings are again subject to word splitting, and also pathname expansion (globbing).
To safely expand an array to its elements without any interpretation of its elements, double-quote it: "${vari[#]}"
[1]
Process substitution rather than a pipeline is used so as to ensure that readarray / read is executed in the current shell rather than in a subshell.
As eckes points out in a comment, if you were to try find ... | IFS=$'\n' read ... instead, read would run in a subshell, which means that the variables it creates will disappear (go out of scope) when the command returns and cannot be used later.
[2]
The POSIX shell spec. supports neither arrays nor process substitution (nor readarray, nor any read options other than -r); you'd have to implement line-by-line processing as follows:
while IFS='
' read -r vari; do
pv vari
done <<EOF
$(find . -name "multi word file.xml")
EOF
Note the require actual newline between IFS=' and ' in order to assign a newline, given that the $'\n' syntax is not available.

Here are a few approaches:
# change the input field separator to a newline to ignore spaces
IFS=$'\n'
for file in $(find . -name '* *.xml'); do
ls "$file"
done
# pipe find result lines to a while loop
IFS=
find . -name '* *.xml' | while read -r file; do
ls "$file"
done
# feed the while loop with process substitution
IFS=
while read -r file; do
ls "$file"
done < <(find . -name '* *.xml')
When you're satisfied with the results, replace ls with rm.

The solutions are all line-based solutions. There is a test environment at bottom for which there is no known solution.
As already written, the file could be removed with this tested command:
$ find . -name "multi word file".xml -exec rm {} +
I did not manage to use rm command with a variable when the path or filename contains \n.
Test environment:
$ mkdir "$(printf "\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\40\41\42\43\44\45\46\47testdir" "")"
$ touch "multi word file".xml
$ mv *xml *testdir/
$ touch "2nd multi word file".xml ; mv *xml *testdir
$ ls -b
\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\ !"#$%&'testdir
$ ls -b *testdir
2nd\ multi\ word\ file.xml multi\ word\ file.xml

How do I store the output from a find command in an array? + bash

I have the following find command with the following output:
$ find -name '*.jpg'
./public_html/github/screencasts-gh-pages/reactiveDataVis/presentation/images/telescope.jpg
./public_html/github/screencasts-gh-pages/introToBackbone/presentation/images/telescope.jpg
./public_html/github/StarCraft-master/img/Maps/(6)Thin Ice.jpg
./public_html/github/StarCraft-master/img/Maps/Snapshot.jpg
./public_html/github/StarCraft-master/img/Maps/Map_Grass.jpg
./public_html/github/StarCraft-master/img/Maps/(8)TheHunters.jpg
./public_html/github/StarCraft-master/img/Maps/(2)Volcanis.jpg
./public_html/github/StarCraft-master/img/Maps/(3)Trench wars.jpg
./public_html/github/StarCraft-master/img/Maps/(8)BigGameHunters.jpg
./public_html/github/StarCraft-master/img/Maps/(8)Turbo.jpg
./public_html/github/StarCraft-master/img/Maps/(4)Blood Bath.jpg
./public_html/github/StarCraft-master/img/Maps/(2)Switchback.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(6)Thin Ice.jpg
./public_html/github/StarCraft-master/img/Maps/Original/Map_Grass.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)TheHunters.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(2)Volcanis.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(3)Trench wars.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)BigGameHunters.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)Turbo.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(4)Blood Bath.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(2)Switchback.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(4)Orbital Relay.jpg
./public_html/github/StarCraft-master/img/Maps/(4)Orbital Relay.jpg
./public_html/github/StarCraft-master/img/Bg/GameLose.jpg
./public_html/github/StarCraft-master/img/Bg/GameWin.jpg
./public_html/github/StarCraft-master/img/Bg/GameStart.jpg
./public_html/github/StarCraft-master/img/Bg/GamePlay.jpg
./public_html/github/StarCraft-master/img/Demo/Demo.jpg
./public_html/github/flot/examples/image/hs-2004-27-a-large-web.jpg
./public_html/github/minicourse-ajax-project/other/GameLose.jpg
How do I store this output in an array? I want it to handle filenames with spaces
I have tried this arrayname=($(find -name '*.jpg')) but this just stores the first element. # I am doing the following which seems to be just the first element?
$ arrayname=($(find -name '*.jpg'))
$ echo "$arrayname"
./public_html/github/screencasts-gh-pages/reactiveDataVis/presentation/images/telescope.jpg
$
I have tried here but again this just stores the 1st element
Other similar Qs
How do I capture the output from the ls or find command to store all file names in an array?
How do i store the output of a bash command in a variable?

If you know with certainty that your filenames will not contain newlines, then
mapfile -t arrayname < <(find ...)
If you want to be able to handle any file
arrayname=()
while IFS= read -d '' -r filename; do
arrayname+=("$filename")
done < <(find ... -print0)
echo "$arrayname" will only show the first element of the array. It is equivalent to echo "${arrayname[0]}". To dump an array:
printf "%s\n" "${arrayname[#]}"
# ............^^^^^^^^^^^^^^^^^ must use exactly this form, with the quotes.
arrayname=($(find ...)) is still wrong. It will store the file ./file with spaces.txt as 3 separate elements in the array.

If you have a sufficiently recent version of bash, you can save yourself a lot of trouble by just using a ** glob.
shopt -s globstar
files=(**/*.jpg)
The first line enables the feature. Once enabled, ** in a glob pattern will match any number (including 0) of directories in the path.
Using the glob in the array definition makes sure that whitespace is handled correctly.
To view an array in a form which could be used to define the array, use the -p (print) option to the declare builtin:
declare -p files

How can I store the "find" command results as an array in Bash

I am trying to save the result from find as arrays.
Here is my code:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=`find . -name ${input}`
len=${#array[*]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
I get 2 .txt files under current directory.
So I expect '2' as result of ${len}. However, it prints 1.
The reason is that it takes all result of find as one elements.
How can I fix this?
P.S
I found several solutions on StackOverFlow about a similar problem. However, they are a little bit different so I can't apply in my case. I need to store the results in a variable before the loop. Thanks again.

Update 2020 for Linux Users:
If you have an up-to-date version of bash (4.4-alpha or better), as you probably do if you are on Linux, then you should be using Benjamin W.'s answer.
If you are on Mac OS, which —last I checked— still used bash 3.2, or are otherwise using an older bash, then continue on to the next section.
Answer for bash 4.3 or earlier
Here is one solution for getting the output of find into a bash array:
array=()
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done < <(find . -name "${input}" -print0)
This is tricky because, in general, file names can have spaces, new lines, and other script-hostile characters. The only way to use find and have the file names safely separated from each other is to use -print0 which prints the file names separated with a null character. This would not be much of an inconvenience if bash's readarray/mapfile functions supported null-separated strings but they don't. Bash's read does and that leads us to the loop above.
[This answer was originally written in 2014. If you have a recent version of bash, please see the update below.]
How it works
The first line creates an empty array: array=()
Every time that the read statement is executed, a null-separated file name is read from standard input. The -r option tells read to leave backslash characters alone. The -d $'\0' tells read that the input will be null-separated. Since we omit the name to read, the shell puts the input into the default name: REPLY.
The array+=("$REPLY") statement appends the new file name to the array array.
The final line combines redirection and command substitution to provide the output of find to the standard input of the while loop.
Why use process substitution?
If we didn't use process substitution, the loop could be written as:
array=()
find . -name "${input}" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done <tmpfile
rm -f tmpfile
In the above the output of find is stored in a temporary file and that file is used as standard input to the while loop. The idea of process substitution is to make such temporary files unnecessary. So, instead of having the while loop get its stdin from tmpfile, we can have it get its stdin from <(find . -name ${input} -print0).
Process substitution is widely useful. In many places where a command wants to read from a file, you can specify process substitution, <(...), instead of a file name. There is an analogous form, >(...), that can be used in place of a file name where the command wants to write to the file.
Like arrays, process substitution is a feature of bash and other advanced shells. It is not part of the POSIX standard.
Alternative: lastpipe
If desired, lastpipe can be used instead of process substitution (hat tip: Caesar):
set +m
shopt -s lastpipe
array=()
find . -name "${input}" -print0 | while IFS= read -r -d $'\0'; do array+=("$REPLY"); done; declare -p array
shopt -s lastpipe tells bash to run the last command in the pipeline in the current shell (not the background). This way, the array remains in existence after the pipeline completes. Because lastpipe only takes effect if job control is turned off, we run set +m. (In a script, as opposed to the command line, job control is off by default.)
Additional notes
The following command creates a shell variable, not a shell array:
array=`find . -name "${input}"`
If you wanted to create an array, you would need to put parens around the output of find. So, naively, one could:
array=(`find . -name "${input}"`) # don't do this
The problem is that the shell performs word splitting on the results of find so that the elements of the array are not guaranteed to be what you want.
Update 2019
Starting with version 4.4-alpha, bash now supports a -d option so that the above loop is no longer necessary. Instead, one can use:
mapfile -d $'\0' array < <(find . -name "${input}" -print0)
For more information on this, please see (and upvote) Benjamin W.'s answer.

Bash 4.4 introduced a -d option to readarray/mapfile, so this can now be solved with
readarray -d '' array < <(find . -name "$input" -print0)
for a method that works with arbitrary filenames including blanks, newlines, and globbing characters. This requires that your find supports -print0, as for example GNU find does.
From the manual (omitting other options):
mapfile [-d delim] [array]
-d
The first character of delim is used to terminate each input line, rather than newline. If delim is the empty string, mapfile will terminate a line when it reads a NUL character.
And readarray is just a synonym of mapfile.

The following appears to work for both Bash and Z Shell on macOS.
#! /bin/sh
IFS=$'\n'
paths=($(find . -name "foo"))
unset IFS
printf "%s\n" "${paths[#]}"

If you are using bash 4 or later, you can replace your use of find with
shopt -s globstar nullglob
array=( **/*"$input"* )
The ** pattern enabled by globstar matches 0 or more directories, allowing the pattern to match to an arbitrary depth in the current directory. Without the nullglob option, the pattern (after parameter expansion) is treated literally, so with no matches you would have an array with a single string rather than an empty array.
Add the dotglob option to the first line as well if you want to traverse hidden directories (like .ssh) and match hidden files (like .bashrc) as well.

you can try something like
array=(`find . -type f | sort -r | head -2`) , and in order to print the array values , you can try something like echo "${array[*]}"

None of these solutions suited me because I didn't feel like learning readarray and mapfile. Here is what I came up with.
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
# The only change is here. Append to array for each non-empty line.
array=()
while read line; do
[[ ! -z "$line" ]] && array+=("$line")
done; <<< $(find . -name ${input} -print)
len=${#array[#]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done

You could do like this:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=(`find . -name '*'${input}'*'`)
for i in "${array[#]}"
do :
echo $i
done

In bash, $(<any_shell_cmd>) helps to run a command and capture the output. Passing this to IFS with \n as delimiter helps to convert that to an array.
IFS='\n' read -r -a txt_files <<< $(find /path/to/dir -name "*.txt")

script for getting extensions of a file

I need to get all the file extension types in a folder. For instance, if the directory's ls gives the following:
a.t
b.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
I should get this by running the script
.t
.t.pg
.bin
.old
.txt
I have a bash shell.
Thanks a lot!

See the BashFAQ entry on ParsingLS for a description of why many of these answers are evil.
The following approach avoids this pitfall (and, by the way, completely ignores files with no extension):
shopt -s nullglob
for f in *.*; do
printf '%s\n' ".${f#*.}"
done | sort -u
Among the advantages:
Correctness: ls behaves inconsistently and can result in inappropriate results. See the link at the top.
Efficiency: Minimizes the number of subprocess invoked (only one, sort -u, and that could be removed also if we wanted to use Bash 4's associative arrays to store results)
Things that still could be improved:
Correctness: this will correctly discard newlines in filenames before the first . (which some other answers won't) -- but filenames with newlines after the first . will be treated as separate entries by sort. This could be fixed by using nulls as the delimiter, or by the aforementioned bash 4 associative-array storage approach.

try this:
ls -1 | sed 's/^[^.]*\(\..*\)$/\1/' | sort -u
ls lists files in your folder, one file per line
sed magic extracts extensions
sort -u sorts extensions and removes duplicates
sed magic reads as:
s/ / /: substitutes whatever is between first and second / by whatever is between second and third /
^: match beginning of line
[^.]: match any character that is not a dot
*: match it as many times as possible
\( and \): remember whatever is matched between these two parentheses
\.: match a dot
.: match any character
*: match it as many times as possible
$: match end of line
\1: this is what has been matched between parentheses

People are really over-complicating this - particularly the regex:
ls | grep -o "\..*" | uniq
ls - get all the files
grep -o "\..*" - -o only show the match; "\..*" match at the first "." & everything after it
uniq - don't print duplicates but keep the same order
you can also sort if you like, but sorting doesn't match the example
This is what happens when you run it:
> ls -1
a.t
a.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
> ls | grep -o "\..*" | uniq
.t
.t.pg
.bin
.old
.txt

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight