Read filenames with embedded whitespace into an array in a shell script

Read filenames with embedded whitespace into an array in a shell script - arrays

Basically I'm searching for a multi-word file which is present in many directories using find command and the output is stored on to a variable vari
vari = `find -name "multi word file.xml"
When I try to delete the file using a for loop to iterate through.,
for file in ${vari[#]}
the execution fails saying.,
rm: cannot remove `/abc/xyz/multi':: No such file or directory
Could you guys please help me with this scenario??

If you really need to capture all file paths in an array up front (assumes bash, primarily due to use of arrays and process substitution (<(...))[1]; a POSIX-compliant solution would be more cumbersome[2]; also note that this is a line-based solution, so it won't handle filenames with embedded newlines correctly, but that's very rare in practice):
# Read matches into array `vari` - safely: no word splitting, no
# globbing. The only caveat is that filenames with *embedded* newlines
# won't be handled correctly, but that's rarely a concern.
# bash 4+:
readarray -t vari < <(find . -name "multi word file.xml")
# bash 3:
IFS=$'\n' read -r -d '' -a vari < <(find . -name "multi word file.xml")
# Invoke `rm` with all array elements:
rm "${vari[#]}" # !! The double quotes are crucial.
Otherwise, let find perform the deletion directly (these solutions also handle filenames with embedded newlines correctly):
find . -name "multi word file.xml" -delete
# If your `find` implementation doesn't support `-delete`:
find . -name "multi word file.xml" -exec rm {} +
As for what you tried:
vari=`find -name "multi word file.xml"` (I've removed the spaces around =, which would result in a syntax error) does not create an array; such a command substitution returns the stdout output from the enclosed command as a single string (with trailing newlines stripped).
By enclosing the command substitution in ( ... ), you could create an array:
vari=( `find -name "multi word file.xml"` ),
but that would perform word splitting on the find's output and not properly preserve filenames with spaces.
While this could be addressed with IFS=$'\n' so as to only split at line boundaries, the resulting tokens are still subject to pathname expansion (globbing), which can inadvertently alter the file paths.
While this could also be addressed with a shell option, you now have 2 settings you need to perform ahead of time and restore to their original value; thus, using readarray or read as demonstrated above is the simpler choice.
Even if you did manage to collect the file paths correctly in $vari as an array, referencing that array as ${vari[#]} - without double quotes - would break, because the resulting strings are again subject to word splitting, and also pathname expansion (globbing).
To safely expand an array to its elements without any interpretation of its elements, double-quote it: "${vari[#]}"
[1]
Process substitution rather than a pipeline is used so as to ensure that readarray / read is executed in the current shell rather than in a subshell.
As eckes points out in a comment, if you were to try find ... | IFS=$'\n' read ... instead, read would run in a subshell, which means that the variables it creates will disappear (go out of scope) when the command returns and cannot be used later.
[2]
The POSIX shell spec. supports neither arrays nor process substitution (nor readarray, nor any read options other than -r); you'd have to implement line-by-line processing as follows:
while IFS='
' read -r vari; do
pv vari
done <<EOF
$(find . -name "multi word file.xml")
EOF
Note the require actual newline between IFS=' and ' in order to assign a newline, given that the $'\n' syntax is not available.

Here are a few approaches:
# change the input field separator to a newline to ignore spaces
IFS=$'\n'
for file in $(find . -name '* *.xml'); do
ls "$file"
done
# pipe find result lines to a while loop
IFS=
find . -name '* *.xml' | while read -r file; do
ls "$file"
done
# feed the while loop with process substitution
IFS=
while read -r file; do
ls "$file"
done < <(find . -name '* *.xml')
When you're satisfied with the results, replace ls with rm.

The solutions are all line-based solutions. There is a test environment at bottom for which there is no known solution.
As already written, the file could be removed with this tested command:
$ find . -name "multi word file".xml -exec rm {} +
I did not manage to use rm command with a variable when the path or filename contains \n.
Test environment:
$ mkdir "$(printf "\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\40\41\42\43\44\45\46\47testdir" "")"
$ touch "multi word file".xml
$ mv *xml *testdir/
$ touch "2nd multi word file".xml ; mv *xml *testdir
$ ls -b
\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\ !"#$%&'testdir
$ ls -b *testdir
2nd\ multi\ word\ file.xml multi\ word\ file.xml

Related

Assign output from find without word splitting

While running the bash command
myarray="(`find -type d -printf '%d\t%P\n' | cut -f2`)"
on my present working directory, and then output the contents of myarray,
tLen=${#myarray[#]}
for (( i=0; i<${tLen}; i++ ))
do
echo "${myarray[$i]}"
done
directory names with white space get split. i.e. The white spaces in the directory name 'My tax documents' aren't automatically escaped and ends up becoming three entries in the array, 'My' 'tax' 'documents' rather then just one name. However running
find -type d -printf '%d\t%P\n' | cut -f2
from the command line works just fine. How do I prevent word splitting when assigning the output of find into an array?

On Doing It Right
You can't safely use a newline as the trailing delimiter after an arbitrary filename: Filenames can contain newlines.
The below uses an unambiguous delimiter, and a read mechanism that works correctly with all possible filenames:
myarray=( )
while IFS= read -r -d $'\t' depth && IFS= read -r -d '' filename; do
printf 'Found filename %q at depth %d\n' "$filename" "$depth" >&2
myarray+=( "$filename" )
done < <(find . -type d -printf '%d\t%P\0')
# and to demonstrate reading from the array:
echo "Reiterating that list of filenames:" >&2
printf -- '- %q\n' "${myarray[#]}"
Note that we're calling read twice -- once to read up to the first tab after the depth, and one to read to the following NUL. One could get almost this effect with IFS=$'\t' read -r -d '' depth filename, but leading and trailing tabs in filenames could get lost.
References:
Using Find
BashFAQ #1
On What Went Wrong
find -type d -printf '%d\t%P\n' | cut -f2 doesn't create a correct list of filenames in the first place. Try creating a file with touch $'foo\tbar\nbaz\tqux' to have a particularly fun time here (the literal newline in the filename will be emitted by the %P format specifier, causing baz to be in the position otherwise containing the depth integer, and qux to show up as part of what looks like a completely separate filename.
By default, spaces and tabs are both part of IFS, and thus are both used for string-splitting.
The syntax
foo="(`...`)"
...does not actually create an array at all; it creates a string which starts with ( as its first character and ends with ).
String splitting runs glob expansion in conjunction, so if you have a file named touch *, that would be replaced with a list of files in the current directory (thus causing other names to be represented twice).

Append to an array variable from a pipeline command

I am writing a bash function to get all git repositories, but I have met a problem when I want to store all the git repository pathnames to the array patharray. Here is the code:
gitrepo() {
local opt
declare -a patharray
locate -b '\.git' | \
while read pathname
do
pathname="$(dirname ${pathname})"
if [[ "${pathname}" != *.* ]]; then
# Note: how to add an element to an existing Bash Array
patharray=("${patharray[#]}" '\n' "${pathname}")
# echo -e ${patharray[#]}
fi
done
echo -e ${patharray[#]}
}
I want to save all the repository paths to the patharray array, but I can't get it outside the pipeline which is comprised of locate and while command.
But I can get the array in the pipeline command, the commented command # echo -e ${patharray[#]} works well if uncommented, so how can I solve the problem?
And I have tried the export command, however it seems that it can't pass the patharray to the pipeline.

Bash runs all commands of a pipeline in separate SubShells. When a subshell containing a while loop ends, all changes you made to the patharray variable are lost.
You can simply group the while loop and the echo statement together so they are both contained within the same subshell:
gitrepo() {
local pathname dir
local -a patharray
locate -b '\.git' | { # the grouping begins here
while read pathname; do
pathname=$(dirname "$pathname")
if [[ "$pathname" != *.* ]]; then
patharray+=( "$pathname" ) # add the element to the array
fi
done
printf "%s\n" "${patharray[#]}" # all those quotes are needed
} # the grouping ends here
}
Alternately, you can structure your code to not need a pipe: use ProcessSubstitution
( Also see the Bash manual for details - man bash | less +/Process\ Substitution):
gitrepo() {
local pathname dir
local -a patharray
while read pathname; do
pathname=$(dirname "$pathname")
if [[ "$pathname" != *.* ]]; then
patharray+=( "$pathname" ) # add the element to the array
fi
done < <(locate -b '\.git')
printf "%s\n" "${patharray[#]}" # all those quotes are needed
}

First of all, appending to an array variable is better done with array[${#array[*]}]="value" or array+=("value1" "value2" "etc") unless you wish to transform the entire array (which you don't).
Now, since pipeline commands are run in subprocesses, changes made to a variable inside a pipeline command will not propagate to outside it. There are a few options to get around this (most are listed in Greg's BashFAQ/024):
pass the result through stdout instead
the simplest; you'll need to do that anyway to get the value from the function (although there are ways to return a proper variable)
any special characters in paths can be handled reliably by using \0 as a separator (see Capturing output of find . -print0 into a bash array for reading \0-separated lists)
locate -b0 '\.git' | while read -r -d '' pathname; do dirname -z "$pathname"; done
or simply
locate -b0 '\.git' | xargs -0 dirname -z
avoid running the loop in a subprocess
avoid pipeline at all
temporary file/FIFO (bad: requires manual cleanup, accessible to others)
temporary variable (mediocre: unnecessary memory overhead)
process substitution (a special, syntax-supported case of FIFO, doesn't require manual cleanup; code adapted from Greg's BashFAQ/020):
i=0 #`unset i` will error on `i' usage if the `nounset` option is set
while IFS= read -r -d $'\0' file; do
patharray[i++]="$(dirname "$file")" # or however you want to process each file
done < <(locate -b0 '\.git')
use the lastpipe option (new in Bash 4.2) - doesn't run the last command of a pipeline in a subprocess (mediocre: has global effect)

How do I store the output from a find command in an array? + bash

I have the following find command with the following output:
$ find -name '*.jpg'
./public_html/github/screencasts-gh-pages/reactiveDataVis/presentation/images/telescope.jpg
./public_html/github/screencasts-gh-pages/introToBackbone/presentation/images/telescope.jpg
./public_html/github/StarCraft-master/img/Maps/(6)Thin Ice.jpg
./public_html/github/StarCraft-master/img/Maps/Snapshot.jpg
./public_html/github/StarCraft-master/img/Maps/Map_Grass.jpg
./public_html/github/StarCraft-master/img/Maps/(8)TheHunters.jpg
./public_html/github/StarCraft-master/img/Maps/(2)Volcanis.jpg
./public_html/github/StarCraft-master/img/Maps/(3)Trench wars.jpg
./public_html/github/StarCraft-master/img/Maps/(8)BigGameHunters.jpg
./public_html/github/StarCraft-master/img/Maps/(8)Turbo.jpg
./public_html/github/StarCraft-master/img/Maps/(4)Blood Bath.jpg
./public_html/github/StarCraft-master/img/Maps/(2)Switchback.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(6)Thin Ice.jpg
./public_html/github/StarCraft-master/img/Maps/Original/Map_Grass.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)TheHunters.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(2)Volcanis.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(3)Trench wars.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)BigGameHunters.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)Turbo.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(4)Blood Bath.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(2)Switchback.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(4)Orbital Relay.jpg
./public_html/github/StarCraft-master/img/Maps/(4)Orbital Relay.jpg
./public_html/github/StarCraft-master/img/Bg/GameLose.jpg
./public_html/github/StarCraft-master/img/Bg/GameWin.jpg
./public_html/github/StarCraft-master/img/Bg/GameStart.jpg
./public_html/github/StarCraft-master/img/Bg/GamePlay.jpg
./public_html/github/StarCraft-master/img/Demo/Demo.jpg
./public_html/github/flot/examples/image/hs-2004-27-a-large-web.jpg
./public_html/github/minicourse-ajax-project/other/GameLose.jpg
How do I store this output in an array? I want it to handle filenames with spaces
I have tried this arrayname=($(find -name '*.jpg')) but this just stores the first element. # I am doing the following which seems to be just the first element?
$ arrayname=($(find -name '*.jpg'))
$ echo "$arrayname"
./public_html/github/screencasts-gh-pages/reactiveDataVis/presentation/images/telescope.jpg
$
I have tried here but again this just stores the 1st element
Other similar Qs
How do I capture the output from the ls or find command to store all file names in an array?
How do i store the output of a bash command in a variable?

If you know with certainty that your filenames will not contain newlines, then
mapfile -t arrayname < <(find ...)
If you want to be able to handle any file
arrayname=()
while IFS= read -d '' -r filename; do
arrayname+=("$filename")
done < <(find ... -print0)
echo "$arrayname" will only show the first element of the array. It is equivalent to echo "${arrayname[0]}". To dump an array:
printf "%s\n" "${arrayname[#]}"
# ............^^^^^^^^^^^^^^^^^ must use exactly this form, with the quotes.
arrayname=($(find ...)) is still wrong. It will store the file ./file with spaces.txt as 3 separate elements in the array.

If you have a sufficiently recent version of bash, you can save yourself a lot of trouble by just using a ** glob.
shopt -s globstar
files=(**/*.jpg)
The first line enables the feature. Once enabled, ** in a glob pattern will match any number (including 0) of directories in the path.
Using the glob in the array definition makes sure that whitespace is handled correctly.
To view an array in a form which could be used to define the array, use the -p (print) option to the declare builtin:
declare -p files

How can I store the "find" command results as an array in Bash

I am trying to save the result from find as arrays.
Here is my code:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=`find . -name ${input}`
len=${#array[*]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
I get 2 .txt files under current directory.
So I expect '2' as result of ${len}. However, it prints 1.
The reason is that it takes all result of find as one elements.
How can I fix this?
P.S
I found several solutions on StackOverFlow about a similar problem. However, they are a little bit different so I can't apply in my case. I need to store the results in a variable before the loop. Thanks again.

Update 2020 for Linux Users:
If you have an up-to-date version of bash (4.4-alpha or better), as you probably do if you are on Linux, then you should be using Benjamin W.'s answer.
If you are on Mac OS, which —last I checked— still used bash 3.2, or are otherwise using an older bash, then continue on to the next section.
Answer for bash 4.3 or earlier
Here is one solution for getting the output of find into a bash array:
array=()
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done < <(find . -name "${input}" -print0)
This is tricky because, in general, file names can have spaces, new lines, and other script-hostile characters. The only way to use find and have the file names safely separated from each other is to use -print0 which prints the file names separated with a null character. This would not be much of an inconvenience if bash's readarray/mapfile functions supported null-separated strings but they don't. Bash's read does and that leads us to the loop above.
[This answer was originally written in 2014. If you have a recent version of bash, please see the update below.]
How it works
The first line creates an empty array: array=()
Every time that the read statement is executed, a null-separated file name is read from standard input. The -r option tells read to leave backslash characters alone. The -d $'\0' tells read that the input will be null-separated. Since we omit the name to read, the shell puts the input into the default name: REPLY.
The array+=("$REPLY") statement appends the new file name to the array array.
The final line combines redirection and command substitution to provide the output of find to the standard input of the while loop.
Why use process substitution?
If we didn't use process substitution, the loop could be written as:
array=()
find . -name "${input}" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done <tmpfile
rm -f tmpfile
In the above the output of find is stored in a temporary file and that file is used as standard input to the while loop. The idea of process substitution is to make such temporary files unnecessary. So, instead of having the while loop get its stdin from tmpfile, we can have it get its stdin from <(find . -name ${input} -print0).
Process substitution is widely useful. In many places where a command wants to read from a file, you can specify process substitution, <(...), instead of a file name. There is an analogous form, >(...), that can be used in place of a file name where the command wants to write to the file.
Like arrays, process substitution is a feature of bash and other advanced shells. It is not part of the POSIX standard.
Alternative: lastpipe
If desired, lastpipe can be used instead of process substitution (hat tip: Caesar):
set +m
shopt -s lastpipe
array=()
find . -name "${input}" -print0 | while IFS= read -r -d $'\0'; do array+=("$REPLY"); done; declare -p array
shopt -s lastpipe tells bash to run the last command in the pipeline in the current shell (not the background). This way, the array remains in existence after the pipeline completes. Because lastpipe only takes effect if job control is turned off, we run set +m. (In a script, as opposed to the command line, job control is off by default.)
Additional notes
The following command creates a shell variable, not a shell array:
array=`find . -name "${input}"`
If you wanted to create an array, you would need to put parens around the output of find. So, naively, one could:
array=(`find . -name "${input}"`) # don't do this
The problem is that the shell performs word splitting on the results of find so that the elements of the array are not guaranteed to be what you want.
Update 2019
Starting with version 4.4-alpha, bash now supports a -d option so that the above loop is no longer necessary. Instead, one can use:
mapfile -d $'\0' array < <(find . -name "${input}" -print0)
For more information on this, please see (and upvote) Benjamin W.'s answer.

Bash 4.4 introduced a -d option to readarray/mapfile, so this can now be solved with
readarray -d '' array < <(find . -name "$input" -print0)
for a method that works with arbitrary filenames including blanks, newlines, and globbing characters. This requires that your find supports -print0, as for example GNU find does.
From the manual (omitting other options):
mapfile [-d delim] [array]
-d
The first character of delim is used to terminate each input line, rather than newline. If delim is the empty string, mapfile will terminate a line when it reads a NUL character.
And readarray is just a synonym of mapfile.

The following appears to work for both Bash and Z Shell on macOS.
#! /bin/sh
IFS=$'\n'
paths=($(find . -name "foo"))
unset IFS
printf "%s\n" "${paths[#]}"

If you are using bash 4 or later, you can replace your use of find with
shopt -s globstar nullglob
array=( **/*"$input"* )
The ** pattern enabled by globstar matches 0 or more directories, allowing the pattern to match to an arbitrary depth in the current directory. Without the nullglob option, the pattern (after parameter expansion) is treated literally, so with no matches you would have an array with a single string rather than an empty array.
Add the dotglob option to the first line as well if you want to traverse hidden directories (like .ssh) and match hidden files (like .bashrc) as well.

you can try something like
array=(`find . -type f | sort -r | head -2`) , and in order to print the array values , you can try something like echo "${array[*]}"

None of these solutions suited me because I didn't feel like learning readarray and mapfile. Here is what I came up with.
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
# The only change is here. Append to array for each non-empty line.
array=()
while read line; do
[[ ! -z "$line" ]] && array+=("$line")
done; <<< $(find . -name ${input} -print)
len=${#array[#]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done

You could do like this:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=(`find . -name '*'${input}'*'`)
for i in "${array[#]}"
do :
echo $i
done

In bash, $(<any_shell_cmd>) helps to run a command and capture the output. Passing this to IFS with \n as delimiter helps to convert that to an array.
IFS='\n' read -r -a txt_files <<< $(find /path/to/dir -name "*.txt")

remove blank first line script

I have this script which is printing out the files that have the first line blank:
for f in `find . -regex ".*\.php"`; do
for t in head; do
$t -1 $f |egrep '^[ ]*$' >/dev/null && echo "blank line at the $t of $f";
done;
done
How can I improve this to actually remove the blank line too, or at least copy all the files with the blank first line somewhere else.
I tried copying using this, which is good, because it copies preserving the directory structure, but it was copying every php file, and I needed to capture the postive output of the egrep and only copy those files.
rsync -R $f ../DavidSiteBlankFirst/

I would use sed personally
find ./ -type f -regex '.*\.php' -exec sed -i -e '1{/^[[:blank:]]*$/d;}' '{}' \;
this finds all the regular files ending in .php and executes the sed command which works on the first line only and checks to see if its blank and deletes it if it is, other blank lines in the file remain unaffected.

Just using find and sed:
find . -type f -name "*.php" -exec sed -i '1{/^\s*$/d;q;}' {} \;
The -type f option only find files, not that I expect you would name folders with a .php suffix but it's good practice. The use of -regex '.*\.php' is overkill and messier just using globbing -name "*.php". Use find's -exec instead of a shell script, the sed script will operate on each matching file passed by find.
The sed script looks at the first line only 1 and applies the operations inside {} to that line. We check if the line is blank /^\s*$/ if the line matches we delete d it and quit q the script so not to read all the other lines in the file. The -i option saves the change back to the file as the default behaviour of sed is to print to stdout. If you want back files making use -i~ instead, this will create a backfile file~ for file.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Read filenames with embedded whitespace into an array in a shell script - arrays

Related

Assign output from find without word splitting

Append to an array variable from a pipeline command

How do I store the output from a find command in an array? + bash

How can I store the "find" command results as an array in Bash

remove blank first line script

Categories

Resources