I am trying to read a directory with "ls" and do operations on it
directory example:
$ ls -1
x x
y y
z z
script file:
files=(`ls -1`);
for ((i=0; i<"${#files[#]}"; i+=1 )); do
echo "${files[$i]}"
however, the output is
yet if I define "files" in the following way
$ files=("x x" "y y" "z z")
$ for ((i=0; i<"${#files[#]}"; i+=1 )); do echo "${files[$i]}"; done
x x
y y
z z
How can I preserve the spaces in "files=(`ls -1`)"?

BashPitfalls #1
If at all possible, use a shell glob instead.
That is to say:
files=( * )
If you need to represent filenames as a stream of text, use NUL delimiters.
That is to say, either:
printf '%s\0' *
find . -mindepth 1 -maxdepth 1 -print0
will emit a NUL-delimited string, which you can load into a shell array safely using (in modern bash 4.x):
readarray -d '' array < <(find . -mindepth 1 -maxdepth 1 -print0)
...or, to support bash 3.x:
array=( )
while IFS= read -r -d '' name; do
array+=( "$name" )
done < <(find . -mindepth 1 -maxdepth 1 -print0)
In either of the above, that find command potentially being on the other side of a FIFO, network stream, or other remoting layer (assuming that there's some complexity of that sort stopping you from using a native shell glob).

It seems the main conclusion is not to use ls. Back in Pleistocene age of Unix programming, they used ls; however, these days, ls is best-restricted to producing human-readable displays only. A robust script for anything that can be thrown at your script (end lines, white spaces, Chinese characters mixed with Hebrew and French, or whatever), is best achieved by some form of globbing (as recommended by others here BashPitfalls).
for file in ./*; do
[ -e "${file}" ] || continue
# do some task, for example, test if it is a directory.
if [ -d "${file}" ]; then
echo "${file}"
The ./ is maybe not absolutely necessary, but it may help if the file begins with a "-", clarifying which file has the return line (or lines), and likely some other nasty buggers. This is also a useful template for specific files (.e.g, ./*.pdf). For example, suppose somehow the following files are in your directory: "-t" and "<CR>t". Then (revealing other issues with ls when using nonstandard characters)
$ ls
-t ?t
$ for file in *; do ls "${file}"; done
-t ?t
$ for file in ./*; do ls "${file}"; done
$ for file in ./*; do echo "${file}"; done
A workaround with POSIX commands can be achieved by --
$ for file in *; do ls -- "${file}"; done # work around

Try this:
eval files=($(ls -Q))
Option -Q enables quoting of filenames.
Option -1 is implied (not needed), if the output is not a tty.


Bash: how to print and run a cmd array which has the pipe operator, |, in it

This is a follow-up to my question here: How to write bash function to print and run command when the command has arguments with spaces or things to be expanded
Suppose I have this function to print and run a command stored in an array:
# Print and run the cmd stored in the passed-in array
print_and_run() {
echo "Running cmd: $*"
# run the command by calling all elements of the command array at once
This works fine:
cmd_array=(ls -a /)
print_and_run "${cmd_array[#]}"
But this does NOT work:
cmd_array=(ls -a / | grep "home")
print_and_run "${cmd_array[#]}"
Error: syntax error near unexpected token `|':
eRCaGuy_hello_world/bash$ ./
./ line 55: syntax error near unexpected token `|'
./ line 55: `cmd_array=(ls -a / | grep "home")'
How can I get this concept to work with the pipe operator (|) in the command?
If you want to treat an array element containing only | as an instruction to generate a pipeline, you can do that. I don't recommend it -- it means you have security risk if you don't verify that variables into your string can't consist only of a single pipe character -- but it's possible.
Below, we create a random single-use "$pipe" sigil to make that attack harder. If you're unwilling to do that, change [[ $arg = "$pipe" ]] to [[ $arg = "|" ]].
# generate something random to make an attacker's job harder
# use that randomly-generated sigil in place of | in our array
ls -a /
"$pipe" grep "home"
exec_array_pipe() {
local arg cmd_q
local -a cmd=( )
while (( $# )); do
arg=$1; shift
if [[ $arg = "$pipe" ]]; then
# log an eval-safe copy of what we're about to run
printf -v cmd_q '%q ' "${cmd[#]}"
echo "Starting pipeline component: $cmd_q" >&2
# Recurse into a new copy of ourselves as a child process
"${cmd[#]}" | exec_array_pipe "$#"
cmd+=( "$arg" )
printf -v cmd_q '%q ' "${cmd[#]}"
echo "Starting pipeline component: $cmd_q" >&2
exec_array_pipe "${cmd_array[#]}"
See this running in an online sandbox at
Do this instead. It works.
print_and_run() {
echo "Running cmd: $1"
eval "$1"
Example usage:
cmd='ls -a / | grep -C 9999 --color=always "home"'
print_and_run "$cmd"
Running cmd: ls -a / | grep -C 9999 --color=always "home"
(rest of output here, with the word "home" highlighted in red)
The general direction is that you don't. You do not store the whole command line to be printed later, and this is not the direction you should take.
The "bad" solution is to use eval.
The "good" solution is to store the literal '|' character inside the array (or some better representation of it) and parse the array, extract the pipe parts and execute them. This is presented by Charles in the other amazing answer. It is just rewriting the parser that already exists in the shell. It requires significant work, and expanding it will require significant work.
The end result is, is that you are reimplementing parts of shell inside shell. Basically writing a shell interpreter in shell. At this point, you can just consider taking Bash sources and implementing a new shopt -o print_the_command_before_executing option in the sources, which might just be simpler.
However, I believe the end goal is to give users a way to see what is being executed. I would propose to approach it like .gitlab-ci.yml does with script: statements. If you want to invent your own language with "debug" support, do just that instead of half-measures. Consider the following YAML file:
- ls -a / | grep "home"
- echo other commands
- for i in "stuff"; do
echo "$i";
- |
for i in "stuff"; do
echo "$i"
Then the following "runner":
import yaml
import shlex
import os
import sys
script = []
input = yaml.safe_load(open(sys.argv[1], "r"))
for line in input:
script += [
"echo + " + shlex.quote(line).replace("\n", "<newline>"), # some unicode like ␤ would look nice
os.execvp("bash", ["bash", "-c", "\n".join(script)])
Executing the runner results in:
+ ls -a / | grep "home"
+ echo other commands
other commands
+ for i in "stuff"; do echo "$i"; done
+ for i in "stuff"; do<newline> echo "$i"<newline>done<newline>
This offers greater flexibility and is rather simple, supports any shell construct with ease. You can try gitlab-ci/cd on their repository and read the docs.
The YAML format is only an example of the input format. Using special comments like # --- cut --- between parts and extracting each part with the parser will allow running shellcheck over the script. Instead of generating a script with echo statements, you could run Bash interactively, print the part to be executed and then "feed" the part to be executed to interactive Bash. This will alow to preserve $?.
Either way - with a "good" solution, you end up with a custom parser.
Instead of passing an array, you can pass the whole function and use the output of declare -f with some custom parsing:
print_and_run() {
echo "+ $(
declare -f "$1" |
# Remove `f() {` and `}`. Remove indentation.
sed '1d;2d;$d;s/^ *//' |
# Replace newlines with <newline>.
sed -z 's/\n*$//;s/\n/<newline>/'
cmd() { ls -a / | grep "home"; }
print_and_run cmd
Results in:
+ ls --color -F -a / | grep "home"
It will allow for supporting any shell construct and still allow you to check it with shellcheck and doesn't require that much work.

Putting files in directory into array variable

I'm writing bash code that will search for specific files in the directory it is run in and add them into an array variable. The problem I am having is formatting the results. I need to find all the compressed files in the current directory and display both the names and sizes of the files in order of last modified. I want to take the results of that command and put them into an array variable with each line element containing the file's name and corresponding size but I don't know how to do that. I'm not sure if I should be using command "find" instead of "ls" but here is what I have so far:
find_files="$(ls -1st --block-size=MB)"
arr=( ($find_files) )
I'm not sure exactly what format you want the array to be in, but here is a snippet that creates an associative array keyed by filename with the size as the value:
$ ls -l test.{zip,bz2}
-rw-rw-r-- 1 user group 0 Sep 10 13:27 test.bz2
-rw-rw-r-- 1 user group 0 Sep 10 13:26
$ declare -A sizes; while read SIZE FILENAME ; do sizes["$FILENAME"]="$SIZE"; done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -A sizes=(["''"]="0" ["'test.bz2'"]="0" )
And if you just want an array of literally "filename size" entries, that's even easier:
$ while read SIZE FILENAME ; do sizes+=("$FILENAME $SIZE"); done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -a sizes=([0]="'' 0" [1]="'test.bz2' 0")
Both of these solutions work, and were tested via copy paste from this post.
The first is fairly slow. One problem is external program invocations within a loop - date for example, is invoked for every file. You could make it quicker by not including the date in the output array (see Notes below). Particularly for method 2 - that would result in no external command invocations inside the while loop. But method 1 is really the problem - orders of magnitude slower.
Also, somebody probably knows how to convert an epoch date to another format in awk for example, which could be faster. Maybe you could do the sort in awk too. Perhaps just keep the epoch date?
These solutions are bash / GNU heavy and not portable to other environments (bash here strings, find -printf). OP tagged linux and bash though, so GNU can be assumed.
Solution 1 - capture any compressed file - using file to match (slow)
The criteria for 'compressed' is if file output contains the word compress
Reliable enough, but perhaps there is a conflict with some other file type description?
file -l | grep compress (file 5.38, Ubuntu 20.04, WSL) indicates for me there are no conflicts at all (all files listed are compression formats)
I couldn't find a way of classifying any compressed file other than this
I ran this on a directory containing 1664 files - time (real) was 40 seconds
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
# Make the array
# A here string (<<<) must be used, to keep array in the global environment
while IFS= read -r -d '' path; do
[[ "$(file --brief "${path%% *}")" == *compress* ]] &&
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
find "$TARGET" -type f -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
# Print results - to test
printf '%s\n' "${compressed_files[#]}"
Solution 2 - use file extensions - orders of magnitude faster
If you know exactly what extensions you are looking for, you can
compose them in a find command
This is alot faster
On the same directory as above, containing 1664 files - time (real) was 200 miliseconds
This example looks for .gz, .zip, and .7z (gzip, zip and 7zip respectively)
I'm not sure if -type f -and -regex '.*[.]\(gz\|zip\|7z\) -and printf may be faster again, now I think of it. I started with globs cause I assumed that was quicker
That may also allow for storing the extension list in a variable..
This method avoids a file analysis on every file in your target
It also makes the while loop shorter - you're only iterating matches
Note the repetition of -printf here, this is due to the logic that
find uses: -printf is 'True'. If it were included by itself, it would
act as a 'match' and print all files
It has to be used as a result of a name match being true (using -and)
Perhaps somebody has a better composition?
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
while IFS= read -r -d '' path; do
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
find "$TARGET" \
-type f -and -name '*.gz' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.zip' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.7z' -and -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
# Print results - for testing
printf '%s\n' "${compressed_files[#]}"
Sample output (of either method):
$ comp-find.bash /tmp
/tmp/comptest/websters_english_dictionary.tmp.tar.gz 265.148 Thu Sep 10 07:53:37 AEST 2020
/tmp/comptest/What_is_Systems_Architecture_PART_1.tar.gz 1357.06 Thu Sep 10 08:17:47 AEST 2020
You can add a literal K to indicate the block size / units (kilobytes)
If you want to print the path only from this array, you can use suffix removal: printf '%s\n' "${files[#]&& *}"
For no date in the array (it's used to sort, but then its job may be done), simply remove $(date -d #${path##* }) (incl. the space).
Kind of tangential, but to use different date formats, replace $(date -d #${path##* }) with:
$(date -I -d #${path##* }) ISO format - note that short opts style: date -Id #[date] did not work for me
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) like ISO, but w/ seconds
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) same again, but w/ nanoseconds (find gives you nano seconds)
Sorry for the long post, hopefully it's informative.

Script to group numbered files into folders

I have around a million files in one folder in the form xxxx_description.jpg where xxx is a number ranging from 100 to an unknown upper.
The list is similar to this:
To get the file number down in the at folder I'd like to put them all into folders grouped by the number at the start.
I was thinking to try and use command line: find | awk {} | mv command or maybe write a script, but I'm not sure how to do this most efficiently.
If you really are dealing with millions of files, I suspect that a glob (*.jpg or [0-9]*_*.jpg may fail because it makes a command line that's too long for the shell. If that's the case, you can still use find. Something like this might work:
find /path -name "[0-9]*_*.jpg" -exec sh -c 'f="{}"; mkdir -p "/target/${f%_*}"; mv "$f" "/target/${f%_*}/"' \;
Broken out for easier reading, this is what we're doing:
find /path - run find, with /path as a starting point,
-name "[0-9]*_*.jpg" - match files that match this filespec in all directories,
-exec sh -c execute the following on each file...
'f="{}"; - put the filename into a variable...
mkdir -p "/target/${f%_*}"; - make a target directory based on that variable (read mkdir's man page about the -p option)
mv "$f" "/target/${f%_*}/"' - move the file into the directory.
\; - end the -exec expression
On the up side, it can handle any number of files that find can handle (i.e. limited only by your OS). On the down side, it's launching a separate shell for each file to be handled.
Note that the above answer is for Bourne/POSIX/Bash. If you're using CSH or TCSH as your shell, the following might work instead:
foreach f (*_*.jpg)
set split = ($f:as/_/ /)
mkdir -p "$split[1]"
mv "$f" "$split[1]/"
This assumes that the filespec will fit in tcsh's glob buffer. I've tested with 40000 files (894KB) on one command line and not had a problem using /bin/sh or /bin/csh in FreeBSD.
Like the Bourne/POSIX/Bash parameter expansion solution above, this avoids unnecessary calls to external I haven't tested that, and would recommend the find solution even though it's slower.
You can use this script:
for i in [0-9]*_*.jpg; do
p=`echo "$i" | sed 's/^\([0-9]*\)_.*/\1/'`
mkdir -p "$p"
mv "$i" "$p"
Using grep
for file in *.jpg;
dirName=$(echo $file | grep -oE '^[0-9]+')
[[ -d $dirName ]] || mkdir $dirName
mv $file $dirName
grep -oE '^[0-9]+' extracts the starting digits in the filename as
[[ -d $dirName ]] returns 1 if the directory exists
[[ -d $dirName ]] || mkdir $dirName ensures that the mkdir works only if the test [[ -d $dirName ]] fails, that is the direcotry does not exists

How can I store the "find" command results as an array in Bash

I am trying to save the result from find as arrays.
Here is my code:
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=`find . -name ${input}`
echo "found : ${len}"
while [ $i -lt $len ]
echo ${array[$i]}
let i++
I get 2 .txt files under current directory.
So I expect '2' as result of ${len}. However, it prints 1.
The reason is that it takes all result of find as one elements.
How can I fix this?
I found several solutions on StackOverFlow about a similar problem. However, they are a little bit different so I can't apply in my case. I need to store the results in a variable before the loop. Thanks again.
Update 2020 for Linux Users:
If you have an up-to-date version of bash (4.4-alpha or better), as you probably do if you are on Linux, then you should be using Benjamin W.'s answer.
If you are on Mac OS, which —last I checked— still used bash 3.2, or are otherwise using an older bash, then continue on to the next section.
Answer for bash 4.3 or earlier
Here is one solution for getting the output of find into a bash array:
while IFS= read -r -d $'\0'; do
done < <(find . -name "${input}" -print0)
This is tricky because, in general, file names can have spaces, new lines, and other script-hostile characters. The only way to use find and have the file names safely separated from each other is to use -print0 which prints the file names separated with a null character. This would not be much of an inconvenience if bash's readarray/mapfile functions supported null-separated strings but they don't. Bash's read does and that leads us to the loop above.
[This answer was originally written in 2014. If you have a recent version of bash, please see the update below.]
How it works
The first line creates an empty array: array=()
Every time that the read statement is executed, a null-separated file name is read from standard input. The -r option tells read to leave backslash characters alone. The -d $'\0' tells read that the input will be null-separated. Since we omit the name to read, the shell puts the input into the default name: REPLY.
The array+=("$REPLY") statement appends the new file name to the array array.
The final line combines redirection and command substitution to provide the output of find to the standard input of the while loop.
Why use process substitution?
If we didn't use process substitution, the loop could be written as:
find . -name "${input}" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
done <tmpfile
rm -f tmpfile
In the above the output of find is stored in a temporary file and that file is used as standard input to the while loop. The idea of process substitution is to make such temporary files unnecessary. So, instead of having the while loop get its stdin from tmpfile, we can have it get its stdin from <(find . -name ${input} -print0).
Process substitution is widely useful. In many places where a command wants to read from a file, you can specify process substitution, <(...), instead of a file name. There is an analogous form, >(...), that can be used in place of a file name where the command wants to write to the file.
Like arrays, process substitution is a feature of bash and other advanced shells. It is not part of the POSIX standard.
Alternative: lastpipe
If desired, lastpipe can be used instead of process substitution (hat tip: Caesar):
set +m
shopt -s lastpipe
find . -name "${input}" -print0 | while IFS= read -r -d $'\0'; do array+=("$REPLY"); done; declare -p array
shopt -s lastpipe tells bash to run the last command in the pipeline in the current shell (not the background). This way, the array remains in existence after the pipeline completes. Because lastpipe only takes effect if job control is turned off, we run set +m. (In a script, as opposed to the command line, job control is off by default.)
Additional notes
The following command creates a shell variable, not a shell array:
array=`find . -name "${input}"`
If you wanted to create an array, you would need to put parens around the output of find. So, naively, one could:
array=(`find . -name "${input}"`) # don't do this
The problem is that the shell performs word splitting on the results of find so that the elements of the array are not guaranteed to be what you want.
Update 2019
Starting with version 4.4-alpha, bash now supports a -d option so that the above loop is no longer necessary. Instead, one can use:
mapfile -d $'\0' array < <(find . -name "${input}" -print0)
For more information on this, please see (and upvote) Benjamin W.'s answer.
Bash 4.4 introduced a -d option to readarray/mapfile, so this can now be solved with
readarray -d '' array < <(find . -name "$input" -print0)
for a method that works with arbitrary filenames including blanks, newlines, and globbing characters. This requires that your find supports -print0, as for example GNU find does.
From the manual (omitting other options):
mapfile [-d delim] [array]
The first character of delim is used to terminate each input line, rather than newline. If delim is the empty string, mapfile will terminate a line when it reads a NUL character.
And readarray is just a synonym of mapfile.
The following appears to work for both Bash and Z Shell on macOS.
#! /bin/sh
paths=($(find . -name "foo"))
unset IFS
printf "%s\n" "${paths[#]}"
If you are using bash 4 or later, you can replace your use of find with
shopt -s globstar nullglob
array=( **/*"$input"* )
The ** pattern enabled by globstar matches 0 or more directories, allowing the pattern to match to an arbitrary depth in the current directory. Without the nullglob option, the pattern (after parameter expansion) is treated literally, so with no matches you would have an array with a single string rather than an empty array.
Add the dotglob option to the first line as well if you want to traverse hidden directories (like .ssh) and match hidden files (like .bashrc) as well.
you can try something like
array=(`find . -type f | sort -r | head -2`) , and in order to print the array values , you can try something like echo "${array[*]}"
None of these solutions suited me because I didn't feel like learning readarray and mapfile. Here is what I came up with.
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
# The only change is here. Append to array for each non-empty line.
while read line; do
[[ ! -z "$line" ]] && array+=("$line")
done; <<< $(find . -name ${input} -print)
echo "found : ${len}"
while [ $i -lt $len ]
echo ${array[$i]}
let i++
You could do like this:
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=(`find . -name '*'${input}'*'`)
for i in "${array[#]}"
do :
echo $i
In bash, $(<any_shell_cmd>) helps to run a command and capture the output. Passing this to IFS with \n as delimiter helps to convert that to an array.
IFS='\n' read -r -a txt_files <<< $(find /path/to/dir -name "*.txt")

How do I capture the output from the ls or find command to store all file names in an array?

Need to process files in current directory one at a time. I am looking for a way to take the output of ls or find and store the resulting value as elements of an array. This way I can manipulate the array elements as needed.
To answer your exact question, use the following:
arr=( $(find /path/to/toplevel/dir -type f) )
$ find . -type f
$ arr=( $(find . -type f) )
$ echo ${#arr[#]}
$ echo ${arr[#]}
./test1.txt ./test2.txt ./test3.txt
$ echo ${arr[0]}
However, if you just want to process files one at a time, you can either use find's -exec option if the script is somewhat simple, or you can do a loop over what find returns like so:
while IFS= read -r -d $'\0' file; do
# stuff with "$file" here
done < <(find /path/to/toplevel/dir -type f -print0)
for i in `ls`; do echo $i; done;
can't get simpler than that!
edit: hmm - as per Dennis Williamson's comment, it seems you can!
edit 2: although the OP specifically asks how to parse the output of ls, I just wanted to point out that, as the commentators below have said, the correct answer is "you don't". Use for i in * or similar instead.
You actually don't need to use ls/find for files in current directory.
Just use a for loop:
for files in *; do
if [ -f "$files" ]; then
# do something
And if you want to process hidden files too, you can set the relative option:
shopt -s dotglob
This last command works in bash only.
Depending on what you want to do, you could use xargs:
ls directory | xargs cp -v dir2
For example. xargs will act on each item returned.
