Bash script - how to fill array? - arrays

Let's say I have this directory structure:
DIRECTORY:
.........a
.........b
.........c
.........d
What I want to do is: I want to store elements of a directory in an array
something like : array = ls /home/user/DIRECTORY
so that array[0] contains name of first file (that is 'a')
array[1] == 'b' etc.
Thanks for help

You can't simply do array = ls /home/user/DIRECTORY, because - even with proper syntax - it wouldn't give you an array, but a string that you would have to parse, and Parsing ls is punishable by law. You can, however, use built-in Bash constructs to achieve what you want :
#!/usr/bin/env bash
readonly YOUR_DIR="/home/daniel"
if [[ ! -d $YOUR_DIR ]]; then
echo >&2 "$YOUR_DIR does not exist or is not a directory"
exit 1
fi
OLD_PWD=$PWD
cd "$YOUR_DIR"
i=0
for file in *
do
if [[ -f $file ]]; then
array[$i]=$file
i=$(($i+1))
fi
done
cd "$OLD_PWD"
exit 0
This small script saves the names of all the regular files (which means no directories, links, sockets, and such) that can be found in $YOUR_DIR to the array called array.
Hope this helps.

Option 1, a manual loop:
dirtolist=/home/user/DIRECTORY
shopt -s nullglob # In case there aren't any files
contentsarray=()
for filepath in "$dirtolist"/*; do
contentsarray+=("$(basename "$filepath")")
done
shopt -u nullglob # Optional, restore default behavior for unmatched file globs
Option 2, using bash array trickery:
dirtolist=/home/user/DIRECTORY
shopt -s nullglob
contentspaths=("$dirtolist"/*) # This makes an array of paths to the files
contentsarray=("${contentpaths[#]##*/}") # This strips off the path portions, leaving just the filenames
shopt -u nullglob # Optional, restore default behavior for unmatched file globs

array=($(ls /home/user/DIRECTORY))
Then
echo ${array[0]}
will equal to the first file in that directory.

Related

bash: make an array for the files in the same directory

I am working with the ensemble of the mol2 filles located in the same directory.
structure36S.mol2 structure30S.mol2 structure21.mol2
structure36R.mol2 structure30R.mol2 Structure20R.mol2
structure35S.mol2 structure29R.mol2 Structure19R.mol2
structure35R.mol2 structure28R.mol2 Structure13R.mol2
structure34S.mol2 structure27R.mol2
structure34R.mol2 structure26.mol2 jacks18.mol2
structure33S.mol2 structure25.mol2 5p9.mol2
structure33R.mol2 structure24.mol2 Y6J.mol2
structure32R.mol2 structure23.mol2 06I.mol2
structure31R.mol2 structure22.mol2
From this data I need to make an associative array with the names of the filles (without extension (mol2)) as well as some value (7LMF) shared between all elements:
dataset=( [structure36S]=7LMF [structure36R]=7LMF [structure35S]=7LMF ...[06I]=7LMF [Y6J]=7LMF )
We may start from the following script:
for file in ./*.mol2; do
file_name=$(basename "$file" .mol2)
#some command to add the file into the array
done
How this script could be completed for the creating of the array?
I would recommend turning on nullglob otherwise the pattern will evaluate as a string when there is no match.
Use parameter expansion to remove the file extension.
If the leading './' is included, it will need to be stripped with another expansion.
#!/usr/bin/env bash
shopt -s nullglob
declare -A dataset
for file in *.mol2; do
dataset+=([${file%.*}]=7LMF)
done
for key in "${!dataset[#]}"; do echo "dataset[$key]: ${dataset[$key]}"; done
Try this Shellcheck-clean code:
#! /bin/bash -p
shopt -s nullglob
declare -A dataset
for file in *.mol2; do
file_name=${file%.mol2}
dataset[$file_name]=7LMF
done
# Show the contents of 'dataset'
declare -p dataset

Populate an array with list of directories existing in a given path in Bash

I have a directory path where there are multiple files and directories.
I want to use basic bash script to create an array containing only list of directories.
Suppose I have a directory path:
/my/directory/path/
$ls /my/directory/path/
a.txt dirX b.txt dirY dirZ
Now I want to populate array named arr[] with only directories, i.e. dirX, dirY and dirZ.
Got one post but its not that relevant with my requirement.
Any help will be appreciated!
Try this:
#!/bin/bash
arr=(/my/directory/path/*/) # This creates an array of the full paths to all subdirs
arr=("${arr[#]%/}") # This removes the trailing slash on each item
arr=("${arr[#]##*/}") # This removes the path prefix, leaving just the dir names
Unlike the ls-based answer, this will not get confused by directory names that contain spaces, wildcards, etc.
Try:
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s dotglob # Expanded globs include names that start with '.'
arr=()
for dir in /my/directory/path/*/ ; do
dir2=${dir%/} # Remove the trailing /
dir3=${dir2##*/} # Remove everything up to, and including, the last /
arr+=( "$dir3" )
done
Try:
baseDir="/my/directory/path/"
readarray -d '' arr < <(find "${baseDir}" -mindepth 1 -maxdepth 1 -type d -print0)
Here the find command outputs all directories within the baseDir, then the readarray command puts these into an array names arr.
You can then work over the array with:
for directory in "${arr[#]}"; do
echo "${directory}"
done
Note: This only works with bash version 4.4-alpha and above. (See this answer for more.)

Array of all files in a directory, except one

Trying to figure out how to include all .txt files except one called manifest.txt.
FILES=(path/to/*.txt)
You can use extended glob patterns for this:
shopt -s extglob
files=(path/to/!(manifest).txt)
The !(pattern-list) pattern matches "anything except one of the given patterns".
Note that this exactly excludes manifest.txt and nothing else; mmanifest.txt, for example, would still go in to the array.
As a side note: a glob that matches nothing at all expands to itself (see the manual and this question). This behaviour can be changed using the nullglob (expand to empty string) and failglob (print error message) shell options.
You can build the array one file at a time, avoiding the file you do not want :
declare -a files=()
for file in /path/to/files/*
do
! [[ -e "$file" ]] || [[ "$file" = */manifest.txt ]] || files+=("$file")
done
Please note that globbing in the for statement does not cause problems with whitespace (even newlines) in filenames.
EDIT
I added a test for file existence to handle the case where the glob fails and the nullglob option is not set.
I think this is best handled with an associative array even if just one element.
Consider:
$ touch f{1..6}.txt manifest.txt
$ ls *.txt
f1.txt f3.txt f5.txt manifest.txt
f2.txt f4.txt f6.txt
You can create an associative array for the names you wish to exclude:
declare -A exclude
for f in f1.txt f5.txt manifest.txt; do
exclude[$f]=1
done
Then add files to an array that are not in the associative array:
files=()
for fn in *.txt; do
[[ ${exclude[$fn]} ]] && continue
files+=("$fn")
done
$ echo "${files[#]}"
f2.txt f3.txt f4.txt f6.txt
This approach allows any number of exclusions from the list of files.
FILES=($(ls /path/to/*.txt | grep -wv '^manifest.txt$'))

Append to an array variable from a pipeline command

I am writing a bash function to get all git repositories, but I have met a problem when I want to store all the git repository pathnames to the array patharray. Here is the code:
gitrepo() {
local opt
declare -a patharray
locate -b '\.git' | \
while read pathname
do
pathname="$(dirname ${pathname})"
if [[ "${pathname}" != *.* ]]; then
# Note: how to add an element to an existing Bash Array
patharray=("${patharray[#]}" '\n' "${pathname}")
# echo -e ${patharray[#]}
fi
done
echo -e ${patharray[#]}
}
I want to save all the repository paths to the patharray array, but I can't get it outside the pipeline which is comprised of locate and while command.
But I can get the array in the pipeline command, the commented command # echo -e ${patharray[#]} works well if uncommented, so how can I solve the problem?
And I have tried the export command, however it seems that it can't pass the patharray to the pipeline.
Bash runs all commands of a pipeline in separate SubShells. When a subshell containing a while loop ends, all changes you made to the patharray variable are lost.
You can simply group the while loop and the echo statement together so they are both contained within the same subshell:
gitrepo() {
local pathname dir
local -a patharray
locate -b '\.git' | { # the grouping begins here
while read pathname; do
pathname=$(dirname "$pathname")
if [[ "$pathname" != *.* ]]; then
patharray+=( "$pathname" ) # add the element to the array
fi
done
printf "%s\n" "${patharray[#]}" # all those quotes are needed
} # the grouping ends here
}
Alternately, you can structure your code to not need a pipe: use ProcessSubstitution
( Also see the Bash manual for details - man bash | less +/Process\ Substitution):
gitrepo() {
local pathname dir
local -a patharray
while read pathname; do
pathname=$(dirname "$pathname")
if [[ "$pathname" != *.* ]]; then
patharray+=( "$pathname" ) # add the element to the array
fi
done < <(locate -b '\.git')
printf "%s\n" "${patharray[#]}" # all those quotes are needed
}
First of all, appending to an array variable is better done with array[${#array[*]}]="value" or array+=("value1" "value2" "etc") unless you wish to transform the entire array (which you don't).
Now, since pipeline commands are run in subprocesses, changes made to a variable inside a pipeline command will not propagate to outside it. There are a few options to get around this (most are listed in Greg's BashFAQ/024):
pass the result through stdout instead
the simplest; you'll need to do that anyway to get the value from the function (although there are ways to return a proper variable)
any special characters in paths can be handled reliably by using \0 as a separator (see Capturing output of find . -print0 into a bash array for reading \0-separated lists)
locate -b0 '\.git' | while read -r -d '' pathname; do dirname -z "$pathname"; done
or simply
locate -b0 '\.git' | xargs -0 dirname -z
avoid running the loop in a subprocess
avoid pipeline at all
temporary file/FIFO (bad: requires manual cleanup, accessible to others)
temporary variable (mediocre: unnecessary memory overhead)
process substitution (a special, syntax-supported case of FIFO, doesn't require manual cleanup; code adapted from Greg's BashFAQ/020):
i=0 #`unset i` will error on `i' usage if the `nounset` option is set
while IFS= read -r -d $'\0' file; do
patharray[i++]="$(dirname "$file")" # or however you want to process each file
done < <(locate -b0 '\.git')
use the lastpipe option (new in Bash 4.2) - doesn't run the last command of a pipeline in a subprocess (mediocre: has global effect)

Store all subdirectories of /Volumes in an array (BASH)

I need a script that will get all of the directories within the /Volumes directory on a Mac and store them in an array. The problem that I am running into is that it is very common for there to be a space in a directory name, and that really messes things up.
Here is what I've got so far:
LOCATION="/Volumes"
COUNTER=0
cd $LOCATION
OIFS=$IFS
IFS=$'\n'
for folder in *; do
[ -d "$folder" ] || continue
(( DRIVES[$COUNTER] = ${folder} ))
(( COUNTER = COUNTER + 1 ))
done
IFS=$OIFS
Here is the error that I am getting:
./getDrives.sh: line 17: DRIVES[0] = Macintosh HD : syntax error in expression (error token is "HD ")
I guess the simplest is just:
array=( /Volumes/*/ )
Notes:
use this with nullglob or failglob set
if you also want hidden directories (but not . nor ..), set dotglob
if you want all the directories and subdirectories (recursively), set globstar and use
array=( /Volumes/**/ )
instead.
When I say set nullglob or failglob or dotglob or globstar I mean the shell options, that can be set with, e.g.:
shopt -s nullglob
and unset with, e.g.:
shopt -u nullglob
More about these in The Shopt Builtin section of the Bash Reference Manual.
To answer your comment: you only want the basename of the directories, not the full path? easy, just do
cd /Volumes
array=( */ )
That's all. In fact, I'm suggesting you replace 6 lines of inefficient code with just one, much more efficient, line.
More generally, if you don't want to cd into /Volumes, you can easily get rid of the leading /Volumes/ like so
array=( /Volumes/*/ )
array=( "${array[#]/#\/Volumes\//}" )
Or, even better, put the leading /Volumes/ in a variable and proceed as:
location="/Volumes/"
array=( "$location"* )
array=( "${array[#]/#"$location"/}" )
cd /Volumes
cnt=0
for d in *; do
[ -d "$d" ] || continue
drv[$cnt]="$d"
((++cnt))
done
for d in "${drv[#]}"; do
echo "$d"
done

Resources