BASH: Sort array of files with crazy names - arrays

Problem:
Need to sort array before operating on them with function.
First, array is loaded with files:
unset a i
counter=1
while IFS= read -r -d $'\0' file; do
a[i++]="$file"
done < <(find $DIR -type f -print0)
Next, each member of array is sent to function
for f in "${a[#]}"
do
func_hash "$f"
[ $(expr $counter % 20) -eq 0 ] && printf "="
counter=$((counter + 1))
done
Somehow a sort needs to be thrown into the above for loop. Have looked
through the SO posts on sorting arrays but somehow my crazy file names
cause issues when I try to tack on a sort.
Ideas?
Thanks!
Bubnoff
UPDATE: Here's code with sort:
while IFS= read -r -d $'\0' file; do
func_hash "$file"
[ $(expr $counter % 20) -eq 0 ] && printf "="
counter=$((counter + 1))
done < <(find $DIR -type f -print0 | sort -z +1 -1)
It's sorting by full path rather than file name. Any ideas on how to
sort by file name given that the path is needed for the function?
UPDATE 2: Decided to compromise.
My main goal was to avoid temp files using sort. GNU sort can write back to the original
file with its '-o' option so now I can:
sort -o $OUT -t',' -k 1 $OUT
Anyone have a more 'elegant' solution ( whatever that means ).
SOLVED See jw013's answer below. Thanks man!

EDIT
while IFS= read -r -d/ && read -r -d '' file; do
a[i++]="$file"
done < <(find "$DIR" -type f -printf '%f/%p\0' | sort -z -t/ -k1 )
Rationale:
I make the assumption that / is never a legal character within a file name (which seems reasonable on most *nix filesystems since it is the path separator).
The -printf is used to print the file name without leading directories, then then full file name with path, separated by /. The sort takes place on the first field separated by / which should be the full file name without path.
The read is modified to first use / as a delimiter to throw out the pathless file name.
side note
Any POSIX shell should support the modulo operator as part of its arithmetic expansion. You can replace line with the call to external command expr in the second loop with
[ $(( counter % 20 )) -eq 0 ] ...

Related

Renaming Files with "Invalid Characters"

I have a script that removes invalid characters from files due to one drive restrictions. It works except for file that have * { } in them. I have used the * { } but that is not working (ignores those files). Script is below. Not sure what I am doing wrong here.
#Renames FOLDERS with space at the end
IFS=$'\n'
for file in $(find -d . -name "* ")
do
target_name=$(echo "$file" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
#end Folder rename
#Renames FILES
declare -a arrayGrep=(\? \* \, \# \; \: \& \# \+ \< \> \% \$ \~ \% \: \< \> )
echo "array: ${arrayGrep[#]}"
for i in "${arrayGrep[#]}"
do
for file in $(find . | grep $i )
do
target_name=$(echo "$file" | sed 's/\'$i'/-/g' )
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
done ````
There are some issues with your code:
You've got some duplicates in arrayGrep
You have some quoting issues: $1 in the grep and sed commands must be protected from the shell
some of the disallowed characters, even if quoted to protect from the shell, could be mis-parsed by grep as meta-characters
You loop rather a lot when all substitutions could happen simultaneously
for file in $(find ...) has to read the entire list into memory, which might be problematic for large lists. It also breaks with some filenames due to word-splitting. Piping into read is better
find can do path filtering without grep (at least for this set of disallowed characters)
sed is fine to use but tr is a little neater
A possible rewrite is:
badchars='?*,#;:&#+<>%$~'
find . -name "*[$badchars]*" | while read -r file
do
target_name=$(echo "$file" | tr "$badchars" - )
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
If using bash, you can even do parameter expansion directly,
although you have to embed the list:
target_name="${file//[?*,#;:&#+<>%$~]/-}"
an enhancement idea
If you choose a suitable character (eg. =), you can rename filenames reversibly (but only if the length of the new name wouldn't exceed the maximum allowed length for a filename). One possible algorithm:
replace all = in the filename with =3D
replace all the other disallowed characters with =hh where hh is the appropriate ASCII code in hex
You can reverse the renaming by:
replace all =hh (except =3D) with the character corresponding to ASCII code hh
replace all =3D with =

Bash Add elements to an array does not work [duplicate]

Why isn't this bash array populating? I believe I've done them like this in the past. Echoing ${#XECOMMAND[#]} shows no data..
DIR=$1
TEMPFILE=/tmp/dir.tmp
ls -l $DIR | tail -n +2 | sed 's/\s\+/ /g' | cut -d" " -f5,9 > $TEMPFILE
i=0
cat $TEMPFILE | while read line ;do
if [[ $(echo $line | cut -d" " -f1) == 0 ]]; then
XECOMMAND[$i]="$(echo "$line" | cut -d" " -f2)"
(( i++ ))
fi
done
When you run the while loop like
somecommand | while read ...
then the while loop is executed in sub-shell, i.e. a different process than the main script. Thus, all variable assignments that happen in the loop, will not be reflected in the main process. The workaround is to use input redirection and/or command substitution, so that the loop executes in the current process. For example if you want to read from a file you do
while read ....
do
# do stuff
done < "$filename"
or if you wan't the output of a process you can do
while read ....
do
# do stuff
done < <(some command)
Finally, in bash 4.2 and above, you can set shopt -s lastpipe, which causes the last command in the pipeline to be executed in the current process.
I think you're trying to construct an array consisting of the names of all zero-length files and directories in $DIR. If so, you can do it like this:
mapfile -t ZERO_LENGTH < <(find "$DIR" -maxdepth 1 -size 0)
(Add -type f to the find command if you're only interested in regular files.)
This sort of solution is almost always better than trying to parse ls output.
The use of process substitution (< <(...)) rather than piping (... |) is important, because it means that the shell variable will be set in the current shell, not in an ephimeral subshell.

echo a variable and then grep to see if value exist in a file is not returning anything. Unix Shell Scripting

I'm trying to figure out how to determine if a variable contains a value from a file using grep, this is not returning anything, so I'm going to explain it.
I have my code that is this:
MyFiles="MyFile-I-20160606_141_Employees.txt"
DirFiles="/dev/fs/C/Users/salasfri/Desktop/MyFiles.txt"
for OutFile in $(cat $DirFiles); do
if [[ $( echo $MyFiles | grep -c $OutFile ) -gt 0 ]]; then
print "The file $OutFile exist!!"
fi
done
and the file in /dev/fs/C/Users/salasfri/Desktop/MyFiles.txt contains the following values:
MyFile-I-*_141_Employees.txt
MyFile-I-*_141_Products.txt
MyFile-I-*_141_Deparments.txt
the idea is verify if the variable "MyFiles" is found in the MyFiles.txt file, as you can see is using the pattern "*" due that is a date, it will change.
that solutions is not returning any count of files, there's something that I'm doing wrong?
You can try to change the searchstring before searching.
An example with three teststrings:
for teststring in MyFile-I-20160606_141_Employees.txt MyFile-I-20160606_142_Employees.txt MyFile-I-20160606_141_Others.txt
do
grepstr=$(sed 's/[0-9]\{8\}_/*_/' <<< "${teststring}")
fgrep "${grepstr}" "${DirFiles}"
found=$(fgrep "${grepstr}" "${DirFiles}")
if [ $? -eq 0 ]; then
echo "${found} matches ${teststring}."
fi
done
In your case you can make the code shorter with
fgrep -q "$(sed 's/[0-9]\{8\}_/*_/' <<< "${MyFiles}")" $DirFiles &&
echo "The file $(sed 's/[0-9]\{8\}_/*_/' <<< "${MyFiles}") exist!!"
Your patterns are glob-style patterns, not regular expressions. The pattern abc-*_X.txt will not match the string abc-1234_X.txt.
You want to use a shell construct that does glob matching.
MyFiles="MyFile-I-20160606_141_Employees.txt"
sed 's/\r$//' "/dev/fs/C/Users/salasfri/Desktop/MyFiles.txt" \
| while IFS= read -r Pattern; do
if [[ $MyFiles == $Pattern ]]; then
print "$MyFiles matches pattern $Pattern"
break
fi
done

Prepending a variable to all items in a bash array

CURRENTFILENAMES=( "$(ls $LOC -AFl | sed "1 d" | grep "[^/]$" | awk '{ print $9 }')" )
I have written the above code, however it is not behaving as I expect it to in a for-loop, which I wrote as so
for a in "$CURRENTFILENAMES"; do
CURRENTFILEPATHS=( "${LOC}/${a}" )
done
Which I expected to prepend the value in the variable LOC to all of the items in the CURRENTFILENAMES array, however it has just prepended it to the beginning of the array, how can I remedy this?
You need to use += operator for appending into an array:
CURRENTFILEPATHS+=( "${LOC}/${a}" )
However parsing ls output is not advisable, use find instead.
EDIT: Proper way to run this loop:
CURRENTFILEPATHS=()
while IFS= read -d '' -r f; do
CURRENTFILEPATHS+=( "$f" )
done < <(find "$LOC" -maxdepth 1 -type f -print0)

Store the output of find command in an array [duplicate]

This question already has answers here:
How can I store the "find" command results as an array in Bash
(8 answers)
Closed 4 years ago.
How do I put the result of find $1 into an array?
In for loop:
for /f "delims=/" %%G in ('find $1') do %%G | cut -d\/ -f6-
I want to cry.
In bash:
file_list=()
while IFS= read -d $'\0' -r file ; do
file_list=("${file_list[#]}" "$file")
done < <(find "$1" -print0)
echo "${file_list[#]}"
file_list is now an array containing the results of find "$1
What's special about "field 6"? It's not clear what you were attempting to do with your cut command.
Do you want to cut each file after the 6th directory?
for file in "${file_list[#]}" ; do
echo "$file" | cut -d/ -f6-
done
But why "field 6"? Can I presume that you actually want to return just the last element of the path?
for file in "${file_list[#]}" ; do
echo "${file##*/}"
done
Or even
echo "${file_list[#]##*/}"
Which will give you the last path element for each path in the array. You could even do something with the result
for file in "${file_list[#]##*/}" ; do
echo "$file"
done
Explanation of the bash program elements:
(One should probably use the builtin readarray instead)
find "$1" -print0
Find stuff and 'print the full file name on the standard output, followed by a null character'. This is important as we will split that output by the null character later.
<(find "$1" -print0)
"Process Substitution" : The output of the find subprocess is read in via a FIFO (i.e. the output of the find subprocess behaves like a file here)
while ...
done < <(find "$1" -print0)
The output of the find subprocess is read by the while command via <
IFS= read -d $'\0' -r file
This is the while condition:
read
Read one line of input (from the find command). Returnvalue of read is 0 unless EOF is encountered, at which point while exits.
-d $'\0'
...taking as delimiter the null character (see QUOTING in bash manpage). Which is done because we used the null character using -print0 earlier.
-r
backslash is not considered an escape character as it may be part of the filename
file
Result (first word actually, which is unique here) is put into variable file
IFS=
The command is run with IFS, the special variable which contains the characters on which read splits input into words unset. Because we don't want to split.
And inside the loop:
file_list=("${file_list[#]}" "$file")
Inside the loop, the file_list array is just grown by $file, suitably quoted.
arrayname=( $(find $1) )
I don't understand your loop question? If you look how to work with that array then in bash you can loop through all array elements like this:
for element in $(seq 0 $((${#arrayname[#]} - 1)))
do
echo "${arrayname[$element]}"
done
This is probably not 100% foolproof, but it will probably work 99% of the time (I used the GNU utilities; the BSD utilities won't work without modifications; also, this was done using an ext4 filesystem):
declare -a BASH_ARRAY_VARIABLE=$(find <path> <other options> -print0 | sed -e 's/\x0$//' | awk -F'\0' 'BEGIN { printf "("; } { for (i = 1; i <= NF; i++) { printf "%c"gensub(/"/, "\\\\\"", "g", $i)"%c ", 34, 34; } } END { printf ")"; }')
Then you would iterate over it like so:
for FIND_PATH in "${BASH_ARRAY_VARIABLE[#]}"; do echo "$FIND_PATH"; done
Make sure to enclose $FIND_PATH inside double-quotes when working with the path.
Here's a simpler pipeless version, based on the version of user2618594
declare -a names=$(echo "("; find <path> <other options> -printf '"%p" '; echo ")")
for nm in "${names[#]}"
do
echo "$nm"
done
To loop through a find, you can simply use find:
for file in "`find "$1"`"; do
echo "$file" | cut -d/ -f6-
done
It was what I got from your question.

Resources