Populating Arrays With Nested Loops in Bash - arrays

I'm curious as to why the following:
array1=(file1 file2 file3)
array2=()
for i in ${array1[#]}
do
find . -name $i -type f -print0 2>/dev/null | \
while read -d '' -r file
do
array2+=( $file )
done
done
fails to populate array2 assuming the filenames file1, file2, and file3 exist in the filesystem in sub-directories from the parent where the search is initiated. I would appreciate if someone could point out where I mis-stepped here.

Try this:
array1=(file1 file2 file3)
array2=()
for i in "${array1[#]}"
do
while read -d '' -r file
do
array2+=( "$file" )
done < <(find . -name "$i" -type f -print0)
done
Due to your use of pipes sub shell is created and your array2 values get lost when sub shell ends.

If you are using bash 4, you can avoid using find:
shopt -s globstar
array1=(file1 file2 file3)
array2=()
for i in "${array1[#]}"
do
for f in **/"$i"; do
[[ -f "$f" ]] && array2+=( "$f" )
done
done

Related

Issue using diff with array and value quoted SHELL [duplicate]

This question already has answers here:
How can I store the "find" command results as an array in Bash
(8 answers)
Closed 2 months ago.
Hi guys i'm having an issue while using diff.
In my script i'm trying to compare all files in 1 dir to all files in 2 other dir
Using diff to compare is files are the same.
Here is my script :
`
#!/bin/bash
files1=()
files2=()
# Directories to compare. Adding quotes at the begining and at the end of each files found in content1 & content3
content2=$(find /data/logs -name "*.log" -type f)
content1=$(find /data/other/logs1 -type f | sed 's/^/"/g' | sed 's/$/"/g')
content3=$(find /data/other/logs2 -type f | sed 's/^/"/g' | sed 's/$/"/g')
# ADDING CONTENT INTO FILES1 & FILES2 ARRAY
while read -r line; do
files1+=("$line")
done <<< "$content1"
# content1 and content3 goes into the same array
while read -r line3;do
files1+=("$line3")
done <<< "$content3"
while read -r line2; do
files2+=("$line2")
done <<< "$content2"
# Here i'm trying to compare 1 by 1 the files in files2 to all files1
for ((i=0; i<${#files2[#]}; i++))
do
for ((j=0; j<${#files1[#]}; j++))
do
if [[ -n ${files2[$i]} ]];then
diff -s "${files2[$i]}" "${files1[$j]}" > /dev/null
if [[ $? == 0 ]]; then
echo ${files1[$j]} "est identique a" ${files2[$i]}
unset 'files2[$i]'
break
fi
fi
done
done
#SHOW THE FILES WHO DIDN'T MATCHED
echo ${files2[#]}
`
I'm having the folling issue when i'm trying to diff :
diff: "/data/content3/other/log2/perso log/somelog.log": No such file or directory
But when i'm doing
ll "/data/content3/other/log2/perso log/somelog.log" -rw-rw-r-- 2 lopom lopom 551M 30 oct. 18:53 '/data/content3/other/logs2/perso log/somelog.log'
So the file exist.
i need those quotes because sometimes there are space in the path
Does some1 know how to fix that ?
Thanks.
I already tried to change the quotes by single quotes, but it didn't fixed it
First, don't do this -
content2=$(find /data/logs -name "*.log" -type f)
content1=$(find /data/other/logs1 -type f | sed 's/^/"/g' | sed 's/$/"/g')
content3=$(find /data/other/logs2 -type f | sed 's/^/"/g' | sed 's/$/"/g')
don't stack all these into single vars. This is asking for ten kinds of obscure trouble. More importantly, those sed calls are embedding the quotation marks into the data as part of the filenames, which is probably what's causing diff to crash, because there are no actual files with the quotes in the name.
Also, if you are throwing away the output and just using diff to check the files are identical, try cmp instead. The -s is silent, and it's a lot faster since it exits at the first differing byte without reading the rest of both files and generating a report. If there ae a lot of files, this will add up.
If the logs are the only things in the directories, and you don't have to scan subdirectoies, and the filename can't appear in both /data/other/logs1 AND /data/other/logs2, but you're pretty sure it will be in at least one of them... then simplify:
for f in /data/logs/*.log # I'll assume these are all files...
do t=/data/other/logs[12]/"${f#/data/logs/}" # always just one?
if cmp -s "$f" "$t" # cmp -s *has* no output
then echo "$t est identique a $f" # files are same
elif [[ -e "$t" ]] # check t exists
then echo "$t diffère de $f" # maybe ls -l "$f" "$t" ?
else echo "$t n'existe pas" # report it does not
fi
done
This needs no arrays, no find, no sed calls, etc.
If you do need to read subdirectories, use shopt to handle it with globs so that you don't have to worry about parsing odd characters with read. (c.f. https://mywiki.wooledge.org/ParsingLs for some reasons.)
shopt -s globstar
for f in /data/logs/**/*.log # globstar makes ** match at arbitrary depth
do for t in /data/other/logs[12]/**/"${f#/data/logs/}" # if >1 possible hit
do if cmp -s "$f" "$t"
then echo "$t est identique a $f"
elif [[ -e "$t" ]]
then echo "$t diffère de $f"
else echo "$t n'existe pas" # $t will be the glob, one iteration
fi
done
done

Using mapfile to save output to associative arrays

In practicing bash, I tried writing a script that searches the home directory for duplicate files in the home directory and deletes them. Here's what my script looks like now.
#!/bin/bash
# create-list: create a list of regular files in a directory
declare -A arr1 sumray origray
if [[ -d "$HOME/$1" && -n "$1" ]]; then
echo "$1 is a directory"
else
echo "Usage: create-list Directory | options" >&2
exit 1
fi
for i in $HOME/$1/*; do
[[ -f $i ]] || continue
arr1[$i]="$i"
done
for i in "${arr1[#]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
dupe=$(find ~ -name "${Name##*/}" ! -wholename "$Name")
if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name")
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
for i in "${!sumray[#]}"; do
poten=$(md5sum "$i" | cut -c 1-32)
for i in "${!origray[#]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${sumray[$i]} is a duplicate of $i"
fi
done
done
Originally, where mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name") is now, my line was the following:
sumray["$i"]=$(find ~ -name "${Name##*/}" ! -wholename "$Name")
This saved the output of find to the array. But I had an issue. If a single file had multiple duplicates, then all locations found by find would be saved to a single value. I figured I could use the mapfile command to fix this, but now it's not saving anything to my array at all. Does it have to do with the fact that I'm using an associative array? Or did I just mess up elsewhere?
I'm not sure if I'm allowed to answer my own question, but I figured that I should post how I solved my problem.
As it turns out, the mapfile command does not work on associative arrays at all. So my fix was to save the output of find to a text file and then store that information in an indexed array. I tested this a few times and I haven't seemed to encounter any errors yet.
Here's my finished script.
#!/bin/bash
# create-list: create a list of regular files in a directory
declare -A arr1 origray
declare indexray
#Verify that Parameter is a directory.
if [[ -d "$HOME/$1/" && -n "$1" ]]; then
echo "Searching for duplicates of files in $1"
else
echo "Usage: create-list Directory | options" >&2
exit 1
fi
#create list of files in specified directory
for i in $HOME/${1%/}/*; do
[[ -f $i ]] || continue
arr1[$i]="$i"
done
#search for all duplicate files in the home directory
#by name
#find checksum of files in specified directory
for i in "${arr1[#]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
find ~ -name "${Name##*/}" ! -wholename "$Name" >> temp.txt
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
#create list of duplicate file locations.
if [[ -f temp.txt ]]; then
mapfile -t indexray < temp.txt
else
echo "No duplicates were found."
exit 0
fi
#compare similarly named files by checksum and delete duplicates
count=0
for i in "${!indexray[#]}"; do
poten=$(md5sum "${indexray[$i]}" | cut -c 1-32)
for i in "${!origray[#]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${indexray[$count]} is a duplicate of a file in $1."
fi
done
count=$((count+1))
done
rm temp.txt
This is kind of sloppy but it does what it's supposed to do. md5sum may not be the optimal way to check for file duplicates but it works. All I have to do is replace echo "${indexray[$count]} is a duplicate of a file in $1." with rm -i ${indexray[$count]} and it's good to go.
So my next question would have to be...why doesn't mapfile work with associative arrays?

rename specific pattern of files in bash

I have the following files and directories:
/tmp/jj/
/tmp/jj/ese
/tmp/jj/ese/2010
/tmp/jj/ese/2010/test.db
/tmp/jj/dfhdh
/tmp/jj/dfhdh/2010
/tmp/jj/dfhdh/2010/rfdf.db
/tmp/jj/ddfxcg
/tmp/jj/ddfxcg/2010
/tmp/jj/ddfxcg/2010/df.db
/tmp/jj/ddfnghmnhm
/tmp/jj/ddfnghmnhm/2010
/tmp/jj/ddfnghmnhm/2010/sdfs.db
I want to rename all 2010 directories to their parent directories then tar all .db files...
What I tried is:
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: `basename $0` <absolute-path>"
exit 1
fi
if [ "$(id -u)" != "0" ]; then
echo "This script must be run as root" 1>&2
exit 1
fi
rm /tmp/test
find $1 >> /tmp/test
for line in $(cat /tmp/test)
do
arr=$( (echo $line | awk -F"/" '{for (i = 1; i < NF; i++) if ($i == "2010") print $(i-1)}') )
for index in "${arr[#]}"
do
echo $index #HOW TO WRITE MV COMMAND RATHER THAN ECHO COMMAND?
done
done
1) The result is:
ese
dfhdh
ddfxcg
ddfnghmnhm
But it should be:
ese
dfhdh
ddfxcg
ddfnghmnhm
2) How can I rename all 2010 directories to their parent directory?
I mean how to do (I want to do it in loop because of larg numbers of dirs):
mv /tmp/jj/ese/2010 /tmp/jj/ese/ese
mv /tmp/jj/dfhdh/2010 /tmp/jj/dfhdh/dfhdh
mv /tmp/jj/ddfxcg/2010 /tmp/jj/ddfxcg/ddfxcg
mv /tmp/jj/ddfnghmnhm/2010 /tmp/jj/ddfnghmnhm/ddfnghmnhm
You could instead use find in order to determine if a directory contains a subdirectory named 2010 and perform the mv:
find /tmp -type d -exec sh -c '[ -d "{}"/2010 ] && mv "{}"/2010 "{}"/$(basename "{}")' -- {} \;
I'm not sure if you have any other question here but this would do what you've listed at the end of the question, i.e. it would:
mv /tmp/jj/ese/2010 /tmp/jj/ese/ese
and so on...
Can be done using grep -P:
grep -oP '[^/]+(?=/2010)' file
ese
ese
dfhdh
dfhdh
ddfxcg
ddfxcg
ddfnghmnhm
ddfnghmnhm
This should be close:
find "$1" -type d -name 2010 -print |
while IFS= read -r dir
do
parentPath=$(dirname "$dir")
parentDir=$(basename "$parentPath")
echo mv "$dir" "$parentPath/$parentDir"
done
Remove the echo after testing. If your dir names can contain newlines then look into the -print0 option for find, and the -0 option for xargs.
First, only iterate through the dirs you're interested in, and avoid temporary files:
for d in $(find $1 -type d -name '2010') ; do
Then you can use basename and dirname to extract parts of that directory name and reconstruct the desired one. Something like:
b="$(dirname $d)"
p="$(basename $b)"
echo mv "$d" "$b/$p"
You could use shell string replace operations instead of basename/dirname.

How to perform sort on all files in a directory?

How do I perform sort on all files in the directory?
I could have done it in python, but it seems to be too much of a hassle.
import os, glob
d = '/somedir/'
for f in glob.glob(d+"*"):
f2 = f+".tmp"
# unix~$ cat f | sort > f2; mv f2 f
os.system("cat "+f+" | sort > "+f2+"; mv "+f2+" "+f)
Use find and -exec:
find /somedir -type f -exec sort -o {} {} \;
For limiting the sort to the files in the directory itself, use -maxdepth:
find /somedir -maxdepth 1 -type f -exec sort -o {} {} \;
You can write a script like that:
#!/bin/bash
directory="/home/user/somedir"
if [ ! -d $directory ]; then
echo "Error: Directory doesn't exist"
exit 1
fi
for file in $directory/*
do
if [ -f $file ]; then
cat $file | sort > $file.tmp
mv -f $file.tmp $file
fi
done

First line of every file in a new file

How can I get the first line of EVERY file in a directory and save them all in a new file?
#!/bin/bash
rm FIRSTLINE
for file in "$(find $1 -type f)";
do
head -1 $file >> FIRSTLINE
done
cat FIRSTLINE
This is my bash script, but when I do this and I open the file FIRSTLINE,
then I see this:
==> 'path of the file' <==
'first line' of the file
and this for all the files in my argument.
Does anybody has some solution?
find . -type f -exec head -1 \{\} \; > YOURFILE
might work for you.
The problem is that you've quoted the output of find so it gets treated as a single string, so the for loop only runs once, with a single argument containing all the files. That means you run head -1 file1 file2 file3 file4 ... etc. and when given multiple files head prints the ==> file1 <== headers.
So to fix it, remove the double quotes around the find shell-out, which ensures you run the for loop once for each file, as intended. Also, the semi-colon after the shell-out is unnecessary.
#!/bin/bash
rm FIRSTLINE
for file in $(find $1 -type f)
do
head -1 $file >> FIRSTLINE
done
cat FIRSTLINE
This has some style issues though, do you really need to write to a file then cat the file to stdout? You could just print the output to stdout:
#!/bin/bash
for file in $(find $1 -type f)
do
head -1 $file
done
Personally I'd write it like this:
find $1 -type f | xargs -L1 head -1
or if you need the output in the file and printed to stdout:
find $1 -type f | xargs -L1 head -1 | tee FIRSTLINE
$ for file in $(find $1 -type f); do echo '';
echo $file;
head -n 4 $file;
done
for gzip files fo instances:
for file in `ls *.gz`; do gzcat $file | head -n 1; done > toto.txt

Resources