Issue using diff with array and value quoted SHELL [duplicate] - arrays

This question already has answers here:
How can I store the "find" command results as an array in Bash
(8 answers)
Closed 2 months ago.
Hi guys i'm having an issue while using diff.
In my script i'm trying to compare all files in 1 dir to all files in 2 other dir
Using diff to compare is files are the same.
Here is my script :
`
#!/bin/bash
files1=()
files2=()
# Directories to compare. Adding quotes at the begining and at the end of each files found in content1 & content3
content2=$(find /data/logs -name "*.log" -type f)
content1=$(find /data/other/logs1 -type f | sed 's/^/"/g' | sed 's/$/"/g')
content3=$(find /data/other/logs2 -type f | sed 's/^/"/g' | sed 's/$/"/g')
# ADDING CONTENT INTO FILES1 & FILES2 ARRAY
while read -r line; do
files1+=("$line")
done <<< "$content1"
# content1 and content3 goes into the same array
while read -r line3;do
files1+=("$line3")
done <<< "$content3"
while read -r line2; do
files2+=("$line2")
done <<< "$content2"
# Here i'm trying to compare 1 by 1 the files in files2 to all files1
for ((i=0; i<${#files2[#]}; i++))
do
for ((j=0; j<${#files1[#]}; j++))
do
if [[ -n ${files2[$i]} ]];then
diff -s "${files2[$i]}" "${files1[$j]}" > /dev/null
if [[ $? == 0 ]]; then
echo ${files1[$j]} "est identique a" ${files2[$i]}
unset 'files2[$i]'
break
fi
fi
done
done
#SHOW THE FILES WHO DIDN'T MATCHED
echo ${files2[#]}
`
I'm having the folling issue when i'm trying to diff :
diff: "/data/content3/other/log2/perso log/somelog.log": No such file or directory
But when i'm doing
ll "/data/content3/other/log2/perso log/somelog.log" -rw-rw-r-- 2 lopom lopom 551M 30 oct. 18:53 '/data/content3/other/logs2/perso log/somelog.log'
So the file exist.
i need those quotes because sometimes there are space in the path
Does some1 know how to fix that ?
Thanks.
I already tried to change the quotes by single quotes, but it didn't fixed it

First, don't do this -
content2=$(find /data/logs -name "*.log" -type f)
content1=$(find /data/other/logs1 -type f | sed 's/^/"/g' | sed 's/$/"/g')
content3=$(find /data/other/logs2 -type f | sed 's/^/"/g' | sed 's/$/"/g')
don't stack all these into single vars. This is asking for ten kinds of obscure trouble. More importantly, those sed calls are embedding the quotation marks into the data as part of the filenames, which is probably what's causing diff to crash, because there are no actual files with the quotes in the name.
Also, if you are throwing away the output and just using diff to check the files are identical, try cmp instead. The -s is silent, and it's a lot faster since it exits at the first differing byte without reading the rest of both files and generating a report. If there ae a lot of files, this will add up.
If the logs are the only things in the directories, and you don't have to scan subdirectoies, and the filename can't appear in both /data/other/logs1 AND /data/other/logs2, but you're pretty sure it will be in at least one of them... then simplify:
for f in /data/logs/*.log # I'll assume these are all files...
do t=/data/other/logs[12]/"${f#/data/logs/}" # always just one?
if cmp -s "$f" "$t" # cmp -s *has* no output
then echo "$t est identique a $f" # files are same
elif [[ -e "$t" ]] # check t exists
then echo "$t diffère de $f" # maybe ls -l "$f" "$t" ?
else echo "$t n'existe pas" # report it does not
fi
done
This needs no arrays, no find, no sed calls, etc.
If you do need to read subdirectories, use shopt to handle it with globs so that you don't have to worry about parsing odd characters with read. (c.f. https://mywiki.wooledge.org/ParsingLs for some reasons.)
shopt -s globstar
for f in /data/logs/**/*.log # globstar makes ** match at arbitrary depth
do for t in /data/other/logs[12]/**/"${f#/data/logs/}" # if >1 possible hit
do if cmp -s "$f" "$t"
then echo "$t est identique a $f"
elif [[ -e "$t" ]]
then echo "$t diffère de $f"
else echo "$t n'existe pas" # $t will be the glob, one iteration
fi
done
done

Related

Renaming Files with "Invalid Characters"

I have a script that removes invalid characters from files due to one drive restrictions. It works except for file that have * { } in them. I have used the * { } but that is not working (ignores those files). Script is below. Not sure what I am doing wrong here.
#Renames FOLDERS with space at the end
IFS=$'\n'
for file in $(find -d . -name "* ")
do
target_name=$(echo "$file" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
#end Folder rename
#Renames FILES
declare -a arrayGrep=(\? \* \, \# \; \: \& \# \+ \< \> \% \$ \~ \% \: \< \> )
echo "array: ${arrayGrep[#]}"
for i in "${arrayGrep[#]}"
do
for file in $(find . | grep $i )
do
target_name=$(echo "$file" | sed 's/\'$i'/-/g' )
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
done ````
There are some issues with your code:
You've got some duplicates in arrayGrep
You have some quoting issues: $1 in the grep and sed commands must be protected from the shell
some of the disallowed characters, even if quoted to protect from the shell, could be mis-parsed by grep as meta-characters
You loop rather a lot when all substitutions could happen simultaneously
for file in $(find ...) has to read the entire list into memory, which might be problematic for large lists. It also breaks with some filenames due to word-splitting. Piping into read is better
find can do path filtering without grep (at least for this set of disallowed characters)
sed is fine to use but tr is a little neater
A possible rewrite is:
badchars='?*,#;:&#+<>%$~'
find . -name "*[$badchars]*" | while read -r file
do
target_name=$(echo "$file" | tr "$badchars" - )
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
If using bash, you can even do parameter expansion directly,
although you have to embed the list:
target_name="${file//[?*,#;:&#+<>%$~]/-}"
an enhancement idea
If you choose a suitable character (eg. =), you can rename filenames reversibly (but only if the length of the new name wouldn't exceed the maximum allowed length for a filename). One possible algorithm:
replace all = in the filename with =3D
replace all the other disallowed characters with =hh where hh is the appropriate ASCII code in hex
You can reverse the renaming by:
replace all =hh (except =3D) with the character corresponding to ASCII code hh
replace all =3D with =

Using mapfile to save output to associative arrays

In practicing bash, I tried writing a script that searches the home directory for duplicate files in the home directory and deletes them. Here's what my script looks like now.
#!/bin/bash
# create-list: create a list of regular files in a directory
declare -A arr1 sumray origray
if [[ -d "$HOME/$1" && -n "$1" ]]; then
echo "$1 is a directory"
else
echo "Usage: create-list Directory | options" >&2
exit 1
fi
for i in $HOME/$1/*; do
[[ -f $i ]] || continue
arr1[$i]="$i"
done
for i in "${arr1[#]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
dupe=$(find ~ -name "${Name##*/}" ! -wholename "$Name")
if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name")
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
for i in "${!sumray[#]}"; do
poten=$(md5sum "$i" | cut -c 1-32)
for i in "${!origray[#]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${sumray[$i]} is a duplicate of $i"
fi
done
done
Originally, where mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name") is now, my line was the following:
sumray["$i"]=$(find ~ -name "${Name##*/}" ! -wholename "$Name")
This saved the output of find to the array. But I had an issue. If a single file had multiple duplicates, then all locations found by find would be saved to a single value. I figured I could use the mapfile command to fix this, but now it's not saving anything to my array at all. Does it have to do with the fact that I'm using an associative array? Or did I just mess up elsewhere?
I'm not sure if I'm allowed to answer my own question, but I figured that I should post how I solved my problem.
As it turns out, the mapfile command does not work on associative arrays at all. So my fix was to save the output of find to a text file and then store that information in an indexed array. I tested this a few times and I haven't seemed to encounter any errors yet.
Here's my finished script.
#!/bin/bash
# create-list: create a list of regular files in a directory
declare -A arr1 origray
declare indexray
#Verify that Parameter is a directory.
if [[ -d "$HOME/$1/" && -n "$1" ]]; then
echo "Searching for duplicates of files in $1"
else
echo "Usage: create-list Directory | options" >&2
exit 1
fi
#create list of files in specified directory
for i in $HOME/${1%/}/*; do
[[ -f $i ]] || continue
arr1[$i]="$i"
done
#search for all duplicate files in the home directory
#by name
#find checksum of files in specified directory
for i in "${arr1[#]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
find ~ -name "${Name##*/}" ! -wholename "$Name" >> temp.txt
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
#create list of duplicate file locations.
if [[ -f temp.txt ]]; then
mapfile -t indexray < temp.txt
else
echo "No duplicates were found."
exit 0
fi
#compare similarly named files by checksum and delete duplicates
count=0
for i in "${!indexray[#]}"; do
poten=$(md5sum "${indexray[$i]}" | cut -c 1-32)
for i in "${!origray[#]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${indexray[$count]} is a duplicate of a file in $1."
fi
done
count=$((count+1))
done
rm temp.txt
This is kind of sloppy but it does what it's supposed to do. md5sum may not be the optimal way to check for file duplicates but it works. All I have to do is replace echo "${indexray[$count]} is a duplicate of a file in $1." with rm -i ${indexray[$count]} and it's good to go.
So my next question would have to be...why doesn't mapfile work with associative arrays?

Script terminates prematurely after do loop. Last "echo" is not executed.

I am looking to execute the following script below. The issue I am encountering is it will not execute anything after the do loop. It doesn't matter what I happen to have after the loop, it never executes, so I am def missing something somewhere.
Also, any suggestions on a more efficient way or writing this script? I am very new to the scripting environment and very open to better ways of going about things.
#!/bin/bash
# mcidas environment
PATH=$HOME/bin:
PATH=$PATH:/usr/sww/bin:/usr/local/bin:$HOME/mcidas/bin
PATH=$PATH:/home/mcidas/bin:/bin:/usr/bin:/etc:/usr/ucb
PATH=$PATH:/usr/bin/X11:/common/tool/bin:.
export PATH
MCPATH=$HOME/mcidas/data
MCPATH=$MCPATH:/home/mcidas/data
export MCPATH
#variables
basedir1="ftp://ladsweb.nascom.nasa.gov/allData/6/MOD02QKM" #TERRA
basedir2="ftp://ladsweb.nascom.nasa.gov/allData/6/MYD02QKM" #AQUA
day=`date +%j`
day1=`date +"%j" -d "-1 day"`
hour=`date -u +"%H"`
min=`date -u +"%m"`
year=`date -u +"%Y"`
segment=1
count=$(ls /satellite/modis_processed/ | grep -v ^d | wc -l)
count_max=25
files=(/satellite/modis_processed/*)
if [ $hour -ge "17" ]; then
workinghour="16"
echo "Searching for hour $workinghour"
url="${basedir2}/${year}/${day1}/MYD02QKM.A${year}${day1}.${workinghour}*.006.${year}*"
wget -r -nd --no-parent -nc -e robots=off -R 'index.*' -P /satellitemodis/ $url
#find /satellite/modis/ -type f -mmin -30 -exec cp "{}" /satellite/modis_processed/ \;
for files in /satellite/modis_processed/*
do
echo "The number used for the data file is ${count}"
echo "The number used for the image file is ${segment}"
export segment
export count
#Run McIDAS
mcenv <<- 'EOF'
imgcopy.k MODISD.${count} MODISI.${segment} BAND=1 SIZE=SAME
imgremap.k MODISD.${segment} MODISI.${segment} BAND=1 SIZE=ALL PRO=MERC
imgcha.k MODISI.${segment} CTYPE=BRIT
exit
EOF
segment=`expr ${segment} + 1`
count=`expr ${count} - 1`
#Reset Counter if equal or greater than 25
if [[ $segment -ge $count_max ]]; then
segment=1
fi
find /satellite/awips -type f -name "AREA62*" -exec mv "{}" /awips2/edex/data/manual/ \;
done;
echo "We have exported ${segment} converted modis files to EDEX."
fi
You have a here-document in that script. That here-document is not properly terminated. The end marker, EOF, needs to be in the first column, not indented at all.
If you indent it, it has to be with tabs, and the start of the here-document should be <<-'EOF'.
The effect of the wrongly indented EOF marker is that the rest of the script is read as the contents of the here-document.
As Charles Duffy points out, ShellCheck is your friend.

Comment out items that do not match pattern in array

I have a log file I am trying to comment out lines that do not match my array. I did successfully learn how to create an array and I can echo out the array items but I am having trouble taking anything that doesn't match my array and adding something in front of it. Here is my code, if you have suggestions on another path or ways I can make it better:
for itsSaturday in $(find "$LOCATION" -mindepth 1 -maxdepth 1 -name "*.log" ); do
TEMPFILE="$itsSaturday.$$"
declare -a someArray=( "breakfast" "scrambled eggs" "Bloody Mary" )
theCall='some_additional_text_'
commentOn="## You_need_"
for arrayItem in "${someArray[#]}"; do
merged="$theCall$arrayItem"
if ! grep -q "$merged" "$itsSaturday"; then
sed -e '/$merged/! s:$commentOn$theCall::g' "$itsSaturday" > $TEMPFILE && mv $TEMPFILE "$itsSaturday"
fi
done
done
file:
some_additional_text_breakfast
some_additional_text_bacon
some_additional_text_scrambled eggs
some_additional_text_Bloody Mary
some_additional_text_orange juice
some_additional_text_breakfast
file into:
some_additional_text_breakfast
## You_need_some_additional_text_bacon
some_additional_text_scrambled eggs
some_additional_text_Bloody Mary
## You_need_some_additional_text_orange juice
some_additional_text_breakfast
How can I add a variable before items that do not match my array?
I don't like doing this using bash and sed, but I think the following might be enough:
#! /bin/bash
declare -a someArray=( "breakfast" "scrambled eggs" "Bloody Mary" )
theCall='some_additional_text_'
commentOn="## You_need_"
OIFS="$IFS"
IFS='|' mergedLines="${someArray[*]/#/$theCall}"
IFS="$OIFS"
for i in *.txt
do
TEMPFILE="$i.$$"
sed -r "/$mergedLines/!s/^/$commentOn/" "$i" >> "$TEMPFILE"
done
I shifted the array and other constants out of the loop.
"${someArray[*]/#/$theCall}" uses bash string substitution to append the contents of $theCall to every element in the array.
IFS='|' mergedLines="${someArray[*]} is a convenient trick to combine the elements of an array into a pipe-separated string.
Combined, (2) and (3) get me
some_additional_text_breakfast|some_additional_text_scrambled eggs|some_additional_text_Bloody Mary
in mergedLines.
Then it's just a matter of using extended regular expressions in sed (for |) and replacing non-matching lines.
Your sed pattern used single quotes, so the variables within were not expanded.
Try replacing the inner for-loop with:
PROG=$(printf '%s\n' "${COMMENT[#]}" | while read comment ; do
/bin/echo -n '$0 !~ /'"$comment"'$/ && '
done
echo '1 { printf commentOn } ; { print }')
awk -v commentOn="$commentOn" "$PROG" $itsSaturday > $TEMPFILE && mv $TEMPFILE $itsSaturday
On each file, this creates an awk program that does the work.

rename specific pattern of files in bash

I have the following files and directories:
/tmp/jj/
/tmp/jj/ese
/tmp/jj/ese/2010
/tmp/jj/ese/2010/test.db
/tmp/jj/dfhdh
/tmp/jj/dfhdh/2010
/tmp/jj/dfhdh/2010/rfdf.db
/tmp/jj/ddfxcg
/tmp/jj/ddfxcg/2010
/tmp/jj/ddfxcg/2010/df.db
/tmp/jj/ddfnghmnhm
/tmp/jj/ddfnghmnhm/2010
/tmp/jj/ddfnghmnhm/2010/sdfs.db
I want to rename all 2010 directories to their parent directories then tar all .db files...
What I tried is:
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: `basename $0` <absolute-path>"
exit 1
fi
if [ "$(id -u)" != "0" ]; then
echo "This script must be run as root" 1>&2
exit 1
fi
rm /tmp/test
find $1 >> /tmp/test
for line in $(cat /tmp/test)
do
arr=$( (echo $line | awk -F"/" '{for (i = 1; i < NF; i++) if ($i == "2010") print $(i-1)}') )
for index in "${arr[#]}"
do
echo $index #HOW TO WRITE MV COMMAND RATHER THAN ECHO COMMAND?
done
done
1) The result is:
ese
dfhdh
ddfxcg
ddfnghmnhm
But it should be:
ese
dfhdh
ddfxcg
ddfnghmnhm
2) How can I rename all 2010 directories to their parent directory?
I mean how to do (I want to do it in loop because of larg numbers of dirs):
mv /tmp/jj/ese/2010 /tmp/jj/ese/ese
mv /tmp/jj/dfhdh/2010 /tmp/jj/dfhdh/dfhdh
mv /tmp/jj/ddfxcg/2010 /tmp/jj/ddfxcg/ddfxcg
mv /tmp/jj/ddfnghmnhm/2010 /tmp/jj/ddfnghmnhm/ddfnghmnhm
You could instead use find in order to determine if a directory contains a subdirectory named 2010 and perform the mv:
find /tmp -type d -exec sh -c '[ -d "{}"/2010 ] && mv "{}"/2010 "{}"/$(basename "{}")' -- {} \;
I'm not sure if you have any other question here but this would do what you've listed at the end of the question, i.e. it would:
mv /tmp/jj/ese/2010 /tmp/jj/ese/ese
and so on...
Can be done using grep -P:
grep -oP '[^/]+(?=/2010)' file
ese
ese
dfhdh
dfhdh
ddfxcg
ddfxcg
ddfnghmnhm
ddfnghmnhm
This should be close:
find "$1" -type d -name 2010 -print |
while IFS= read -r dir
do
parentPath=$(dirname "$dir")
parentDir=$(basename "$parentPath")
echo mv "$dir" "$parentPath/$parentDir"
done
Remove the echo after testing. If your dir names can contain newlines then look into the -print0 option for find, and the -0 option for xargs.
First, only iterate through the dirs you're interested in, and avoid temporary files:
for d in $(find $1 -type d -name '2010') ; do
Then you can use basename and dirname to extract parts of that directory name and reconstruct the desired one. Something like:
b="$(dirname $d)"
p="$(basename $b)"
echo mv "$d" "$b/$p"
You could use shell string replace operations instead of basename/dirname.

Resources