variable still empty after the loop problem in shell scripting - loops

I'm very new in shell scripting, and I encountered a problem that is quite wired. The program is rather simple so I just post it here:
#!/bin/bash
list=""
list=`mtx -f /dev/sg2 status | while read status
do
result=$(echo ${status} | grep "Full")
if [ -z "$result" ]; then
continue
else
echo $(echo ${result} | cut -f3 -d' ' | tr -d [:alpha:] | tr -d [:punct:])
fi
done`
echo ${list}
for haha in ${list}
do
printf "current slot is:%s \n" ${haha}
done
What the program does is that it executes mtx -f /dev/sg2 status and goes to each line and see if there's a full disk. If it has "Full" in that line, I'll record the slot number in that line, and put in the list.
Notice that I put a back quote after list= at line 6, and it covers the whole "while" loop after that. The reason is unclear to me, but I got this usage by just googling it. It is said that the while loop will open up a separate shell or something like that, so when the while loop is done, whatever you concatenated in the loop will get lost, so in my initial implementation, list is still empty after the while loop.
My question is: even if the code above works fine, it looks pretty tricky to others, and what's worse, I can only make only ONE list after the loop is done. Is there a better way to fix this so that I can pull out more information from the loop? Like what if I need list2 to store other values? Thanks.

Your shell script does work. If you wanted to get two pieces of info per line insteal of one, you would have to change this line
echo $(echo ${result} | cut -f3 -d' ' | tr -d [:alpha:] | tr -d [:punct:])
to concatenate the desired values separated by a comma or any other "special" character. Then you could parse your list this way :
for haha in ${list}
do
printf "current slot is:%s, secondary info:%s \n" $(echo ${haha} | cut -f1 -d',') $(echo ${haha} | cut -f2 -d',')
done

See this explanation. As a pipe is involved, the while read... code isn't executed in your current shell, but in a subshell (A child process which can't update your current process' (environment/shell) variables).
Choose on of the listed workarounds to make the while read... loop execute in your current shell.

Related

Vlookup-like function using awk in ksh

Disclaimers:
1) English is my second language, so please forgive any gramatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
3) You will find some text in capital letters here and there. Is is of course not me "shouting" at you, but only a way to make portions of text stand out. Plase do not consider this an act of unpoliteness.
4) For those of you who get to the bottom of this novella alive, THANKS IN ADVANCE for your patience, even if you do not get to be able to/feel like help/ing me. My disclamer here would be the fact that, after surfing the site for a while, I noticed that the most common "complaint" from people willing to help seems to be lack of information (and/or the lack of quality) provided by the ones seeking for help. I then preferred to be accused of overwording if need be... It would be, at least, not a common offense...
The "Problem":
I have 2 files (a and b for simplification). File a has 7 columns separated by commas. File b has 2 columns separated by commas.
What I need: Whenever the data in the 7th column of file a matches -EXACT MATCHES ONLY- the data on the 1st column of file b, a new line, containing the whole line of file a plus column 2 of file b is to be appended into a new file "c".
--- MORE INFO IN THE NOTES AT THE BOTTOM ---
file a:
Server Name,File System,Path,File,Date,Type,ID
horror,/tmp,foldera/folder/b/folderc,binaryfile.bin,2014-01-21 22:21:59.000000,typet,aaaaaaaa
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333
hostile,/sad,folder22,higefile.hug,2016-06-17 18:43:12.000000,typeasd,77777777
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999
file b:
ID,Size
11111111,215915
22222222,1716
33333333,212856
44444444,1729
55555555,215927
66666666,1728
88888888,1729
99999999,213876
bbbbbbbb,26669080
Expected file c:
Server Name,File System,Path,File,Date,Type,ID,Size
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111,215915
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222,1716
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666,1728
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333,212856
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444,1729
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555,215927
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999,213876
Additional notes:
0) Notice how line with ID "aaaaaaaa" in file a does not make it into file c since ID "aaaaaaaa" is not present in file b. Likewise, line with ID "bbbbbbbb" in file b does not make it into file c since ID "bbbbbbbb" is not present in file a and it is therefore never looked out for in the first place.
1) Data is clearly completely made out due to confidenciality issues, though the examples provided fairly resemble what the real files look like.
2) I added headers just to provide a better idea of the nature of the data. The real files don't have it, so no need to skip them on the source file nor create it in the destination file.
3) Both files come sorted by default, meaning that IDs will be properly sorted in file b, while they will be most likely scrambled in file a. File c should preferably follow the order of file a (though I can manipulate later to fit my needs anyway, so no worries there, as long as the code does what I need and doesn't mess up with the data by combining the wrong lines).
4) VERY VERY VERY IMPORTANT:
4.a) I already have a "working" ksh code (attached below) that uses "cat", "grep", "while" and "if" to do the job. It worked like a charm (well, acceptably) with 160K-lines sample files (it was able to output 60K lines -approx- an hour, which, in projection, would yield an acceptable "20 days" to produce 30 million lines [KEEP ON READING]), but somehow (I have plenty of processor and memory capacity) cat and/or grep seem to be struggling to process a real life 5Million-lines file (both file a and b can have up to 30 million lines each, so that's the maximum probable amount of lines in the resulting file, even assuming 100% lines in file a find it's match in file b) and the c file is now only being feed with a couple hundred lines every 24 hours.
4.b) I was told that awk, being stronger, should succeed where the more weaker commands I worked with seem to fail. I was also told that working with arrays might be the solution to my performance problem, since all data is uploded to memory at once and worked from there, instead of having to cat | grep file b as many times as there are lines in file a, as I am currently doing.
4.c) I am working on AIX, so I only have sh and ksh, no bash, therefore I cannot use the array tools provided by the latter, that's why I thought of AWK, that and the fact that I think AWK is probably "stronger", though I might be (probably?) wrong.
Now, I present to you the magnificent piece of ksh code (obvious sarcasm here, though I like the idea of you picturing for a brief moment in your mind the image of the monkey holding up and showing all other jungle-crawlers their future lion king) I have managed to develop (feel free to laugh as hard as you need while reading this code, I will not be able to hear you anyway, so no feelings harmed :P ):
cat "${file_a}" | while read -r line_file_a; do
server_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $1}'`
filespace_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $2}'`
folder_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $3}'`
file_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $4}'`
file_date_file_a=`echo "${line_file_a}" | awk -F"," '{print $5}'`
file_type_file_a=`echo "${line_file_a}" | awk -F"," '{print $6}'`
file_id_file_a=`echo "${line_file_a}" | awk -F"," '{print $7}'`
cat "${file_b}" | grep ${object_id_file_a} | while read -r line_file_b; do
file_id_file_b=`echo "${line_file_b}" | awk -F"," '{print $1}'`
file_size_file_b=`echo "${line_file_b}" | awk -F"," '{print $2}'`
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}" >> ${file_c}.csv
fi
done
done
One last additional note, just in case you wonder:
The "if" section was not only built as a mean to articulate the output line, but it servers a double purpose, while safe-proofing any false positives that may derive from grep, IE 100 matching 1000 (Bear in mind that, as I mentioned earlier, I am working on AIX, so my grep does not have the -m switch the GNU one has, and I need matches to be exact/absolute).
You have reached the end. CONGRATULATIONS! You've been awarded the medal to patience.
$ cat stuff.awk
BEGIN { FS=OFS="," }
NR == FNR { a[$1] = $2; next }
$7 in a { print $0, a[$7] }
Note the order for providing the files to the awk command, b first, followed by a:
$ awk -f stuff.awk b.txt a.txt
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111,215915
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222,1716
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666,1728
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333,212856
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444,1729
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555,215927
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999,213876
EDIT: Updated calculation
You can try to predict how often you are calling another program:
At least 7 awk's + 1 cat + 1 grep for each line in file a multiplied by 2 awk's for each line in file b.
(9 * 160.000).
For file b: 2 awk's, one file open and one file close for each hit. With 60K output, that would be 4 * 60.000.
A small change in the code can change this into "only" 160.000 times a grep:
cat "${file_a}" | while IFS=, read -r server_name_file_a \
filespace_name_file_a folder_name_file_a file_name_file_a \
file_date_file_a file_type_file_a file_id_file_a; do
grep "${object_id_file_a}" "${file_b}" | while IFS="," read -r line_file_b; do
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}"
fi
done
done >> ${file_c}.csv
Well, try this with your 160K files and see how much faster it is.
Before I explain that this still is the wrong way I will make another small improvement: I will move the cat for the while loop to the end (after done).
while IFS=, read -r server_name_file_a \
filespace_name_file_a folder_name_file_a file_name_file_a \
file_date_file_a file_type_file_a file_id_file_a; do
grep "${object_id_file_a}" "${file_b}" | while IFS="," read -r line_file_b; do
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}"
fi
done
done < "${file_a}" >> ${file_c}.csv
The main drawback of the solutions is that you are reading the complete file_b again and again with your grep for each line in file a.
This solution is a nice improvement in the performance, but still a lot overhead with grep. Another huge improvement can be found with awk.
The best solution is using awk as explained in What is "NR==FNR" in awk? and found in the answer of #jas.
It is only one system call and both files are only read once.

bash script to collect pids in array

Working on a simple bash script that I can use to ultimately tell me if a rogue process is running that we don't want - this one will ultimately be running with a different parent pid. a monitor of sorts. Where I'm having an issue is getting all the specific pids that I want into an array that I can perform some actions on. Script first:
#!/bin/bash
rmanRUNNING=`ps -ef|grep /etc/process/process.conf|egrep -v grep|wc -l`
if [ $rmanRUNNING -gt 0 ]
then
rmanPPID=( $(ps -ef|grep processname|egrep -v grep|egrep -v /etc/process/process.conf|awk '{ printf $3 }') )
for i in "${rmanPPID[#]}"
do
:
echo $i
done
fi
So, goal is to check for existence of the main process, this is the one running with the config file in it, the first variable tells me this. Next, if it's running (based on the count greather than 0) the intention is to populate an array with all the parent pids, excluding what would be determined as the main process (we don't need to analyze this one). So, in the array definition we get the list of processes, grep process name, egrep -v the grep output, also egrep -v the "main" process and then awk the parent pids then iterate through and attempt to echo each one individually (more would be done in this section, but it's not working). Unfortunately, when I output $i all of the parent pids are simply concatenated together in one long string. If I try to output a specific array item I get an empty output.
Obviously the question here is, what's wrong with my array definition that is preventing it from being declared as an array, or some other odd thing.
This is on RHEL, 6.2 on the test environment, probably 7 in production by the time this is live.
Full disclosure, I'm a monitoring engineer, not an SA - definitely not a bash scripter by nature!
Thanks in advance.
EDIT: just for clarity, an echo to screen of the PIDs is NOT the end desired output, it's just a simple way to test that I'm getting back what I'm expecting. Based on comment below I believe pgrep type output is the preferred output. In the end I'll be tying these pids back one at a time against the original process to ensure that it is the parent, and if it is not I'll spit out an error.
It's not so much $i that will be one concatenated number, as well as that your array is just a single element of that concatenated number. This is because the output of awk is concatenated together, without any separator.
If you simply add a space within awk, you may get what you want:
rmanPPID=( $(ps -ef|grep processname | ... | awk '{ printf "%d ", $3 }') )
or even simpler, use print instead of printf:
rmanPPID=( $(ps -ef|grep processname | ... | awk '{ print $3 }') )
(Thanks to Jonathan Leffler, see comment below.)

Having issues using IFS to cut a string into an array. BASH

I have tried everything I can think of to cut this into separate elements for my array but I am struggling..
Here is what I am trying to do..
(This command just rips out the IP addresses on the first element returned )
$ IFS=$"\n"
$ aaa=( $(netstat -nr | grep -v '^0.0.0.0' | grep -v 'eth' | grep "UGH" | sed 's/ .*//') )
$ echo "${#aaa[#]}"
1
$ echo "${aaa[0]}"
4.4.4.4
5.5.5.5
This shows more than one value when I am looking for the array to separate 4.4.4.4 into ${aaa[0]} and 5.5.5.5 into ${aaa[1]}
I have tried:
IFS="\n"
IFS=$"\n"
IFS=" "
Very confused as I have been working with arrays a lot recently and have never ran into this particular issue.
Can someone tell me what I am doing wrong?
There is a very good example on how to use IFS + read -a to split a string into an array on this other stackoverflow page
How does splitting string to array by 'read' with IFS word separator in bash generated extra space element?
netstat is deprecated, replaced by ss, so I'm not sure how to reproduce your exact problem

bash - variable storing multiple lines of file

This is my code:
grep $to_check $forbidden >${dir}variants_of_interest;
cat ${dir}variants_of_interest | (while read line; do
#process ${line} and echo result
done;
)
Thank to grep I get lines of data that I then process separately in loop. I would like to use variable instead of using file variants_of_interest.
Reason for this is that I am afraid that writing to file thousands of time (and consequently reading from it) rapidly slows down computation, so I am hoping that avoiding writing to file could help. What do you think?
I have to do thousands of grep commands and variants_of_interest contains up to 10 lines only.
Thanks for your suggestions.
You can just make grep pass its output directly to the loop:
grep "$to_check" "$forbidden" | while read line; do
#process "${line}" and echo result
done
I removed the explicit subshell in your example, since it is already in a separate one due to the piping. Also don't forget to quote the $line variable to prevent whitespace expansion on use.
You dont have to write a file. Simply iterate over the result of grep:
grep $to_check $forbidden | (while read line; do
#process ${line} and echo result
done;
)
This might work for you:
OIFS="$IFS"; IFS=$'\n'; lines=($(grep $to_check $forbidden)); IFS="$OIFS"
for line in "${lines[#]}"; do echo $(process ${line}); done
The first line places the results of the grep into the variable array lines.
The second line processes the array lines placing each line into the variable line

Bash read file to an array based on two delimiters

I have a file that I need to parse into an array but I really only want a brief portion of each line and for only the first 84 lines.
Sometimes the line maybe:
>MT gi...
And I would just want the MT to be entered into the array. Other times it might be something like this:
>GL000207.1 dn...
and I would need the GL000207.1
I was thinking that you might be able to set two delimiters (one being the '>' and the other being the ' ' whitespace) but I am not sure how you would go about it. I have read other peoples posts about the internal field separator but I am really not sure of how that would work. I would think perhaps something like this might work though?
desiredArray=$(echo file.whatever | tr ">" " ")
for x in $desiredArray
do
echo > $x
done
Any suggestions?
How about:
head -84 <file> | awk '{print $1}' | tr -d '>'
head takes only the first lines of the file, awk strips off the first space and everything after it, and tr gets rid of the '>'.
You can also do it with sed:
head -n 84 <file> | sed 's/>\([^ ]*\).*/\1/'

Resources