I wrote a bash script that reads a file from stdin $1, and needs to read that file line by line within a loop, and based on a condition statement in each iteration, each line tested from the file will feed into one of two new arrays lets say named GOOD array and BAD array. Lastly, I'll display the total elements of each array.
#!/bin/bash
for x in $(cat $1); do
#testing something on x
if [ $? -eq 0 ]; then
#add the current value of x into array called GOOD
else
#add the current value of x into array called BAD
fi
done
echo "Total GOOD elements: ${#GOOD[#]}"
echo "Total BAD elements: ${#BAD[#]}"
What changes should i make to accomplish it?
#!/usr/bin/env bash
# here, we're checking the number of lines more than 5 characters long
# replace with your real test
testMyLine() { (( ${#1} > 5 )); }
good=( ); bad=( )
while IFS= read -r line; do
if testMyLine "$line"; then
good+=( "$line" )
else
bad+=( "$line" )
fi
done <"$1"
echo "Read ${#good[#]} good and ${#bad[#]} bad lines"
Note:
We're using a while read loop to iterate over file contents. This doesn't need to read more than one line into memory at a time (so it won't run out of RAM even with really big files), and it doesn't have unwanted side effects like changing a line containing * to a list of files in the current directory.
We aren't using $?. if foo; then is a much better way to branch on the exit status of foo than foo; if [ $? = 0 ]; then -- in particular, this avoids depending on the value of $? not being changed between when you assign it and when you need it; and it marks foo as "checked", to avoid exiting via set -e or triggering an ERR trap when your boolean returns false.
The use of lower-case variable names is intentional. All-uppercase names are used for shell-builtin variables and names with special meaning to the operating system -- and since defining a regular shell variable overwrites any environment variable with the same name, this convention applies to both types. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html
I'm trying to take the input from the user into an array but Shell is accepting user input separated by spaced. Is there any way to accept the user input given separately in each line. My code below:
#!/bin/bash
echo "enter the servers names..."
read -a array
for i in "${array[#]}"
do
echo $i
done
exit 0
Input:
hello world
I want to the input to be taken as below (in two different lines):
hello
world
Kindly help. Thanks.
You can specify the delimiter of read to stop reading, rather than newline. (read man page)
-d delim continue until the first character of DELIM is read, rather
than newline
For example:
read -d':' -a array
If you want there is no delimiter to input, you can use a loop to read into the elements of the array, then check whether the input is null string or not.
i=0
read "array[$i]"
while [ "${array[$i]}" ];
do
let "i = i+1"
read "array[$i]"
done
So the input will be
hello
world
# > there is one more empty line
According to help read:
Reads a single line from the standard input, or from file descriptor FD...
This loop would do instead:
echo "enter the servers names..."
i=0
until false; do
read server
[ -z "$server" ] && break
let i=i+1
array[$i]="$server"
done
for i in "${array[#]}"; do
echo $i
done
exit 0
The loop will exit on an empty line or EOF (ctrl-D).
Example session terminated by empty line:
#server:/tmp$ ./test.sh
enter the servers names...
se1
se2
se3
se1
se2
se3
Example session terminated by EOF on the empty line after se2:
#server:/tmp$ ./test.sh
enter the servers names...
se1
se2
se1
se2
Please note that I check for an empty string while reading the names; but it is possible to check for empty strings (or whatever else) in any loop, for example while printing them or doing other computations.
#!/bin/bash
echo "enter the servers names..."
read -a array -d '\n'
for i in "${array[#]}"
do
echo $i
done
exit 0
or
#!/bin/bash
echo "enter the servers names..."
readarray array
for i in "${array[#]}"
do
echo $i
done
exit 0
i have a a files containing usernames and users sent count mail per line . for example (dont know how many line have ) :
info.txt >
500 example1
40 example2
20 example3
....
..
.
if the number was greater than X , i want to run commands containing the user name and act on user .
getArray() {
users=() # Create array
while IFS= read -r line # Read a line
do
users+=("$line") # Append line to the array
done < "$1"
}
getArray "/root/.myscripts/spam1/info.txt"
# i know this part is incorrect and need help here :
if [ "${users[1$]}" -gt "50" ]
then
echo "${users[2$] has sent ${users[1$]} emails"
fi
please Help
Thanks
Not knowing how many lines of input you have is no reason to use an array. Indeed, it is generally more useful if you assume your input is infinite (an input stream), so reading into an array is impossible. Just read each line and take action if necessary:
#!/bin/sh
while read -r count user; do
if test "$count" -gt 50; then
echo "$user has sent $count emails"
fi
done < /root/.myscripts/spam1/info.txt
ahhh array and loops my weakest links. I was trying to create array depending on user input so
printf "%s\n" "how may array you want"
read value
after this i will ask what value user want to put on a array(this is the bit im stuck on)
i=1
while [ $i -le $value ]
do
echo "what value you want to put in array $i"
read number
echo $number >> array.db
i=$(( i+1 ))
echo
done
although this method works(i think) but i'm not too sure if i'm actually creating an array and putting value to that array.
you can expand arrays in bash dynamically. you can use this snippet
a=(); a[${#a[#]}]=${number}; echo ${a[#]}
The first statement defines an empty array. with the second (which you can use in your while loop) you insert a value at last elment position + 1, due to ${#a[#]} represents the length of a. the third statement just prints all elements in the array.
I was doing an exercise on reading from a setup file in which every line specifies two words and a number. The number denotes the number of words in between the two words specified. Another file – input.txt – has a block of text, and the program attempts to count the number of occurrences in the input file which follows the constraints in each line in the setup file (i.e., two particular words a and b should be separated by n words, where a, b and n are specified in the setup file.
So I've tried to do this as a shell script, but my implementation is probably highly inefficient. I used an array to store the words from the setup file, and then did a linear search on the text file to find out the words, and the works. Here's a bit of the code, if it helps:
#!/bin/sh
j=0
count=0;
m=0;
flag=0;
error=0;
while read line; do
line=($line);
a[j]=${line[0]}
b[j]=${line[1]}
num=${line[2]}
c[j]=`expr $num + 0`
j=`expr $j + 1`
done <input2.txt
while read line2; do
line2=($line2)
for (( i=0; $i<=50; i++ )); do
for (( m=0; $m<j; m++)); do
g=`expr $i + ${c[m]}`
g=`expr $g + 1`
if [ "${line2[i]}" == "${a[m]}" ] ; then
for (( k=$i; $k<$g; k++)); do
if [[ "${line2[k]}" == *.* ]]; then
flag=1
break
fi
done
if [ "${b[m]}" == "${line2[g]}" ] ; then
if [ "$flag" == 1 ] ; then
error=`expr $error + 1`
fi
count=`expr $count + 1`
fi
flag=0
fi
if [ "${line2[i]}" == "${b[m]}" ] ; then
for (( k=$i; $k<$g; k++)); do
if [[ "${line2[k]}" == *.* ]]; then
flag=1
break
fi
done
if [ "${a[m]}" == "${line2[g]}" ] ; then
if [ "$flag" == 1 ] ; then
error=`expr $error + 1`
fi
count=`expr $count + 1`
fi
flag=0
fi
done
done
done <input.txt
count=`expr $count - $error`
echo "| Count = $count |"
As you can see, this takes a lot of time.
I was thinking of a more efficient way to implement this, in C or C++, this time. What could be a possible alternative implementation of this, efficiency considered? I thought of hash tables, but could there be a better way?
I'd like to hear what everyone has to say on this.
Here's a fully working possibility. It is not 100% pure bash since it uses (GNU) sed: I'm using sed to lowercase everything and to get rid of punctuation marks. Maybe you won't need this. Adapt to your needs.
#!/bin/bash
input=input.txt
setup=setup.txt
# The Check function
Check() {
# $1 is word1
# $2 is word2
# $3 is number of words between word1 and word2
nb=0
# Get all positions of w1
IFS=, read -a q <<< "${positions[$1]}"
# Check, for each position, if word2 is at distance $3 from word1
for i in "${q[#]}"; do
[[ ${words[$i+$3+1]} = $2 ]] && ((++nb))
done
echo "$nb"
}
# Slurp input file in an array
words=( $(sed 's/[,.:!?]//g;s/\(.*\)/\L\1/' -- "$input") )
# For each word, specify its positions in file
declare -A positions
pos=0
for i in "${words[#]}"; do
positions[$i]+=$((pos++)),
done
# Do it!
while read w1 w2 p; do
# Check that w1 w2 are not empty
[[ -n $w2 ]] || continue
# Check that p is a number
[[ $p =~ ^[[:digit:]]+$ ]] || continue
n=$(Check "$w1" "$w2" "$p")
[[ $w1 != $w2 ]] && (( n += $(Check "$w2" "$w1" "$p") ))
echo "$w1 $w2 $p: $n"
done < <(sed 's/\(.*\)/\L\1/' -- "$setup")
How does it work:
we first read the whole file input.txt in the array words: a word per field. Observe I'm using sed here to delete all punctuation marks (well, only ,, ., :, !, ?, for testing purposes, add some more if you wish) and to lowercase every letter.
Loop through the array words and for each word, put its position in an associative array positions:
w => "position1,position2,...,positionk,"
Finally, we read the setup.txt file (filtered through sed again to lowercase everything – optional see below). Do a quick check whether the line is valid (2 words and a number) and then call the Check function (twice, for each permutation of the given words, unless both words are equal).
The Check function finds all positions of word1 in file, thanks to associative array positions and then using the array words, check whether word2 is at the given "distance" from word1.
The second sed is optional. I've filtered the setup.txt file through sed to lowercase everything. This sed will leave only very little overhead, so, efficiency-wise, it's not a big deal. You'll be able to add more filtering later to make sure the data is consistent with how the script will use it (e.g., get rid of punctuation marks). Otherwise you could:
Get rid of it altogether: replace the corresponding line (the last line) with just
done < "$setup"
In this case, you'll have to trust the guy/gal who will write the setup.txt file.
Get rid of it as above, but still want to convert everything to lowercase. In this case, below the
while read w1 w2 p; do
line, just add these lines:
w1=${w1,,}
w2=${w2,,}
That's the bash way to lowercase a string.
Caveats. The script will break if:
The number given in setup.txt file starts with a 0 and contains an 8 or a 9. This is because bash will consider it's an octal number, where 8's and 9's are not valid. There are workarounds for this.
The text in input.txt doesn't follow proper typographical practices: a punctuation mark is always followed by a space. E.g., if the input file contains
The quick,brown,dog jumps over the lazy fox
then after the sed treatment the text will look like
The quickbrowndog jumps over the lazy fox
and the words quick, brown and dog won't be treated properly. You can replace the sed substitution s/[,:!?]//g with s/[,:!?]/ /g to convert these symbols with a space. It's up to you, but in that case, abbreviations as, e.g., e.g. and i.e. might not be considered properly… it now really depends what you need to do.
Different character encodings are used… I don't really know how robust you need the script to be, and what languages and encodings you'll consider.
(Add stuff here :).)
About efficiency. I'd say the algorithm is rather efficient. bash is probably not the best suited language for that, but it's a lot of fun, and not that difficult after all if we look at it (less than 20 lines of relevant code, and even less than that!). If you only have 50 files with 50000 words, it's ok, you will not notice too much difference between bash and perl/python/awk/C/you-name-it: bash performs decently quickly for files of this type. Now if you have 100000 files each containing millions of words, well, a different approach should be taken and a different language should be used (but I don't know which one).
If:
it can get complex for the sake of efficiency
the text file can be large
the setup file can have many rows
then I would do it the following way:
As preparation I would create:
A hash map with the index of the word as key and the word as the value (named -say- WORDS). So WORDS[1] would be the first word, WORDS[2] the second, and so on.
A hashmap with the words as keys and the list of indexes as values (named -say- INDEXES). So if WORDS[2] and WORDS[5] is "dog" and none other, than INDEXES["dog"] would yield the numers 2 and 5. The value can be a dynamic indexed array or a linked list. Linked list is better if there are words that occur many times.
You can read the text file, and populate both structures at the same time.
Processing:
For each row of the setup file I would get the indexes in INDEXES[firstword] and check if WORDS[index + wordsinbetween + 1] equals with secondword. If it does, that's a hit.
Notes:
Preparation: You only read the text file once. For each word in the text file, you're doing fast operations thats' performance is not really effected by the amount of words already processed.
Processing: You only read the setup file once. For each row you're here too doing operations that are only effected by the number of occurences of firstword in the text file.