capturing a file into array and using that array in while loop - arrays

Hi I have to capture a file into an array and then pass that array into the while loop.
I just don't want to execute my script with below while loop because it is taking long time...
while read line; do
some actions...
done < file.txt
My server has 8 GB ram and out of which 6 GB is always available. So please let me know weather it is a good idea to capture the file of size 100 MB into memory (array) and do operations like grep,sed,awk and etc on it.
If So please let me know how to capture file into array.
If not kindly suggest me another way to increase performance.

I'm not sure to understand...
Do you need something like this ?
array=()
# Read the file in parameter and fill the array named "array"
getArray() {
i=0
while read line
do
array[i]=$line
i=$(($i + 1))
done < $1
}
getArray "file.txt"
for line in "${array[#]}"
do
# some actions using $line
done
EDIT :
To answer your question, yes it's possible to grep data into an array and push its into another. There is probably a better way to do it, but this works :
array2=()
# Split the string in parameter and push the values into the array
pushIntoArray() {
i=0
for element in $1
do
array2[i]=$element
i=$(($i + 1))
done
}
array1=("foo" "bar" "baz")
# Build a string of the elements into the array separated by '\n' and redirect the ouput to grep.
str=`printf "%s\n" "${array1[#]}" | grep "a"`
pushIntoArray "$str"
printf "%s\n" "${array2[#]}" # Display array2 line by line
Output of this snippet:
$ ./grep_array.sh
bar
baz

You want a while loop. Try this:
while read d; do
echo $d
done < dinosaurs.txt
Enjoy

Related

How does bash array slicing work when start index is not provided?

I'm looking at a script, and I'm having trouble determining what is going on.
Here is an example:
# Command to get the last 4 occurrences of a pattern in a file
lsCommand="ls /my/directory | grep -i my_pattern | tail -4"
# Load the results of that command into an array
dirArray=($(echo $(eval $lsCommand) | tr ' ' '\n'))
# What is this doing?
yesterdaysFileArray=($(echo ${x[#]::$((${#x[#]} / 2))} | tr ' ' '\n'))
There is a lot going on here. I understand how arrays work, but I don't know how $x is getting referenced if it was never declared.
I see that the $((${#x[#]} / 2}} is taking the number of elements and dividing it in half, and the tr is used to create the array. But what else is going on?
I think the last line is an array slice pattern in bash of form ${array[#]:1:2}, where array[#] returns the contents of the array, :1:2 takes a slice of length 2, starting at index 1.
So for your case though you are taking the start index empty because you haven't specified any and length as half the count of array.
But there is a lot better way to do this in bash as below. Don't use eval and use the built-in globbing support from the shell itself
cd /my/directory
fileArray=()
for file in *my_pattern*; do
[[ -f "$file" ]] || { printf '%s\n' 'no file found'; return 1; }
fileArray+=( "$file" )
done
and do
printf '%s\n' "${fileArray[#]::${#fileArray[#]}/2}"

Finding elements in common between two ksh or bash arrays efficiently

I am writing a Korn shell script. I have two arrays (say, arr1 and arr2), both containing strings, and I need to check which elements from arr1 are present (as whole strings or substrings) in arr2. The most intuitive solution is having nested for loops, and checking if each element from arr1 can be found in arr2 (through grep) like this:
for arr1Element in ${arr1[*]}; do
for arr2Element in ${arr2[*]}; do
# using grep to check if arr1Element is present in arr2Element
echo $arr2Element | grep $arr1Element
done
done
The issue is that arr2 has around 3000 elements, so running a nested loop takes a long time. I am wondering if there is a better way to do this in Bash.
If I were doing this in Java, I could have calculated hashes for elements in one of the arrays, and then looked for those hashes in the other array, but I don't think Bash has any functionality for doing something like this (unless I was willing to write a hash calculating function in Bash).
Any suggestions?
Since version 4.0 Bash has associative arrays:
$ declare -A elements
$ elements[hello]=world
$ echo ${elements[hello]}
world
You can use this in the same way you would a Java Map.
declare -A map
for el in "${arr1[#]}"; do
map[$el]="x"
done
for el in "${arr2[#]}"; do
if [ -n "${map[$el]}" ] ; then
echo "${el}"
fi
done
Dealing with substrings is an altogether more weighty problem, and would be a challenge in any language, short of the brute-force algorithm you're already using. You could build a binary-tree index of character sequences, but I wouldn't try that in Bash!
BashFAQ #36 describes doing set arithmetic (unions, disjoint sets, etc) in bash with comm.
Assuming your values can't contain literal newlines, the following will emit a line per item in both arr1 and arr2:
comm -12 <(printf '%s\n' "${arr1[#]}" | sort -u) \
<(printf '%s\n' "${arr2[#]}" | sort -u)
If your arrays are pre-sorted, you can remove the sorts (which will make this extremely memory- and time-efficient with large arrays, moreso than the grep-based approach).
Since you're OK with using grep, and since you want to match substrings as well as full strings, one approach is to write:
printf '%s\n' "${arr2[#]}" \
| grep -o -F "$(printf '%s\n' "${arr1[#]}")
and let grep optimize as it sees fit.
Here's a bash/awk idea:
# some sample arrays
$ arr1=( my first string "hello wolrd")
$ arr2=( my last stringbean strings "well, hello world!)
# break array elements into separate lines
$ printf '%s\n' "${arr1[#]}"
my
first
string
hello world
$ printf '%s\n' "${arr2[#]}"
my
last
stringbean
strings
well, hello world!
# use the 'printf' command output as input to our awk command
$ awk '
NR==FNR { a[NR]=$0 ; next }
{ for (i in a)
if ($0 ~ a[i]) print "array1 string {"a[i]"} is a substring of array2 string {"$0"}" }
' <( printf '%s\n' "${arr1[#]}" ) \
<( printf '%s\n' "${arr2[#]}" )
array1 string {my} is a substring of array2 string {my}
array1 string {string} is a substring of array2 string {stringbean}
array1 string {string} is a substring of array2 string {strings}
array1 string {hello world} is a substring of array2 string {well, hello world!}
NR==FNR : for file #1 only: store elements into awk array named 'a'
next : process next line in file #1; at this point rest of awk script is ignored for file #1; the for each line in file #2 ...
for (i in a) : for each index 'i' in array 'a' ...
if ($0 ~ a[i] ) : see if a[i] is a substring of the current line ($0) from file #2 and if so ...
print "array1... : output info about the match
A test run using the following data:
arr1 == 3300 elements
arr2 == 500 elements
When all arr2 elements have a substring/pattern match in arr1 (ie, 500 matches), total time to run is ~27 seconds ... so the repetitive looping through the array takes a toll.
Obviously (?) need to reduce the volume of repetitive actions ...
for an exact string match the comm solution by Charles Duffy makes sense (it runs against the same 3300/500 test set in about 0.5 seconds)
for a substring/pattern match I was able to get a egrep solution to run in about 5 seconds (see my other answer/post)
An egrep solution for substring/pattern matching ...
egrep -f <(printf '.*%s.*\n' "${arr1[#]}") \
<(printf '%s\n' "${arr2[#]}")
egrep -f : take patterns to search from the file designated by the -f, which in this case is ...
<(printf '.*%s.*\n' "${arr1[#]}") : convert arr1 elements into 1 pattern per line, appending a regex wild card character (.*) for prefix and suffix
<(printf '%s\n' "${arr2[#]}") : convert arr2 elements into 1 string per line
When run against a sample data set like:
arr1 == 3300 elements
arr2 == 500 elements
... with 500 matches, total run time is ~5 seconds; there's still a good bit of repetitive processing going on with egrep but not as bad as seen with my other answer (bash/awk) ... and of course not as fast the comm solution which eliminates the repetitive processing.

Shell Array Cleared for Unknown Reason [duplicate]

This question already has answers here:
A variable modified inside a while loop is not remembered
(8 answers)
Closed 6 years ago.
I have a pretty simple sh script where I make a system cat call, collect the results and parse some relevant information before storing the information in an array, which seems to work just fine. But as soon as I exit the for loop where I store the information, the array seems to clear itself. I'm wondering if I am accessing the array incorrectly outside of the for loop. Relevant portion of my script:
#!/bin/sh
declare -a QSPI_ARRAY=()
cat /proc/mtd | while read mtd_instance
do
# split result into individiual words
words=($mtd_instance)
for word in "${words[#]}"
do
# check for uboot
if [[ $word == *"uboot"* ]]
then
mtd_num=${words[0]}
index=${mtd_num//[!0-9]/} # strip everything except the integers
QSPI_ARRAY[$index]="uboot"
echo "QSPI_ARRAY[] at index $index: ${QSPI_ARRAY[$index]}"
elif [[ $word == *"fpga_a"* ]]
then
echo "found it: "$word""
mtd_num=${words[0]}
index=${mtd_num//[!0-9]/} # strip everything except the integers
QSPI_ARRAY[$index]="fpga_a"
echo "QSPI_ARRAY[] at index $index: ${QSPI_ARRAY[$index]}"
# other items are added to the array, all successfully
fi
done
echo "length of array: ${#QSPI_ARRAY[#]}"
echo "----------------------"
done
My output is great until I exit the for loop. While within the for loop, the array size increments and I can check that the item has been added. After the for loop is complete I check the array like so:
echo "RESULTING ARRAY:"
echo "length of array: ${#QSPI_ARRAY[#]}"
for qspi in "${QSPI_ARRAY}"
do
echo "qspi instance: $qspi"
done
Here are my results, echod to my display:
dev: size erasesize name
length of array: 0
-------------
mtd0: 00100000 00001000 "qspi-fsbl-uboot"
QSPI_ARRAY[] at index 0: uboot
length of array: 1
-------------
mtd1: 00500000 00001000 "qspi-fpga_a"
QSPI_ARRAY[] at index 1: fpga_a
length of array: 2
-------------
RESULTING ARRAY:
length of array: 0
qspi instance:
EDIT: After some debugging, it seems I have two different arrays here somehow. I initialized the array like so: QSPI_ARRAY=("a" "b" "c" "d" "e" "f" "g"), and after my for-loop for parsing the array it is still a, b, c, etc. How do I have two different arrays of the same name here?
This structure:
cat /proc/mtd | while read mtd_instance
do
...
done
Means that whatever comes between do and done cannot have any effects inside the shell environment that are still there after the done.
The fact that the while loop is on the right hand side of a pipe (|) means that it runs in a subshell. Once the loop exits, so does the subshell. And all of its variable settings.
If you want a while loop which makes changes that stick around, don't use a pipe. Input redirection doesn't create a subshell, and in this case, you can just read from the file directly:
while read mtd_instance
do
...
done </proc/mtd
If you had a more complicated command than a cat, you might need to use process substitution. Still using cat as an example, that looks like this:
while read mtd_instance
do
...
done < <(cat /proc/mtd)
In the specific case of your example code, I think you could simplify it somewhat, perhaps like this:
#!/usr/bin/env bash
QSPI_ARRAY=()
while read -a words; do␣
declare -i mtd_num=${words[0]//[!0-9]/}
for word in "${words[#]}"; do
for type in uboot fpga_a; do
if [[ $word == *$type* ]]; then
QSPI_ARRAY[mtd_num]=$type
break 2
fi
done
done
done </proc/mtd
Is this potentially what you are seeing:
http://mywiki.wooledge.org/BashFAQ/024

Bash: awk output to array

Im trying to put the contents of a awk command in to a bash array however im having a bit of trouble.
>>test.sh
f_checkuser() {
_l="/etc/login.defs"
_p="/etc/passwd"
## get mini UID limit ##
l=$(grep "^UID_MIN" $_l)
## get max UID limit ##
l1=$(grep "^UID_MAX" $_l)
awk -F':' -v "min=${l##UID_MIN}" -v "max=${l1##UID_MAX}" '{ if ( $3 >= min && $3 <= max && $7 != "/sbin/nologin" ) print $0 }' "$_p"
}
...
Used files:
Sample File: /etc/login.defs
>>/etc/login.defs
### Min/max values for automatic uid selection in useradd
UID_MIN 1000
UID_MAX 60000
Sample File: /etc/passwd
>>/etc/passwd
root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
admin:x:1000:1000:Administrator,,,:/home/admin:/bin/bash
daniel:x:1001:1001:Daniel,,,:/home/daniel:/bin/bash
The output looks like:
admin:x:1000:1000:Administrator,,,:/home/admin:/bin/bash
daniel:x:1001:1001:User,,,:/home/user:/bin/bash
respectively (awk ... print $1 }' "$_p")
admin
daniel
Now my problem is to save the awk output in an Array to use it as variable.
>>test.sh
...
f_checkuser
echo "Array items and indexes:"
for index in ${!LOKAL_USERS[*]}
do
printf "%4d: %s\n" $index ${array[$index]}
done
It could/should look like this example.
Array items and indexes:
0: admin
1: daniel
Specially i would become all Users of a System (not root,bin,sys,ssh,...) without blocked users in an array.
Perhaps someone has another idea to solve my Problem?
Are you trying to set the output of one script to an array? There is a bash has a way of doing this. For example,
a=( $(seq 1 10) ); echo ${a[1]}
will populate the array a with elements 1 to 10 and will print 2, the second line generated by seq (array index starts at zero). Simply replace the contents of $(...) with your script.
For those coming to this years later ...
bash 4 introduced readarray (aka mapfile) exactly for this purpose.
See also Bash capturing output of awk into array
One solution that works:
array=()
f_checkuser(){
...
...
tempfile="localuser.tmp"
touch ${tempfile}
awk -F':'...'{... print $1 }' "$_p" > ${HOME}/${tempfile}
getArrayfromFile "${tempfile}"
}
getArrayfromFile() {
i=0
while read line # Read a line
do
array[i]=$line # Put it into the array
i=$(($i + 1))
done < $1
}
f_checkuser
echo "Array items and indexes:"
for index in ${!array[*]}
do
printf "%4d: %s\n" $index ${array[$index]}
done
Output:
Array items and indexes:
0: daniel
1: admin
But I like more to observe without a new temp-file.
So, have someone any another idea without a temp-file?

how do I output the contents of a while read line loop to multiple arrays in bash?

I read the files of a directory and put each file name into an array (SEARCH)
Then I use a loop to go through each file name in the array (SEARCH) and open them up with a while read line loop and read each line into another array (filecount). My problem is its one huge array with 39 lines (each file has 13 lines) and I need it to be 3 seperate arrays, where
filecount1[line1] is the first line from the 1st file and so on. here is my code so far...
typeset -A files
for file in ${SEARCH[#]}; do
while read line; do
files["$file"]+="$line"
done < "$file"
done
So, Thanks Ivan for this example! However I'm not sure I follow how this puts it into a seperate array because with this example wouldnt all the arrays still be named "files"?
If you're just trying to store the file contents into an array:
declare -A contents
for file in "${!SEARCH[#]}"; do
contents["$file"]=$(< $file)
done
If you want to store the individual lines in a array, you can create a pseudo-multi-dimensional array:
declare -A contents
for file in "${!SEARCH[#]}"; do
NR=1
while read -r line; do
contents["$file,$NR"]=$line
(( NR++ ))
done < "$file"
done
for key in "${!contents[#]}"; do
printf "%s\t%s\n" "$key" "${contents["$key"]}"
done
line 6 is
$filecount[$linenum]}="$line"
Seems it is missing a {, right after the $.
Should be:
${filecount[$linenum]}="$line"
If the above is true, then it is trying to run the output as a command.
Line 6 is (after "fixing" it above):
${filecount[$linenum]}="$line"
However ${filecount[$linenum]} is a value and you can't have an assignment on a value.
Should be:
filecount[$linenum]="$line"
Now I'm confused, as in whether the { is actually missing, or } is the actual typo :S :P
btw, bash supports this syntax too
filecount=$((filecount++)) # no need for $ inside ((..)) and use of increment operator ++
This should work:
typeset -A files
for file in ${SEARCH[#]}; do # foreach file
while read line; do # read each line
files["$file"]+="$line" # and place it in a new array
done < "$file" # reading each line from the current file
done
a small test shows it works
# set up
mkdir -p /tmp/test && cd $_
echo "abc" > a
echo "foo" > b
echo "bar" > c
# read files into arrays
typeset -A files
for file in *; do
while read line; do
files["$file"]+="$line"
done < "$file"
done
# print arrays
for file in *; do
echo ${files["$file"]}
done
# same as:
echo ${files[a]} # prints: abc
echo ${files[b]} # prints: foo
echo ${files[c]} # prints: bar

Resources