Read a file's content (with spaces) into an array

Read a file's content (with spaces) into an array - arrays

I'm creating a Shell Script, and I have a file like this called expressions.txt:
"Ploink Poink"
"I Need Oil"
"Some Bytes are Missing!"
"Poink Poink"
"Piiiip Beeeep!!"
"Whoops! I'm out of memory!"
"1 + 1 = 3"
"Please fix my bugs!"
"Goeiedag!"
"Hallo!"
"Guten Tag!"
"Hyvää Päivää!"
"Добрый день"
"!สวัสดี"
"Bonjour!"
"!مرحبا"
"!שלום"
"Γειά!"
I have this code in robot.sh:
expressions=( $(cat expressions.txt) )
# Get random expression...
selectedexpression=${expressions[$RANDOM % ${#expressions[#]}]}
# Write to Shell
echo $selectedexpression
However, this splits the file's content by spaces, not by quotes, so the output could be something like "Ploink, Need or fix. But I want the complete sentences. Is there a way to do this? Thanks
Oh, the shell I use is #!/bin/sh.

This will work with bash, possibly with sh:
OLDIFS="$IFS"
IFS=$'\n'
expressions=()
while read line
do
expressions=("${expressions[#]}" "$line")
done < expressions.txt
IFS="$OLDIFS"

use (g)awk, or nawk on Solaris
awk 'BEGIN{srand() }
{
lines[++d]=$0
}END{
RANDOM = int(1 + rand() * d)
print lines[RANDOM]
}' expression.txt

Related

How to write a shell script which accepts array in the command line and outputs the sum of it [duplicate]

This question already has answers here:
How to initialize a bash array with output piped from another command? [duplicate]
(3 answers)
Closed 6 years ago.
I am new to arrays in Bash scripting. I need to write a script which accepts an array from standard input on the command line. and outputs the sum of it to the user.
Here is the logic, but how can I convert it into shell script to be used in command line?
read -a array
tot=0
for i in ${array[#]}; do
let tot+=$i
done
echo "Total: $tot"
Any help is appreciated.

You're close! Try this instead:
IFS=$'\n' read -d '' -r -a array
total=0
for i in "${array[#]}"; do
((total += i))
done
When you're reading $array from stdin with read -a, you're only getting the first line.
IFS=$'\n' changes Bash's Internal Field Separator to the newline symbol (\n) so that each line is seen as a separate field, rather than looking for tokens separated by whitespace. -d '' makes read not stop reading at the end of each line. ((total += i)) is a shorter/cleaner way to do math.
Here's it running:
$ seq 1 10 | ./test.sh
Total: 55

#!/bin/bash
calcArray() {
local total=0
for i ;do
let total+=$i
done
echo "${total}"
}
From your terminal do this source scriptName .
calcArray 1 2 3 4 5
Don't quote the args.
In your .bashrc put source scriptName, so you can always run calcArray args , without source scriptName again

Using array inside awk in shell script

I am very new to Unix shell script and trying to get some knowledge in shell scripting. Please check my requirement and my approach.
I have a input file having data
ABC = A:3 E:3 PS:6
PQR = B:5 S:5 AS:2 N:2
I am trying to parse the data and get the result as
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
The values can be added horizontally and vertically so I am trying to use an array. I am trying something like this:
myarr=(main.conf | awk -F"=" 'NR!=1 {print $1}'))
echo ${myarr[1]}
# Or loop through every element in the array
for i in "${myarr[#]}"
do
:
echo $i
done
or
awk -F"=" 'NR!=1 {
print $1"\n"
STR=$2
IFS=':' read -r -a array <<< "$STR"
for i in "${!array[#]}"
do
echo "$i=>${array[i]}"
done
}' main.conf
But when I add this code to a .sh file and try to run it, I get syntax errors as
$ awk -F"=" 'NR!=1 {
> print $1"\n"
> STR=$2
> FS= read -r -a array <<< "$STR"
> for i in "${!array[#]}"
> do
> echo "$i=>${array[i]}"
> done
>
> }' main.conf
awk: cmd. line:4: FS= read -r -a array <<< "$STR"
awk: cmd. line:4: ^ syntax error
awk: cmd. line:5: for i in "${!array[#]}"
awk: cmd. line:5: ^ syntax error
awk: cmd. line:8: done
awk: cmd. line:8: ^ syntax error
How can I complete the above expectations?

This is the awk code to do what you want:
$ cat tst.awk
BEGIN { FS="[ =:]+"; OFS="=" }
{
print $1
for (i=2;i<NF;i+=2) {
print $i, $(i+1)
}
print ""
}
and this is the shell script (yes, all a shell script does to manipulate text is call awk):
$ awk -f tst.awk file
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
A UNIX shell is an environment from which to call UNIX tools (find, sort, sed, grep, awk, tr, cut, etc.). It has its own language for manipulating (e.g. creating/destroying) files and processes and sequencing calls to tools but it is NOT intended to be used to manipulate text. The guys who invented shell also invented awk for shell to call to manipulate text.
Read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice and the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

First off, a command that does what you want:
$ sed 's/ = /\n/;y/: /=\n/' main.conf
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
This replaces, on each line, the first (and only) occurrence of = with a newline (the s command), then turns all : into = and all spaces into newlines (the y command). Notice that
this works only because there is a space at the end of the first line (otherwise it would be a bit more involved to get the empty line between the blocks) and
this works only with GNU sed because it substitutes newlines; see this fantastic answer for all the details and how to get it to work with BSD sed.
As for what you tried, there is almost too much wrong with it to try and fix it piece by piece: from the wild mixing of awk and Bash to syntax errors all over the place. I recommend you read good tutorials for both, for example:
The BashGuide
Effective AWK Programming
A Bash solution
Here is a way to solve the same in Bash; I didn't use any arrays.
#!/bin/bash
# Read line by line into the 'line' variable. Setting 'IFS' to the empty string
# preserves leading and trailing whitespace; '-r' prevents interpretation of
# backslash escapes
while IFS= read -r line; do
# Three parameter expansions:
# Replace ' = ' by newline (escape backslash)
line="${line/ = /\\n}"
# Replace ':' by '='
line="${line//:/=}"
# Replace spaces by newlines (escape backslash)
line="${line// /\\n}"
# Print the modified input line; '%b' expands backslash escapes
printf "%b" "$line"
done < "$1"
Output:
$ ./SO.sh main.conf
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2

Shell to resolve output issue - issue in concatenation and file reading

I have a quick question - I am trying to resolve an issue with a serie of files where the output has been changed.
The output should look like that:
>Tests HadI-sdds1:4134:AAABBBBB:1:1101:6635:2407_2:N:0:TTTTTT
AAAABBBBBEEEECCCCERTTSFASFASFDSGFSDGGSFGFSGDFGDFGDFGDFGDFGDFGDFGDCCVBWAAAABBBBBEEEECCCCERTTSFASFASFDSGFSDGGSFGFSG
But appears as:
>Tests HadI-sdds1:4134:AAABBBBB:1:1101:6635:2407_2:N:0:TTTTTT
AAAABBBBBEEEECCCCERTTSFASFASFDSGFSDGGSFGFSGDFGDFGDFGDFGDFGDFGDFGDCCVBW
AAAABBBBBEEEECCCCERTTSFASFASFDSGFSDGGSFGFSG
I have written the following code to try to fix it, but the line 16 appears to return an empty string, however when I do the echo without putting in a var, I get the complete line.
#!/bin/sh
FILENAME=$1
OUTPUT=$2
set LineToWrite=''
while read LINE
do
if [ `echo "$LINE" | awk '{print substr($0,1,1)}'` == ">" ]
then
echo "$LineToWrite" >> $OUTPUT
echo "$LINE" >> $OUTPUT
set LineToWrite=''
else
set currLine=`echo "$LINE" | awk '{print substr($0,1,70)}'`
set LineToWrite+=$currLine
fi
done <$FILENAME
Any Idea to solve my problem? (The files contains > 1million lines)
Thanks a lot in advance!!!!

Three things :
no space allowed between keys & values in shell
use more quotes on all variables
no need to cat FILE | while : use while <condition>; do ...; done < FILE
USE MORE QUOTES! They are vital. Also, learn the difference between ' and " and `. See http://mywiki.wooledge.org/Quotes and http://wiki.bash-hackers.org/syntax/words

Bash array with spaces in elements

I'm trying to construct an array in bash of the filenames from my camera:
FILES=(2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg)
As you can see, there is a space in the middle of each filename.
I've tried wrapping each name in quotes, and escaping the space with a backslash, neither of which works.
When I try to access the array elements, it continues to treat the space as the elementdelimiter.
How can I properly capture the filenames with a space inside the name?

I think the issue might be partly with how you're accessing the elements. If I do a simple for elem in $FILES, I experience the same issue as you. However, if I access the array through its indices, like so, it works if I add the elements either numerically or with escapes:
for ((i = 0; i < ${#FILES[#]}; i++))
do
echo "${FILES[$i]}"
done
Any of these declarations of $FILES should work:
FILES=(2011-09-04\ 21.43.02.jpg
2011-09-05\ 10.23.14.jpg
2011-09-09\ 12.31.16.jpg
2011-09-11\ 08.43.12.jpg)
or
FILES=("2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg")
or
FILES[0]="2011-09-04 21.43.02.jpg"
FILES[1]="2011-09-05 10.23.14.jpg"
FILES[2]="2011-09-09 12.31.16.jpg"
FILES[3]="2011-09-11 08.43.12.jpg"

There must be something wrong with the way you access the array's items. Here's how it's done:
for elem in "${files[#]}"
...
From the bash manpage:
Any element of an array may be referenced using ${name[subscript]}. ... If subscript is # or *, the word expands to all members of name. These subscripts differ only when the word appears within double quotes. If the word is double-quoted, ${name[*]} expands to a single word with the value of each array member separated by the first character of the IFS special variable, and ${name[#]} expands each element of name to a separate word.
Of course, you should also use double quotes when accessing a single member
cp "${files[0]}" /tmp

You need to use IFS to stop space as element delimiter.
FILES=("2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg")
IFS=""
for jpg in ${FILES[*]}
do
echo "${jpg}"
done
If you want to separate on basis of . then just do IFS="."
Hope it helps you:)

I agree with others that it's likely how you're accessing the elements that is the problem. Quoting the file names in the array assignment is correct:
FILES=(
"2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg"
)
for f in "${FILES[#]}"
do
echo "$f"
done
Using double quotes around any array of the form "${FILES[#]}" splits the array into one word per array element. It doesn't do any word-splitting beyond that.
Using "${FILES[*]}" also has a special meaning, but it joins the array elements with the first character of $IFS, resulting in one word, which is probably not what you want.
Using a bare ${array[#]} or ${array[*]} subjects the result of that expansion to further word-splitting, so you'll end up with words split on spaces (and anything else in $IFS) instead of one word per array element.
Using a C-style for loop is also fine and avoids worrying about word-splitting if you're not clear on it:
for (( i = 0; i < ${#FILES[#]}; i++ ))
do
echo "${FILES[$i]}"
done

If you had your array like this:
#!/bin/bash
Unix[0]='Debian'
Unix[1]="Red Hat"
Unix[2]='Ubuntu'
Unix[3]='Suse'
for i in $(echo ${Unix[#]});
do echo $i;
done
You would get:
Debian
Red
Hat
Ubuntu
Suse
I don't know why but the loop breaks down the spaces and puts them as an individual item, even you surround it with quotes.
To get around this, instead of calling the elements in the array, you call the indexes, which takes the full string thats wrapped in quotes.
It must be wrapped in quotes!
#!/bin/bash
Unix[0]='Debian'
Unix[1]='Red Hat'
Unix[2]='Ubuntu'
Unix[3]='Suse'
for i in $(echo ${!Unix[#]});
do echo ${Unix[$i]};
done
Then you'll get:
Debian
Red Hat
Ubuntu
Suse

This was already answered above, but that answer was a bit terse and the man page excerpt is a bit cryptic. I wanted to provide a fully worked example to demonstrate how this works in practice.
If not quoted, an array just expands to strings separated by spaces, so that
for file in ${FILES[#]}; do
expands to
for file in 2011-09-04 21.43.02.jpg 2011-09-05 10.23.14.jpg 2011-09-09 12.31.16.jpg 2011-09-11 08.43.12.jpg ; do
But if you quote the expansion, bash adds double quotes around each term, so that:
for file in "${FILES[#]}"; do
expands to
for file in "2011-09-04 21.43.02.jpg" "2011-09-05 10.23.14.jpg" "2011-09-09 12.31.16.jpg" "2011-09-11 08.43.12.jpg" ; do
The simple rule of thumb is to always use [#] instead of [*] and quote array expansions if you want spaces preserved.
To elaborate on this a little further, the man page in the other answer is explaining that if unquoted, $* an $# behave the same way, but they are different when quoted. So, given
array=(a b c)
Then $* and $# both expand to
a b c
and "$*" expands to
"a b c"
and "$#" expands to
"a" "b" "c"

Not exactly an answer to the quoting/escaping problem of the original question but probably something that would actually have been more useful for the op:
unset FILES
for f in 2011-*.jpg; do FILES+=("$f"); done
echo "${FILES[#]}"
Where of course the expression would have to be adopted to the specific requirement (e.g. *.jpg for all or 2001-09-11*.jpg for only the pictures of a certain day).

For those who prefer set array in oneline mode, instead of using for loop
Changing IFS temporarily to new line could save you from escaping.
OLD_IFS="$IFS"
IFS=$'\n'
array=( $(ls *.jpg) ) #save the hassle to construct filename
IFS="$OLD_IFS"

Escaping works.
#!/bin/bash
FILES=(2011-09-04\ 21.43.02.jpg
2011-09-05\ 10.23.14.jpg
2011-09-09\ 12.31.16.jpg
2011-09-11\ 08.43.12.jpg)
echo ${FILES[0]}
echo ${FILES[1]}
echo ${FILES[2]}
echo ${FILES[3]}
Output:
$ ./test.sh
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg
Quoting the strings also produces the same output.

#! /bin/bash
renditions=(
"640x360 80k 60k"
"1280x720 320k 128k"
"1280x720 320k 128k"
)
for z in "${renditions[#]}"; do
echo "$z"
done
OUTPUT
640x360 80k 60k
1280x720 320k 128k
1280x720 320k 128k
`

Another solution is using a "while" loop instead a "for" loop:
index=0
while [ ${index} -lt ${#Array[#]} ]
do
echo ${Array[${index}]}
index=$(( $index + 1 ))
done

If you aren't stuck on using bash, different handling of spaces in file names is one of the benefits of the fish shell. Consider a directory which contains two files: "a b.txt" and "b c.txt". Here's a reasonable guess at processing a list of files generated from another command with bash, but it fails due to spaces in file names you experienced:
# bash
$ for f in $(ls *.txt); { echo $f; }
a
b.txt
b
c.txt
With fish, the syntax is nearly identical, but the result is what you'd expect:
# fish
for f in (ls *.txt); echo $f; end
a b.txt
b c.txt
It works differently because fish splits the output of commands on newlines, not spaces.
If you have a case where you do want to split on spaces instead of newlines, fish has a very readable syntax for that:
for f in (ls *.txt | string split " "); echo $f; end

If the elements of FILES come from another file whose file names are line-separated like this:
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg
then try this so that the whitespaces in the file names aren't regarded as delimiters:
while read -r line; do
FILES+=("$line")
done < ./files.txt
If they come from another command, you need to rewrite the last line like this:
while read -r line; do
FILES+=("$line")
done < <(./output-files.sh)

I used to reset the IFS value and rollback when done.
# backup IFS value
O_IFS=$IFS
# reset IFS value
IFS=""
FILES=(
"2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg"
)
for file in ${FILES[#]}; do
echo ${file}
done
# rollback IFS value
IFS=${O_IFS}
Possible output from the loop:
2011-09-04 21.43.02.jpg
2011-09-05 10.23.14.jpg
2011-09-09 12.31.16.jpg
2011-09-11 08.43.12.jpg

Assigning command's output to shell variable and get the variables size

I have a file consisting of digits. Usually, each line contains one single number. I would like to count the number of lines in the file that begin with digit '0'. If it's the case, then I would like to do some post-processing.
Although I'm able to retrieve correctly the corresponding line numbers, the total number of retrieved lines is not correct. Below, I'm posting the code that I'm using.
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
# linesToRemove=$(grep -n "^0" ${inputFile} | cut -d":" -f1);
linesNr=${#linesToRemove} # <- here, the error
# linesNr=${#linesToRemove[#]} # <- here, the error
if [ "${linesNr}" -gt "0" ]; then
# do something here, e.g. remove corresponding lines.
awk -v n=$linesToRemove 'NR == n {next} {print}' ${anotherFile} > ${outputFile}
fi
Also, as for the awk-based command, how could I use a shell-variable? I tried the command below, but it's not working correctly, since 'myIndex' is interpreted as a text and not as a variable.
linesToRemove=$(awk -v myIndex="$myIndex" '/^myIndex/ { print NR;}' ${inputFile});
Given the line numbers starting with 0 found in ${inputFile}, I would like to remove the corresponding lines numbers from ${anotherFile}. An example for both ${inputFile} and ${anotherFile} is given below:
// ${inputFile}
0
1
3
0
// ${anotherFile}
2.617300e+01 5.886700e+01 -1.894697e-01 1.251225e+02
5.707397e+01 2.214040e+02 8.607959e-02 1.229114e+02
1.725900e+01 1.734360e+02 -1.298053e-01 1.250318e+02
2.177940e+01 1.249531e+02 1.538853e-01 1.527150e+02
// ${outputFile}
5.707397e+01 2.214040e+02 8.607959e-02 1.229114e+02
1.725900e+01 1.734360e+02 -1.298053e-01 1.250318e+02
In the example above, I need to delete lines 0 and 3 from ${anotherFile}, given that those lines correspond to the lines starting with 0 in ${inputFile}.

If you want to count the number of lines in the file that begins with 0, then this line is wrong.
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
The above says to print the line number when the line start with 0, and your linesToRemove variable will contain all the line numbers, not the total number of lines. Use END{} block to capture the total. eg
linesToRemove=$(awk '/^0/ {c++}END{print c}' ${inputFile});
As for your 2nd question on using variable inside awk, use the regex operator ~. And then set your myIndex variable to include the ^ anchor
linesToRemove=$(awk -v myIndex="^$myIndex" '$0 ~ myIndex{ print NR;}' ${inputFile});
finally, if you just want to remove those lines that start with 0, then just simply remove it
awk '/^0/{next}{print $0>FILENAME}' file
If you want to remove lines from another file using what is captured in input file, here's one way
paste -d"|" inputfile anotherfile | awk '!/^0/{gsub(/^.*\|/,"");print}'
Or just one awk command
awk 'FNR==NR && /^0/{a[FNR]} NR>FNR && (!(FNR in a))' inputfile anotherfile
crude explanation: FNR==NR && /^0/ means process the first file whole line starts with 0 and put its line number into array a. NR>FNR means process the next file and if line number not in array, print the line. See the gawk documentation for what FNR,NR etc means

I think you have to do the following to assign an array:
linesToRemove=( $(awk '/^0/ { print NR; }' ${inputFile}) )
And to get the number of elements do (as you have in a commented line):
linesNr=${#linesToRemove[#]}
To remove the lines from from the file you could do something like:
sedCmd=""
for lineNr in ${linesToRemove[#]}; do
sedCmd="$sedCmd;${lineNr}d"
done
sed "$sedCmd" ${anotherFile} > ${outputFile}

In general if you do this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
instead of this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
linesNr=${#linesToRemove}
use this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
linesNr=${echo $linesToRemove|awk '{print NF}'}
POC :
cat temp.sh
#!/usr/bin/ksh
lines=$(awk '/^d/{print NR}' script.sh)
nooflines=$(echo $lines|awk '{print NF}')
echo $nooflines
torinoco!DBL:/oo_dgfqausr/test/dfqwrk12/vijay> temp.sh
8
torinoco!DBL:/oo_dgfqausr/test/dfqwrk12/vijay>

It greatly depends on the post-processing you are doing, but do you really need the actual count? Why not do something like this:
if grep ^0 $inputfile > /dev/null; then
# There is at least one line with a leading 0
:
fi
grep -v ^0 $inputfile | process-lines-without-leading-zero
grep ^0 $inputfile | process-lines-with-leading-zero
Or, even just:
if grep ^0 $inputfile | process-lines-with-leading-zero; then
# some post processing
:
fi
--EDIT--
Based on what you've said in your comment, I would recommend a different approach. If I understand you correctly, you want to read file a, looking for lines of the form ^0[0-9]*,
and then remove those line numbers from file b. Doing it one line at a time is pretty slow if the files get big. Just do:
cmd=$( grep '^0[0-9]*$' a | sed 's/$/d;/g' )
sed "$cmd" b
The assignment to cmd forms a sed command to delete the lines. Invoking sed on b will omit those lines. You'll need to redirect the sed output appropriately (perhaps to a temp file and then back to b, or just use 'sed -i' if you're using gnu sed.)

Given the large number of edits to this question, it seems easiest to start a new answer. Your problem can be solved with a simple one-liner:
$ sed "$( grep -n ^0 $inputFile | sed 's/:.*/d;/g' )" $anotherFile > $outputFile

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Read a file's content (with spaces) into an array - arrays

This will work with bash, possibly with sh: OLDIFS="$IFS" IFS=$'\n' expressions=() while read line do expressions=("${expressions[#]}" "$line") done < expressions.txt IFS="$OLDIFS"

use (g)awk, or nawk on Solaris awk 'BEGIN{srand() } { lines[++d]=$0 }END{ RANDOM = int(1 + rand() * d) print lines[RANDOM] }' expression.txt

Related

How to write a shell script which accepts array in the command line and outputs the sum of it [duplicate]

Using array inside awk in shell script

Shell to resolve output issue - issue in concatenation and file reading

Bash array with spaces in elements

Assigning command's output to shell variable and get the variables size

Categories

Resources