Weird array output in shell script - arrays

The content extracted from the file is separated and stored in an array, and print the content using loop. Only printing the last element is weird. I'll show you my code.
How can I resolve this problem?
[config.json]
{
"id": "hello",
"passwd": "1234",
"languageList": ["ko", "en"]
}
[test.sh]
# BEFORE_CONFIG and AFTER_CONFIG have same code
BEFORE_CONFIG=~/workspace/env/config.json
AFTER_CONFIG=~/workspace/config/config.json
BEF_LANG=$(grep "\[" ${BEFORE_CONFIG} | tr -d '\[' | tr -d '\]' | tr -d '"' | tr -d ' ' | cut -d ":" -f2)
AF_LANG=$(grep "\[" ${AFTER_CONFIG} | tr -d '\[' | tr -d '\]' | tr -d '"' | tr -d ' ' | cut -d ":" -f2)
echo "before lang :: ${BEF_LANG}"
echo "after lang :: ${AF_LANG}"
IFS=',' read -r -a AF_LANG_LIST <<< "$AF_LANG"
echo "after lang list print >> ${AF_LANG_LIST[#]}"
echo "list length >> ${#AF_LANG_LIST[#]}"
for element in ${AF_LANG_LIST[#]}
do
echo "${element}"
echo "This language !!! ${element} !!! print !!!!"
done
[result]
$ source tesh.sh
before lang :: ko,en
after lang :: ko,en
after lang list >> ko en
list length >> 2
ko
This language !!! ko !!! print !!!!
en
!!! print !!!!!! en # expect result → This language !!! en !!! print !!!!

You can use jq to parse the json correctly, and extract languageList do:
cat ~/workspace/env/config.json|jq .languageList[] -r|xargs
which will output:
ko en
which you can later use in your script
Trying to parse JSON with tr & cut is prone to so many errors.
Here is the example for your script:
#!/bin/bash
# BEFORE_CONFIG and AFTER_CONFIG have same code
BEFORE_CONFIG=~/workspace/env/config.json
AFTER_CONFIG=~/workspace/config/config.json
BEF_LANG=$(grep "\[" ${BEFORE_CONFIG} | tr -d '\[' | tr -d '\]' | tr -d '"' | tr -d ' ' | cut -d ":" -f2)
#AF_LANG=$(grep "\[" ${AFTER_CONFIG} | tr -d '\[' | tr -d '\]' | tr -d '"' | tr -d ' ' | cut -d ":" -f2)
AF_LANG=$(cat ${BEFORE_CONFIG}|jq .languageList[] -r|xargs)
echo "before lang :: ${BEF_LANG}"
echo "after lang :: ${AF_LANG}"
# you do not need this
#IFS=',' read -r -a AF_LANG_LIST <<< "$AF_LANG"
#echo "after lang list print >> ${AF_LANG_LIST[#]}"
#echo "list length >> ${#AF_LANG_LIST[#]}"
for element in ${AF_LANG[#]}
do
echo "${element}"
echo "This language !!! ${element} !!! print !!!!"
done

Related

Read delimited multiline string file into multiple arrays in Bash

I began with a file like so:
Table_name1 - Table_desc1
Table_name2 - Table_desc2
...
...
I have a script that parses this file and splits them into two arrays:
declare -a TABLE_IDS=()
declare -a TABLE_DESCS=()
while IFS= read -r line || [[ -n "${line}" ]]; do
TABLE_IDS[i]=${line%' '-' '*}
TABLE_DESCS[i++]=${line#*' '-' '}
done < "${TABLE_LIST}"
for i in "${!TABLE_IDS[#]}"; do
echo "Creating Table ID: "${TABLE_IDS[i]}", with Table Description: "${TABLE_DESCS[i]}""
done
This works really well, with no problems whatsoever.
I wanted to extend this and make the file:
Table_name1 - Table_desc1 - Table_schema1
Table_name2 - Table_desc2 - Table_schema2
...
...
For this, I tried:
declare -a TABLE_IDS=()
declare -a TABLE_DESCS=()
while IFS= read -r line || [[ -n "${line}" ]]; do
TABLE_IDS[i]="$(echo $line | cut -f1 -d - | tr -d ' ')"
TABLE_DESCS[i++]="$(echo $line | cut -f2 -d - | tr -d ' ')"
TABLE_SCHEMAS[i++]="$(echo $line | cut -f3 -d - | tr -d ' ')"
done < "${TABLE_LIST}"
for i in "${!TABLE_IDS[#]}"; do
echo "Creating Table ID: "${TABLE_IDS[i]}", with Table Description: "${TABLE_DESCS[i]}" and schema: "${TABLE_SCHEMAS[i]}""
done
And while this will faithfully list all the Table IDs and the Table descriptions, the schemas are omitted. I tried:
while IFS= read -r line || [[ -n "${line}" ]]; do
TABLE_IDS[i]="$(echo $line | cut -f1 -d - | tr -d ' ')"
TABLE_DESCS[i]="$(echo $line | cut -f2 -d - | tr -d ' ')"
TABLE_SCHEMAS[i]="$(echo $line | cut -f3 -d - | tr -d ' ')"
done < "${TABLE_LIST}"
And it returns just the last line's Table name, description AND schema. I suspect this is an indexing/looping problem, but am unable to figure out what exactly is going wrong. Please help! Thanks!
perhaps set the delimiter to the actual delimiter - and do the processing in the read loop instead of deferring and using arrays.
$ while IFS=- read -r t d s;
do
echo "Creating Table ID: ${t// }, with Table Description: ${d// } and schema: ${s// }";
done < file

Array job gives error in bash

As I want to run several simulations with different values in R, I have been recommended to use a job array in bash.
1) I generated the combination of parameters and saved it in a txt file, called parameters.txt.
2) I want now to use each combination of parameters into R. Each combination is represented by a line of 3 numbers (the 3 parameters) in parameters.txt.
When I run my script, an error message appears :
head: parameters.txt: invalid number of lines
head: parameters.txt: invalid number of lines
head: parameters.txt: invalid number of lines
Job array item : rx=, ry=, rz=
Here is my script:
# Sweeping parameters.txt
N=${SLURM_ARRAY_TASK_ID}
rx=`head -n ${N} parameters.txt | tail -n 1 | cut -d' ' -f1`
ry=`head -n ${N} parameters.txt | tail -n 1 | cut -d' ' -f2`
rz=`head -n ${N} parameters.txt | tail -n 1 | cut -d' ' -f3`
# Display
echo "Job array item $N: rx=$rx, ry=$ry, rz=$rz"
echo "---------------------------------"
# Run
R CMD BATCH ex.R $rx $ry $rz
Seems SLURM_ARRAY_TASK_ID is None (not set) and as a result N is None here:
N=${SLURM_ARRAY_TASK_ID}
Then bash translates it as
rx=`head -n parameters.txt ...
You can wrap with if statement as follows:
N=${SLURM_ARRAY_TASK_ID}
if [ -n "${N}" ]; then
rx=`head -n ${N} parameters.txt | tail -n 1 | cut -d' ' -f1`
ry=`head -n ${N} parameters.txt | tail -n 1 | cut -d' ' -f2`
rz=`head -n ${N} parameters.txt | tail -n 1 | cut -d' ' -f3`
# Display
echo "Job array item $N: rx=$rx, ry=$ry, rz=$rz"
echo "---------------------------------"
# Run
R CMD BATCH ex.R $rx $ry $rz
else
echo "SLURM_ARRAY_TASK_ID / N is None"
fi

Splitting files into multiple files based on some pattern and take some information

I'm working with a lot of files with this structure:
BEGIN
TITLE=id=PRD000012;PRIDE_Exp_Complete_Ac_1645.xml;spectrum=1393
PEPMASS=946.3980102539062
CHARGE=3.0+
USER03=
SEQ=DDDIAAL
TAXONOMY=9606
272.228 126847.000
273.252 33795.000
END
BEGIN IONS
TITLE=id=PRD000012;PRIDE_Exp_Complete_Ac_1645.xml;spectrum=1383
PEPMASS=911.3920288085938
CHARGE=2.0+
USER03=
SEQ=QGKFEAAETLEEAAMR
TAXONOMY=9606
1394.637 71404.000
1411.668 122728.000
END
BEGIN IONS
TITLE=id=PRD000012;PRIDE_Exp_Complete_Ac_1645.xml;spectrum=2965
PEPMASS=946.3900146484375
CHARGE=3.0+
TAXONOMY=9606
1564.717 92354.000
1677.738 33865.000
END
This structure is repeated thousands of times but with different data inside. As you can see, between some begin-end, sometimes SEQ and USER03 are not there. This is because the protein is not identified ... And here comes my problem.
I would like to know how many proteins are identified and how many are unidentified. To do this I was trying this:
for i in $(ls *.txt ); do
echo $i
awk '/^BEGIN/{n++;w=1} n&&w{print > "./cache/out" n ".txt"} /^END/{w=0}' $i
done
I found this here (Split a file into multiple files based on a pattern and name the new files by the search pattern in Unix?)
And then use the outputs and classify them:
for i in $(ls cache/*.txt ); do
echo $i
if grep -q 'SEQ' $i; then
mv $i ./archive_identified
else
mv $i ./archive_unidentified
fi
done
After this, I'd like to take some data (Example: spectrum, USER03, SEQ, TAXONOMY) from classified files.
for I in $( ls archive_identified/*.txt ); do
echo $i
grep 'SEQ' $i | cut -d "=" -f2- | tr ',' '\n' >> ./sequences_ide.txt
grep 'TAXONOMY' $i | cut -d "=" -f2- | tr ',' '\n' >> ./taxonomy_ide.txt
grep 'USER' $i | cut -d "=" -f2- >> ./modifications_ide.txt
grep 'TITLE' $i | sed 's/^.*\(spectrum.*\)/\1/g' | cut -d "=" -f2- >> ./spectrum.txt
done
for i in $( ls archive_unidentified/*.txt ); do
echo $i
grep 'SEQ' $i | cut -d "=" -f2- | tr ',' '\n' >> ./sequences_unide.txt
grep 'TAXONOMY' $i | cut -d "=" -f2- | tr ',' '\n' >> ./taxonomy_unide.txt
grep 'USER' $i | cut -d "=" -f2- >> ./modifications_unide.txt
grep 'TITLE' $i | sed 's/^.*\(spectrum.*\)/\1/g' | cut -d "=" -f2- >> ./spectrum_unide.txt
done
The problem is that the first part of the script takes too much time due to the large size of the data (12-15gb.). Is there any way to do this easier?
Thank you in advance.
You can do all in one awk script. awk can iterate through all rows (records) so you don't need an external loop. For example, for the data file you provided
$ awk -v RS= '/\nSEQ/ {seq++; print > "file_path_with_seq" NR ".txt"; next}
{noseq++; print > "file_path_without_seq" NR ".txt"}
END { print "with seq:", seq;
print "without seq:", noseq}' file
will print
with seq: 2
without seq: 1
and produces the files
$ head file_path_with*
==> file_path_with_seq1.txt <==
BEGIN
TITLE=id=PRD000012;PRIDE_Exp_Complete_Ac_1645.xml;spectrum=1393
PEPMASS=946.3980102539062
CHARGE=3.0+
USER03=
SEQ=DDDIAAL
TAXONOMY=9606
272.228 126847.000
273.252 33795.000
END
==> file_path_with_seq2.txt <==
BEGIN IONS
TITLE=id=PRD000012;PRIDE_Exp_Complete_Ac_1645.xml;spectrum=1383
PEPMASS=911.3920288085938
CHARGE=2.0+
USER03=
SEQ=QGKFEAAETLEEAAMR
TAXONOMY=9606
1394.637 71404.000
1411.668 122728.000
END
==> file_path_without_seq3.txt <==
BEGIN IONS
TITLE=id=PRD000012;PRIDE_Exp_Complete_Ac_1645.xml;spectrum=2965
PEPMASS=946.3900146484375
CHARGE=3.0+
TAXONOMY=9606
1564.717 92354.000
1677.738 33865.000
END

shell script array won't populate from for loop

Can anyone tell me why this array creation: cccr[$string_1]=$string_2 #doesn't work?
#!/bin/bash
firstline='[Event "Marchand Open"][Site "Rochester NY"][Date "2005.03.19"][Round "1"][White "Smith, Igor"][Black "Jones, Matt"][Result "1-0"][ECO "C01"][WhiteElo "2409"][BlackElo "1911"]'
unset cccr
declare -A cccr
(IFS='['; for word in $firstline; do
string_1=$(echo $word | cut -f1 -d'"' | tr -d ' ')
string_2=$( echo $word | cut -f2 -d'"' )
if [ ! -z $string_1 ]; then # If $string_1 is not empty
cccr[$string_1]=$string_2 # why doesn't this line work?
fi
done)
echo ${cccr[Event]} # echos null string
It happens because the value of string_1 is empty at the first iteration.
Example :
#!/bin/bash
firstline='[Event "Marchand Open"][Site "Rochester NY"][Date "2005.03.19"][Round "1"][White "Smith, Igor"][Black "Jones, Matt"][Result "1-0"][ECO "C01"][WhiteElo "2409"][BlackElo "1911"]'
unset cccr
declare -A cccr
(IFS='['; for word in $firstline; do
string_1=$( echo $word | cut -f1 -d'"' )
string_2=$( echo $word | cut -f2 -d'"' )
echo "$string_1 - $string_2"
#cccr[$string_1]=$string_2
done)
Output :
- # Problem !
Event - Marchand Open
Site - Rochester NY
...
You have to modify your script to prevent the value of being empty.
A very simple workaround is to check the value of string_1 before using it.
Example :
# ...
string_1=$( echo $word | cut -f1 -d'"' )
string_2=$( echo $word | cut -f2 -d'"' )
if [ ! -z $string_1 ]; then # If $string_1 is not empty
echo "$string_1 - $string_2"
cccr[$string_1]=$string_2
fi
# ...
From the man page of [
-z STRING
the length of STRING is zero
Output :
Event - Marchand Open
Site - Rochester NY
# ... No problem
EDIT
BTW, if look at the value of string_1, you will see that the value is Event' ' and not Event (there's a whitespace at the end of Event)
So cccr[Event] does not exist, but cccr[Event ] exists.
To fix that, you can delete the whitespaces in string_1 :
string_1=$(echo $word | cut -f1 -d'"' | tr -d ' ') # tr -d ' ' deletes all the whitespaces
EDIT 2
I forgot to tell you that it's normal if it does not work. Indeed, the loop is executed in a subshell environment. So the array is filled in the subshell, but not in the current shell.
From the man page of bash :
(list) list is executed in a subshell environment (see COMMAND EXECUTION ENVIRONMENT below). Variable
assignments and builtin commands that affect the shell's environment do not remain in effect
after the command completes. The return status is the exit status of list.
So there are 2 solutions :
1. Don't run the loop in a subshell (remove the parentheses).
# ...
OLDIFS=$IFS
IFS='['
for word in $firstline; do
string_1=$(echo $word | cut -f1 -d'"' | tr -d ' ')
string_2=$(echo $word | cut -f2 -d'"')
if [ ! -z $string_1 ]; then
cccr[$string_1]=$string_2
fi
done
IFS=$OLDIFS
echo "Event = ${cccr[Event]}"
echo "Site = ${cccr[Site]}"
Output :
Event = Marchand Open
Site = Rochester NY
2. Use your array in the subshell.
# ...
(IFS='['
for word in $firstline; do
string_1=$(echo $word | cut -f1 -d'"' | tr -d ' ')
string_2=$(echo $word | cut -f2 -d'"')
if [ ! -z $string_1 ]; then # If $string_1 is not empty
cccr[$string_1]=$string_2
fi
done
echo "Event = ${cccr[Event]}"
echo "Site = ${cccr[Site]}"
)
Output :
Event = Marchand Open
Site = Rochester NY

store the values from command into an array bash

svn mergeinfo --show-revs eligible http://svn.test.com/INT_1.0.0/ http://svn.test.com/DEV/ | cut -d"r" -f2 | cut -d" " -f1
6097
6099
when i put this in a script, i get only last value but not all:
#!/usr/bin/bash
src_url="http://svn.test.com/INT_1.0.0/"
target_url="http://svn.test.com/DEV/"
eligible_revs=(`svn mergeinfo --show-revs eligible $src_url $target_url | cut -d"r" -f2 | cut -d" " -f1`)
echo ${eligible_revs[#]}
output:
6099
If you are running Cygwin the line endings can mess it up
$ foo=(`printf 'bar\r\nbaz'`)
$ echo ${foo[*]}
baz

Resources