Create Array of File paths with a loop and grep through them

Create Array of File paths with a loop and grep through them - arrays

I want to save each output filepath to a variable and then grep through them to find the timestamp. I want to lable each variable by adding the nodeId from the node list I am looping through. When I try this with the following code I get an error
output1_1: command not found
nodeList=('1_1' '1_6' '2_1' '2_6')
for i in "${nodeList[#]}"
do
output${i}=$CWD/output/abc${i}.txt
times${i}=$(grep -m 1 '\"path\":' $output${i}| sed 's/.*timestampUtc\"://g' | sed 's/,.*//g')
done

As #muru suggested, try
declare -A output times
nodeList=('1_1' '1_6' '2_1' '2_6')
for i in "${nodeList[#]}"; do
output[${i}]=${PWD}/output/abc${i}.txt
times[${i}]=$(grep -m 1 '\"path\":' ${output[${i}]} | sed 's/.*timestampUtc\"://g' | sed 's/,.*//g')
done

One way to set a variable whose name is derived from another variable is to use the -v option to 'printf'.
To get the value of a variable whose name is in another variable (e.g. varname) use ${!varname}.
See Creating a string variable name from the value of another string and How can I generate new variable names on the fly in a shell script?.
Using these a possible loop body is:
output_var=output${i}
times_var=times${i}
printf -v "$output_var" '%s' "$CWD/output/abc${i}.txt"
printf -v "$times_var" '%s' "$(grep -m 1 '\"path\":' "${!output_var}" | sed 's/.*timestampUtc\"://g' | sed 's/,.*//g')"
Note that (unless CWD is set elsewhere in your program) you probably want $PWD instead of $CWD.

Related

Add text on certain lines of a file, with the added text depending on the output of a command that takes a substring from the line

I'm trying to make a shell script to take an input file (thousands of lines) and produce an output file that is the same except that on certain lines there will be added text. When the text is added to the (middle of) the line, the exact added text will depend on a substring on the line. The correlation between the substring and the added text is complex and comes from an external program that I can call in the shell. I don't have the source for this converter program nor any control over how the mapping is done.
To explain further...
I have an input file of this general format:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, output_1),
FIELD_INFO(field_name_2, output_2),
Yadda Yadda
The whole file needs to be copied, with added text, but the only important parts for me are the field names (e.g. field_name_1, field_name_2). I have a command line program called "converter" that can take a file of field names and output a list of corresponding actions. Converter cannot operate directly on the input file. The input to converter needs to be just field names and the output of converter has extra information I don't need:
converter_field_name_1 "action1" /* Use this action for field_name_1 */
converter_field_name_2 "action2" /* use this action for field_name_2 */
The desire is to create a second file that looks like this:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, action1, output_1),
FIELD_INFO(filed_name_2, action2, output_2),
Yadda Yadda
Here is the script I'm working on, but I've hit a wall (or two):
#!/bin/bash
filename="input_file"
# Let's create an array of the field names to feed to the converter program
field_array=($(sed -e '/^\s*FIELD_INFO/ s/FIELD_INFO(\(.*\),.*),/\1/' -e 't' -e 'd' < ${filename}))
# Save the array to a file, to be able to use the converter's file option
printf "%s\n" "${field_array[#]}" > script_field_names.txt
# Use converter on the whole file and extract only the actions into another array
action_array=($(converter -f script_field_names.txt | cut -d'"' -f 2))
# I will make and use an associative array and try to use
# sed to do the substitution
declare -A mapper
for i in ${!field_array[*]}
do
mapper[${field_array[i]}]=${action_array[i]}
done
#Now go back through the file and add action names (source file unchanged)
sed -e "s/FIELD_INFO(\(.*\),\(.*?),\)/FIELD_INFO(\1, ${mapper[\1], \2}/" < ${filename}
I know now that I can't use the sed group capture "\1" as an index into the mapper array like this. It is not working as a key and the output looks like this:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, , output_1),
FIELD_INFO(field_name_2, , output_2),
Yadda Yadda
My actual script has debug statements scattered throughout and I know the field array, action array, and mapper array are all getting created correctly. But my idea of using the group capture substring from sed as the index into the mapper array is not working because I now know that sed expands the variables before running in the sub-shell, so the mapper[] array is not seeing the substring as an index.
What should I be doing instead? This script may only be used once, but it's too time consuming and error prone to do the addition of the action strings by hand. I want to come up with a way to make this work but I can't tell if I'm close or completely on the wrong path.

sed -e "s/FIELD_INFO(\(.*\),\(.*?),\)/FIELD_INFO(\1, ${mapper[\1], \2}/" < ${filename}
[...]
I now know that sed expands the variables before running in the sub-shell, so the mapper[] array is not seeing the substring as an index.
Good job identifying the problem. Also, the non-greedy quantifier .*? does not work with sed and ${mapper[\1], \2} should probably be ${mapper[\1]}, \2.
If you want to keep your current approach I see two options.
Do the replacement line by line in bash, either by creating a giant sed command string that lists the action for each line, or by executing sed inside a loop for each line while creating the command strings on the fly.
Instead of the array mapper, create a file that lists the actions to be inserted in the order from the file. Then use GNU sed's R filename command. This command inserts the next line from filename. You can use this to insert the correct action each time you come across a filed. However, the linebreak is inserted too. So you have to fiddle with the hold space and so on to remove these linebreaks afterwards.
Both options are not that great. Therefore I'd switch to awk to insert the actions:
sed -En 's/^\s*FIELD_INFO\(([^,]*).*/\1/p' "$filename" > fields
converter -f fields | cut -d\" -f2 > actions
awk '/^\s*FIELD_INFO\(/ {getline a < "actions"; sub(",", ", " a ",")} 1' "$filename"
With GNU grep you can simplify the first line to
grep -Po '^\s*FIELD_INFO\(\K[^,]*' "$filename" > fields

Why not try,
sed -n -e 's/^[ ]*FIELD_INFO(\(.*\),.*,/\1/p' -- input_file > script_field_names.txt
printf '/^[ ]*FIELD_INFO(%s,/ s/(\\(.[^,]*\\), \\(.[^)]*\\))/(\\1, %s, \\2)/\n' \
$(converter -f script_field_names.txt | cut -d'"' -f 2 |
paste -- script_field_names.txt -) |
sed -f /dev/stdin -- input_file
where
paste emits the map of fields (from file) and actions (from stdin)
printf emits a script read by sed from stdin
each script line becomes: /^[ ]*FIELD_INFO(fieldnameN,/ s/(\(.[^,]*\), \(.[^)]*\))/(\1, actionN, \2)/

Strange behaviour with Bash, Arrays and empty spaces

Problem:
Writing a bash script, i'm trying to import a list of products that are inside a csv file into an array:
#!/bin/bash
PRODUCTS=(`csvprintf -f "/home/test/data/input.csv" -x | grep "col2" | sed 's/<col2>//g' | sed 's/<\/col2>//g' | sed -n '1!p' | sed '$ d' | sed 's/ //g'`)
echo ${PRODUCTS[#]}
In the interactive shell, the result/output looks perfect as following:
burger
special fries
juice - 300ml
When i use exactly the same commands in bash script, even debugging with bash -x script.sh, in the part of echo ${PRODUCTS[#]}, the result of array is all files names located at /home/test/data/ and:
burger
special
fries
juice
-
300ml
The array is taking directory list AND messed up newlines. This don't happen in interactive shell (single command line).
Anyone know how to fix that?

Looking at the docs for csvprintf, you're converting the csv into XML and then parsing it with regular expressions. This is generally a very bad idea.
You might want to install csvkit then you can do
csvcut -c prod input.csv | sed 1d
Or you could use a language that comes with a CSV parser module. For example, ruby
ruby -rcsv -e 'CSV.read("input.csv", :headers=>true).each {|row| puts row["prod"]}'
Whichever method you use, read the results into a bash array with this construct
mapfile -t products < <(command to extract the product data)
Then, to print the array elements:
for prod in "${products[#]}"; do echo "$prod"; done
# or
printf "%s\n" "${products[#]}"
The quotes around the array expansion are critical. If missing, you'll see one word per line.
Tip: don't use ALLCAPS variable names in the shell: leave those for the shell. One day you'll write PATH=something and then wonder why your script is broken.

Shell Script regex matches to array and process each array element

While I've handled this task in other languages easily, I'm at a loss for which commands to use when Shell Scripting (CentOS/BASH)
I have some regex that provides many matches in a file I've read to a variable, and would like to take the regex matches to an array to loop over and process each entry.
Regex I typically use https://regexr.com/ to form my capture groups, and throw that to JS/Python/Go to get an array and loop - but in Shell Scripting, not sure what I can use.
So far I've played with "sed" to find all matches and replace, but don't know if it's capable of returning an array to loop from matches.
Take regex, run on file, get array back. I would love some help with Shell Scripting for this task.
EDIT:
Based on comments, put this together (not working via shellcheck.net):
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=($(sed 'asset\((.*)\)' $examplefile))
for el in ${!examplearr[*]}
do
echo "${examplearr[$el]}"
done

This works in bash on a mac:
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=(`echo "$examplefile" | sed -e '/.*/s/asset(\(.*\))/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
output:
'1a/1b/1c.ext'
'2a/2b/2c.ext'
'3a/3b/3c.ext'
Note the wrapping of $examplefile in quotes, and the use of sed to replace the entire line with the match. If there will be other content in the file, either on the same lines as the "asset" string or in other lines with no assets at all you can refine it like this:
#!/bin/sh
examplefile="
fooasset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')bar
foobar
fooasset('3a/3b/3c.ext')bar
"
examplearr=(`echo "$examplefile" | grep asset | sed -e '/.*/s/^.*asset(\(.*\)).*$/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
and achieve the same result.

There are several ways to do this. I'd do with GNU grep with perl-compatible regex (ah, delightful line noise):
mapfile -t examplearr < <(grep -oP '(?<=[(]).*?(?=[)])' <<<"$examplefile")
for i in "${!examplearr[#]}"; do printf "%d\t%s\n" $i "${examplearr[i]}"; done
0 '1a/1b/1c.ext'
1 '2a/2b/2c.ext'
2 '3a/3b/3c.ext'
This uses the bash mapfile command to read lines from stdin and assign them to an array.
The bits you're missing from the sed command:
$examplefile is text, not a filename, so you have to send to to sed's stdin
sed's a funny little language with 1-character commands: you've given it the "a" command, which is inappropriate in this case.
you only want to output the captured parts of the matches, not every line, so you need the -n option, and you need to print somewhere: the p flag in s///p means "print the [line] if a substitution was made".
sed -n 's/asset\(([^)]*)\)/\1/p' <<<"$examplefile"
# or
echo "$examplefile" | sed -n 's/asset\(([^)]*)\)/\1/p'
Note that this returns values like ('1a/1b/1c.ext') -- with the parentheses. If you don't want them, add the -r or -E option to sed: among other things, that flips the meaning of ( and \(

Unique/No Duplicated values in Shell Array Linux

I need to make a new array or just delete from the actual array the duplicate elements,
#The NTP IPS are the following ones:
#10.30.10.0, 10.30.10.0, 10.30.20.0, 10.30.20.0, 10.30.20.0
#!/bin/bash
ips_networks=()
for ip in ${ips_for_ntp[#]};do
ips_networks+=${ip%.*}.0
done
So I'll get ips_networks with duplicate ips, but I need just one of each ip into another array or the same, I have try with awk, set -A (Is not working on my linux), cut but with no luck, is there anyway to make an unique value array?

ips="10.30.10.0, 10.30.10.0, 10.30.20.0, 10.30.20.0, 10.30.20.0"
unique_ips=`echo $ips | sed -e "s/\s\\+//g" | sed -e "s/,/\\n/g"| sort | uniq`
echo $unique_ips #10.30.10.0 10.30.20.0

Content of array in bash is OK when called directly, but lost when called from function

I am trying to use xmllint to search an xml file and store the values I need into an array. Here is what I am doing:
#!/bin/sh
function getProfilePaths {
unset profilePaths
unset profilePathsArr
profilePaths=$(echo 'cat //profiles/profile/#path' | xmllint --shell file.xml | grep '=' | grep -v ">" | cut -f 2 -d "=" | tr -d \")
profilePathsArr+=( $(echo $profilePaths))
return 0
}
In another function I have:
function useProfilePaths {
getProfilePaths
for i in ${profilePathsArr[#]}; do
echo $i
done
return 0
}
useProfilePaths
The behavior of the function changes whether I do the commands manually on the command line VS calling them from different function as part of a wrapper script. When I can my function from a wrapper script, the items in the array are 1, compared to when I do it from the command line, it's 2:
$ echo ${#profilePathsArr[#]}
2
The content of profilePaths looks like this when echoed:
$ echo ${profilePaths}
/Profile/Path/1 /Profile/Path/2
I am not sure what the separator is for an xmllint call.
When I call my function from my wrapper script, the content of the first iteration of the for loop looks like this:
for i in ${profilePathsArr[#]}; do
echo $i
done
the first echo looks like:
/Profile/Path/1
/Profile/Path/2
... and the second echo is empty.
Can anyone help me debug this issue? If I could find out what is the separator used by xmllint, maybe I could parse the items correctly in the array.
FYI, I have already tried the following approach, with the same result:
profilePaths=($(echo 'cat //profiles/profile/#path' | xmllint --shell file.xml | grep '=' | grep -v ">" | cut -f 2 -d "=" | tr -d \"))

Instead of using the --shell switch and many pipes, you should use the proper --xpath switch.
But as far of I know, when you have multiple values, there's no simple way to split the different nodes.
So a solution is to iterate like this :
profilePaths=(
$(
for i in {1..100}; do
xmllint --xpath "//profiles[$i]/profile/#path" file.xml || break
done
)
)
or use xmlstarlet:
profilePaths=( $(xmlstarlet sel -t -v "//profiles/profile/#path" file.xml) )
it display output with newlines by default

The problem you're having is related to data encapsulation; specifically, variables defined in a function are local, so you can't access them outside that function unless you define them otherwise.
Depending on the implementation of sh you're using, you may be able get around this by using eval on your variable definition or with a modifier like global for mksh and declare -g for zsh and bash. I know that mksh's implementation definitely works.

Thank you for providing feedback on how I can resolve this problem. After investigating more, I was able to make this work by changing the way I was iterating the content of my 'profilePaths' variable to insert its values into the 'profilePathsArr' array:
# Retrieve the profile paths from file.xml and assign to 'profilePaths'
profilePaths=$(echo 'cat //profiles/profile/#path' | xmllint --shell file.xml | grep '=' | grep -v ">" | cut -f 2 -d "=" | tr -d \")
# Insert them into the array 'profilePathsArr'
IFS=$'\n' read -rd '' -a profilePathsArr <<<"$profilePaths"
For some reason, with all the different function calls from my master script and calls to other scripts, it seemed like the separators were lost along the way. I am unable to find the root cause, but I know that by using "\n" as the IFS and a while loop, it worked like a charm.
If anybody wishes to add more comments on this, you are more than welcome.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Create Array of File paths with a loop and grep through them - arrays

As #muru suggested, try declare -A output times nodeList=('1_1' '1_6' '2_1' '2_6') for i in "${nodeList[#]}"; do output[${i}]=${PWD}/output/abc${i}.txt times[${i}]=$(grep -m 1 '\"path\":' ${output[${i}]} | sed 's/.timestampUtc\"://g' | sed 's/,.//g') done

Related

Add text on certain lines of a file, with the added text depending on the output of a command that takes a substring from the line

Strange behaviour with Bash, Arrays and empty spaces

Shell Script regex matches to array and process each array element

Unique/No Duplicated values in Shell Array Linux

Content of array in bash is OK when called directly, but lost when called from function

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Create Array of File paths with a loop and grep through them - arrays

As #muru suggested, try declare -A output times nodeList=('1_1' '1_6' '2_1' '2_6') for i in "${nodeList[#]}"; do output[${i}]=${PWD}/output/abc${i}.txt times[${i}]=$(grep -m 1 '\"path\":' ${output[${i}]} | sed 's/.*timestampUtc\"://g' | sed 's/,.*//g') done

Related

Add text on certain lines of a file, with the added text depending on the output of a command that takes a substring from the line

Strange behaviour with Bash, Arrays and empty spaces

Shell Script regex matches to array and process each array element

Unique/No Duplicated values in Shell Array Linux

Content of array in bash is OK when called directly, but lost when called from function

Categories

Resources

As #muru suggested, try declare -A output times nodeList=('1_1' '1_6' '2_1' '2_6') for i in "${nodeList[#]}"; do output[${i}]=${PWD}/output/abc${i}.txt times[${i}]=$(grep -m 1 '\"path\":' ${output[${i}]} | sed 's/.timestampUtc\"://g' | sed 's/,.//g') done