Passing csh array to awk - arrays

I'm having a problem with a csh script that i'm writing.
What i wanna do is to read a file, and for every new line of this file that i'm reading, assign a new value to a certain variable that i'll use later.
To make it simple I have a string, array called "zn" that contains 6 values. I can print every value using smth like:
echo ${zn[$i]}
Then i try to use the values of this array with something like (easy example just to explain):
cat file1 | awk '{i=NR;n='${zn[i]}';print n,$0}' >! file2
or other attempt:
cat file1 | awk '{n='${zn[NR]}';print n,$0}' >! file2
Well, I tried almost every possible combination of brackets, apostrophes, quotes...and I always get some errors like:
missing -.
Any help would be really appreciated, the solution it's probably smth pretty easy and obvious.
(i'm sorry if my syntax is not the best but i'm kinda new to this)
EDIT:
I ported the script in bash
...this is part of the script I use to prepare some text files to prepare a graphic in GMT:
cat crosspdf.dat |
awk '
BEGIN { n = int(('$dz')/('$dz_new')) }
{
z=$1
for (i=6;i<=NF;i++) {
if ($i!=0) {
for(j=1;j<=‌​n;j++)
print (i-4)*'$dv', z+(j-n/2)*'$dz_new', $i
}
}
}
' >! temp
This works: the only thing you need to know is that $dz was a constant value, and now i wanna change it in order to have a different value for each line of the file i'm scanning. I can easily prepare the array with the values but i'm not able to pass include it somehow in the previous line. PS: thanks for the support – Francesco 2 mins ago edit
EDIT 2
1) dv, and dz_new are just parameters
2) dz would be an array with variable lenght containing just numbers (depth intervals: smth like -6.0 1.0 5.0 10.0 ... 36.0)
3) crosspdf.dat contains some histogram-like data: Each line corresponds to a different depth (depths were equally spaced, now not anymore, so that's why i need to use the dz array)

Let's start by re-writing your script:
cat crosspdf.dat |
awk '
BEGIN { n = int(('$dz')/('$dz_new')) }
{
z=$1
for (i=6;i<=NF;i++) {
if ($i!=0) {
for(j=1;j<=‌​n;j++)
print (i-4)*'$dv', z+(j-n/2)*'$dz_new', $i
}
}
}
'
to pass shell variable values to awk the right way and cleanup the UUOC. The above should be written as:
awk -v dv="$dv" -v dz="$dz" -v dz_new="$dz_new" '
BEGIN { n = int(dz/dz_new) }
{
z=$1
for (i=6;i<=NF;i++) {
if ($i!=0) {
for(j=1;j<=‌​n;j++)
print (i-4)*dv, z+(j-n/2)*dz_new, $i
}
}
}
' crosspdf.dat
Now some questions you need to answer are: which of your shell variables (dv, dz, and/or dz_new) is it you want to have different values for each line of the input file? What are some representative values of those shell variables? What values could crosspdf.dat contain? What would your expected output look like?
Update your question to show some small sample of crosspdf.dat, some settings of your array variable(s), and the expected output given all of that.
Actually - maybe this is all the hint you need:
$ cat file
abc
def
ghi
$ cat tst.sh
dz="12 23 17"
awk -v dz="$dz" '
BEGIN{ split(dz,dzA) }
{ print dzA[NR], $0 }
' file
$ ./tst.sh
12 abc
23 def
17 ghi
Questions?

Related

Split a string directly into array

Suppose I want to pass a string to awk so that once I split it (on a pattern) the substrings become the indexes (not the values) of an associative array.
Like so:
$ awk -v s="A:B:F:G" 'BEGIN{ # easy, but can these steps be combined?
split(s,temp,":") # temp[1]="A",temp[2]="B"...
for (e in temp) arr[temp[e]] #arr["A"], arr["B"]...
for (e in arr) print e
}'
A
B
F
G
Is there a awkism or gawkism that would allow the string s to be directly split into its components with those components becoming the index entries in arr?
The reason is (bigger picture) is I want something like this (pseudo awk):
awk -v s="1,4,55" 'BEGIN{[arr to arr["1"],arr["5"],arr["55"]} $3 in arr {action}'
No, there is no better way to map separated substrings to array indices than:
split(str,tmp); for (i in tmp) arr[tmp[i]]
FWIW if you don't like that approach for doing what your final pseudo-code does:
awk -v s="1,4,55" 'BEGIN{split(s,tmp,/,/); for (i in tmp) arr[tmp[i]]} $3 in arr{action}'
then another way to get the same behavior is
awk -v s=",1,4,55," 'index(s,","$3","){action}'
Probably useless and unnecessarily complex but I'll open the game with while, match and substr:
$ awk -v s="A:B:F:G" '
BEGIN {
while(match(s,/[^:]+/)) {
a[substr(s,RSTART,RLENGTH)]
s=substr(s,RSTART+RLENGTH)
}
for(i in a)
print i
}'
A
B
F
G
I'm eager to see (if there are) some useful solutions. I tried playing around with asorts and such.
Other way kind awkism
cat file
1 hi
2 hello
3 bonjour
4 hola
5 konichiwa
Run it,
awk 'NR==FNR{d[$1]; next}$1 in d' RS="," <(echo "1,2,4") RS="\n" file
you get,
1 hi
2 hello
4 hola

File Name based on Array

I have created an array:
declare -A months=( ["JAN"]="AP01" ["FEB"]="AP02" ["MAR"]="AP03" ["APR"]="AP04" ["MAY"]="AP05" ["JUN"]="AP06" ["JUL"]="AP07" ["AUG"]="AP08" ["SEP"]="AP09" ["OCT"]="AP10" ["NOV"]="AP11" ["DEC"]="AP12")
Now I want read the replaced value of the month as it splits the file and creates new file name:
awk -F, '{print "a~ST_SAP_FILE~Actual~",echo ${months["${"$3":0:3}"]}","~RM.txt"}' ExtractOriginal.txt
The field where the variable substitution occurs is column 3. In there I have MAR-2016, what I am expecting is a file named: a~ST_SAP_FILE~Actual~MAR~RM.txt. However, I get an error:
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: bailing out near line 1
What is the right syntax to take column 3, pass it to my array, return the Substitution variable and use it as the file name?
There's a few ways you could go about solving your problem. Which you choose is mostly contingent on how tied to awk you want to be.
Declare the array in awk:
Is there any reason for you not to declare the variable in awk?
awk -F, 'BEGIN{months["JAN"]="AP01"; months["FEB"]="AP02"; months["MAR"]="AP03"; months["APR"]="AP04"; months["MAY"]="AP05"; months["JUN"]="AP06"; months["JUL"]="AP07"; months["AUG"]="AP08"; months["SEP"]="AP09"; months["OCT"]="AP10"; months["NOV"]="AP11"; months["DEC"]="AP12"}{print "a~ST_SAP_FILE~Actual~"months[substr($3,0,3)]"~RM.txt"}' ExtractOriginal.txt
(also note that I removed the commas from print, since those will add spaces that your question seems to indicate you do not want in the result)
As #Ed Morton pointed out, due to the nature of your array, we can simplify it's creation with split/sprintf, giving you this:
awk -F, 'BEGIN{split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC",t," "); for (i in t) months[t[i]]=sprintf("AP%02d",i)}{print "a~ST_SAP_FILE~Actual~"months[substr($3,0,3)]"~RM.txt"}' ExtractOriginal.txt
Parse the variable into awk:
This seems closest to what you were trying to do in your attempt. This keeps the array available in bash, but handles getting the filename you want with awk. Since there is no native way to handle a bash array in awk, you have to construct the latter from the former (which is made more difficult by this being an associative array).
I did this by first changing the bash array into a more easily parsed string which I then passed into awk as a variable.
# Declare the array
declare -A months=( ["JAN"]="AP01" ["FEB"]="AP02" ["MAR"]="AP03" ["APR"]="AP04" ["MAY"]="AP05" ["JUN"]="AP06" ["JUL"]="AP07" ["AUG"]="AP08" ["SEP"]="AP09" ["OCT"]="AP10" ["NOV"]="AP11" ["DEC"]="AP12")
# Change the array into a string more easily parsed with awk
# Each element in this array is of the format MON=APON
mon=`for key in ${!months[#]}; do echo ${key}'='${months[${key}]}; done`
# See below explanation
awk -F, -v mon="$mon" 'BEGIN {split(mon,tmp," "); for(m in tmp){i = index(tmp[m], "="); months[substr(tmp[m], 1, i-1)] = substr(tmp[m], i+1)}} {print "a~ST_SAP_FILE~Actual~"months[substr($3,0,3)]"~RM.txt"}' ExtractOriginal.txt
Below is a more readable version of the awk script. Note that -v mon="$mon" passes the bash variable mon into awk as a variable also named mon:
BEGIN {
split(mon,tmp," "); # split the string mon into an array named tmp
for(m in tmp) { # for element in tmp
i = index(tmp[m], "="); # get the index of the '='
months[substr(tmp[m], 1, i-1)] = substr(tmp[m], i+1)
# split the elements of tmp at the '='
# and add them into an associative array called months
# the value is the part which follows the '='
}
}
{
print "a~ST_SAP_FILE~Actual~"months[substr($3,0,3)]"~RM.txt"
}
Skip awk entirely:
Another option is to simply not use awk at all, which removes the burden of getting the array into a workable state. It's not clear by your question if this is a potential solution for you, but personally I found this bash version much simpler to write/read/understand.
#!/usr/bin/env bash
filename="ExtractOriginal.txt"
declare -A months=( ["JAN"]="AP01" ["FEB"]="AP02" ["MAR"]="AP03" ["APR"]="AP04" ["MAY"]="AP05" ["JUN"]="AP06" ["JUL"]="AP07" ["AUG"]="AP08" ["SEP"]="AP09" ["OCT"]="AP10" ["NOV"]="AP11" ["DEC"]="AP12")
while read line; do # for line in file
month_yr=`echo $line | cut -d',' -f3` # get the third column
month=${months[${month_yr:0:3}]} # get first 3 characters
echo 'a~ST_SAP_FILE~Actual~'$month'~RM.txt'
done <"$filename"

awk equivalent of LTRIM function in C

I need to delete leading 0s only from a string. I found that there is no in-built function like LTRIM as in C.
I'm thinking of the below AWK script to do that:
awk -F"," 'BEGIN { a[$1] }
for (v in a) {
{if ($v == 0) {delete a[$v]; print a;} else exit;}
}'
But guess I'm not declaring the array correctly, and it throws error. Sorry new to AWK programming. Can you please help me to put it together.
Using awk, as requested:
#!/usr/bin/awk -f
/^0$/ { print; next; }
/^0*[^0-9]/ { print; next; }
/^0/ { sub("^0+", "", $0); print; next; }
{ print $0; }
This provides for not trimming a plain "0" to an empty string, as well as avoiding the (probably) unwanted trimming of non-numeric fields. If the latter is actually desired behavior, the second pattern/action can be commented out. In either case, substitution is the way to go, since adding a number to a non-numeric field will generate an error.
Input:
0
0x
0000x
00012
Output:
0
0x
0000x
12
Output trimming non-numeric fields:
0
x
x
12
Here is a somewhat generic ltrim function that can be called as ltrim(s) or ltrim(s,c), where c is the character to be trimmed (assuming it is not a special regex character) and where c defaults to " ":
function ltrim(s,c) {if (c==""){c=" "} sub("^" c "*","",s); return s}
This can be called with 0, e.g. ltrim($0,0)
NOTE:
This will work for some special characters (e.g. "*"), but if you want to trim special characters, it would probably be simplest to call the appropriate sub() function directly.
Based on other recent questions you posted, you appear to be struggling with the basics of the awk language.
I will not attempt to answer your original question, but instead try to get you on the way in your investigation of the awk language.
It is true that the syntax of awk expressions is similar to c. However there are some important differences.
I would recommend that you spend some time reading a primer on awk and find some exercises. Try for instance the Gnu Awk Getting Started.
That said, there are two major differences with C that I will highlight here:
Types
Awk only uses strings and numbers -- it decides based on context whether it needs to treat input as text or as a number. In some cases
you may need to force conversion to string or to a number.
Structure
An Awk program always follows the same structure of a series of patterns, each followed by an action, enclosed in curly braces: pattern { action }:
pattern { action }
pattern { action }
.
.
.
pattern { action }
Patterns can be regular expressions or comparisons of strings or numbers.
If a pattern evaluates as true, the associated action is executed.
An empty pattern always triggers an action. The { action } part is optional and is equivalent to { print }.
An empty pattern with no action will do nothing.
Some patterns like BEGIN and END get special treatment. Before reading stdin or opening any files, awk will first collect all BEGIN statements in the program and execute their associated actions in order.
It will then start processing stdin or any files given and subject each line to all other pattern/action pairs in order.
Once all input is exhausted, all files are closed, and awk will process the actions belonging to all END patterns, again in order of appearance.
You can use BEGIN action to initialize variables. END actions are typically used to report summaries.
A warning: Quite often we see people trying to pass data from the shell by partially unquoting the awk script, or by using double quotes. Don't do this; instead, use the awk -v option to pass on parameters into the program:
a="two"
b="strings"
awk -v a=$a \
-v b=$b \
'BEGIN {
print a, b
}'
two strings
you can force awk to convert the field to a number and leading zeros by default will be eliminated.
e.g.
$ echo 0001 | awk '{print $1+0}'
1
If I understand correctly, and you just want to trim the leading '0's from a value in bash, you can use sed to provide precise regex control, or a simple loop works well -- and eliminates spawning a subshell with the external utility call. For example:
var=00104
Using sed:
$ echo "$var" | sed 's/^0*//'
104
or using a herestring to eliminate the pipe and additional subshell (bash only)
$ sed 's/^0*//' <<<$var
104
Using a simple loop with string indexes:
while [ "${var:0:1}" = '0' ]; do
var="${var:1}"
done
var will contain 104 following 2 iterations of the loop.

How to read in csv file to array in bash script

I have written the following code to read in my csv file (which has a fixed number of columns but not a fixed number of rows) into my script as an array. I need it to be a shell script.
usernames x1 x2 x3 x4
username1, 5 5 4 2
username2, 6 3 2 0
username3, 8 4 9 3
My code
#!/bin/bash
set oldIFS = $IFS
set IFS=,
read -a line < something.csv
another option I have used is
#!/bin/bash
while IFS=$'\t' reaad -r -a line
do
echo $line
done < something.csv
for both I tried some test code to see what the size of the array line would be and I seem to be getting a size of 10 with the first one but the array only outputs username. For the second one, I seem to be getting a size of 0 but the array outputs the whole csv.
Help is much appreciated!
You may consider using AWK with a regular expression in FS variable like this:
awk 'BEGIN { FS=",?[ \t]*"; } { print $1,"|",$2,"|",$3,"|",$4,"|",$5; }'
or this
awk 'BEGIN { FS=",?[ \t]*"; OFS="|"; } { $1=$1; print $0; }'
($1=$1 is required to rebuild $0 with new OFS)

Bash: Split a string into an array

First of all, let me state that I am very new to Bash scripting. I have tried to look for solutions for my problem, but couldn't find any that worked for me.
Let's assume I want to use bash to parse a file that looks like the following:
variable1 = value1
variable2 = value2
I split the file line by line using the following code:
cat /path/to/my.file | while read line; do
echo $line
done
From the $line variable I want to create an array that I want to split using = as a delimiter, so that I will be able to get the variable names and values from the array like so:
$array[0] #variable1
$array[1] #value1
What would be the best way to do this?
Set IFS to '=' in order to split the string on the = sign in your lines, i.e.:
cat file | while IFS='=' read key value; do
${array[0]}="$key"
${array[1]}="$value"
done
You may also be able to use the -a argument to specify an array to write into, i.e.:
cat file | while IFS='=' read -a array; do
...
done
bash version depending.
Old completely wrong answer for posterity:
Add the argument -d = to your read statement. Then you can do:
cat file | while read -d = key value; do
$array[0]="$key"
$array[1]="$value"
done
while IFS='=' read -r k v; do
: # do something with $k and $v
done < file
IFS is the 'inner field separator', which tells bash to split the line on an '=' sign.

Resources