Split XML line in bash ignoring spaces in quotes - arrays

I've seen some suggestions to use eval, but that assumes the quoted text is an entire string in itself, and not part of one.
Simple example:
Split the string
<SERVICETYPE Name="Two words">
So that we get
<SERVICETYPE
Name="Two words">
Is this possible? Ideally in a statement that I can then use to loop through the values. (Yes, I know perl or something would be easier, but I don't have anything more useful than bash availale, so I have to get this working).
I'm currently splitting into an array with the following
IFS=" " read -ra xmlfield <<< ${xmlline}
for i in "${xmlfield[#]}"; do
But then of course that gives me:
<SERVICETYPE
Name="Two
words">
Which is a pain.

Related

Postgres query result into CSV in terminal wrongly quotes text values

I am using following postgres command in terminal to output very large query result into CSV format:
psql -d ecoprod -t -A -F"," -f queries/query.sql > exports/output.csv
It works just fine except its not valid CSV format. Text values should be wrapped in quotes "". Its not and its causing many problems parsing the CSV when there are commas in the text and so on.
Of course I could use another delimiter like semicolon however its the similar problem. In addition some text values contain line break characters which also breaks the parsing.
Didnt find any way to modify the command in documentation. Hope you will help me. Thank you.
-F doesn't promise to generate valid CSV. There is a --csv option you could use instead, which is at least intended for this purpose. But it seems like COPY or \copy would be more suited.

Replace a number in a file using array data, bash

I'm not an expert in bash coding and I'm trying to do one interative-like code to help me in my work.
I have a file that contains some numbers (coordinates), and I'm trying to make a code to read some specific numbers from the file and then store them in an array. Modify that array using some arithmetic operation and then replace the numbers in the original file with the modified array. So far I've done everything except replacing the numbers in the file, I tried using sed but it does not change the file. The original numbers are stored in an array called "readfile" and the new numbers are stored in an array called "d".
I'm trying to use sed in this way: sed -i 's/${readfile[$j]}/${d[$k]}/' file.txt
And I loop j and k to cover all the numbers in the arrays. Everything seems to work but the file is not being modified. After some digging, I'm noticing that sed is not reading the value of the array, but I do not know how to fix that.
Your help is really appreciated.
When a file isn't modified by sed -i, it means sed didn't find any matches to modify. Your pattern is wrong somehow.
After using " instead of ' so that the variables can actually be evaluated inside the string, look at the contents of the readfile array and check whether it actually matches the text. If it seems to match, look for special characters in the pattern, characters that would mean something specific to sed (the most common mistake is /, which will interfere with the search command).
The fix for special characters is either to (1) escape them, e.g. \/ instead of just /, or (2) (and especially for /) to use another delimiter for the search/replace command (instead of s/foo/bar/ you can use s|foo|bar| or s,foo,bar, etc - pretty much any delimiter works, so you can pick one that you know isn't in the pattern string).
If you post data samples and more of your script, we can look at where you went wrong.

Bash - loop through array of objects and combine them

I'm trying to create a for-loop to go through all the items from an array, and add the items to a string. The tags are given as a single string with format "tag1 tag2 tag3", and the tagging parameter can be given as many times as I want with the single command with syntax "-tag tag1 -tag -tag2 -tag tag3". I'm unable to create a for loop for the job, and I'm a little confused what is wrong with my code.
TAGS="asd fgh jkl zxc bnm" # Amount of tags varies, but there is always at least one
ARRAY=($TAGS)
TAGSTOBEADDED=""
for i in "$ARRAY[#]"
do
STRINGTOBEADDED="-tag ${ARRAY[$i]}"
$TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED
done
command $TAGSTOBEADDED
First, your array sintax is wrong as #oguz ismail said. To iter through array items you shold use this:
for i in "${ARRAY[#]}"; { echo $i;}
Second $TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED this is also fail.
Variables are set like so var="$var 123" you don't need $ in front of var name if you want to change it. Back to code. In this example you dont even need an array, just use TAGS var(without ""):
for i in $TAGS; { TAGSTOBEADDED+="-tag $i"; }
First: avoid storing lists of things in space-delimited strings (as you're currently doing with TAGS and TAGSTOBEADDED) -- there are a bunch of things that can go wrong if they have any "funny" characters (or if IFS gets changed). Use an array instead. Storing them as a string and then converting doesn't help; all of the same potential problems apply during the conversion.
I also recommend using lower- or mixed-case variable names in scripts, since there are a bunch of all-caps names with special meanings, and accidentally using one of those for something else can have weird effects. So, to define the array of tags, I'd just use this:
tags=(asd fgh jkl zxc bnm)
You also have a number of syntax errors in the script. In this line:
for i in "$ARRAY[#]"
... the shell will try to expand $ARRAY as a plain variable (not an array), and then treat "[#]" as just some unrelated characters that go after it. You need braces around the variable refence (like "${ARRAY[#]}") any time you're doing anything nontrivial with a variable reference. BTW, this idiom -- including double-quotes, braces, square-brackets and at-sign -- is what you almost always want when getting the contents of an array.
In this line:
STRINGTOBEADDED="-tag ${ARRAY[$i]}"
$i will expand to one of the array elements, not its index. That is, it'll expand to something like:
STRINGTOBEADDED="-tag ${ARRAY[asd]}"
...which doesn't make any sense. You just want
STRINGTOBEADDED="-tag $i"
...except you don't want that either, because (as I said before) storing lists of things space-delimited in a string is a bad idea. But I'll get to that because fixing it will involve the next line:
$TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED
There are two problems here: you don't want a dollar sign on the variable being assigned to ($varname gets the value of a variable; anytime you're setting it, don't use the $). Also the + isn't needed to add strings, you just stick them end to end. Well, you'd need to add a space in between, something like one of these:
TAGSTOBEADDED=$TAGSTOBEADDED" "$STRINGTOBEADDED
TAGSTOBEADDED="$TAGSTOBEADDED $STRINGTOBEADDED"
(Generally, you should have double-quotes around all variable references; on the right side of a plain assignment is one of the few places it's safe to leave them unquoted, but I tend to prefer to just double-quote always rather than try to remember all of the exceptions about where it's safe and where it isn't. Plus, quoting just the space looks weird.)
But you don't want to do that either, because (again) space-delimited strings are a bad way to do things. Use an array. So before the loop, create an empty array instead of an empty string:
tagstobeadded=()
...and then inside the loop, append to it with +=( ):
tagstobeadded+=(-tag "$i")
...and then at the end, use it with all the appropriate quotes, braces, etc:
command "${tagstobeadded[#]}"
So, with all of these changes, here's what I'd recommend:
tags=(asd fgh jkl zxc bnm)
tagstobeadded=()
for i in "${tags[#]}"
do
tagstobeadded+=(-tag "$i")
done
command "${tagstobeadded[#]}"

Split array element delimited with '.'

I am trying to read below CSV file content line by line in Perl.
CSV File Content:
A7777777.A777777777.XXX3604,XXX,3604,YES,9
B9694396.B216905785.YYY0018,YYY,0018,YES,13
C9694396.C216905785.ZZZ0028,ZZZ,0028,YES,16
I am able to split line content using below code and able to verify the content too:
#column_fields1 = split(',', $_);
print $column_fields1[0],"\n";
I am also trying to find the second part on the first column of CSV file (i.e., A777777777 or B216905785 or C216905785) – the first column delimited with . using the below code and I am unable to get it.
Instead, just a new line printed.
my ($v1, $v2, $v3) = split(".", $column_fields1[0]);
print $v2,"\n";
Can someone suggest me how to split the array element and get the above value?
On my functionality, I need the first column value altogether at someplace and just only the second part at someplace.
Below is my code:
use strict;
use warnings;
my $dailybillable_tab_section1_file = "./sql/demanding_01_T.csv";
open(FILE, $dailybillable_tab_section1_file) or die "Could not read from $dailybillable_tab_section1_file, program halting.";
my #column_fields1;
my #column_fields2;
while (<FILE>)
{
chomp;
#column_fields1 = split(',', $_);
print $column_fields1[0],"\n";
my ($v1, $v2, $v3) = split(".",$column_fields1[0]);
print $v2,"\n";
if($v2 ne 'A777777777')
{
…
…
…
}
else
{
…
…
…
}
}
close FILE;
split takes a regex as its first argument. You can pass it a string (as in your code), but the contents of the string will simply be interpreted as a regex at runtime.
That's not a problem for , (which has no special meaning in a regex), but it breaks with . (which matches any (non-newline) character in a regex).
Your attempt to fix the problem with split "\." fails because "\." is identical to ".": The backslash has its normal string escape meaning, but since . isn't special in strings, escaping it has no effect. You can see this by just printing the resulting string:
print "\.\n"; # outputs '.', same as print ".\n";
That . is then interpreted as a regex, causing the problems you have observed.
The normal fix is to just pass a regex to split:
split /\./, $string
Now the backslash is interpreted as part of the regex, forcing . to match itself literally.
If you really wanted to pass a string to split (I'm not sure why you'd want to do that), you could also do it like this:
split "\\.", $string
The first backslash escapes the second backslash, giving a two character string (\.), which when interpreted as a regex means the same thing as /\./.
If you look at the documentation for split(), you'll see it gives the following ways to call the function:
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split
In three of those examples, the first argument to the function is /PATTERN/. That is, split() expects to be given a regular expression which defines how the input string is split apart.
It's very important to realise that this argument is a regex, not a string. Unfortunately, Perl's parser doesn't insist on that. It allows you to use a first argument which looks like a string (as you have done). But no matter how it looks, it's not a string. It's a regex.
So you have confused yourself by using code like this:
split(".",$COLUMN_FIELDS1[0])
If you had made the first argument look like a regex, then you would be more likely to realise that the first argument is a regex and that, therefore, a dot needs to be escaped to prevent it being interpreted as a metacharacter.
split(/\./, $COLUMN_FIELDS1[0])
Update: It's generally accepted among Perl programmers, that variable with upper case names are constants and don't change their values. By using upper case names for standard variables, you are likely to confuse the next person who edits your code (who could well be you in six months time).

How to create n variables in BASH?

I'm trying to create n variables in bash...ideally an array with these n variables so that I may later go through and assign them to columns I read in from a csv file. I guess I'm just really confusing myself with syntax. Help is much appreciated!
The easiest way in bash to read a line from a CSV file and put it into an array is:
IFS=, read -r -a ARRAY < filename
The IFS=, at the beginning tells read to use , as a field separator. The option -a ARRAY tells read to put the results in a bash array named ARRAY (you could use any name; it doesn't need to be uppercase).
You would normally want to do that in a loop, something like:
while IFS=, read -r -a ARRAY; do
# do something with ARRAY
done < filename
This is not a very robust technique, since it will not work with quoted fields and especially not with embedded commas in quoted fields. There are CSV-parsing libraries for most languages; if you have any familiarity with Python, it might be a good choice.

Resources