Split array element delimited with '.' - arrays

I am trying to read below CSV file content line by line in Perl.
CSV File Content:
A7777777.A777777777.XXX3604,XXX,3604,YES,9
B9694396.B216905785.YYY0018,YYY,0018,YES,13
C9694396.C216905785.ZZZ0028,ZZZ,0028,YES,16
I am able to split line content using below code and able to verify the content too:
#column_fields1 = split(',', $_);
print $column_fields1[0],"\n";
I am also trying to find the second part on the first column of CSV file (i.e., A777777777 or B216905785 or C216905785) – the first column delimited with . using the below code and I am unable to get it.
Instead, just a new line printed.
my ($v1, $v2, $v3) = split(".", $column_fields1[0]);
print $v2,"\n";
Can someone suggest me how to split the array element and get the above value?
On my functionality, I need the first column value altogether at someplace and just only the second part at someplace.
Below is my code:
use strict;
use warnings;
my $dailybillable_tab_section1_file = "./sql/demanding_01_T.csv";
open(FILE, $dailybillable_tab_section1_file) or die "Could not read from $dailybillable_tab_section1_file, program halting.";
my #column_fields1;
my #column_fields2;
while (<FILE>)
{
chomp;
#column_fields1 = split(',', $_);
print $column_fields1[0],"\n";
my ($v1, $v2, $v3) = split(".",$column_fields1[0]);
print $v2,"\n";
if($v2 ne 'A777777777')
{
…
…
…
}
else
{
…
…
…
}
}
close FILE;

split takes a regex as its first argument. You can pass it a string (as in your code), but the contents of the string will simply be interpreted as a regex at runtime.
That's not a problem for , (which has no special meaning in a regex), but it breaks with . (which matches any (non-newline) character in a regex).
Your attempt to fix the problem with split "\." fails because "\." is identical to ".": The backslash has its normal string escape meaning, but since . isn't special in strings, escaping it has no effect. You can see this by just printing the resulting string:
print "\.\n"; # outputs '.', same as print ".\n";
That . is then interpreted as a regex, causing the problems you have observed.
The normal fix is to just pass a regex to split:
split /\./, $string
Now the backslash is interpreted as part of the regex, forcing . to match itself literally.
If you really wanted to pass a string to split (I'm not sure why you'd want to do that), you could also do it like this:
split "\\.", $string
The first backslash escapes the second backslash, giving a two character string (\.), which when interpreted as a regex means the same thing as /\./.

If you look at the documentation for split(), you'll see it gives the following ways to call the function:
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split
In three of those examples, the first argument to the function is /PATTERN/. That is, split() expects to be given a regular expression which defines how the input string is split apart.
It's very important to realise that this argument is a regex, not a string. Unfortunately, Perl's parser doesn't insist on that. It allows you to use a first argument which looks like a string (as you have done). But no matter how it looks, it's not a string. It's a regex.
So you have confused yourself by using code like this:
split(".",$COLUMN_FIELDS1[0])
If you had made the first argument look like a regex, then you would be more likely to realise that the first argument is a regex and that, therefore, a dot needs to be escaped to prevent it being interpreted as a metacharacter.
split(/\./, $COLUMN_FIELDS1[0])
Update: It's generally accepted among Perl programmers, that variable with upper case names are constants and don't change their values. By using upper case names for standard variables, you are likely to confuse the next person who edits your code (who could well be you in six months time).

Related

Replace a number in a file using array data, bash

I'm not an expert in bash coding and I'm trying to do one interative-like code to help me in my work.
I have a file that contains some numbers (coordinates), and I'm trying to make a code to read some specific numbers from the file and then store them in an array. Modify that array using some arithmetic operation and then replace the numbers in the original file with the modified array. So far I've done everything except replacing the numbers in the file, I tried using sed but it does not change the file. The original numbers are stored in an array called "readfile" and the new numbers are stored in an array called "d".
I'm trying to use sed in this way: sed -i 's/${readfile[$j]}/${d[$k]}/' file.txt
And I loop j and k to cover all the numbers in the arrays. Everything seems to work but the file is not being modified. After some digging, I'm noticing that sed is not reading the value of the array, but I do not know how to fix that.
Your help is really appreciated.
When a file isn't modified by sed -i, it means sed didn't find any matches to modify. Your pattern is wrong somehow.
After using " instead of ' so that the variables can actually be evaluated inside the string, look at the contents of the readfile array and check whether it actually matches the text. If it seems to match, look for special characters in the pattern, characters that would mean something specific to sed (the most common mistake is /, which will interfere with the search command).
The fix for special characters is either to (1) escape them, e.g. \/ instead of just /, or (2) (and especially for /) to use another delimiter for the search/replace command (instead of s/foo/bar/ you can use s|foo|bar| or s,foo,bar, etc - pretty much any delimiter works, so you can pick one that you know isn't in the pattern string).
If you post data samples and more of your script, we can look at where you went wrong.

ksh: remove last extension from a multiple extension filename

I have a filename in the format dir1/dir2/filename.txt.org and I like to rename this to dir1/dir2/filename.txt . how can this be done. I tried 'cut' with '.' separator but it also removes .txt
You can try korn shell variable expansion formats, instead of using a subprocess (e.g. cut) . This can be much faster.
example:
var1=dir1/dir2/filename.txt.org
var2=${var1%.*}
If you now print $var2 its value will be dir1/dir2/filename.txt
The % tells it to delete the smallest matching rightmost match for .* (which means anything following the rightmost period character).
${variable%pattern} - return the value of variable without the smallest ending portion that matches pattern.
Other variable expansion formats are available, it is worthwhile to study the docs.

Bash - loop through array of objects and combine them

I'm trying to create a for-loop to go through all the items from an array, and add the items to a string. The tags are given as a single string with format "tag1 tag2 tag3", and the tagging parameter can be given as many times as I want with the single command with syntax "-tag tag1 -tag -tag2 -tag tag3". I'm unable to create a for loop for the job, and I'm a little confused what is wrong with my code.
TAGS="asd fgh jkl zxc bnm" # Amount of tags varies, but there is always at least one
ARRAY=($TAGS)
TAGSTOBEADDED=""
for i in "$ARRAY[#]"
do
STRINGTOBEADDED="-tag ${ARRAY[$i]}"
$TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED
done
command $TAGSTOBEADDED
First, your array sintax is wrong as #oguz ismail said. To iter through array items you shold use this:
for i in "${ARRAY[#]}"; { echo $i;}
Second $TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED this is also fail.
Variables are set like so var="$var 123" you don't need $ in front of var name if you want to change it. Back to code. In this example you dont even need an array, just use TAGS var(without ""):
for i in $TAGS; { TAGSTOBEADDED+="-tag $i"; }
First: avoid storing lists of things in space-delimited strings (as you're currently doing with TAGS and TAGSTOBEADDED) -- there are a bunch of things that can go wrong if they have any "funny" characters (or if IFS gets changed). Use an array instead. Storing them as a string and then converting doesn't help; all of the same potential problems apply during the conversion.
I also recommend using lower- or mixed-case variable names in scripts, since there are a bunch of all-caps names with special meanings, and accidentally using one of those for something else can have weird effects. So, to define the array of tags, I'd just use this:
tags=(asd fgh jkl zxc bnm)
You also have a number of syntax errors in the script. In this line:
for i in "$ARRAY[#]"
... the shell will try to expand $ARRAY as a plain variable (not an array), and then treat "[#]" as just some unrelated characters that go after it. You need braces around the variable refence (like "${ARRAY[#]}") any time you're doing anything nontrivial with a variable reference. BTW, this idiom -- including double-quotes, braces, square-brackets and at-sign -- is what you almost always want when getting the contents of an array.
In this line:
STRINGTOBEADDED="-tag ${ARRAY[$i]}"
$i will expand to one of the array elements, not its index. That is, it'll expand to something like:
STRINGTOBEADDED="-tag ${ARRAY[asd]}"
...which doesn't make any sense. You just want
STRINGTOBEADDED="-tag $i"
...except you don't want that either, because (as I said before) storing lists of things space-delimited in a string is a bad idea. But I'll get to that because fixing it will involve the next line:
$TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED
There are two problems here: you don't want a dollar sign on the variable being assigned to ($varname gets the value of a variable; anytime you're setting it, don't use the $). Also the + isn't needed to add strings, you just stick them end to end. Well, you'd need to add a space in between, something like one of these:
TAGSTOBEADDED=$TAGSTOBEADDED" "$STRINGTOBEADDED
TAGSTOBEADDED="$TAGSTOBEADDED $STRINGTOBEADDED"
(Generally, you should have double-quotes around all variable references; on the right side of a plain assignment is one of the few places it's safe to leave them unquoted, but I tend to prefer to just double-quote always rather than try to remember all of the exceptions about where it's safe and where it isn't. Plus, quoting just the space looks weird.)
But you don't want to do that either, because (again) space-delimited strings are a bad way to do things. Use an array. So before the loop, create an empty array instead of an empty string:
tagstobeadded=()
...and then inside the loop, append to it with +=( ):
tagstobeadded+=(-tag "$i")
...and then at the end, use it with all the appropriate quotes, braces, etc:
command "${tagstobeadded[#]}"
So, with all of these changes, here's what I'd recommend:
tags=(asd fgh jkl zxc bnm)
tagstobeadded=()
for i in "${tags[#]}"
do
tagstobeadded+=(-tag "$i")
done
command "${tagstobeadded[#]}"

PowerShell: Input Parameter [Array] element automatically/incorrectly converts to wrong datatype

PowerShell's automatically presuming that the value '9e9' (in the COMMAND below) is an integer; since the value isn't surrounded by quotes.
I'm looking for any clever way to either treat all array elements/values as strings (no matter what) or force PowerShell to treat any values with an 'e' character in the middle, as a string... or, any other possible way before the its automatically casted as an integer WITHOUT having to surround the value with quotes.
COMMAND:
Get-Input -var 9e9, ba7
CODE:
function Get-Input {
[CmdletBinding()]
param(
[array[]]$Vars
)
$Var = $Vars[0]
write-output $Var
}
**** UPDATE ****
TessellatingHeckler has the best solution at the end of his answer. I just made one minor modification to strip spaces. I still think it's all PowerShell's fault. j/k. I really appreciate the time TessellatingHeckler to break it all down and still provide a sensible solution. Thanks TessellatingHeckler!
COMMAND:
Get-Input -vars "9e9, ba7"
NEW CODE:
function Get-Input {
[CmdletBinding()]
param(
[string[]]$Vars
)
$Vars = $Vars.replace(' ', "")
$Vars = $Vars.split(",")
$Var = $Vars[0]
write-output $Var
}
PowerShell's automatically presuming that the value '9e9' (in the COMMAND below) is an integer; since the value isn't surrounded by quotes.
Admittedly, that's got me a bit annoyed at the 'blame the tool I like' wording; 'Presume': "be arrogant or impertinent enough to do something"
It's in the language spec (section 2.3.5.12 Real Literals) that "number e number" is a way of writing a real number ([double]).
That's not PowerShell presuming anything, you're telling it to interpret it that way by using the "this is a number" syntax instead of "this is a string" syntax (quotes).
Anyway.
I'm looking for any clever way to either treat all array elements/values as strings (no matter what) [..]
before it's automatically casted as an integer WITHOUT having to surround the value with quotes.
The array value is 9000000000, if you could force it to be a string, the string would be "9000000000" - i.e. at that point, it's already too late to change it back.
In the same way that 1/2 means 0.5, you can't convert 0.5 to a string and expect to get 1/2 back out.
It's not being cast as a number, it is a number in the PowerShell language.
That sounds a bit like I'm repeating the same point, but the significance this time is that anything you do in the function, such as setting the parameter type to [string[]] is too late to make a difference, the array already has the undesired interpretation in it. Any intervention would have to happen sooner.
force PowerShell to treat any values with an 'e' character in the middle, as a string... or, any other possible way
As it scans Get-Input -var 9e9, ba7 it reads a GenericToken for 'Get-Input' which is going to turn into a command.
It hits '-' and starts reading an operator or a parameter, reads 'var' up to the space and doesn't match an operator ('-eq', '-gt', etc) so treats it as a parameter.
It reads the '9' and starts reading a number.
There's only two ways out of this - if the number is valid, it's read as a number. That happens now and you don't want it.
If the number is not valid, e.g. 7z.exe starts with a number, but is not one, then it fails, backtracks out, and treats it as a GenericToken for an argument.
So you could tack on something else to force it to be a string 9e9z. But you can't write '9e9' and have it become a string without quotes.
About the only thing you can do is use something else as a separator (not a comma) which would force the entire argument to be one string - that means no spaces either - and then split it yourself into text pieces inside the function.
function Get-Input {
[CmdletBinding()]
param(
[string]$VarsText
)
$Vars = $VarsText.Split('.')
$Var = $Vars[0]
write-output $Var
}
Get-Input 9e9.bab
Which will only confuse everyone, and would likely read more helpfully if you did
Get-Input "9e9,bab,..."
and used the comma everyone is familiar with, inside one quoted string.

Bash and Double-Quotes passing to argv

I have re-purposed this example to keep it simple, but what I am trying to do is get a nested double-quote string as a single argv value when the bash shell executes it.
Here is the script example:
set -x
command1="key1=value1 \"key2=value2 key3=value3\""
command2="keyA=valueA keyB=valueB keyC=valueC"
echo $command1
echo $command2
the output is:
++ command1='key1=value1 "key2=value2 key3=value3"'
++ command2='keyA=valueA keyB=valueB keyC=valueC'
++ echo key1=value1 '"key2=value2' 'key3=value3"'
key1=value1 "key2=value2 key3=value3"
++ echo keyA=valueA keyB=valueB keyC=valueC
keyA=valueA keyB=valueB keyC=valueC
I did test as well, that when you do everything on the command line, the nested quote message IS set as a single argv value. i.e.
prog.exe argument1 "argument2 argument3"
argv[0] = prog.exe
argv[1] = argument1
argv[2] = argument2 argument3
Using the above example:
command1="key1=value1 \"key2=value2 key3=value3\""
The error is, my argv is comming back like:
arg[1] = echo
arg[2] = key1=value1
arg[3] = "key2=value2
arg[4] = key3=value3"
where I really want my argv[3] value to be "key2=value2 key3=value3"
I noticed that debug (set -x) shows a single-quote at the points where my arguments get broken which kinda indicates that it is thinking about the arguments at these break point...just not sure.
Any idea what is really going on here? How can I change the script?
Thanks in advance.
What is happening is that your nested quotes are literal and not parsed into separate arguments by the shell. The best way to handle this using bash is to use an array instead of a string:
args=('key1=value1', 'key2=value2 key3=value3')
prog.exe "${args[#]}"
The Bash FAQ50 has some more examples and use cases for dynamic commands.
A kind of crazy "answer" is to set IFS to double quote like this (save/restore original IFS):
SAVED_IFS=$IFS
IFS=$'\"'
prog.exe $command1
IFS=$SAVED_IFS
It kind of illustrates word splitting which occurs on unquoted arguments but does not affect variables or text inside ".." quotes. Text inside double quotes (after various expansions) is passed to the program as a single argument. However a bare variable $command1 (unquoted) undergoes word splitting which does not care about " inside the variable (taking it literal). A stupid IFS hack forces word splitting to be made at ". Also beware of a trailing whitespace at the end of argv[1] which appears because of word splitting at the " boundary.
jordanm's answer is much better for production use than mine :) The array is quoted, i.e. each array element is expanded as individual string and no word splitting occurs afterwards. This is essential. If it is unquoted like ${args[#]} it would be word split into three arguments instead of two.

Resources