Perl: Indexing function returning array syntax - arrays

I have a question about Perl more out of curiosity than necessity. I have seen there are many ways to do a lot of things in Perl, a lot of the time the syntax seems unintuitive to me (I've seen a few one liners doing som impressive stuff).
So.. I know the function split returns an array. My question is, how do I go about printing the first element of this array without saving it into a special variable? Something like $(split(" ",$_))[0] ... but one that works.

You're 99% there
$ perl -de0
Loading DB routines from perl5db.pl version 1.33
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(-e:1): 0
DB<1> $a = "This is a test"
DB<2> $b = (split(" ",$a))[0]
DB<3> p $b
This
DB<4> p "'$b'"
'This'

This should do it:
print ((split(" ", $_))[0]);
You need one set of parentheses to allow you to apply array indexing to the result of a function. The outer parentheses are needed to get around special parsing of print arguments.

Try this out to print the first element of a whitespace separated list. The \s+ regex matches one or more whitespace characters to split on.
echo "1 2 3 4" | perl -pe 'print +(split(/\s+/, $_))[0]'
Also, see this related post.

Related

Bash: Store sed result into array?

How to fix the following code so that it can store the result of sed, which will replace the _
with -?
My code:
names=()
for entry_ in $foo
do
names+=($entry_ | sed -e "s/_/-/g")
done
echo names
You don't need sed for this, you can use bash's built-in parameter expansion + substitution capability to replace all _ characters with -: ${var//_/-}. You can even use it to do this for the entire list of elements in a single operation, but how you do it depends on what the source variable, foo, actually is.
If foo is an array (the much better way to do things), you can combine [#] ("get me all elements of the array") with the substitution:
names=( "${foo[#]//_/-}" )
If foo is a plain string, and you need to use word splitting to break it into elements for the array, you can do essentially the same thing without the [#] ('cause it's not an array) or the double-quotes (which prevent word splitting):
names=( ${foo//_/-} )
Note: I recommend avoiding word splitting if possible -- it often does something close to what you want, but almost never exactly what you want.
P.s. I third the recommendation of shellcheck. Among other things, it'll flag anything involving word splitting as a probable mistake.
This should be enough to get you there.
names=()
names+=$(echo "hello_world" | sed -e "s/_/-/g")
echo $names
Note that you need $ before echoing your variable.
Also. Look into installing shellcheck for your code editor and it will help you catch sneaky bugs and build better shell programming practices.

How to use a variable to index ${array[*]} in bash?

First of all sorry because my english may be not good.
I want to use a variable to index an element in an array or use the same variable to index all the elements. For example:
...
var1="1"
var2="*"
array=(one two three for five)
for elem in ${array[$var1]}
do
echo $elem
done
When I use var1 to index in ${array[$var1]} it works correctly, but if I use var2 doesn't work correctly, I get this error:
./ed.sh line XXX *: syntax error: operand expected (error token is "*")
I'm pretty sure that the error is related with the * wildcard expansion, but I didn't find an answer that help me to solve this problem. So, how can I do it?
* and # are not considered regular elements in the array. They are not listed when iterating keys, and are not considered when expanding indirectly through index variables.
The bash source code has a function chk_atstar that checks whether [#] or [*] is being used, and you can see that it's done literally and not through any expansion:
else if (valid_array_reference (name, 0))
{
temp1 = mbschr (name, '[');
if (temp1 && temp1[1] == '#' && temp1[2] == ']')
{
If you really want to do this, you can go through variable indirection:
arr=(one two three)
index='*'
var="arr[$index]"
echo "${!var}"
though you may be better off not trying to treat these special array access modes as array elements.
I don't recommend this, but for completeness you can get this to work by cheating with the expansion order using eval:
eval items=\${array[$var2]}
for elem in $items
do
echo $elem
done
There are issues with this. eval is generally pronounced "evil" because there can be security implications in running code from a variable. There is usually a better way to do the job than using eval. In this case you should give some thought to the design.
There is also an issue if an element contains embedded whitespace. Add:
array+=('at the end')
After the array declaration and you'll see what I mean.
EDIT: After some deliberation, here is a way to do it without eval, and it supports embedded spaces or tabs (but not embedded newlines). Pretty it is not:
display_it() {
if [[ $1 = '*' ]]; then
oldIFS="$IFS"
IFS=$'\n'
echo "${array[*]}"
IFS="$oldIFS"
else
echo "${array[$1]}"
fi
}
var1="1"
var2="*"
array=(one two three for five)
array+=('at the end')
while read -r elem
do
echo $elem
done < <(display_it "$var2")
Displays:
one
two
three
for
five
at the end
At the end of the loop you will see process substitution where I call the function display_it. Each item read is separated by a newline, hence the swapping of the Internal Field Separator (IFS) in the function.

Perl, Pattern Matching each element ($line) in an array

I have a simple enough problem I think, I have recently ran a script which extracted specific information from the string in each element in an array. I have written this before and it functions well however when trying the very simple version of it right now it will not presen data only the same response uninitialized value argument! I am getting really frustrated as my previous code works. I am clearly doing something STUPID and would love some help!
#!/usr/bin/env perl
use strict;
use warnings;
my#histone;
my$line;
my$idea;
my$file="demo_site.txt";
open(IN, "<$file")||die"\ncannot be opend\n";
#histone=<IN>;
print #histone;
foreach $line(#histone)
{
$line=~ m/([a-zA-Z0-9]+)\t[0-9]+\t[0-9]+\t/;
print$1."\n";
print$2."\n";
print$3."\n";
}
The infile "demo_site.txt" takes the format of a tab delimited .txt file:
chr9 1234 5678 . 200 . 14.0 -1
This file has multiple lines as above and I wish to extract the first three items of data so the output looks as follows.
chr9
1234
5678
Cheers!
You don't really need a regular expression since it's tab delimited.
foreach $line(#histone)
{
#line_data = split(/\t/,$line)
print $line_data[0]."\n";
print $line_data[1]."\n";
print $line_data[2]."\n";
}
Edit:
If you want to assign the values to specific named variables, assign it in a temporary array.
($varA, $varB, $varC .... ) = split(/\t/,$line)
The actual problem here is that you're trying to print the values of $1, $2 and $3, but you only have one set of capturing parenthesis in your regex, so only $1 gets a value. $2 and $3 will remain undefined and hence give you that error when you try to print them.
The solution is to add two more sets of capturing parenthesis. I expect you want something like this:
$line=~ m/([a-zA-Z0-9]+)\t([0-9]+)\t([0-9]+)\t/;
Let's assume, that file.txt have what you want: (file.txt eq demo_site.txt )
chr9 1234 5678 . 200 . 14.0 -1
you can use simple thing:
perl -ane '$" = "\n"; print "#F[0..2]"' file.txt 1>output.txt
One-liners in Perl are powerful. And you don't need to write your scripts for simple tasks;)
Just open Terminal sometimes;)
P.S:
This is not very good one-liner, I know, but It do what It must.
$line=~ m/([a-zA-Z0-9]+)\t[0-9]+\t[0-9]+\t/)
First of all, the parens are not balanced.
Second, I haven't checked this, but don't you need a set of parens for each capture?
Third, as misplacedme said split() is definitely the way to go. ;)
If I may self-promote, you can use Tie::Array::CSV to give direct read-write access to the file as a Perl array of arrayrefs.
use strict;
use warnings;
use Tie::Array::CSV;
tie my #file, 'Tie::Array::CSV', 'demo_site.txt', sep_char => "\t";
print $file[0][0]; # first line before first tab
$file[2][1] = 10; # set the third line between the first and second tabs

How to read lines from a file into an array?

I'm trying to read in a file as an array of lines and then iterate over it with zsh. The code I've got works most of the time, except if the input file contains certain characters (such as brackets). Here's a snippet of it:
#!/bin/zsh
LIST=$(cat /path/to/some/file.txt)
SIZE=${${(f)LIST}[(I)${${(f)LIST}[-1]}]}
POS=${${(f)LIST}[(I)${${(f)LIST}[-1]}]}
while [[ $POS -le $SIZE ]] ; do
ITEM=${${(f)LIST}[$POS]}
# Do stuff
((POS=POS+1))
done
What would I need to change to make it work properly?
I know it's been a lot of time since the question was answered but I think it's worth posting a simpler answer (which doesn't require the zsh/mapfile external module):
#!/bin/zsh
for line in "${(#f)"$(</path/to/some/file.txt)"}"
{
// do something with each $line
}
#!/bin/zsh
zmodload zsh/mapfile
FNAME=/path/to/some/file.txt
FLINES=( "${(f)mapfile[$FNAME]}" )
LIST="${mapfile[$FNAME]}" # Not required unless stuff uses it
integer POS=1 # Not required unless stuff uses it
integer SIZE=$#FLINES # Number of lines, not required unless stuff uses it
for ITEM in $FLINES
# Do stuff
(( POS++ ))
done
You have some strange things in your code:
Why are you splitting LIST each time instead of making it an array variable? It is just a waste of CPU time.
Why don’t you use for ITEM in ${(f)LIST}?
There is a possibility to directly ask zsh about array length: $#ARRAY. No need in determining the index of the last occurrence of the last element.
POS gets the same value as SIZE in your code. Hence it will iterate only once.
Brackets are problems likely because of 3.: (I) is matching against a pattern. Do read documentation.
Let's say, for the purpose of example, that file.txt contains the following text:
one
two
three
The solution depends on whether or not you'd like to elide the empty lines in file.txt:
Creating an array lines from file file.txt, eliding empty lines:
typeset -a lines=("${(f)"$(<file.txt)"}")
print ${#lines}
Expected output:
3
Creating an array lines from file file.txt, without eliding empty lines:
typeset -a lines=("${(#f)"$(<file.txt)"}")
print ${#lines}
Expected output:
5
In the end, the difference in the resulting array is a result of whether or not the parameter expansion flag (#) is provided during brace expansion.
while read -r line;
do ARRAY+=("$line");
done < file.txt

KSH scripting: how to split on ',' when values have escaped commas?

I try to write KSH script for processing a file consisting of name-value pairs, several of them on each line.
Format is:
NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc
Suppose I write:
read l
IFS=","
set -A nvls $l
echo "$nvls[2]"
This will give me second name-value pair, nice and easy. Now, suppose that the task is extended so that values could include commas. They should be escaped, like this:
NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc
Obviously, my code no longer works, since "read" strips all quoting and second element of array will be just "NAME2 VALUE2_1".
I'm stuck with older ksh that does not have "read -A array". I tried various tricks with "read -r" and "eval set -A ....", to no avail. I can't use "read nvl1 nvl2 nvl3" to do unescaping and splitting inside read, since I dont know beforehand how many name-value pairs are in each line.
Does anyone have a useful trick up their sleeve for me?
PS
I know that I have do this in a nick of time in Perl, Python, even in awk. However, I have to do it in ksh (... or die trying ;)
As it often happens, I deviced an answer minutes after asking the question in public forum :(
I worked around the quoting/unquoting issue by piping the input file through the following sed script:
sed -e 's/\([^\]\),/\1\
/g;s/$/\
/
It converted the input into:
NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>
Now, I can parse this input like this:
while read name value ; do
echo "$name => $value"
done
Value will have its commas unquoted by "read", and I can stuff "name" and "value" in some associative array, if I like.
PS
Since I cant accept my own answer, should I delete the question, or ...?
You can also change the \, pattern to something else that is known not to appear in any of your strings, and then change it back after you've split the input into an array. You can use the ksh builtin pattern-substitution syntax to do this, you don't need to use sed or awk or anything.
read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}

Resources