How do I store output of a git command in a Perl array or scalar? - arrays

In my code I have
git diff --numstat
I know I could create a file with
git diff --numstat > log.log
But is it even possible to pass this into an array or scalar of some sort? I was thinking something like this but I'm not sure why it doesn’t compile.
my #array;
push (#array, git diff --numstat);

Use backticks, also known more generally as qx//:
qx/STRING/
`STRING`
A string which is (possibly) interpolated and then executed as a system command with /bin/sh or its equivalent. Shell wildcards, pipes, and redirections will be honored. The collected standard output of the command is returned; standard error is unaffected. In scalar context, it comes back as a single (potentially multi-line) string, or undef if the command failed. In list context, returns a list of lines (however you've defined lines with $/ or $INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
You have options, and which is better depends on what you want to do with the output.
To read all of the standard output into a scalar, use the operator in scalar context as in
$output = `git diff --numstat`;
In list context with the default value of $/, perl splits the output into separate lines. If you want to append the git output to the end of an existing array, use push, as in
push #array, `git diff --numstat`;
Although you mentioned push specifically in your question, I’m having a hard time imagining why you’d mix the output of git with something else. Storing the output in an array directly is simpler:
#array = `git diff --numstat`;
Note that the list of lines returned retain their end-of-line characters. To create a new array and remove all of the newlines in one line, write
chomp(#array = `git diff --numstat`);
or even
chomp(my #array = `git diff --numstat`);
if you’re running under use strict.
Error Handling
For code that you plan to use more than once or twice, you should check that `git diff --numstat`, or any other command whose output you want to read, actually succeeded. Otherwise, with the warnings pragma enabled, you’ll see lots of diagnostic messages about undefined variables or missing output.
In scalar context, failure will return the undefined value. Check it as in
my $output = `git diff --numstat`;
die "$0: git may not be installed" unless defined $output;
Failure in list context produces an empty list.
my #output = `git diff --numstat`;
die "$0: git may not be installed" unless #output;

Related

Loop thru a filename list and iterate thru a variable/array removing all strings from filenames with bash

I have a list of strings that I have in a variable and would like to remove those strings from a list of filenames. I'm pulling that string from a file that I can add to and modify over time. Some of the strings in the variable may include part of the item needed to be removed while the other may be another line in the list. Thats why I need to loop thru the entire variable list.
I'm familiar using a while loop to loop thru a list but not sure how I can loop thru each line to remove all strings from that filename.
Here's an example:
getstringstoremove=$(cat /text/from/some/file.txt)
echo "$getstringstoremove"
# Or the above can be an array
getstringstoremove=$(cat /text/from/some/file.txt)
declare -a arr=($getstringstoremove)
the above 2 should return the following lines
-SOMe.fil
(Ena)M-3_1
.So[Me].filEna)M-3_2
SOMe.fil(Ena)M-3_3
Here's the loop I was running to grab all filenames from a directory and remove anything other than the filenames
ls -l "/files/in/a/folder/" | awk -v N=9 '{sep=""; for (i=N; i<=NF; i++) {printf("%s%s",sep,$i); sep=OFS}; printf("\n")}' | while read line; do
echo "$line"
returns the following result after each loop
# 1st loop
ilikecoffee1-SOMe.fil(Ena)M-3_1.jpg
# iterate thru $getstringstoremove to remove all strings from the above file.
# 2nd loop
ilikecoffee2.So[Me].filEna)M-3_2.jpg
# iterate thru $getstringstoremove again
# 3rd loop
ilikecoffee3SOMe.fil(Ena)M-3_3.jpg
# iterate thru $getstringstoremove and again
done
the final desired output would be the following
ilikecoffee1.jpg
ilikecoffee2.jpg
ilikecoffee3.jpg
I'm running this in bash on Mac.
I hope this makes sense as I'm stuck and can use some help.
If someone has a better way of doing this by all means it doesn't have to be the way I have it listed above.
You can get the new filenames with this awk one-liner:
$ awk 'NR==FNR{a[$0];next} {for(i in a){n=index($0,i);if(n){$0=substr($0,0,n-1)substr($0,n+length(i))}}} 1' rem.txt files.lst
This assumes your exclusion strings are in rem.txt and there's a files list in files.lst.
Spaced out for easier commenting:
NR==FNR { # suck the first file into the indices of an array,
a[$0]
next
}
{
for (i in a) { # for each file we step through the array,
n=index($0,i) # search for an occurrence of this string,
if (n) { # and if found,
$0=substr($0,0,n-1)substr($0,n+length(i))
# rewrite the line with the string missing,
}
}
}
1 # and finally, print the line.
If you stow the above script in a file, say foo.awk, you could run it as:
$ awk -f foo.awk rem.txt files.lst
to see the resultant files.
Note that this just shows you how to build new filenames. If what you want is to do this for each file in a directory, it's best to avoid running your renames directly from awk, and use shell constructs designed for handling files, like a for loop:
for f in path/to/*.jpg; do
mv -v "$f" "$(awk -f foo.awk rem.txt - <<<"$f")"
done
This should be pretty obvious except perhaps for the awk options, which are:
-f foo.awk, use the awk script from this filename,
rem.txt, your list of removal strings,
-, a hyphen indicating that standard input should be used IN ADDITION to rem.txt, and
<<<"$f", a "here-string" to provide that input to awk.
Note that this awk script will work with both gawk and the non-GNU awk that is included in macos.
I think I have understood what you mean, and I would do it with Perl which comes built-in to the standard macOS - so nothing to install.
I assume you have a file called remove.txt with your list of stuff to remove, and that you want to run the script on all files in your current directory. If so, the script would be:
#!/usr/local/bin/perl -w
use strict;
# Load the strings to remove into array "strings"
my #strings = `cat remove.txt`;
for(my $i=0;$i<$#strings;$i++){
# Strip carriage returns and quote metacharacters - e.g. *()[]
chomp($strings[$i]);
$strings[$i] = quotemeta($strings[$i]);
}
# Iterate over all filenames
my #files = glob('*');
foreach my $file (#files){
my $new = $file;
# Iterate over replacements
foreach my $string (#strings){
$new =~ s/$string//;
}
# Check if name would change
if($new ne $file){
if( -f $new){
printf("Cowardly refusing to rename %s as %s since it involves overwriting\n",$file,$new);
} else {
printf("Rename %s as %s\n",$file,$new);
# rename $file,$new;
}
}
}
Then save that in your HOME directory as renamer. Make it executable - only necessary once - with this command in Terminal:
chmod +x $HOME/renamer
Then you can go in any directory where you madly named files are and run the script like this:
cd path/to/mad/files
$HOME/renamer
As with all things you download off the Internet, make a backup first and just run on a small, copied, subset of your files till you get the idea of how it works.
If you use homebrew as your package manager, you could install rename using:
brew install rename
You could then take all the Perl from my other answer and condense it down to a couple of lines and embed it in a rename command which would give you the added benefit of being able to do dry-runs etc. The code below does exactly the same as my other answer but is somewhat harder to read for non_perl folk.
Your command would simply be:
rename --dry-run '
my #strings = map { s/\r|\n//g; $_=quotemeta($_) } `cat remove.txt`;
foreach my $string (#strings){ s/$string//; } ' *
Sample Output
'ilikecoffee(Ena)M-3_1' would be renamed to 'ilikecoffee'
'ilikecoffee-SOMe.fil' would be renamed to 'ilikecoffee'
'ilikecoffee.So[Me].filEna)M-3_2' would be renamed to 'ilikecoffee'
To try and understand it, remember:
the rename part applies the following Perl to each file because of the asterisk at the end
the #strings part reads all the strings from the file remove.txt and removes any carriage returns and linefeeds from them and quotes any metacharacters
the foreach applies each of the deletions to the current filename which rename stores in $_ for you
Note that this method trades simplicity for performance somewhat. If you have millions of files to do, the other method will be quicker because here I read the remove.txt file for each and every file whose name is checked, but if you only have a few hundred/thousand files, I doubt you'll notice it.
This should be much the same, just shorter:
rename --dry-run '
my #strings = `cat remove.txt`; chomp #strings;
foreach my $string (#strings){ s/\Q$string\E//; } ' *

Perl: STDOUT/the output of shell command to an array directly

I have to access a shell command - hive within a Perl script, So I use `...`.
Assuming the result of `hive ... ...` contains 100000000 lines and is 20GB size.
what I want to achieve is like this:
#array = `hive ... ...`;
Does `` automatically know to use "\n" as separator to divide each line into the #array?
The 2 ways I can thought of are (but with problem in this case):
$temp = `hive ... ...`;
#array = split ( "\n", $temp );
undef $temp;
The problem of this way is that if the output of hive is too big in this case, the $temp cant store the output, resulting in segmentation fault core dump.
OR
`hive ... ... 1>temp.txt`;
open ( FP, <, "temp.txt" );
while (<FP>)
{
chomp;
push #array, $_;
}
close FP;
`rm temp.txt`;
But this way would be too slow, because it writes result first to hard-disk.
Is there a way to write the output of a shell command directly to an array without using any 'temporary container'?
Very Thanks for helping.
#array = `command`;
does, in fact, put each line of output from command into its own element of #array. There is no need to load the output into a scalar and split it yourself.
But 20GB of output stored in an array (and possibly 2-3 times that amount due to the way that Perl stores data) will still put an awful strain on your system.
The real solution to your problem is to stream the output of your command through an IO handle, and deal with one line at a time without having to load all of the output into memory at once. The way to do that is with Perl's open command:
open my $fh, "-|", "command";
open my $fh, "command |";
The -| filemode or the | appended to the command tells Perl to run an external command, and to make the output of that command available in the filehandle $fh.
Now iterate on the filehandle to receive one line of output at a time.
while (<$fh>) {
# one line of output is now in $_
do_something($_);
}
close $fh;

using perl array as input to bash bedtools command

I'm wondering if it is possible to use a perl array as the input to a program called bedtools ( http://bedtools.readthedocs.org/en/latest/ )
The array is itself generated by bedtools via the backticks method in perl. When I try to use the perl array in another bedtools bash command it complains that the argument list is too long because it seems to treat each word or number in the array as a separate argument.
Example code:
my #constit_super = `bedtools intersect -wa -a $enhancers -b $super_enhancer`;
that works fine and can be viewed by:
print #constit_super
which looks like this onscreen:
chr10 73629894 73634938
chr10 73636240 73639574
chr10 73639726 73657218
but then if I try to use this array in bedtools again e.g.
my $bedtools = `bedtools merge -i #constit_super`;
then i get this error message:
Can't exec "/bin/sh": Argument list too long
Is there anyway to use this perl array in bedtools?
many thanks
27/9/14 thanks for the info on doing it via a file. however, sorry to be a pain I would really like to do this without writing a file if possible.
I haven't tested this but I think it would work.
bedtools is expecting one argument with the -i flag, the name of a .bed file. This was in the docs. You need to write your array to a file and then input it into the bedtools merge command.
open(my $fh, '>', "input.bed") or die $!;
print $fh join("", #constit_super);
close $fh;
Then you can sort it with this command from the docs:
`sort -k1,1 -k2,2n input.bed > input.sorted.bed`;
Finally, you can run your merge command.
my $bedtools = `bedtools merge -i input.sorted.bed`;
Hopefully this sets you on the right track.

how to enumerate multiple arrays using same base name

I am trying to create multiple arrays holding random lists of file names referencing the number of elements in another array. How can I append a $cntr var (beginning with cntr=0) to the end of the new array names so they are directly referenced with elements in other array?
Wow I hope that reads somewhat sensible. Here is what I got going on so far that I hope helps make better sense of what I mean:
function fGenRanList() {
cntr=0
while [[ "$cntr" -lt "${#mTypeAr[#]}" ]] ; do
n="${nAr[$cntr]}" ; echo "\$n: $n"
tracks${cntr}=() ; echo "\$tracks${cntr}: $tracks${cntr}"
while ((n > 0)) && IFS= read -rd $'\0' ; do
tracks${cntr}+=("$REPLY")
((n--))
done < <(sort -zuR <(find "${dirAr[$cntr]}" -type f \( -name '*.mp3' -o -name '*.ogg' \) -print0))
((cntr++))
done
}
error I get is:
/home/user/bin/ranSong_multDirs.sh: line 95: syntax error near unexpected token `"$REPLY"'
/home/user/bin/ranSong_multDirs.sh: line 95: ` tracks${cntr}+=("$REPLY")'
But I first commentted out the echo statements from the tracks${cntr}=() array initialization to get rid of a similar error, but unsure whether or not track${cntr} gets initialized in the first place.
By the end I should end up with as many track(n) arrays as there are elements in ${#mTypeAr[#]}, using the numeric var stored in array ${nAr[$cntr]} to determine how many elements each track array will contain.
Maybe I am making things more difficult than need be, trying to implement arrays into older scripts I have both in order to make them a little more efficient, but I guess am driven primarily to get a better handle on using BASH arrays to store vars for similar but multiple processes which I seem to do often in my scripts.
Change this line, which is not valid bash syntax,
tracks${cntr}+=("$REPLY")
to
declare "tracks${cntr}+=($REPLY)"
Rather than having a syntactic assignment, the declare command takes a string that *look*s like an assignment as an argument; that argument is processed by the shell first, so if cntr is currently 3 and $REPLY is foo, the actual assignment performed is
tracks3+=(foo)
The declare command gives you a level of indirection in making parameter assignments.

Exporting an array in bash script

I can not export an array from a bash script to another bash script like this:
export myArray[0]="Hello"
export myArray[1]="World"
When I write like this there are no problem:
export myArray=("Hello" "World")
For several reasons I need to initialize my array into multiple lines. Do you have any solution?
Array variables may not (yet) be exported.
From the manpage of bash version 4.1.5 under ubuntu 10.04.
The following statement from Chet Ramey (current bash maintainer as of 2011) is probably the most official documentation about this "bug":
There isn't really a good way to encode an array variable into the environment.
http://www.mail-archive.com/bug-bash#gnu.org/msg01774.html
TL;DR: exportable arrays are not directly supported up to and including bash-5.1, but you can (effectively) export arrays in one of two ways:
a simple modification to the way the child scripts are invoked
use an exported function to store the array initialisation, with a simple modification to the child scripts
Or, you can wait until bash-4.3 is released (in development/RC state as of February 2014, see ARRAY_EXPORT in the Changelog). Update: This feature is not enabled in 4.3. If you define ARRAY_EXPORT when building, the build will fail. The author has stated it is not planned to complete this feature.
The first thing to understand is that the bash environment (more properly command execution environment) is different to the POSIX concept of an environment. The POSIX environment is a collection of un-typed name=value pairs, and can be passed from a process to its children in various ways (effectively a limited form of IPC).
The bash execution environment is effectively a superset of this, with typed variables, read-only and exportable flags, arrays, functions and more. This partly explains why the output of set (bash builtin) and env or printenv differ.
When you invoke another bash shell you're starting a new process, you loose some bash state. However, if you dot-source a script, the script is run in the same environment; or if you run a subshell via ( ) the environment is also preserved (because bash forks, preserving its complete state, rather than reinitialising using the process environment).
The limitation referenced in #lesmana's answer arises because the POSIX environment is simply name=value pairs with no extra meaning, so there's no agreed way to encode or format typed variables, see below for an interesting bash quirk regarding functions , and an upcoming change in bash-4.3(proposed array feature abandoned).
There are a couple of simple ways to do this using declare -p (built-in) to output some of the bash environment as a set of one or more declare statements which can be used reconstruct the type and value of a "name". This is basic serialisation, but with rather less of the complexity some of the other answers imply. declare -p preserves array indexes, sparse arrays and quoting of troublesome values. For simple serialisation of an array you could just dump the values line by line, and use read -a myarray to restore it (works with contiguous 0-indexed arrays, since read -a automatically assigns indexes).
These methods do not require any modification of the script(s) you are passing the arrays to.
declare -p array1 array2 > .bash_arrays # serialise to an intermediate file
bash -c ". .bash_arrays; . otherscript.sh" # source both in the same environment
Variations on the above bash -c "..." form are sometimes (mis-)used in crontabs to set variables.
Alternatives include:
declare -p array1 array2 > .bash_arrays # serialise to an intermediate file
BASH_ENV=.bash_arrays otherscript.sh # non-interactive startup script
Or, as a one-liner:
BASH_ENV=<(declare -p array1 array2) otherscript.sh
The last one uses process substitution to pass the output of the declare command as an rc script. (This method only works in bash-4.0 or later: earlier versions unconditionally fstat() rc files and use the size returned to read() the file in one go; a FIFO returns a size of 0, and so won't work as hoped.)
In a non-interactive shell (i.e. shell script) the file pointed to by the BASH_ENV variable is automatically sourced. You must make sure bash is correctly invoked, possibly using a shebang to invoke "bash" explicitly, and not #!/bin/sh as bash will not honour BASH_ENV when in historical/POSIX mode.
If all your array names happen to have a common prefix you can use declare -p ${!myprefix*} to expand a list of them, instead of enumerating them.
You probably should not attempt to export and re-import the entire bash environment using this method, some special bash variables and arrays are read-only, and there can be other side-effects when modifying special variables.
(You could also do something slightly disagreeable by serialising the array definition to an exportable variable, and using eval, but let's not encourage the use of eval ...
$ array=([1]=a [10]="b c")
$ export scalar_array=$(declare -p array)
$ bash # start a new shell
$ eval $scalar_array
$ declare -p array
declare -a array='([1]="a" [10]="b c")'
)
As referenced above, there's an interesting quirk: special support for exporting functions through the environment:
function myfoo() {
echo foo
}
with export -f or set +a to enable this behaviour, will result in this in the (process) environment, visible with printenv:
myfoo=() { echo foo
}
The variable is functionname (or functioname() for backward compatibility) and its value is () { functionbody }.
When a subsequent bash process starts it will recreate a function from each such environment variable. If you peek into the bash-4.2 source file variables.c you'll see variables starting with () { are handled specially. (Though creating a function using this syntax with declare -f is forbidden.) Update: The "shellshock" security issue is related to this feature, contemporary systems may disable automatic function import from the environment as a mitigation.
If you keep reading though, you'll see an #if 0 (or #if ARRAY_EXPORT) guarding code that checks variables starting with ([ and ending with ), and a comment stating "Array variables may not yet be exported". The good news is that in the current development version bash-4.3rc2 the ability to export indexed arrays (not associative) is enabled. This feature is not likely to be enabled, as noted above.
We can use this to create a function which restores any array data required:
% function sharearray() {
array1=(a b c d)
}
% export -f sharearray
% bash -c 'sharearray; echo ${array1[*]}'
So, similar to the previous approach, invoke the child script with:
bash -c "sharearray; . otherscript.sh"
Or, you can conditionally invoke the sharearray function in the child script by adding at some appropriate point:
declare -F sharearray >/dev/null && sharearray
Note there is no declare -a in the sharearray function, if you do that the array is implicitly local to the function, not what is wanted. bash-4.2 supports declare -g that makes a variable declared in a function into a global, so declare -ga can then be used. (Since associative arrays require a declare -A you won't be able to use this method for global associative arrays prior to bash-4.2, from v4.2 declare -Ag will work as hoped.) The GNU parallel documentation has useful variation on this method, see the discussion of --env in the man page.
Your question as phrased also indicates you may be having problems with export itself. You can export a name after you've created or modified it. "exportable" is a flag or property of a variable, for convenience you can also set and export in a single statement. Up to bash-4.2 export expects only a name, either a simple (scalar) variable or function name are supported.
Even if you could (in future) export arrays, exporting selected indexes (a slice) may not be supported (though since arrays are sparse there's no reason it could not be allowed). Though bash also supports the syntax declare -a name[0], the subscript is ignored, and "name" is simply a normal indexed array.
Jeez. I don't know why the other answers made this so complicated. Bash has nearly built-in support for this.
In the exporting script:
myArray=( ' foo"bar ' $'\n''\nbaz)' ) # an array with two nasty elements
myArray="${myArray[#]#Q}" ./importing_script.sh
(Note, the double quotes are necessary for correct handling of whitespace within array elements.)
Upon entry to importing_script.sh, the value of the myArray environment variable comprises these exact 26 bytes:
' foo"bar ' $'\n\\nbaz)'
Then the following will reconstitute the array:
eval "myArray=( ${myArray} )"
CAUTION! Do not eval like this if you cannot trust the source of the myArray environment variable. This trick exhibits the "Little Bobby Tables" vulnerability. Imagine if someone were to set the value of myArray to ) ; rm -rf / #.
The environment is just a collection of key-value pairs, both of which are character strings. A proper solution that works for any kind of array could either
Save each element in a different variable (e.g. MY_ARRAY_0=myArray[0]). Gets complicated because of the dynamic variable names.
Save the array in the file system (declare -p myArray >file).
Serialize all array elements into a single string.
These are covered in the other posts. If you know that your values never contain a certain character (for example |) and your keys are consecutive integers, you can simply save the array as a delimited list:
export MY_ARRAY=$(IFS='|'; echo "${myArray[*]}")
And restore it in the child process:
IFS='|'; myArray=($MY_ARRAY); unset IFS
Based on #mr.spuratic use of BASH_ENV, here I tunnel $# through script -f -c
script -c <command> <logfile> can be used to run a command inside another pty (and process group) but it cannot pass any structured arguments to <command>.
Instead <command> is a simple string to be an argument to the system library call.
I need to tunnel $# of the outer bash into $# of the bash invoked by script.
As declare -p cannot take #, here I use the magic bash variable _ (with a dummy first array value as that will get overwritten by bash). This saves me trampling on any important variables:
Proof of concept:
BASH_ENV=<( declare -a _=("" "$#") && declare -p _ ) bash -c 'set -- "${_[#]:1}" && echo "$#"'
"But," you say, "you are passing arguments to bash -- and indeed I am, but these are a simple string of known character. Here is use by script
SHELL=/bin/bash BASH_ENV=<( declare -a _=("" "$#") && declare -p _ && echo 'set -- "${_[#]:1}"') script -f -c 'echo "$#"' /tmp/logfile
which gives me this wrapper function in_pty:
in_pty() {
SHELL=/bin/bash BASH_ENV=<( declare -a _=("" "$#") && declare -p _ && echo 'set -- "${_[#]:1}"') script -f -c 'echo "$#"' /tmp/logfile
}
or this function-less wrapper as a composable string for Makefiles:
in_pty=bash -c 'SHELL=/bin/bash BASH_ENV=<( declare -a _=("" "$$#") && declare -p _ && echo '"'"'set -- "$${_[#]:1}"'"'"') script -qfc '"'"'"$$#"'"'"' /tmp/logfile' --
...
$(in_pty) test --verbose $# $^
I was editing a different post and made a mistake. Augh. Anyway, perhaps this might help?
https://stackoverflow.com/a/11944320/1594168
Note that because the shell's array format is undocumented on bash or any other shell's side,
it is very difficult to return a shell array in platform independent way.
You would have to check the version, and also craft a simple script that concatinates all
shell arrays into a file that other processes can resolve into.
However, if you know the name of the array you want to take back home then there is a way, while a bit dirty.
Lets say I have
MyAry[42]="whatever-stuff";
MyAry[55]="foo";
MyAry[99]="bar";
So I want to take it home
name_of_child=MyAry
take_me_home="`declare -p ${name_of_child}`";
export take_me_home="${take_me_home/#declare -a ${name_of_child}=/}"
We can see it being exported, by checking from a sub-process
echo ""|awk '{print "from awk =["ENVIRON["take_me_home"]"]"; }'
Result :
from awk =['([42]="whatever-stuff" [55]="foo" [99]="bar")']
If we absolutely must, use the env var to dump it.
env > some_tmp_file
Then
Before running the another script,
# This is the magic that does it all
source some_tmp_file
As lesmana reported, you cannot export arrays. So you have to serialize them before passing through the environment. This serialization useful other places too where only a string fits (su -c 'string', ssh host 'string'). The shortest code way to do this is to abuse 'getopt'
# preserve_array(arguments). return in _RET a string that can be expanded
# later to recreate positional arguments. They can be restored with:
# eval set -- "$_RET"
preserve_array() {
_RET=$(getopt --shell sh --options "" -- -- "$#") && _RET=${_RET# --}
}
# restore_array(name, payload)
restore_array() {
local name="$1" payload="$2"
eval set -- "$payload"
eval "unset $name && $name=("\$#")"
}
Use it like this:
foo=("1: &&& - *" "2: two" "3: %# abc" )
preserve_array "${foo[#]}"
foo_stuffed=${_RET}
restore_array newfoo "$foo_stuffed"
for elem in "${newfoo[#]}"; do echo "$elem"; done
## output:
# 1: &&& - *
# 2: two
# 3: %# abc
This does not address unset/sparse arrays.
You might be able to reduce the 2 'eval' calls in restore_array.
Although this question/answers are pretty old, this post seems to be the top hit when searching for "bash serialize array"
And, although the original question wasn't quite related to serializing/deserializing arrays, it does seem that the answers have devolved in that direction.
So with that ... I offer my solution:
Pros
All Core Bash Concepts
No Evals
No Sub-Commands
Cons
Functions take variable names as arguments (vs actual values)
Serializing requires having at least one character that is not present in the array
serialize_array.bash
# shellcheck shell=bash
##
# serialize_array
# Serializes a bash array to a string, with a configurable seperator.
#
# $1 = source varname ( contains array to be serialized )
# $2 = target varname ( will contian the serialized string )
# $3 = seperator ( optional, defaults to $'\x01' )
#
# example:
#
# my_arry=( one "two three" four )
# serialize_array my_array my_string '|'
# declare -p my_string
#
# result:
#
# declare -- my_string="one|two three|four"
#
function serialize_array() {
declare -n _array="${1}" _str="${2}" # _array, _str => local reference vars
local IFS="${3:-$'\x01'}"
# shellcheck disable=SC2034 # Reference vars assumed used by caller
_str="${_array[*]}" # * => join on IFS
}
##
# deserialize_array
# Deserializes a string into a bash array, with a configurable seperator.
#
# $1 = source varname ( contains string to be deserialized )
# $2 = target varname ( will contain the deserialized array )
# $3 = seperator ( optional, defaults to $'\x01' )
#
# example:
#
# my_string="one|two three|four"
# deserialize_array my_string my_array '|'
# declare -p my_array
#
# result:
#
# declare -a my_array=([0]="one" [1]="two three" [2]="four")
#
function deserialize_array() {
IFS="${3:-$'\x01'}" read -r -a "${2}" <<<"${!1}" # -a => split on IFS
}
NOTE: This is hosted as a gist here:
https://gist.github.com/TekWizely/c0259f25e18f2368c4a577495cd566cd
[edits]
Logic simplified after running through shellcheck + shfmt.
Added URL for hosted GIST
you (hi!) can use this, dont need writing a file, for ubuntu 12.04, bash 4.2.24
Also, your multiple lines array can be exported.
cat >>exportArray.sh
function FUNCarrayRestore() {
local l_arrayName=$1
local l_exportedArrayName=${l_arrayName}_exportedArray
# if set, recover its value to array
if eval '[[ -n ${'$l_exportedArrayName'+dummy} ]]'; then
eval $l_arrayName'='`eval 'echo $'$l_exportedArrayName` #do not put export here!
fi
}
export -f FUNCarrayRestore
function FUNCarrayFakeExport() {
local l_arrayName=$1
local l_exportedArrayName=${l_arrayName}_exportedArray
# prepare to be shown with export -p
eval 'export '$l_arrayName
# collect exportable array in string mode
local l_export=`export -p \
|grep "^declare -ax $l_arrayName=" \
|sed 's"^declare -ax '$l_arrayName'"export '$l_exportedArrayName'"'`
# creates exportable non array variable (at child shell)
eval "$l_export"
}
export -f FUNCarrayFakeExport
test this example on terminal bash (works with bash 4.2.24):
source exportArray.sh
list=(a b c)
FUNCarrayFakeExport list
bash
echo ${list[#]} #empty :(
FUNCarrayRestore list
echo ${list[#]} #profit! :D
I may improve it here
PS.: if someone clears/improve/makeItRunFaster I would like to know/see, thx! :D
For arrays with values without spaces, I've been using a simple set of functions to iterate through each array element and concatenate the array:
_arrayToStr(){
array=($#)
arrayString=""
for (( i=0; i<${#array[#]}; i++ )); do
if [[ $i == 0 ]]; then
arrayString="\"${array[i]}\""
else
arrayString="${arrayString} \"${array[i]}\""
fi
done
export arrayString="(${arrayString})"
}
_strToArray(){
str=$1
array=${str//\"/}
array=(${array//[()]/""})
export array=${array[#]}
}
The first function with turn the array into a string by adding the opening and closing parentheses and escaping all of the double quotation marks. The second function will strip the quotation marks and the parentheses and place them into a dummy array.
In order export the array, you would pass in all the elements of the original array:
array=(foo bar)
_arrayToStr ${array[#]}
At this point, the array has been exported into the value $arrayString. To import the array in the destination file, rename the array and do the opposite conversion:
_strToArray "$arrayName"
newArray=(${array[#]})
Much thanks to #stéphane-chazelas who pointed out all the problems with my previous attempts, this now seems to work to serialise an array to stdout or into a variable.
This technique does not shell-parse the input (unlike declare -a/declare -p) and so is safe against malicious insertion of metacharacters in the serialised text.
Note: newlines are not escaped, because read deletes the \<newlines> character pair, so -d ... must instead be passed to read, and then unescaped newlines are preserved.
All this is managed in the unserialise function.
Two magic characters are used, the field separator and the record separator (so that multiple arrays can be serialized to the same stream).
These characters can be defined as FS and RS but neither can be defined as newline character because an escaped newline is deleted by read.
The escape character must be \ the backslash, as that is what is used by read to avoid the character being recognized as an IFS character.
serialise will serialise "$#" to stdout, serialise_to will serialise to the varable named in $1
serialise() {
set -- "${#//\\/\\\\}" # \
set -- "${#//${FS:-;}/\\${FS:-;}}" # ; - our field separator
set -- "${#//${RS:-:}/\\${RS:-:}}" # ; - our record separator
local IFS="${FS:-;}"
printf ${SERIALIZE_TARGET:+-v"$SERIALIZE_TARGET"} "%s" "$*${RS:-:}"
}
serialise_to() {
SERIALIZE_TARGET="$1" serialise "${#:2}"
}
unserialise() {
local IFS="${FS:-;}"
if test -n "$2"
then read -d "${RS:-:}" -a "$1" <<<"${*:2}"
else read -d "${RS:-:}" -a "$1"
fi
}
and unserialise with:
unserialise data # read from stdin
or
unserialise data "$serialised_data" # from args
e.g.
$ serialise "Now is the time" "For all good men" "To drink \$drink" "At the \`party\`" $'Party\tParty\tParty'
Now is the time;For all good men;To drink $drink;At the `party`;Party Party Party:
(without a trailing newline)
read it back:
$ serialise_to s "Now is the time" "For all good men" "To drink \$drink" "At the \`party\`" $'Party\tParty\tParty'
$ unserialise array "$s"
$ echo "${array[#]/#/$'\n'}"
Now is the time
For all good men
To drink $drink
At the `party`
Party Party Party
or
unserialise array # read from stdin
Bash's read respects the escape character \ (unless you pass the -r flag) to remove special meaning of characters such as for input field separation or line delimiting.
If you want to serialise an array instead of a mere argument list then just pass your array as the argument list:
serialise_array "${my_array[#]}"
You can use unserialise in a loop like you would read because it is just a wrapped read - but remember that the stream is not newline separated:
while unserialise array
do ...
done
I've wrote my own functions for this and improved the method with the IFS:
Features:
Doesn't call to $(...) and so doesn't spawn another bash shell process
Serializes ? and | characters into ?00 and ?01 sequences and back, so can be used over array with these characters
Handles the line return characters between serialization/deserialization as other characters
Tested in cygwin bash 3.2.48 and Linux bash 4.3.48
function tkl_declare_global()
{
eval "$1=\"\$2\"" # right argument does NOT evaluate
}
function tkl_declare_global_array()
{
local IFS=$' \t\r\n' # just in case, workaround for the bug in the "[#]:i" expression under the bash version lower than 4.1
eval "$1=(\"\${#:2}\")"
}
function tkl_serialize_array()
{
local __array_var="$1"
local __out_var="$2"
[[ -z "$__array_var" ]] && return 1
[[ -z "$__out_var" ]] && return 2
local __array_var_size
eval declare "__array_var_size=\${#$__array_var[#]}"
(( ! __array_var_size )) && { tkl_declare_global $__out_var ''; return 0; }
local __escaped_array_str=''
local __index
local __value
for (( __index=0; __index < __array_var_size; __index++ )); do
eval declare "__value=\"\${$__array_var[__index]}\""
__value="${__value//\?/?00}"
__value="${__value//|/?01}"
__escaped_array_str="$__escaped_array_str${__escaped_array_str:+|}$__value"
done
tkl_declare_global $__out_var "$__escaped_array_str"
return 0
}
function tkl_deserialize_array()
{
local __serialized_array="$1"
local __out_var="$2"
[[ -z "$__out_var" ]] && return 1
(( ! ${#__serialized_array} )) && { tkl_declare_global $__out_var ''; return 0; }
local IFS='|'
local __deserialized_array=($__serialized_array)
tkl_declare_global_array $__out_var
local __index=0
local __value
for __value in "${__deserialized_array[#]}"; do
__value="${__value//\?01/|}"
__value="${__value//\?00/?}"
tkl_declare_global $__out_var[__index] "$__value"
(( __index++ ))
done
return 0
}
Example:
a=($'1 \n 2' "3\"4'" 5 '|' '?')
tkl_serialize_array a b
tkl_deserialize_array "$b" c
I think you can try it this way (by sourcing your script after export):
export myArray=(Hello World)
. yourScript.sh

Resources