I'm trying to emulate MATLAB's 'end' indexing keyword (such as A[5:end]) in Powershell but I don't want to type the array name (such as $array) to access $array.length for
$array[0..($array.length - 2)]
as discussed in another Stackflow question. I tried $this
(0..7)[4..($this.Length-1)]
given ($this.Length-1) seems to be interpreted as -1 as the output shows
4
3
2
1
0
7
This makes me think $this is empty when used inside [] indexing an array. Is there a way for an array to refer to itself without explicitly repeating the variable name so I can call the methods of the array to derive the indices? This would be very handy for emulating logical indexing while taking advantage of method chaining (like a.b.c.d[4..end]).
PowerShell doesn't have any facility for referring to "the collection targeted by this index access operator", but if you want to skip the first N items of a collection/enumerable you can use Select -Skip:
0..7 |Select -Skip 4
To complement Mathias' helpful answer:
The automatic $this variable is not available inside index expressions ([...]), only in custom classes (to refer to the instance at hand) and in script blocks acting as .NET event delegates (to refer to the event sender).
However, for what you're trying to achieve you don't need a reference to the input array (collection) as a whole: instead, an abstract notation for referring to indices relative to the end of the input array should suffice, and ideally also for "all remaining elements" logic.
You can use negative indices to refer to indices relative to the end of the input array, but that only works with individual indices:
# OK: individual negative indices; get the last and the 3rd last item:
('a', 'b', 'c', 'd')[-3, -1] # -> 'b', 'd'
Unfortunately, because .. inside an index expression refers to the independent, general-purpose range operator, this does not work for range-based array slicing when negative indices are used as range endpoints:
# !! DOES NOT WORK:
# *Flawed* attempt to get all elements up to and including the 2nd last,
# i.e. to get all elements but the last.
# 0..-2 evaluates to array 0, -1, -2, whose elements then serve as the indices.
('a', 'b', 'c', 'd')[0..-2] # -> !! 'a', 'd', 'c'
That is, the general range operation 0..-2 evaluates to array 0, -1, -2, and the resulting indices are used to extract the elements.
It is this behavior that currently requires an - inconvenient - explicit reference to the array inside the index expression for everything-except-the-last-N-elements logic, such as $array[0..($array.length - 2)] in your question in order to extract all elements except the last one.
GitHub issue #7940 proposes introducing new syntax that addresses this problem, by effectively implementing C#-style ranges:
While no syntax has been agreed on and no commitment has been made to implement this enhancement, borrowing C#'s syntax directly is an option:
Now
Potential future syntax
Comment
$arr[1..($arr.Length-2)]
$arr[1..^1]
From the 2nd el. through to the next to last.
$arr[1..($arr.Length-1)]
$arr[1..]
Everything from the 2nd el.
$arr[0..9]
$arr[..9]
Everything up to the 10th el.
$arr[-9..-1]
$arr[^9..]
Everything from the 9th to last el.
Note the logic of the from-the-end, 1-based index syntax (e.g., ^1 refers to the last element) when serving as a range endpoint: It is up-to-but-excluding logic, so that ..^1 means: up to the index before the last one, i.e. the second to last one.
As for the workarounds:
Using Select-Object with -Skip / -SkipLast is convenient in simple cases, but:
performs poorly compared to index expressions ([...]) (see below)
lacks the flexibility of the latter[1]
A notable limitation is that you cannot use both -Skip and -SkipLast in a single Select-Object call; GitHub issue #11752 proposes removing this limitation.
E.g., in the following example (which complements Mathias's -Skip example), which extracts all elements but the last:
# Get all elements but the last.
$arr | Select-Object -SkipLast 1
Array $arr is enumerated, i.e. its elements are sent one by one through the pipeline, a process known as streaming.
When captured, the streamed elements are collected in a regular, [object[]]-typed PowerShell array, even if the input array is strongly typed - however, this loss of strict typing also applies to extracting multiple elements via [...].
Depending on the size of your arrays and the number of slicing operations needed, the performance difference can be significant.
[1] Notably, you can use arbitrary expressions inside [...], which is discussed in more detail in this answer.
Related
The Length property works as expected on all arrays that I test except one weird case:
PS> #(#()).Length
0
It's not that empty arrays are generally omitted though:
PS> #(#(), #()).Length
2
PS> #(#(), #(), #()).Length
3
What's going on?
#(...), the array-subexpression operator is not an array constructor, it is an array "guarantor" (see next section), and nesting #(...) operations is pointless.
#(#()) is in effect the same as #(), i.e. an empty array of type [object[]].
To unconditionally construct arrays, use ,, the array constructor operator.
To construct an array wrapper for a single object, use the unary form of ,, as Abraham Zinala suggests:
# Create a single-element array whose only element is an empty array.
# Note: The outer enclosure in (...) is only needed in order to
# access the array's .Count property.
(, #()).Count # -> 1
Note that I've used .Count instead of .Length above, which is more PowerShell-idiomatic; .Count works across different collection types. Even though System.Array doesn't directly implement .Count, it does so via the ICollection interface, and PowerShell allows access to interface members without requiring a cast.
Background information:
#(...)'s primary purpose is to ensure that output objects collected from - invariably pipeline-based - commands (e.g, #(Get-ChildItem *.txt)) are always collected as an array (invariably of type [object[]]) - even if ... produces only one output object.
If getting an array is desired, use of #(...) is necessary because collecting output that happens to contain just one object would by default be collected as-is, i.e. not wrapped in an array (this also applies when you use $(...), the subexpression operator); only for multiple output objects is an array used, which is always [object[]]-typed.
Note that PowerShell commands (typically) do not output collections; instead, they stream a (usually open-ended) number of objects one by one to the pipeline; capturing command output therefore requires collecting the streamed objects - see this answer for more information.
#(...)'s secondary purpose is to facilitate defining array literals, e.g. #('foo', 'bar')
Note:
Using #(...) for this purpose was not by original design, but such use became so prevalent that an optimization was implemented in version 5 of PowerShell so that, say, 1, 2 - which is sufficient to declare a 2-element array - may also be expressed as #(1, 2) without unnecessary processing overhead.
On the plus side, #(...) is visually distinctive in general and syntactically convenient specifically for declaring empty (#()) or single-element arrays (e.g. #(42)) - without #(...), these would have to expressed as [object[]]:new() and , 42, respectively.
However, this use of #(...) invites the misconception that it acts as an unconditional array constructor, which isn't the case; in short: wrapping extra #(...) operations around a #(...) operation does not create nested arrays, it is an expensive no-op; e.g.:
#(42) # Single-element array
#(#(42)) # !! SAME - the outer #(...) has no effect.
When #(...) is applied to a (non-array-literal) expression, what this expression evaluates to is sent to the pipeline, which causes PowerShell to enumerate it, if it considers it enumerable;[1] that is, if the expression result is a collection, its elements are sent to the pipeline, one by one, analogous to a command's streaming output, before being collected again in an [object[]] array.
# #(...) causes the [int[]]-typed array to be *enumerated*,
# and its elements are then *collected again*, in an [object[]] array.
$intArray = [int[]] (1, 2)
#($intArray).GetType().FullName # -> !! 'System.Object[]'
To prevent this enumeration and re-collecting:
Use the expression as-is and, if necessary, enclose it just in (...)
To again ensure that an array is returned, an efficient alternative to #(...) is to use an [array] cast; the only caveat is that if the expression evaluates to $null, the result will be $null too ($null -eq [array] $null):
# With an array as input, an [array] cast preserves it as-is.
$intArray = [int[]] (1, 2)
([array] $intArray).GetType().FullName # -> 'System.Int32[]'
# With a scalar as input, a single-element [object[]] array is created.
([array] 42).GetType().FullName # -> 'System.Object[]'
[1] See the bottom section of this answer for an overview of which .NET types PowerShell considers enumerable in the pipeline.
I know array reference, but in Perl multidimensional array is a one dimensional array of reference to other one-dimensional array can anyone explain this with an example?
my #a = ( "a", "b", "c" );
my #x;
$x[4] = \#a;
say $x[4]->[2]; # c
The dereference (->) is implied "between indexes" if omitted.
my #a = ( "a", "b", "c" );
my #x;
$x[4] = \#a;
say $x[4][2]; # c
As you can see, this can be used to create multi-dimensional arrays.
An anonymous array is commonly used. [ ... ] constructs an array and returns a reference to it.
my #x;
$x[4] = [ "a", "b", "c" ];
say $x[4][2]; # c
Also common is to let Perl create the array and the reference for you automatically through a feature called "autovivification".
my #x;
$x[4][2] = "c";
say $x[4][2]; # c
That's because
$x[4][2] = "c";
is short for
$x[4]->[2] = "c";
and
SCALAR->[EXPR1] = EXPR2;
is effectively
( SCALAR //= [ ] )->[EXPR1] = EXPR2;
so
$x[4]->[2] = "c";
is effectively
( $x[4] //= [ ] )->[2] = "c";
There are different uses of "dimension" at play here, and I recently was working in a language that used the other one. That messed me up for a moment while I had to switch thinking.
In some domains, the "dimension" is the number of elements. In others, it's the number of things you have to specify to get to an element. Some uses use the term incorrectly (or sloppily). This is incredibly reductive, but I think that's the spirit. This means, though, that you can't transfer what you know about "dimension" from one use to another use. You have to know what the particular tool thinks it means.
Simple math
Forget about programming for a moment and think about simple math, which I'll also crudely simplify.
A point (vector, whatever) with just an x and a y coordinate, say (1,2), is two dimensional because there are two things you need to specify it completely. A point with (x,y,z) is three dimensional, and so on, up to whatever multiple you like. This sort of use is a bit sloppy because a point with three coordinates is still a point (a one dimensional thing) even if it lives is a three dimensional space. But, through a bit of synedoche that's how people talk.
Then consider matrices. These have rows and columns. The rows have a certain number of thingys and the columns have a certain number of thingys. The dimension of the matrix is a combination of the number of rows and columns (there's a bit more to it, but ignore that). Someone might say they have a "three by three matrix".
Perl's use of "multi" tends more to the matrix idea.
Matrices in Perl
Perl basically has three major built-in data types (there are a few others that aren't important here). You have scalars, which are single things, lists, which are zero of more scalars, and hashes. Perl doesn't get much more complicated than that. Somehow, we have to use that simple foundation to make fancier things:
Since Perl does not have a matrix type, we fake it with anonymous arrays (see the perl data structures cookbook and Perl list of lists). A reference is single thingy, so a scalar.
my $ref = [ # top level
[ 0, 1, 2 ], # first row
[ 3, 4, 5 ], # second row
[ 6, 7, 8 ], # third row
];
Inside that top level array reference, there are three elements that are each anonymous arrays themselves. Think of those as rows. The particular position in each row is like a column. To get to any particular spot, we specify the (zero-based) row and column number:
$ref->[2][2];
That use of multiple subscripts to address any particular element gives us the "multi" in Perl. That's just how Perl uses the term, right or wrong and despite how it's used any other place.
So, finally to your question about an array ref holding a single array ref of one item:
my $ref = [ # top level
[ 'Camel' ], # first row
];
Having one row and one column is as correct as any other number or rows and columns. To get to that element, we still need to specify the row and column. They both happen to be 0.
$ref->[0][0];
And, finally, we have the trivial case of no rows or columns:
my $ref = [];
Remember though, that Perl never fixes the sizes here. It's happy to give you undef for any element you ask for:
my $value = $ref->[999][999];
And if you assign to any index higher than what it already has, it creates all the elements it needs to it can have that index. This one assignment gets you a 1000 by 1000 "matrix":
$ref->[999][999] = 'Hello';
Just for fun
The references are a Perl 5 feature, but multi-dimensional things go back to at least Perl 3. We used to fake multi-dimensional thingys with goofy hash keys. We'd separate the indices with , (or whatever the value of $; was:
$some_hash{1,3,7} = 'foo';
This was the same as:
$key = join $;, 1, 3, 7;
$some_hash{$key}
As long as your sub-keys didn't have the value of $; in them, this worked out.
This still exists in Perl, but starting with v5.36 (the latest as I write this), it's turned off:
use v5.36; # no multidimensional
Or you can turn it off:
no multidimensional;
If you really need it and still want to require a minimum of v5.36, you can re-enable it:
use v5.36;
use multidimensional;
Or, use require which doesn't do the feature stuff (but then you have to figure out what else you aren't getting but may want):
require v5.36;
So I've been wondering about this for a while now. Summing up over some array variable A is as easy as
sum(A(:))
% or
sum(...sum(sum(A,n),n-2)...,1) % where n is the dimension of A
However once it gets to expressions the (:) doesn't work anymore, like
sum((A-2*A)(:))
is no valid matlab syntax, instead we need to write
foo = A-2*A;
sum(foo(:))
%or the one liner
sum(sum(...sum(A-2*A,n)...,2),1) % n is the dimension of A
The one liner above will only work, if the dimension of A is fixed which, depending on what you are doing, may not necessary be the case. The downside of the two lines is, that foo will be kept in memory until you run clear foo or may not even be possible depending on the size of A and what else is in your workspace.
Is there a general way to circumvent this issue and sum up all elements of an array valued expression in a single line / without creating temporal variables? Something like sum(A-2*A,'-all')?
Edit: It differes from How can I index a MATLAB array returned by a function without first assigning it to a local variable?, as it doesn't concern general (nor specific) indexing of array valued expressions or return values, but rather the summation over each possible index.
While it is possible to solve my problem with the answer given in the link, gnovice says himself that using subref is a rather ugly solution. Further Andras Deak posted a much cleaner way of doing this in the comments below.
While the answers to the linked duplicate can indeed be applied to your problem, the narrower scope of your question allows us to give a much simpler solution than the answers provided there.
You can sum all the elements in an expression (including the return value of a function) by reshaping your array first to 1d:
sum(reshape(A-2*A,1,[]))
%or even sum(reshape(magic(3),1,[]))
This will reshape your array-valued expression to size [1, N] where N is inferred from the size of the array, i.e. numel(A-2*A) (but the above syntax of reshape will compute the missing dimension for you, no need to evaluate your expression twice). Then a single call to sum will sum all the elements, as needed.
The actual case where you have to resort to something like this is when a function returns an array with an unknown number of dimensions, and you want to use its sum in an anonymous function (making temporary variables unavailable):
fun = #() rand(2*ones(1,randi(10))); %function returning random 2 x 2 x ... x 2 array with randi(10) dimensions
sumfun = #(A) sum(reshape(A,1,[]));
sumfun(fun()) %use it
Within C code, I have an array and a zero-based index used to lookup within it, for example:
char * names[] = {"Apple", "Banana", "Carrot"};
char * name = names[index];
From an embedded Lua script, I have access to index via a getIndex() function and would like to replicate the array lookup. Is there an agreed on "best" method for doing this, given Lua's one-based arrays?
For example, I could create a Lua array with the same contents as my C array, but this would require adding 1 when indexing:
names = {"Apple", "Banana", "Carrot"}
name = names[getIndex() + 1]
Or, I could avoid the need to add 1 by using a more complex table, but this would break things like #names:
names = {[0] = "Apple", "Banana", "Carrot"}
name = names[getIndex()]
What approach is recommended?
Edit: Thank you for the answers so far. Unfortunately the solution of adding 1 to the index within the getIndex function is not always applicable. This is because in some cases indices are "well-known" - that is, it may be documented that an index of 0 means "Apple" and so on. In that situation, should one or the other of the above solutions be preferred, or is there a better alternative?
Edit 2: Thanks again for the answers and comments, they have really helped me think about this issue. I have realized that there may be two different scenarios in which the problem occurs, and the ideal solution may be different for each.
In the first case consider, for example, an array which may differ from time to time and an index which is simply relative to the current array. Indices have no meaning outside the code. Doug Currie and RBerteig are absolutely correct: the array should be 1-based and getIndex should contain a +1. As was mentioned, this allows the code on both the C and Lua sides to be idiomatic.
The second case involves indices which have meaning, and probably an array which is always the same. An extreme example would be where names contains "Zero", "One", "Two". In this case, the expected value for each index is well-known, and I feel that making the index on the Lua side one-based is unintuitive. I believe one of the other approaches should be preferred.
Use 1-based Lua tables, and bury the + 1 inside the getIndex function.
I prefer
names = {[0] = "Apple", "Banana", "Carrot"}
name = names[getIndex()]
Some of table-manipulation features - #, insert, remove, sort - are broken.
Others - concat(t, sep, 0), unpack(t, 0) - require explicit starting index to run correctly:
print(table.concat(names, ',', 0)) --> Apple,Banana,Carrot
print(unpack(names, 0)) --> Apple Banana Carrot
I hate constantly remembering of that +1 to cater Lua's default 1-based indices style.
You code should reflect your domain specific indices to be more readable.
If 0-based indices are fit well for your task, you should use 0-based indices in Lua.
I like how array indices are implemented in Pascal: you are absolutely free to choose any range you want, e.g., array[-10..-5]of byte is absolutely OK for an array of 6 elements.
This is where Lua metemethods and metatables come in handy. Using a table proxy and a couple metamethods, you can modify access to the table in a way that would fit your need.
local names = {"Apple", "Banana", "Carrot"} -- Original Table
local _names = names -- Keep private access to the table
local names = {} -- Proxy table, used to capture all accesses to the original table
local mt = {
__index = function (t,k)
return _names[k+1] -- Access the original table
end,
__newindex = function (t,k,v)
_names[k+1] = v -- Update original table
end
}
setmetatable(names, mt)
So what's going on here, is that the original table has a proxy for itself, then the proxy catches every access attempt at the table. When the table is accessed, it increment the value it was accessed by, simulating a 0-based array. Here are the print result:
print(names[0]) --> Apple
print(names[1]) --> Banana
print(names[2]) --> Carrot
print(names[3]) --> nil
names[3] = "Orange" --Add a new field to the table
print(names[3]) --> Orange
All table operations act just as they would normally. With this method you don't have to worry about messing with any unordinary access to the table.
EDIT: I'd like to point out that the new "names" table is merely a proxy to access the original names table. So if you queried for #names the result would be nil because that table itself has no values. You'd need to query for #_names to access the size of the original table.
EDIT 2: As Charles Stewart pointed out in the comment below, you can add a __len metamethod to the mt table to ensure the #names call gives you the correct results.
First of all, this situation is not unique to applications that mix Lua and C; you can face the same question even when using Lua only apps. To provide an example, I'm using an editor component that indexes lines starting from 0 (yes, it's C-based, but I only use its Lua interface), but the lines in the script that I edit in the editor are 1-based. So, if the user sets a breakpoint on line 3 (starting from 0 in the editor), I need to send a command to the debugger to set it on line 4 in the script (and convert back when the breakpoint is hit).
Now the suggestions.
(1) I personally dislike using [0] hack for arrays as it breaks too many things. You and Egor already listed many of them; most importantly for me it breaks # and ipairs.
(2) When using 1-based arrays I try to avoid indexing them and to use iterators as much as possible: for i, v in ipairs(...) do instead of for i = 1, #array do).
(3) I also try to isolate my code that deals with these conversions; for example, if you are converting between lines in the editor to manage markers and lines in the script, then have marker2script and script2marker functions that do the conversion (even if it's simple +1 and -1 operations). You'd have something like this anyway even without +1/-1 adjustments, it would just be implicit.
(4) If you can't hide the conversion (and I agree, +1 may look ugly), then make it even more noticeable: use c2l and l2c calls that do the conversion. In my opinion it's not as ugly as +1/-1, but has the advantage of communicating the intent and also gives you an easy way to search for all the places where the conversion happens. It's very useful when you are looking for off-one bugs or when API changes cause updates to this logic.
Overall, I wouldn't worry about these aspects too much. I'm working on a fairly complex Lua app that wraps several 0-based C components and don't remember any issues caused by different indexing...
Why not just turn the C-array into a 1-based array as well?
char * names[] = {NULL, "Apple", "Banana", "Carrot"};
char * name = names[index];
Frankly, this will lead to some unintuitive code on the C-side, but if you insist that there must be 'well-known' indices that work in both sides, this seems to be the best option.
A cleaner solution is of course not to make those 'well-known' indices part of the interface. For example, you could use named identifiers instead of plain numbers. Enums are a nice match for this on the C side, while in Lua you could even use strings as table keys.
Another possibility is to encapsulate the table behind an interface so that the user never accesses the array directly but only via a C-function call, which can then perform arbitrarily complex index transformations. Then you only need to expose that C function in Lua and you have a clean and maintainable solution.
Why not present your C array to Lua as userdata? The technique is described with code in PiL, section 'Userdata'; you can set the __index, __newindex, and __len metatable methods, and you can inherit from a class to provide other sequence manipulation functions as regular methods (e.g., define an array with array.remove, array.sort, array.pairs functions, which can be defined as object methods by a further tweak to __index). Doing things this way means you have no "synchronisation" issues between Lua and C, and it avoids risks that "array" tables get treated as ordinary tables resulting in off-by-one errors.
You can fix this lua-flaw by using an iterator that is aware of different index bases:
function iarray(a)
local n = 0
local s = #a
if a[0] ~= nil then
n = -1
end
return function()
n = n + 1
if n <= s then return n,a[n] end
end
end
However, you still have to add the zeroth element manually:
Usage example:
myArray = {1,2,3,4,5}
myArray[0] = 0
for _,e in iarray(myArray) do
-- do something with element e
end
When looking at Smalltalk syntax definitions I noticed a few different notations for arrays:
#[] "ByteArray"
#() "Literal Array"
{} "Array"
Why are there different array types? In other programming languages I know there's only one kind of array independent of the stored type.
When to choose which kind?
Why do literal array and array have a different notation but same class?
There's a bit of terminological confusion in Michael's answer, #() is a literal array whereas {} is not. A literal array is the one created by the compiler and can contain any other literal value (including other literal arrays) so the following is a valid literal array:
#(1 #blah nil ('hello' 3.14 true) $c [1 2 3])
On the other hand {} is merely a syntactic sugar for runtime array creation, so { 1+2. #a. anObject} is equivalent to:
(Array new: 3) at: 1 put: 1 + 2; at: 2 put: #a; at: 3 put: anObject; yourself
Here's a little walkthrough:
Firstly, we can find out the types resp. classes of the resulting objects:
#[] class results in ByteArray
#() class results in Array
{} class also results in Array
So apparently the latter two produce Arrays while the first produces a ByteArray. ByteArrays are what you would expect -- fixed sized arrays of bytes.
Now we'll have to figure out the difference between #() and {}. Try evaluating #(a b c), it results in #(#a #b #c); however when you try to evaluate {a b c}, it doesn't work (because a is not defined). The working version would be {#a. #b. #c}, which also results in #(#a #b #c).
The difference between #() and {} is, that the first takes a list of Symbol names separated by spaces. You're also allowed to omit the # signs. Using this notation you can only create Arrays that contain Symbols. The second version is the generic Array literal. It takes any expressions, separated by . (dots). You can even write things like {1+2. anyObject complexOperation}.
This could lead you to always using the {} notation. However, there are some things to keep in mind: The moment of object creation differs: While #() Arrays are created during compilation, {} Arrays are created during execution. Thus when you run code with an #() expression, it will also return the same Array, while {} only returns equal Arrays (as long as you are using equal contents). Also, AFAIK the {} is not necessarily portable because it's not part of the ST-80 standard.