How can I iterate subranges in a "cyclic array"? - arrays

I'm trying to write the following Perl subroutine. Given are an array a of length n, an index i in the array (0<=i<n an upstream window length u and a downstream window length d.
I want to iterate over the values in the upstream window and the downstream window to i. In the simplest case, this will iterating over the values in a[i-u..i-1] (upstream window) and a[i+1..i+d] (downstream window).
For example: if my array is 1 2 3 4 5 6 7 8 9 10, i=5 and both window sizes are 2, the upstream values are simply 6 7 and the downstream values are 9 10.
However, there are two complications:
I would like to consider my array is cyclic. If i is relatively small (close to
0) or large (close to n), then one
of the windows may not fit in the
array. In that case, I want to look
at the array as a cyclic one. for
example, if my array is 1 2 3 4 5 6 7 8 9 10, i=8 and both window sizes
are 4, the upstream values are
simply 4 5 6 7, but the downstream
values are 9 10 1 2.
I would prefer some way to iterate
over these values without explicitly
copying them into a new array, since
they might be very long.

You can just get a list of indices using the range operator (..) by subtracting the upstream window from $i and adding the downstream window to $i. You will need to remember to skip the iterator when the iterator is equal to $i if you don't want that $ith value.
You will need to use the modulo operator (%) to keep the index within the bounds of the array. Given an array of size 11, we can see that by modifying the index with 11 it will always point to the right place in the array:
#!/usr/bin/perl
use strict;
use warnings;
for my $i (-22 .. 22) {
print "$i => ", $i % 11, "\n";
}
You may run into problems with huge numbers (i.e., numbers larger than what your platform holds in an unsigned integer), because Perl 5 changes the algorithm the modulus uses around there. It becomes more like C's fmod (but there are some differences).
You may also want to not use the integer pragma. It makes % faster, but you get the behavior of the C modulo operator. Neither ANSI nor ISO define what C should do with negative numbers, so you may or may not get a valid index back. Of course, so long as the version of C spits back either
X -5 -4 -3 -2 -1 0 1
X%5 0 -4 -3 -2 -1 0 1
or
X -5 -4 -3 -2 -1 0 1
X%5 0 1 2 3 4 0 1
it should be fine (if not very portable).
It looks like C99 defines the modulo operator to return the second case, so long as perl gets compiled with a C99 compiler (with the C99 flag on) it should be safe to use the integer pragma.

Related

How should I selectively sum multiple axes of an array?

What is the preferred approach in J for selectively summing multiple axes of an array?
For instance, suppose that a is the following rank 3 array:
]a =: i. 2 3 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
My goal is to define a dyad "sumAxes" to sum over multiple axes of my choosing:
0 1 sumAxes a NB. 0+4+8+12+16+20 ...
60 66 72 78
0 2 sumAxes a NB. 0+1+2+3+12+13+14+15 ...
60 92 124
1 2 sumAxes a NB. 0+1+2+3+4+5+6+7+8+9+10+11 ...
66 210
The way that I am currently trying to implement this verb is to use the dyad |: to first permute the axes of a, and then ravel the items of the necessary rank using ,"n (where n is the number axes I want to sum over) before summing the resulting items:
sumAxes =: dyad : '(+/ # ,"(#x)) x |: y'
This appears to work as I want, but as a beginner in J I am unsure if I am overlooking some aspect of rank or particular verbs that would enable a cleaner definition. More generally I wonder whether permuting axes, ravelling and summing is idiomatic or efficient in this language.
For context, most of my previous experience with array programming is with Python's NumPy library.
NumPy does not have J's concept of rank and instead expects the user to explicitly label the axes of an array to reduce over:
>>> import numpy
>>> a = numpy.arange(2*3*4).reshape(2, 3, 4) # a =: i. 2 3 4
>>> a.sum(axis=(0, 2)) # sum over specified axes
array([ 60, 92, 124])
As a footnote, my current implementation of sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified (as rank is not interchangeable with "axis").
Motivation
J has incredible facilities for handling arbitrarily-ranked arrays. But there's one facet of the language which is simultaneously almost universally useful as well as justified, but also somewhat antithetical to this dimensionality-agnostic nature.
The major axis (in fact, leading axes in general) are implicitly privileged. This is the concept that underlies, e.g. # being the count of items (i.e. the dimension of the first axis), the understated elegance and generality of +/ without further modification, and a host of other beautiful parts of the language.
But it's also what accounts for the obstacles you're meeting in trying to solve this problem.
Standard approach
So the general approach to solving the problem is just as you have it: transpose or otherwise rearrange the data so the axes that interest you become leading axes. Your approach is classic and unimpeachable. You can use it in good conscience.
Alternative approaches
But, like you, it niggles me a bit that we are forced to jump through such hoops in similar circumstances. One clue that we're kind of working against the grain of the language is the dynamic argument to the conjunction "(#x); usually arguments to conjunctions are fixed, and calculating them at runtime often forces us to use either explicit code (as in your example) or dramatically more complicated code. When the language makes something hard to do, it's usually a sign you're cutting against the grain.
Another is that ravel (,). It's not just that we want to transpose some axes; it's that we want to focus on one specific axis, and then run all the elements trailing it into a flat vector. Though I actually think this reflects more a constraint imposed by how we're framing the problem, rather than one in the notation. More on in the final section of this post.
With that, we might feel justified in our desire to address a non-leading axis directly. And, here and there, J provides primitives that allow us to do exactly that, which might be a hint that the language's designers also felt the need to include certain exceptions to the primacy of leading axes.
Introductory examples
For example, dyadic |. (rotate) has ranks 1 _, i.e. it takes a vector on the left.
This is sometimes surprising to people who have been using it for years, never having passed more than a scalar on the left. That, along with the unbound right rank, is another subtle consequence of J's leading-axis bias: we think of the right argument as a vector of items, and the left argument as a simple, scalar rotation value of that vector.
Thus:
3 |. 1 2 3 4 5 6
4 5 6 1 2 3
and
1 |. 1 2 , 3 4 ,: 5 6
3 4
5 6
1 2
But in this latter case, what if we didn't want to treat the table as a vector of rows, but as a vector of columns?
Of course, the classic approach is to use rank, to explicitly denote the the axis we're interested in (because leaving it implicit always selects the leading axis):
1 |."1 ] 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Now, this is perfectly idiomatic, standard, and ubiquitous in J code: J encourages us to think in terms of rank. No one would blink an eye on reading this code.
But, as described at the outset, in another sense it can feel like a cop-out, or manual adjustment. Especially when we want to dynamically choose the rank at runtime. Notationally, we are now no longer addressing the array as a whole, but addressing each row.
And this is where the left rank of |. comes in: it's one of those few primitives which can address non-leading axes directly.
0 1 |. 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Look ma, no rank! Of course, we now have to specify a rotation value for each axis independently, but that's not only ok, it's useful, because now that left argument smells much more like something which can be calculated from the input, in true J spirit.
Summing non-leading axes directly
So, now that we know J lets us address non-leading axes in certain cases, we simply have to survey those cases and identify one which seems fit for our purpose here.
The primitive I've found most generally useful for non-leading-axis work is ;. with a boxed left-hand argument. So my instinct is to reach for that first.
Let's start with your examples, slightly modified to see what we're summing.
]a =: i. 2 3 4
sumAxes =: dyad : '(< # ,"(#x)) x |: y'
0 1 sumAxes a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
0 2 sumAxes a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
1 2 sumAxes a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
The relevant part of the definition of for dyads derived from ;.1 and friends is:
The frets in the dyadic cases 1, _1, 2 , and _2 are determined by the 1s in boolean vector x; an empty vector x and non-zero #y indicates the entire of y. If x is the atom 0 or 1 it is treated as (#y)#x. In general, boolean vector >j{x specifies how axis j is to be cut, with an atom treated as (j{$y)#>j{x.
What this means is: if we're just trying to slice an array along its dimensions with no internal segmentation, we can simply use dyad cut with a left argument consisting solely of 1s and a:s. The number of 1s in the vector (ie. the sum) determines the rank of the resulting array.
Thus, to reproduce the examples above:
('';'';1) <#:,;.1 a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
('';1;'') <#:,;.1 a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
(1;'';'') <#:,;.1 a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
Et voila. Also, notice the pattern in the left hand argument? The two aces are exactly at the indices of your original calls to sumAxe. See what I mean by the fact that providing a value for each dimension smelling like a good thing, in the J spirit?
So, to use this approach to provide an analog to sumAxe with the same interface:
sax =: dyad : 'y +/#:,;.1~ (1;a:#~r-1) |.~ - {. x -.~ i. r=.#$y' NB. Explicit
sax =: ] +/#:,;.1~ ( (] (-#{.#] |. 1 ; a: #~ <:#[) (-.~ i.) ) ##$) NB. Tacit
Results elided for brevity, but they're identical to your sumAxe.
Final considerations
There's one more thing I'd like to point out. The interface to your sumAxe call, calqued from Python, names the two axes you'd like "run together". That's definitely one way of looking at it.
Another way of looking at it, which draws upon the J philosophies I've touched on here, is to name the axis you want to sum along. The fact that this is our actual focus is confirmed by the fact that we ravel each "slice", because we do not care about its shape, only its values.
This change in perspective to talk about the thing you're interested in, has the advantage that it is always a single thing, and this singularity permits certain simplifications in our code (again, especially in J, where we usually talk about the [new, i.e. post-transpose] leading axis)¹.
Let's look again at our ones-and-aces vector arguments to ;., to illustrate what I mean:
('';'';1) <#:,;.1 a
('';1;'') <#:,;.1 a
(1;'';'') <#:,;.1 a
Now consider the three parenthesized arguments as a single matrix of three rows. What stands out to you? To me, it's the ones along the anti-diagonal. They are less numerous, and have values; by contrast the aces form the "background" of the matrix (the zeros). The ones are the true content.
Which is in contrast to how our sumAxe interface stands now: it asks us to specify the aces (zeros). How about instead we specify the 1, i.e. the axis that actually interests us?
If we do that, we can rewrite our functions thus:
xas =: dyad : 'y +/#:,;.1~ (-x) |. 1 ; a: #~ _1 + #$y' NB. Explicit
xas =: ] +/#:,;.1~ -#[ |. 1 ; a: #~ <:###$#] NB. Tacit
And instead of calling 0 1 sax a, you'd call 2 xas a, instead of 0 2 sax a, you'd call 1 xas a, etc.
The relative simplicity of these two verbs suggests J agrees with this inversion of focus.
¹ In this code I'm assuming you always want to collapse all axes except 1. This assumption is encoded in the approach I use to generate the ones-and-aces vector, using |..
However, your footnote sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified suggests sometimes you want to only collapse one axis.
That's perfectly possible and the ;. approach can take arbitrary (orthotopic) slices; we'd only need to alter the method by which we instruct it (generate the 1s-and-aces vector). If you provide a couple examples of generalizations you'd like, I'll update the post here. Probably just a matter of using (<1) x} a: #~ #$y or ((1;'') {~ (e.~ i.###$)) instead of (-x) |. 1 ; a:#~<:#$y.

Fastest way to find twice number in C [duplicate]

This question already has answers here:
Finding out the duplicate element in an array
(2 answers)
Closed 6 years ago.
Can anyone could help me how to solve this code in C? I think that I have to use big O notation as a solution, but I have no idea about it.
The question: There is an array T sized N+1 where numbers from 1 to N are random. One number x is repeated twice (position is also random).
What should be the fastest way to find value of this number x?
For example:
N = 7
[6 3 5 1 3 7 4 2]
x=3
The sum of numbers 1..N is N*(N+1)/2.
So, the extra number is:
extra_number = sum(all N+1 numbers) - N*(N+1)/2
Everything is O(1) except the sum. The sum can be computed in O(N) time.
The overall algorithm is O(N).
Walk the array using the value as the next array index (minus 1), marking the ones visited with a special value (like 0 or the negation). O(n)
On average, only half the elements are visited.
v
6 3 5 1 3 7 4 2
v
. 3 5 1 3 7 4 2
v
. 3 5 1 3 7 . 2
v
. 3 5 1 . 7 . 2
v
. 3 5 . . 7 . 2
v !! all ready visited. Previous 3 is repeated.
. 3 5 . . 7 . 2
No overflow problem caused by adding up the sum. Of course the array needs to be modified (or a sibling bool array of flags is needed.)
This method works even if more than 1 value is repeated.
The algorithm given by Klaus has O(1) memory requirements, but requires to sum all the elements from the given array, which may be quite large to iterate (sum) all over them.
Another approach is to iterate over array and increment the occurence counter once per iteration, so the algorithm can be stopped instantly once it finds the duplicate, though the worst case scenario is to scan through all the elements. For example:
#define N 8
int T[N] = {6, 3, 5, 1, 3, 7, 4, 2};
int occurences[N+1] = {0};
int duplicate = -1;
for (int i = 0; i < N; i++) {
occurences[T[i]]++;
if (occurences[T[i]] == 2) {
duplicate = T[i];
break;
}
}
Note that this method is also immune to integer overflow, that is N*(N+1)/2. might be larger than integer data type can possibly hold.

The meaning of target value of Leetcode Search in Rotated Sorted Array

The original problem is like this:
Suppose a sorted array is rotated at some pivot unknown to you beforehand.
(i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2).
You are given a target value to search. If found in the array return its index, otherwise return -1.
You may assume no duplicate exists in the array.
The link is here https://oj.leetcode.com/problems/search-in-rotated-sorted-array/
I don't know the meaning of the 'target value' here. Is it the value we want to find or something else? Why it is given to me?
Is it the value we want to find or something else?
Yes, for example, if you have rotated array:
4 5 6 7 0 1 2
and you are given number 6, you should return 2 - the index of 6 in the array (assuming indexes start from 0). If you are given number 8, which doesn't occur in the array - return -1.

PowerShell - Want to understand why $myarr[(-1)..0] gives 10 1 but $myarr[$myarr[-1]..0] gives expected 10 9 8 ... 0

I want to better understand what is going on here with PowerShell's range operator.
$myArray = 1..10
so we have $myArray with 1 2 3 4 ... 10
Now I want to use -1 to get the last value in the array and show 1 - 10 in reverse, so I do
$myArray[(-1)..0] but this yields only 10 1 (those two values only, nothing in between).
But if I do $myArray[$myArray[-1]..0] this will yield all the values expected 10 9 8 ... 1
Can anyone give an explanation for this? I would think the (-1) being inside [] would evaluate to the last element or value 10 which it seems to be doing then the range would kick in as 10..0 but it seems like the range is being skipped and giving only the two listed values. This is an exercise just to learn PowerShell, there is no specific application of this I'm after. Btw, I get the same 10 1 only if I run the -1 without the ().
Thanks,
Jason
It is quite simple
Let's see what -1..0 returns:
-1
0
So $myArray[-1..0] is equivalent to the $myArray[-1, 0] hence the result.
But the 10..0 expression returns an entire range reversed. Hence the $myArray[$myArray[-1]..0] expression works as you would expected.

Brute-force sudoku solver: backtracking?

An implementation of a brute-force algorithm to solve Sudoku puzzles fails if a cell is discovered in which placing any of the digits 1-9 would be an illegal move.
The implementation is written in C, with the board represented by a 9x9 array. The solver counts down from 9 until a legal number's reached, and if none can be reached, it outputs a zero in its place.
A zero also represents a cell to be filled in. Here's the output (truncated) if a string of zeros (an empty board) is the input:
9 8 7 6 5 4 3 2 1
6 5 4 9 8 7 0 0 0
Those last three zeros are there because the values filled in previously aren't changing. How can I stop the solver from failing like this?
If you would currently put a zero in a spot, instead go back to the previous spot you put a number in and continue to count down till you find another value number for that spot.
For instance, in your example:
9 8 7 6 5 4 3 2 1
6 5 4 9 8 7 0 0 0
Instead of putting the zero in below the three, you would instead go back and try putting a 6 in below the 4.
don't treat every "move" like the right move. E.g. placing the last 7 seemed ok but makes it so that in the next cell no valid moves are left. So upon hitting the "no move possible" situation, go back, and try the next option. Iterate and you will have your solution.
A better way of course would be to start brute forcing for places with a small set of options left; run through all cells and start brute forcing with the cell with the least number of options left. When starting out with all-zero, you would then end up with
9 8 7 6 5 4 3 2 1
6 5 4 0 0 0 0 0 0
3 2 1 0 0 0 0 0 0
which is legal, without backtracking once.
You can do this by pushing your guesses onto a stack. Every time you end up wanting to output a zero, instead pop your last answer off the board and continue counting from it.
So if you guess 3 in (2,3) and next you're looking at (3,3) and get to zero, go back to (2,3) and try 2, then 1, then pop to before your (2,3) guess, etc.

Resources