Regrouping Entries in Numpy Array - arrays

I have a numpy array. The best way I can describe it is an array of arrays. I have N arrays that are all the same size (L x M). What I need to do is obtain the value for each (L,M) combination and assemble these combinations into a list of N values.
Example:
I have 400 arrays that are 8 x 8. I need to obtain the value of (2,5) for all 400 arrays and put them in a list.
I have looked into numpy.dsplit() and numpy.array_split(), but either I'm not applying them correctly or they aren't what I'm needing.
Can anyone advise me? And, no, at this point, I don't have any code to show beyond obtaining the original array, and as that is research data, I'm not comfortable posting it here.

This is basic indexing.
If, for instance, myArray.shape is (400, 8, 8), you'd pull those values out with:
myArray[:, 2, 5]
(the colon means "everything in this dimension")

Related

using numpy_where on a 1D array

I am trying to use numpy_where to find the index of a particular value. Though I have searched quite a bit on the web including stackoverflow I did not find a simple 1D example.
ar=[3,1,4,8,2,1,0]
>>> np.where(ar==8)
(array([], dtype=int64),)
I expected np.where(ar==8) to return me the index/location of 8 in the the array.
What am I doing wrong? Is it something in my array?
Thanks
This is a really good example of how the range of variable types in Python and numpy can be confusing for a beginner. What's happening is [3,1,4,8,2,1,0] returns a list, not an ndarray. So, the expression ar == 8 returns a scalar False, because all comparisons between list and scalar types return False. Thus, np.where(False) returns an empty array. The way to fix this is:
arr = np.array([3,1,4,8,2,1,0])
np.where(arr == 8)
This returns (array([3]),). There's opportunity for further confusion, because where returns a tuple. If you write a script that intends to access the index position (3, in this case), you need np.where(arr == 8)[0] to pull the first (and only) result out of the tuple. To actually get the value 3, you need np.where(arr == 8)[0][0] (although this will raise an IndexError if there are no 8's in the array).
This is an example where numeric-specialized languages like Matlab or Octave are simpler to use for newbies, because the language is less general and so has fewer return types to understand.

signrank test in a three-dimensional array in MATLAB

I have a 60x60x35 array and would like to calculate the Wilcoxon signed rank test to calculate if the median for each element value across the third array dimension (i.e. with 35 values) is different from zero. Thus, I would like my results in two 60x60 arrays - with values of 0 and 1 depending on the test statistic, and in a separate array with corresponding p values.
The problem I am facing is specifying the command in a way that desired output would have appropriate dimensions and would be calculated across the appropriate dimension of the array.
Thanks for your help and all the best!
So one way to solve your problem is using a nested for-loop. Lets say your data is stored in data:
data=rand(60,60,35);
size_data=size(data);
p=zeros(size_data(1),size_data(2));
p(:,:)=NaN;
h=zeros(size_data(1),size_data(2));
h(:,:)=NaN;
for k=1:size_data(1)
for l=1:size_data(2)
tmp_data=data(k,l,:);
tmp_data=reshape(tmp_data,1,numel(tmp_data));
[p(k,l), h(k,l)]=signrank(tmp_data);
end
end
What I am doing is I preallocate the memory of p,h as a 60x60 matrix. Then I set them to NaN, so if you can easily see if sth went wrong (0 would be an acceptable result). Now I loop over all elements and store the actual data array in a new variable. signrank needs the data to be an array so I reshape it to two dimensions.
I guess you could skip those loops by using bsxfun

determine if an array has the numbers a to b each once [duplicate]

This question already has answers here:
How to tell if an array is a permutation in O(n)?
(16 answers)
Closed 9 years ago.
Given an array A of size n, and two numbers a and b with b-a+1=n, I need to determine whether or not A contains each of the numbers between a and b (exactly once).
For example, if n=4 and a=1,b=4, then I'm looking to see if A is a rearrangement of [1,2,3,4].
In practice, I need to do this with O(1) space (no hash table).
My first idea was to sort A, but I have to do this without rearranging A, so that's out.
My second idea is to run through A once, adding up the entries and checking that they are in the correct range. At the end, I have to get the right sum (for a=1,b=n, this is n(n+1)/2), but this doesn't always catch everything, e.g. [1,1,4,4,5] passes the test for n=5,a=1,b=5, but shouldn't.
The only idea of mine that works is to pass through the array n times making sure to see each number once and only once. Is there a faster solution?
You can do this with a single pass through the array, using only a minor modification of the n(n+1)/2 method you already mentioned.
To do so, walk through the array, ignoring elements outside the a..b range. For numbers that are in the correct range, you want to track three values: the sum of the numbers, the sum of the squares of the numbers, and the count of the numbers.
You can pre-figure the correct values for both the sum of numbers and the sum of the squares (and, trivially, the count).
Then compare your result to the expected results. Consider, for example, if you're searching for 1, 2, 3, 4. If you used only the sums of the numbers, then [1, 1, 4, 4] would produce the correct result (1+2+3+4 = 10, 1+1+4+4 = 10), but if you also add the sums of the squares, the problem is obvious: 1+4+9+16 = 30 but 1+1+16+16 = 34.
This is essentially applying (something at least very similar to) a Bloom filter to the problem. Given a sufficiently large group and a fixed pair of functions, there's going to be some set of incorrect inputs that will produce the correct output. You can reduce that possibility to an arbitrarily low value by increasing the number of filters you apply. Alternatively, you can probably design an adaptive algorithm that can't be fooled--offhand, it seems like if your range of inputs is N, then raising each number to the power N+1 will probably assure that you can only get the correct result with exactly the correct inputs (but I'll admit, I'm not absolutely certain that's correct).
Here is a O(1) space and O(n) solution that might help :-
Find the mean and standard deviation in range (a,b)
Scan the array and find mean and standard deviation.
if any number is outside (a,b) return false
if(mean1!=mean2 || sd1!=sd2) return false else true.
Note : I might not be 100% accurate.
Here's a solution that fails with the probability of a hash collision.
Take an excellent (for example cryptographic) hash function H.
Compute: xor(H(x) for x in a...b)
Compute: xor(H(A[i]) for i in 1...n)
If the two are the different, then for sure you don't have a permutation. If the two are the same, then you've almost certainly got a permutation. You can make this immune to input that's been picked to produce a hash collision by including a random seed into the hash.
This is obviously O(b-a) in running time, needs O(1) external storage, and trivial to implement.

Tips for finding patterns in an array

I have an array of 256 values. Those 256 values were calculated in some mysterious way, and range from 0-3 inclusive. To increase the efficiency of my program, I can calculate the results of the array given an index, rather than actually looking up in the array.
Basically, the program gives me an index, which would be looked up in the array, but I know that I can actually calculate what will be in that index using the index number itself.
For example
a[0] = 3, a[1] = 2, a[2] = 1, ... , a[254] = 1, a[255] = 1
I'm not actually asking for the calculation here, but looking at every number in the array, what are some tips on figuring out the pattern? I apologize if this is poorly worded, I'll attempt to clear up any questions.
There likely isn't a general approach to solving this problem without having some idea about the function that generated the data. You mentioned "efficiency" — if there really are only 256 values and the function to generate the data has any kind of computational complexity, it's probably more efficient to just keep it as an array.

Representing a 2D array as a 1D array [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
Implementing a matrix, which is more efficient - using an Array of Arrays (2D) or a 1D array?
Performance of 2-dimensional array vs 1-dimensional array
I was looking at one of my buddy's molecular dynamics code bases the other day and he had represented some 2D data as a 1D array. So rather than having to use two indexes he only has to keep track of one but a little math is done to figure out what position it would be in if it were 2D. So in the case of this 2D array:
two_D = [[0, 1, 2],
[3, 4, 5]]
It would be represented as:
one_D = [0, 1, 2, 3, 4, 5]
If he needed to know what was in position (1,1) of the 2D array he would do some simple algebra and get 4.
Is there any performance boost gained by using a 1D array rather than a 2D array. The data in the arrays can be called millions of times during the computation.
I hope the explanation of the data structure is clear...if not let me know and I'll try to explain it better.
Thank you :)
EDIT
The language is C
For a 2-d Array of width W and height H you can represent it as a 1-d Array of length W*H where each index
(x,y)
where x is the column and y is the row, of the 2-d array is mapped to to the index
i=y*W + x
in the 1-D array. Similarily you can use the inverse mapping:
y = i / W
x = i % W
. If you make W a power of 2 (W=2^m), you can use the hack
y = i >> m;
x = (i & (W-1))
where this optimization is restricted only to the case where W is a power of 2. A compiler would most likely miss this micro-optimization so you'd have to implement it yourself.
Modulus is a slow operator in C/C++, so making it disappear is advantageous.
Also, with large 2-d arrays keep in mind that the computer stores them in memory as a 1-d array and basically figures out the indexes using the mappings I listed above.
Far more important than the way that you determine these mappings is how the array is accessed. There are two ways to do it, column major and row major. The way that you traverse is more important than any other factor because it determines if you are using caching to your advantage. Please read http://en.wikipedia.org/wiki/Row-major_order .
Take a look at Performance of 2-dimensional array vs 1-dimensional array
Often 2D arrays are implemented as 1D arrays. Sometimes 2D arrays are implemented by a 1D array of pointers to 1D arrays. The first case obviously has no performance penalty compared to a 1D array, because it is identical to a 1D array. The second case might have a slight performance penalty due to the extra indirection (and additional subtle effects like decreased cache locality).
It's different for each system what kind is used, so without information about what you're using there's really no way to be sure. I'd advise to just test the performance if it's really important to you. And if the performance isn't that important, then don't worry about it.
For C, 2D arrays are 1D arrays with syntactic sugar, so the performance is identical.
You didn't mention which language this is regarding or how the 2D array would be implemented. In C 2D arrays are actually implemented as 1D arrays where C automatically performs the arithmetic on the indices to acces the right element. So it would do what your friend does anyway behind the scenes.
In other languages a 2d array might be an array of pointers to the inner arrays, in which case accessing an element would be array lookup + pointer dereference + array lookup, which is probably slower than the index arithmetic, though it would not be worth optimizing unless you know that this is a bottleneck.
oneD_index = 3 * y + x;
Where x is the position within the row and y the position in the column. Instead of 3 you use your column width. This way you can convert your 2D coordinates to a 1D coordinate.

Resources