I would like to average across "Rows" in a column. That is rows that have the same value in another column.
For example :
e= {{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2},
{69, 7, 30, 38, 16, 70, 97, 50, 97, 31, 81, 96, 60, 52, 35, 6,
24, 65, 76, 100}}
I would like to average all the Value in the second column that have the same value in the first one.
So Here : The Average for Col 1 = 1 & Col 1 = 2
And then create a third column with the result of this operation. So the values in that columns should be the same for the first 10 lines an next 10.
Many Thanks for any help you could provide !
LA
Output Ideal Format :
Interesting problem. This is the first thing that came into my mind:
e[[All, {1}]] /. Reap[Sow[#2, #] & ### e, _, # -> Mean##2 &][[2]];
ArrayFlatten[{{e, %}}] // TableForm
To get rounding you may simply add Round# before Mean in the code above: Round#Mean##2
Here is a slightly faster method, but I actually prefer the Sow/Reap one above:
#[[1, 1]] -> Round#Mean##[[All, 2]] & /# GatherBy[e, First];
ArrayFlatten[{{e, e[[All, {1}]] /. %}}] // TableForm
If you have many different elements in the first column, either of the solutions above can be made faster by applying Dispatch to the rule list that is produced, before the replacement (/.) is done. This command tells Mathematica to build and use an optimized internal format for the rules list.
Here is a variant that is slower, but I like it enough to share anyway:
Module[{q},
Reap[{#, Sow[#2,#], q##} & ### e, _, (q## = Mean##2) &][[1]]
]
Also, general tips, you can replace:
Table[RandomInteger[{1, 100}], {20}] with RandomInteger[{1, 100}, 20]
and Join[{c}, {d}] // Transpose with Transpose[{c, d}].
What the heck, I'll join the party. Here is my version:
Flatten/#Flatten[Thread/#Transpose#{#,Mean/##[[All,All,2]]}&#GatherBy[e,First],1]
Should be fast enough I guess.
EDIT
In response to the critique of #Mr.Wizard (my first solution was reordering the list), and to explore a bit the high-performance corner of the problem, here are 2 alternative solutions:
getMeans[e_] :=
Module[{temp = ConstantArray[0, Max[#[[All, 1, 1]]]]},
temp[[#[[All, 1, 1]]]] = Mean /# #[[All, All, 2]];
List /# temp[[e[[All, 1]]]]] &[GatherBy[e, First]];
getMeansSparse[e_] :=
Module[{temp = SparseArray[{Max[#[[All, 1, 1]]] -> 0}]},
temp[[#[[All, 1, 1]]]] = Mean /# #[[All, All, 2]];
List /# Normal#temp[[e[[All, 1]]]]] &[GatherBy[e, First]];
The first one is the fastest, trading memory for speed, and can be applied when keys are all integers, and your maximal "key" value (2 in your example) is not too large. The second solution is free from the latter limitation, but is slower. Here is a large list of pairs:
In[303]:=
tst = RandomSample[#, Length[#]] &#
Flatten[Map[Thread[{#, RandomInteger[{1, 100}, 300]}] &,
RandomSample[Range[1000], 500]], 1];
In[310]:= Length[tst]
Out[310]= 150000
In[311]:= tst[[;; 10]]
Out[311]= {{947, 52}, {597, 81}, {508, 20}, {891, 81}, {414, 47},
{849, 45}, {659, 69}, {841, 29}, {700, 98}, {858, 35}}
The keys can be from 1 to 1000 here, 500 of them, and there are 300 random numbers for each key. Now, some benchmarks:
In[314]:= (res0 = getMeans[tst]); // Timing
Out[314]= {0.109, Null}
In[317]:= (res1 = getMeansSparse[tst]); // Timing
Out[317]= {0.219, Null}
In[318]:= (res2 = tst[[All, {1}]] /.
Reap[Sow[#2, #] & ### tst, _, # -> Mean##2 &][[2]]); // Timing
Out[318]= {5.687, Null}
In[319]:= (res3 = tst[[All, {1}]] /.
Dispatch[
Reap[Sow[#2, #] & ### tst, _, # -> Mean##2 &][[2]]]); // Timing
Out[319]= {0.391, Null}
In[320]:= res0 === res1 === res2 === res3
Out[320]= True
We can see that the getMeans is the fastest here, getMeansSparse the second fastest, and the solution of #Mr.Wizard is somewhat slower, but only when we use Dispatch, otherwise it is much slower. Mine and #Mr.Wizard's solutions (with Dispatch) are similar in spirit, the speed difference is due to (sparse) array indexing being more efficient than hash look-up. Of course, all this matters only when your list is really large.
EDIT 2
Here is a version of getMeans which uses Compile with a C target and returns numerical values (rather than rationals). It is about twice faster than getMeans, and the fastest of my solutions.
getMeansComp =
Compile[{{e, _Integer, 2}},
Module[{keys = e[[All, 1]], values = e[[All, 2]], sums = {0.} ,
lengths = {0}, , i = 1, means = {0.} , max = 0, key = -1 ,
len = Length[e]},
max = Max[keys];
sums = Table[0., {max}];
lengths = Table[0, {max}];
means = sums;
Do[key = keys[[i]];
sums[[key]] += values[[i]];
lengths[[key]]++, {i, len}];
means = sums/(lengths + (1 - Unitize[lengths]));
means[[keys]]], CompilationTarget -> "C", RuntimeOptions -> "Speed"]
getMeansC[e_] := List /# getMeansComp[e];
The code 1 - Unitize[lengths] protects against division by zero for unused keys. We need every number in a separate sublist, so we should call getMeansC, not getMeansComp directly. Here are some measurements:
In[180]:= (res1 = getMeans[tst]); // Timing
Out[180]= {0.11, Null}
In[181]:= (res2 = getMeansC[tst]); // Timing
Out[181]= {0.062, Null}
In[182]:= N#res1 == res2
Out[182]= True
This can probably be considered a heavily optimized numerical solution. The fact that the fully general, brief and beautiful solution of #Mr.Wizard is only about 6-8 times slower speaks very well for the latter general concise solution, so, unless you want to squeeze every microsecond out of it, I'd stick with #Mr.Wizard's one (with Dispatch). But it's important to know how to optimize code, and also to what degree it can be optimized (what can you expect).
A naive approach could be:
Table[
Join[ i, {Select[Mean /# SplitBy[e, First], First## == First#i &][[1, 2]]}]
, {i, e}] // TableForm
(*
1 59 297/5
1 72 297/5
1 90 297/5
1 63 297/5
1 77 297/5
1 98 297/5
1 3 297/5
1 99 297/5
1 28 297/5
1 5 297/5
2 87 127/2
2 80 127/2
2 29 127/2
2 70 127/2
2 83 127/2
2 75 127/2
2 68 127/2
2 65 127/2
2 1 127/2
2 77 127/2
*)
You could also create your original list by using for example:
e = Array[{Ceiling[#/10], RandomInteger[{1, 100}]} &, {20}]
Edit
Answering #Mr.'s comments
If the list is not sorted by its first element, you can do:
Table[Join[
i, {Select[
Mean /# SplitBy[SortBy[e, First], First], First## == First#i &][[1,2]]}],
{i, e}] //TableForm
But this is not necessary in your example
Why not pile on?
I thought this was the most straightforward/easy-to-read answer, though not necessarily the fastest. But it's really amazing how many ways you can think of a problem like this in Mathematica.
Mr. Wizard's is obviously very cool as others have pointed out.
#Nasser, your solution doesn't generalize to n-classes, although it easily could be modified to do so.
meanbygroup[table_] := Join ## Table[
Module[
{sublistmean},
sublistmean = Mean[sublist[[All, 2]]];
Table[Append[item, sublistmean], {item, sublist}]
]
, {sublist, GatherBy[table, #[[1]] &]}
]
(* On this dataset: *)
meanbygroup[e]
Wow, the answers here are so advanced and cool looking, Need more time to learn them.
Here is my answer, I am still matrix/vector/Matlab'ish guy in recovery and transition, so my solution is not functional like the experts solution here, I look at data as matrices and vectors (easier for me than looking at them as lists of lists etc...) so here it is
sizeOfList=10; (*given from the problem, along with e vector*)
m1 = Mean[e[[1;;sizeOfList,2]]];
m2 = Mean[e[[sizeOfList+1;;2 sizeOfList,2]]];
r = {Flatten[{a,b}], d , Flatten[{Table[m1,{sizeOfList}],Table[m2,{sizeOfList}]}]} //Transpose;
MatrixForm[r]
Clearly not as a good a solution as the functional ones.
Ok, I will go now and hide away from the functional programmers :)
--Nasser
Related
If there is an array that contains random integers in ascending order, how can I tell if this array contains a arithmetic sequence (length>3) with the common differece x?
Example:
Input: Array=[1,2,4,5,8,10,17,19,20,23,30,36,40,50]
x=10
Output: True
Explanation of the Example: the array contains [10,20,30,40,50], which is a arithmetic sequence (length=5) with the common differece 10.
Thanks!
I apologize that I have not try any code to solve this since I have no clue yet.
After reading the answers, I tried it in python.
Here are my codes:
df = [1,10,11,20,21,30,40]
i=0
common_differene=10
df_len=len(df)
for position_1 in range(df_len):
for position_2 in range(df_len):
if df[position_1] + common_differene == df[position_2]:
position_1=position_2
i=i+1
print(i)
However, it returns 9 instead of 4.
Is there anyway to prevent the repetitive counting in one sequence [10,20,30,40] and also prevent accumulating i from other sequences [1,11,21]?
You can solve your problem by using 2 loops, one to run through every element and the other one to check if the element is currentElement+x, if you find one that does, you can continue form there.
With the added rule of the sequence being more than 2 elements long, I have recreated your problem in FREE BASIC:
DIM array(13) As Integer = {1, 2, 4, 5, 8, 10, 17, 19, 20, 23, 30, 36, 40, 50}
DIM x as Integer = 10
DIM arithmeticArrayMinLength as Integer = 3
DIM index as Integer = 0
FOR position As Integer = LBound(array) To UBound(array)
FOR position2 As Integer = LBound(array) To UBound(array)
IF (array(position) + x = array(position2)) THEN
position = position2
index = index + 1
END IF
NEXT
NEXT
IF (index <= arithmeticArrayMinLength) THEN
PRINT false
ELSE
PRINT true
END IF
Hope it helps
Edit:
After reviewing your edit, I have come up with a solution in Python that returns all arithmetic sequences, keeping the order of the list:
def arithmeticSequence(A,n):
SubSequence=[]
ArithmeticSequences=[]
#Create array of pairs from array A
for index,item in enumerate(A[:-1]):
for index2,item2 in enumerate(A[index+1:]):
SubSequence.append([item,item2])
#finding arithmetic sequences
for index,pair in enumerate(SubSequence):
if (pair[1] - pair[0] == n):
found = [pair[0],pair[1]]
for index2,pair2 in enumerate(SubSequence[index+1:]):
if (pair2[0]==found[-1] and pair2[1]-pair2[0]==n):
found.append(pair2[1])
if (len(found)>2): ArithmeticSequences.append(found)
return ArithmeticSequences
df = [1,10,11,20,21,30,40]
common_differene=10
arseq=arithmeticSequence(df,common_differene)
print(arseq)
Output: [[1, 11, 21], [10, 20, 30, 40], [20, 30, 40]]
This is how you can get all the arithmetic sequences out of df for you to do whatever you want with them.
Now, if you want to remove the sub-sequences of already existing arithmetic sequences, you can try running it through:
def distinct(A):
DistinctArithmeticSequences = A
for index,item in enumerate(A):
for index2,item2 in enumerate([x for x in A if x != item]):
if (set(item2) <= set(item)):
DistinctArithmeticSequences.remove(item2)
return DistinctArithmeticSequences
darseq=distinct(arseq)
print(darseq)
Output: [[1, 11, 21], [10, 20, 30, 40]]
Note: Not gonna lie, this was fun figuring out!
Try from 1: check the presence of 11, 21, 31... (you can stop immediately)
Try from 2: check the presence of 12, 22, 32... (you can stop immediately)
Try from 4: check the presence of 14, 24, 34... (you can stop immediately)
...
Try from 10: check the presence of 20, 30, 40... (bingo !)
You can use linear searches, but for a large array, a hash map will be better. If you can stop as soon as you have found a sequence of length > 3, this procedure takes linear time.
Scan the list increasingly and for every element v, check if the element v + 10 is present and draw a link between them. This search can be done in linear time as a modified merge operation.
E.g. from 1, search 11; you can stop at 17; from 2, search 12; you can stop at 17; ... ; from 8, search 18; you can stop at 19...
Now you have a graph, the connected components of which form arithmetic sequences. You can traverse the array in search of a long sequence (or a longest), also in linear time.
In the given example, the only links are 10->-20->-30->-40->-50.
Say that I have a batch of arrays, and I would like to alter them based on conditions of particular values located by indices.
For example, say that I would like to increase and decrease particular values if the difference between those values are less than two.
For a single 1D array it can be done like this
import numpy as np
single2 = np.array([8, 8, 9, 10])
if abs(single2[1]-single2[2])<2:
single2[1] = single2[1] - 1
single2[2] = single2[2] + 1
single2
array([ 8, 7, 10, 10])
But I do not know how to do it for batch of arrays. This is my initial attempt
import numpy as np
single1 = np.array([6, 0, 3, 7])
single2 = np.array([8, 8, 9, 10])
single3 = np.array([2, 15, 15, 20])
batch = np.array([
np.copy(single1),
np.copy(single2),
np.copy(single3),
])
if abs(batch[:,1]-batch[:,2])<2:
batch[:,1] = batch[:,1] - 1
batch[:,2] = batch[:,2] + 1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Looking at np.any and np.all, they are used to create an array of booleans values, and I am not sure how they could be used in the code snippet above.
My second attempt uses np.where, using the method described here for comparing particular values of a batch of arrays by creating new versions of the arrays with values added to the front/back of the arrays.
https://stackoverflow.com/a/71297663/3259896
In the case of the example, I am comparing values that are right next to each other, so I created copies that shift the arrays forwards and backwards by 1. I also use only the particular slice of the array that I am comparing, since the other numbers would also be used in the comparison in np.where.
batch_ap = np.concatenate(
(batch[:, 1:2+1], np.repeat(-999, 3).reshape(3,1)),
axis=1
)
batch_pr = np.concatenate(
(np.repeat(-999, 3).reshape(3,1), batch[:, 1:2+1]),
axis=1
)
Finally, I do the comparisons, and adjust the values
batch[:, 1:2+1] = np.where(
abs(batch_ap[:,1:]-batch_ap[:,:-1])<2,
batch[:, 1:2+1]-1,
batch[:, 1:2+1]
)
batch[:, 1:2+1] = np.where(
abs(batch_pr[:,1:]-batch_pr[:,:-1])<2,
batch[:, 1:2+1]+1,
batch[:, 1:2+1]
)
print(batch)
[[ 6 0 3 7]
[ 8 7 10 10]
[ 2 14 16 20]]
Though I am not sure if this is the most computationally efficient nor programmatically elegant method for this task. Seems like a lot of operations and code for the task, but I do not have a strong enough mastery of numpy to be certain about this.
This works
mask = abs(batch[:,1]-batch[:,2])<2
batch[mask,1] -= 1
batch[mask,2] += 1
I'm having List index out of range error and the issue is that I'm trying to show 25 results of players on a squad. Squads don't require 25, but only have a limit of 25. So when the squad doesn't contain 25 players, I get the out of range error. My question is, how do I display a list of squad members up to 25, but not requiring 25? Here is the line that is causing issues:
e = discord.Embed(title=f"{x2[0]['squadName']} ({squadnumber})", color=discord.Colour(value=235232), description='\n'.join([f"{c} <#{x[c-1]['player']}> - {int(x[c-1]['points']):,d} Score"]) for c in range(1+(25*(0)), 26+(25*(0)))]))
I used this method to get the range:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [x[i] for i in range(0, 5 if len(x) >= 5 else len(x))]
# this will get the first 5 elements of the list, and if the list isn't long enough
# it will get the length of the list
Here's the concept in use:
And applying this method will get you this:
e = discord.Embed(title=f"{x2[0]['squadName']} ({squadnumber})",
color=0x396E0,
description='\n'.join([f"{c} <#{x[c-1]['player']}> - {int(x[c-1]['points']):,d} Score" for c in range(1, 26 if len(x.keys()) > 25 else len(x.keys()))]))
Also, I noticed another thing with the code, such as discord.Color(value=some_value), you could just do 0xHEXCODE for example, to get the hex code, so I edited it in to make it easier on the eyes.
Please let me know if you need clarification on anything.
References:
0x usage in python
Using if/else in list comprehension
Getting hex colour codes
I have a multiple input arrays and I want to generate one output array where the value is 0 if all elements in a column are the same and the value is 1 if all elements in a column are different.
For example, if there are three arrays :
A = [28, 28, 43, 43]
B = [28, 43, 43, 28]
C = [28, 28, 43, 43]
Output = [0, 1, 0, 1]
The arrays can be of any size and any number, but the arrays are also the same size.
A none loopy way is to use diff and any to advantage:
A = [28, 28, 43,43];
B = [28, 43, 43,28];
C = [28, 28, 43,43];
D = any(diff([A;B;C])) %Combine all three (or all N) vectors into a matrix. Using the Diff to find the difference between each element from row to row. If any of them is non-zero, then return 1, else return 0.
D = 0 1 0 1
There are several easy ways to do it.
Let's start by putting the relevant vectors in a matrix:
M = [A; B; C];
Now we can do things like:
idx = min(M)==max(M);
or
idx = ~var(M);
No one seems to have addressed that you have a variable amount of arrays. In your case, you have three in your example but you said you could have a variable amount. I'd also like to take a stab at this using broadcasting.
You can create a function that will take a variable number of arrays, and the output will give you an array of an equal number of columns shared among all arrays that conform to the output you're speaking of.
First create a larger matrix that concatenates all of the arrays together, then use bsxfun to take advantage of broadcasting the first row and ensuring that you find columns that are all equal. You can use all to complete this step:
function out = array_compare(varargin)
matrix = vertcat(varargin{:});
out = ~all(bsxfun(#eq, matrix(1,:), matrix), 1);
end
This will take the first row of the stacked matrix and see if this row is the same among all of the rows in the stacked matrix for every column and returns a corresponding vector where 0 denotes each column being all equal and 1 otherwise.
Save this function in MATLAB and call it array_compare.m, then you can call it in MATLAB like so:
A = [28, 28, 43, 43];
B = [28, 43, 43, 28];
C = [28, 28, 43, 43];
Output = array_compare(A, B, C);
We get in MATLAB:
>> Output
Output =
0 1 0 1
Not fancy but will do the trick
Output=nan(length(A),1); %preallocation and check if an index isn't reached
for i=1:length(A)
Output(i)= ~isequal(A(i),B(i),C(i));
end
If someone has an answer without the loop take that, but i feel like performance is not an issue here.
Learning to be pythonic in 2.7. Is there a way to avoid the explicit loop? answer = [5, 4, 4, 3, 3, 2]
import numpy as np
import scipy.special as spe
nmax = 5 # n = 0, 1 ...5
mmax = 7 # m = 1, 2 ...7
big = 15.
z = np.zeros((nmax+1, mmax))
for i in range(nmax+1):
z[i] = spe.jn_zeros(i, mmax)
answer = [np.max(np.where(z[i]<big))+1 for i in range(nmax+1)]
print answer # list of the largest m for each n where the mth zero of Jn < big
What does "more Pythonic" really mean here. One of the core tenets of Python is readability, so if there is no real performance reason to get rid of the loops, just keep them.
If you really wanted to see some other ways to do the same thing, then:
z = np.zeros((nmax+1, mmax))
for i in range(nmax+1):
z[i] = spe.jn_zeros(i, mmax)
could be replaced with:
func = lambda i:spe.jn_zeros(i,mmax)
np.vstack(np.vectorize(func, otypes=[np.ndarray])(np.arange(nmax+1)))
which is slightly faster (1.35 ms vs. 1.77 ms) but probably less Pythonic and
[np.max(np.where(z[i]<big))+1 for i in range(nmax+1)]
could be replaced by
np.cumsum(z < big,axis=1)[:,-1]
which I would argue is more Pythonic (or numpythonic) and much faster (20 us vs. 212 us).