This question already has answers here:
Pandas groupby mean - into a dataframe?
(4 answers)
Closed 2 years ago.
I am working with pandas and I have a list between 1949 and 1960, with months (January to December), and associated to each month a number (of people). Months are in column A, nb of people in column B.
I would like to calculate the mean of people for every month and determine the month with the maximum of people, over the time-period.
How can I do that ? I had the idea of using a rolling mean but I wanted to make sure there is a simpler way to do it before jumping into it too much.
It is organized as:
nf =
A B
Jan 3
Feb 5
... ...
Jan 4
Feb 1
... ...
Jan 0
Feb 9
... ...
u can achieve this task useing groupby() method:
nf.groupby(['A'],as_index=false).mean()
You can do it like this:
df = nf.groupby('A').mean()
This will give you the mean for each month.
Then you can sort the results:
df.sort_values(by=['B'], ascending = False)
Related
This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 6 months ago.
Given the following df with fruits and their prices by month, I'd like to have all of the fruits listed in a single Fruit_Month column and then have another column called Prices. The ultimate goal is to calculate the correlation between fruit prices.
Given:
Fruit Jan Feb
Apple 2.00 2.50
Banana 1.00 1.25
Desired output:
Fruit_Month Price
Apple_Jan 2.00
Apple_Feb 2.50
Banana_Jan 1.00
Banana_Feb 1.25
And then from here, I'd like to see how correlated each fruit is with one another. In this simple example, it'd just be Apple vs Banana, but it should apply if there were more fruits. If there's a better/easier way, please let me know.
Here is an approach that first melts the table to make the Month row, then makes a new df using the melted columns. I bet there are more clever ways to do this, maybe with unstack. Maybe depending on what you need to do, it will be easier to keep Fruit and Month as separate columns.
df = df.melt(id_vars='Fruit',var_name='Month',value_name='Price')
df = pd.DataFrame({
'Fruit_Month': df.Fruit+'_'+df.Month,
'Price': df.Price
})
I would be grateful for your help on the below.
I have two tables as per the image below, one on the left and one on the right.
Each row in the left table has "Genre" & "Age Years". Under the years 2020 to 2024 I want to pull in the % from the right had side table. There are two values it has to be equal to, the correct genre and be within the correct Age Years range.
Therefore if it was TV and Age Years was 3 it would return 10% but if it was TV and 4 years old it would return 3%
I have tried Vlookup and Index Match with little success.
If I understand correctly and in columns from 2020 to 2024 should have the same value for each genre and age, then it is enough to combine the HLOOKUP and MATCH functions in the array formula:
{=HLOOKUP($B2;$M$1:$Y$6;MATCH(1;--($A2=$L$2:$L$6);0)+1;TRUE)}
Array formula after editing is confirmed by pressing ctrl + shift + enter
I have the following ssrs matrix that I am building :
Month(Column)
Sales(Rows) SalesData (Data)
My data looks something like this :
Jan Feb March
Sales 10 3 9
What I would like to do now is to find the difference between each of the rows to show
something like :
Jan Feb March
Sales 10 3 9
#change -7 6
In an ssrs table its a simple expression .
I do not know how I need to do it in a matrix since the Months Columns are generated dynamically
Please direct me..
Just wanted to add this to clarify :
This is how my matrix looks
Month(Column)
Sales(Rows) SalesData (Data)
If you right click on the row group and put 'Add Total'
After
Then click on the area that they up the Total for you.
In the value section add an expression like: Fields!YourOtherFieldToCalculate.Value - Fields!YourFieldToCalculate.Value
If you are dealing with the same values you could add this
ReportItems!SomeTextBox.Value - ReportItems!SomeOtherTextBox.Value
(though I think you would have to add it into the footer)
With your new clarification I would go with Sam's use of Previous: though I would have used but his checks for nulls which is a good thing!
=Fields!Column.Value - Previous(Fields!Column.Value)
I have a 200 data files to process. I need a solution for one of the files and I would do same for the rest of the files. It is a typical daily time series problem.
My rainfall data is arranged thus: 1990 to 2011 as years, under each year are 12 months, and in front of each month are 29 or 30 or 31 days depending on the month.
My problem is to take all the days in each month and place them beneath that month and for each year. The result will be two column vectors; one for dates and one for rainfall on each day, in each month in each year.
Thanks in advance.
Asong.
my data is shaped as:
1960 1 2 3 4 5 6 4
1961 1 2 3 4 5 6 4
and I want it to be 1960
1
.
.
N
1961
1
.
.
N
etc. as a column not row form.
I got the answer using reshape(a.',1,[]).
However, one problem remains. My data contains 31 days in all months. How can I tell matlab to delete last two or three days in February and one day in April and other months which are supposed to have 30 days but have 31 in my time series?
I'm building a report of case results with a parent-child grouping on the row group and single column grouping:
Parent Row Group: Location
Child Row Group: Result
Column Group: Month
Running across the report are months in the year, and running down the report are the location and the different result breakdowns for the location in the given month. Looks something like this:
Jan Feb Total
% # % # % #
Main Office
Pass ? 5 ? 6 55% 11
Fail ? 5 ? 4 45% 9
Total 10 10 20
Other Office
Pass ? 3 ? 2 25% 5
Fail ? 7 ? 8 75% 15
Total 10 10 20
I have everything working except for the percentage breakdowns as indicated by the question marks above. I can't seem to get that total (the 10 for each month/location set above) reflected into my expression caclulation. Any ideas on how to setup my groups and variables to properly render these percentages?
Here's my attempts so far:
Count(Fields!Result.Value, "dsResults") = 40
Count(Fields!Result.Value, "LocationRowGroup") = 20
Count(Fields!Result.Value, "ResultRowGroup") = 11 - (for the Main Office/January/Pass cell, which is the total for the whole year for that result)
Count(Fields!Result.Value, "MonthColumnGroup") = 20
SSRS gets the count correct on the total line right, so there must be a way to reproduce that scope within the data cells?
I sometimes work around annoying SSRS scope issues by pre-calculating my totals, subtotals and percentages. Take a look at this response (to a different post) for an example. I know it is unsatisfying, but it works: pre-calc values suggestion