Pandas: switch from two arrays (i.e. columns) to one array [duplicate] - arrays

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 6 months ago.
Given the following df with fruits and their prices by month, I'd like to have all of the fruits listed in a single Fruit_Month column and then have another column called Prices. The ultimate goal is to calculate the correlation between fruit prices.
Given:
Fruit Jan Feb
Apple 2.00 2.50
Banana 1.00 1.25
Desired output:
Fruit_Month Price
Apple_Jan 2.00
Apple_Feb 2.50
Banana_Jan 1.00
Banana_Feb 1.25
And then from here, I'd like to see how correlated each fruit is with one another. In this simple example, it'd just be Apple vs Banana, but it should apply if there were more fruits. If there's a better/easier way, please let me know.

Here is an approach that first melts the table to make the Month row, then makes a new df using the melted columns. I bet there are more clever ways to do this, maybe with unstack. Maybe depending on what you need to do, it will be easier to keep Fruit and Month as separate columns.
df = df.melt(id_vars='Fruit',var_name='Month',value_name='Price')
df = pd.DataFrame({
'Fruit_Month': df.Fruit+'_'+df.Month,
'Price': df.Price
})

Related

Excel formula: picking a value from a table/array of non-unique values

OK, so here's the problem that's got me stumped in Excel. I have three columns:
date value fruit
5-Jan 19 apple
5-Jan 2 banana
6-Jan 8 grapefruit
6-Jan 2 lemon
6-Jan 14 orange
5-Jan 8 peach
1-Jan 14 pear
6-Jan 4 starfruit
10-Jan 3 strawberry
The data is not sorted in any way. The dates, values, and fruits are non-unique.
I'm going to copy and paste the date values into the first column of another table, and remove duplicates. I need a formula to paste into the second column of that table that will, for each date, identify the fruit with the highest value.
Here's the expected outcome from that formula.
date fruit
1-Jan pear
5-Jan apple
6-Jan orange
10-Jan strawberry
I've been playing around with index/match, vlookup, arrays (which baffle me), and I'm stumped.
Ideas? Much thanks!
With EXCEL-365 try below
=#INDEX(SORT(FILTER($B$2:$C$10,$A$2:$A$10=F2),1,-1),,2)
For older version of excel try below array formula-
=INDEX($C$2:$C$10,SMALL(IF($A$2:$A$10=F2,IF($B$2:$B$10=MAX(IF($A$2:$A$10=F2,$B$2:$B$10,"")),ROW($B$2:$B$10)-ROW($B$1),""),""),1))
Array formula needs CSE entry means press CTRL+SHIFT+ENTER.

Counting grouped cells excel

Currently I have a excel table that looks like this
A B C D E F G
ID NAME DATE ITEM 2020 3
1234 Alex 09-20-2020 Carrot 2019 2
1234 Alex 09-20-2020 Onion
1234 Alex 09-20-2019 Carrot
1234 Alex 09-20-2019 Mushroom
1234 Alex 09-20-2020 Pasta
1345 Morgan 09-20-2020 Pasta
1345 Morgan 09-20-2020 Tomato Sauce
1145 Jayson 09-20-2020 Tomato Sauce
1145 Jayson 09-20-2020 Cream Sauce
1345 Morgan 09-20-2019 Pasta
1345 Morgan 09-20-2019 Tomato Sauce
I want to be able to count the unique customers for each year using excel functions. This is so that the functions can be transferred to a different computers without setting up the custom functions.
The proccess currently can be done in excel without function by: adding filter to each column, filtering to only show the intended year, using remove duplicate to remove duplicates in NAME, and finally counting the rows (giving reslts seen in G2 & G3). However, I want to be able to do that through excel functions. So far what I have is that I am able to count unique values through
{=SUM(IF(FREQUENCY(IF(LEN(B2:B12)>0,MATCH(B2:B12,B2:B12,0),""),IF(LEN(B2:B12)>0,MATCH(B2:B12,B2:B12,0),''))>0,1))}
Additionally I am also able to SUMPRODUCT() for counting a array with multiple condition so for now I have combined the above forumla with
SUMPRODUCT((YEAR(C2:C12)=G1)+0)
My initial idea was to add the first function into the SUMPRODUCTI() since the first function could also produce a array that it could count. However that quickly did not work as it did not count the unique values corresponding to the year.
My question here is if there is any way to what would be a grouping function so that I can take unique values that are within a year, without transforming the data (through filters of deletion of duplicates). My current understanding with SUMPRODUCT() is that it will only look for unique values in the entire column but not within the range given for the first array.
You have got numeric ID's which you should make use of. If you have got Excel O365, in G1 use:
=COUNT(UNQIUE(FILTER(A$2:A$12,YEAR(C$2:C$12)=F1)))
With older versions, use this CSE-entered formula:
=SUM(--(FREQUENCY(IF(YEAR(C$2:C$12)=F1,A$2:A$12),A$2:A$12)>0))
And drag down.

BI - fact table design with incompatible grains

I'm quite new to BI designing DB, and here some point I do not understand well.
I'm trying to import french census data, where I got population for each city. For each city, I have population with different age classification, that can't really relate with each other.
For instance, let's say that one classification is 00 to 20 years old, 21 to 59, and 60+
And the other is way more precise : 00 to 02, 03 to 05, etc. but the bounds are never the same as the first one classification : I don't have 15 to 20, but 18 to 22, for example.
So those 2 classifications are incompatible. How can I use them in my fact table ? Should I use 2 fact tables and 2 cubes ? Should I use one fact table, and 2 dimensions for 1 cube ? But in this case, I will have double counted facts when I'll sum to have total population for a city, won't I ?
This is national census data, and national classifications, so changing that or estimating population to mix those classifications is not an option. And to be clear, one row doesn't relate to one person, but to one city. My facts are not individuals but cities' populations.
So this table is like :
Line 1 : One city - one amount of population - one code for dim age (ex. 00 to 19 yo) of this population - code (m/f) for the dim gender of that population - date of the census
Line 2 : Same city - one amount of population - one code for dim age (ex. 20 to 34) of this population - code (m/f) for the dim gender - date of the census
And so it goes for a lot of cities, both gender, and multiple years.
Same
I hope this question is clear enough, as english is not my native language and as I'm quite new in DB and BI !
Thanks for helping me with that.
One possible solution using a single fact table and two dimensions for the age ranges:
1 - Categorical range based on the broadest census, for example:
Young 0-20
Adult 21-59
Senior 60+
You could then link the other census to this dimension with approximate values, for example 18-22 could be Young.
2 -Original age range. This dimension could be used for precise age ranges when you report on a single city, it can also help you evaluate the impact of the overlapping bounds (e.g. how many rows are in the young / 18-22 range?)
you can crate one dimention as below
young 1-20
adult 21-59
senior 60+
Classification is
young city 1 : 1-20
young city 2 : 4-23
id field1 field2 field3 field4 .......
1 1 year young_city_1 other .......
2 2 year young_city_1 other .......
3 3 year young_city_1 other .......
4 4 year young_city_1 young_city_2 .......
Now you can report from any item and with any division
i hope it is help you

Higlight the dominating number in excel, most repeated for each keyword

Is this possible using excel formulas? To find keyword and number then match and color the highest number for that specific keyword, e.g. below:
this is the list Cell A keyword and B numbers
shoes 9
shoes 5
shoes 3
furniture 2
furniture 4
furniture 5
beauty 6
beauty 8
health 35
health 4
health 2
grocery 3
grocery 2
computers 9
computers 7
laptop 2
laptop 11
laptop 2
laptop 6
pets 9
pets 3
books 5
books 5
shoes 9 Highlight this number
shoes 5
shoes 3
furniture 2
furniture 4
furniture 5 Highlight this number
beauty 6
beauty 8 Highlight this number
health 35 Highlight this number
health 4
health 2
grocery 3 Highlight this number
grocery 2
computers 9 Highlight this number
computers 7
laptop 2
laptop 11 Highlight this number
laptop 2
laptop 6
pets 9 Highlight this number
pets 3
books 5 ignore if its equal
books 5
You can use conditional formatting, choosing "Use a formula..." and use a formula such as =b1=maxifs($B$1:$B$100,$A$1:$A$100,a1). Be mindful of absolute vs. relative reference to ensure that you're tracking the right ranges.
In particular when tagged vba you should be showing what you have tried. macros Usage guide specifically states "DO NOT USE for VBA / MS-Office languages" and excel wiki states "Questions tagged with excel should be version-agnostic.". However, with a formula is possible in versions earlier than those with MAXIFS (ie not: Excel for Office 365 Excel for Office 365 for Mac Excel 2016 Excel 2016 for Mac Excel Online Excel for iPad Excel for iPhone Excel for Android tablets Excel for Android phones Excel Mobile), if in a more long-winded way:
Assuming you have 11 in B18. Add a column (say I) and populate I1 with 0 and enough of it from I2 downwards with:
=IF(A1<>A2,I1+1,I1)
copied down to sort your data on ColumnI Smallest to Largest then by ColumnB Largest to Smallest (to preserve the order of the values in ColumnA).
Then select B2 down to as far as required, clear any existing CF rules from it and HOME > Styles - Conditional Formatting, New Rule..., Use a formula to determine which cells to format and Format values where this formula is true::
=AND(A1<>A2,B2<>B3)
Format..., select choice of formatting, OK.
The above should not, as specified, highlight the values for books though if working I suspect #nutsch's current answer might.
Sorry, I forgot to adjust my guess for what was where, once I realised a header row would make things easier.
This does though stil have a problem, in that text that changes from one row to the next but shares the same quantity, one row to the next, will not trigger highlighting - a more complex formula may be required.
based on #pnuts idea, found a simpler way to do it.
Sort Z to A of B row, then sort column A, A to Z, with expand the selection for both
next write a formula to highlight duplicates excluding the first one from column A and drag down the formula, it higlights all the correct ones.
thank you

Report services with multiple grouping

I am new to reporting services. I have 2 tables:
"cars" with columns id, cartype, capacity
"values" with column id,carid, year, val1, val2
Records for these tables are:
Cars:
id cartype capacity
1 Passat 2200
2 BMW 2800
Values:
id carid year val1 val2
1 1 2012 100 1
2 1 2011 200 2
3 1 2010 300 3
4 2 2012 400 4
5 2 2011 500 5
I want to make a report that shows this:
Car Type Capacity
Passat 2200
2012 2011 2010
val1 100 200 300
val2 1 2 3
Car Type Capacity
BMV 2800
2012 2011
val1 400 500
val2 4 5
I made a data source with this select:
SELECT m.Id AS carid, m.cartype, m.capacity, v.Id AS idval, v.An, v.val1, v.val2
FROM car AS m INNER JOIN values AS v ON m.Id = v.carid
I have tried to use a matrix but I can't succeed in making this format. Can somebody help me to obtain this report?
Your Dataset is fine for this report.
You need to create a List based on the Cars Group, the within this List add two Textboxes for the Car details and a Matrix for val1, val2, etc.
A List allows you flexibility to place and move items as required, and placing a Matrix with the Cars group means it will only include values in scope for each Car. The List (and hence Car details and the values Matrix) will be repeated for each Car as required.
Added after comment:
It's impossible to say what was causing your error; it's really a specific implementation detail. To give an example of how this might be done I've mocked up a report. First step is to create the Car group:
You can see there is one Group, with one Textbox. In the Textbox there is a Rectangle (Lists in SSRS are just tables with Rectangles inserted). Car and Capacity are just Textboxes. In this example I've used two Matrices, but this could be done any number of ways. Val1:
Val2:
Final result:
So you can see it's very possible, you just need to understand the grouping required and how to construct a matrix. Unfortunately it's impossible to say what caused this error but hopefully this gives you something to aim towards.

Resources