Distinct values for first/last date in group - database

I have data in the format below with unique ID's in Column A, but these ID's could appear on multiple rows representing repeat transactions against that individual. In col B i have the datetime stamp of that transaction, and in Col C, the name of the transaction;
Col A Col B Col C
ABC1 15/02/2018 16:26 Apple
ABC1 14/02/2018 11:26 Pear
ABC1 13/02/2018 09:11 Pear
ABC2 15/02/2018 16:26 Orange
ABC2 14/02/2018 11:26 Pear
ABC2 13/02/2018 09:11 Apple
ABC3 15/02/2018 16:26 Grape
ABC3 14/02/2018 11:26 Orange
ABC3 13/02/2018 09:11 Apple
I'm trying to pivot this data with MIN and MAX criteria on the datestamp to get the count of how many records had which transaction in Col C as their first transaction, how many had X transaction in Col C as their latest transaction etc, the aim to finalise the data in something like this;
MIN (first) transactions:
Distinct Count Col A Col C
1 Pear
2 Apple
MAX (last) Transactions:
Distinct Count Col A Col C
1 Grape
1 Orange
1 Apple
Is there a way to do this with Pivot tables I'm missing? I'm working with several million rows of data here so manipulating via a pivot is easier for me to do (data loaded via power query) than using a formula or something. I can concatenate columns during the load process if needed.
Thanks in advance for your help.

Use helper columns as this will allow you to use page filters for max and min rather than relying on ordering each column in question.
Set your data up as a table. Then add a max column and a min column.
Max column formula:
=IF([#[Col B]]=MAX([Col B]),1,0)
Min column formula:
=IF([#[Col B]]=MIN([Col B]),1,0)
Create 2 pivots. 1 for max and 1 for min and put the max or min in the page field and filter on 1 (i.e. date is max or min of source values)
Order the Column C by count of Column of Column C (the fruit name column), in which ever way you see fit. Ascending for the min if you are interested in the fruit with the smallest count for the min date.
Final outcome:
You can always remove unwanted fields e.g. Column B to get the exact same look:
Edit:
If you want to show the count of each fruit, by ID, for the minimum date for that ID you can use lookup table pivot(s)
An example lookup table pivot for minimum values for each ID
You then reference this table in your source table, in a helper column, using index match to retrieve the minimum date and compare against the date in your data table for the same ID:
Formula in helper column (MinMatch):
=IF(INDEX(LookupMin!B:B,MATCH(A2,LookupMin!A:A,0))=[#Date],1,0)
Note: This would be a lot easier if you created a unique key of ID & Fruit and lookup against that.
The helper column formula is:
=IFERROR(IF([#[Col B]]=INDEX(LookupMin!$A:$E,MATCH([#[Col A]],LookupMin!$A:$A,0),MATCH([#[Col C]],LookupMin!$4:$4)),1,0),"")
LookupMin! is the sheet with the minimum pivot in.
Note that I have used a pivot on the data table to see count of each fruit, on the minimum date for each ID.
You could have used a formula instead, but then you would have repeating sums i.e see Column F
Formula in E (then dragged down):
=SUMIFS([MinMatch],[Fruit],C2,[ID],A2)
Finally, if you then decided you wanted earliest date for ID and fruit you could change the lookup as follows:

Related

Combine 2 rows give a new name and SUM their respective value display in a new column and take any of the row date

Problem
Field Name Field Value date
Jak 10 08/08/2020
Danz 15 08/08/2020
Rob 20 08/08/2020
Result should be: create new column for New Field Name and New column for SUM field value.
Field Name Field Value date New Col NewFieldValue Date
Jak 10 08/08/2020 Mat 45 08/08/2020
Danz 15 08/08/2020
Rob 20 08/08/2020
I won't write the code for you because you won't learn anything, instead I'll give you some points to go with:
Generate a row number based on [Date] column using a subquery.
Generate a row number based on [Date] column and the SUM() of FieldValue column grouped by Date column using another subquery.
LEFT JOIN both on RowNumber and [Date] columns.

Summarize rows based on multiple cells

Let's say columns 1-10 are identifiers and 11-15 are values. Identifiers might occur multiple times and I'd like to add up the values in each column. Example:
A1|B1|1|2|
A1|B1|5|3|
A2|B2|3|6|
A2|B2|4|2|
should become:
A1|B1|6|5|
A2|B2|7|8|
This is a straightforward case for a Pivot Table.
Add column headings in the first row: Identifier 1|Identifier 2|Value 1|Value 2.
Select the data (A1 to D5 in this example) and go to Data -> Pivot Table -> Create.
Drag Identifier 1 and Identifier 2 to Row Fields, and Value 1 and Value 2 to Data Fields. Uncheck Total Rows and Total Columns.
The result:

Excel Array Lookup Formula

I have two tables as below. For formulas, assume "ID1" is on cell A1 and one blank row between tables so "ID" is on cell A4.
ID1 ID2 ID3 ID4 ID_OF_MAXDATE
a b d #N/A formula_here
ID DATE
a 1/1/2015
b 1/2/2015
e 1/3/2015
d 1/4/2015
g 1/5/2015
In the formula, I want the id of the max date if id's in that row. So in this case, out of a,b,d - the max date is d with 1/4/2015. So the I want the formula to output d.
I have the below so far but the #N/A throws it off. Without the N/A value, the below outputs the max date. However, I want the ID of the max date. And it should ignore N/A's in the range. Note, all ID's in table 1 will appear in table 2. But some id columns in table 1 may be N/A.
=MAX(IF(A2:D2=A7:A11,B7:B11))
A much bigger and complex formula than expected, but it will take into account that a date can appear more than once in the data set. Be sure and enter with CTRL + SHIFT + ENTER.
=IF(SUM(IFERROR(MATCH(A2:D2,$A$6:$A$10,0),""))>0,LOOKUP(REPT("Z",255),IF(MAX(IF(FREQUENCY(IFERROR(MATCH(TRANSPOSE(A2:D2),$A$6:$A$10,0),""),ROW($B$6:$B$10)-ROW($B$6)+1),$B$6:$B$10))=IF(FREQUENCY(IFERROR(MATCH(TRANSPOSE(A2:D2),$A$6:$A$10,0),0),ROW($B$6:$B$10)-ROW($B$6)+1),$B$6:$B$10),$A$6:$A$10)),"No Match Found")
I also put in some additional error handling. The formula will return "No Match Found" if it is unable to find a match.
Insert an "iferror". In the example above, change the formula to:
=MAX(IF(IFERROR(A2:D2,"")=A7:A11,B7:B11))

SQL Optimize Group By Query

I have a table here with following fields:
Id, Name, kind. date
Data:
id name kind date
1 Thomas 1 2015-01-01
2 Thomas 1 2015-01-01
3 Thomas 2 2014-01-01
4 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
6 Sasha 1 2014-01-01
I have an SQL statement like this:
Select name,kind,Count(*) AS RecordCount
from mytable
group by kind, name
I want to know how many records there are for any name and kind. Expected results:
name kind count
Thomas 1 2
Thomas 2 1
Kevin 2 2
Sasha 1 4
The problem is that it is a big table, with more than 50 Million records.
Also I'd like to know the result within the last hour, last day, last week and so on, for which I need to add this WHERE clause this:
Select name,kind,Count(*) AS RecordCount
from mytable
WHERE Date > '2015-26-07'
group by kind, name
I use T-SQL with the SQL Server Management Studio. All of the relevant columns have a non clustered index and the primary key is a clustered index.
Does somebody have ideas how to make this faster?
Update:
The execution plan says:
Select, Compute Scalar, Stream Aggregate, Sort, Parallelism: 0% costs.
Hash Match (Partial Aggregate): 12%.
Clustered Index Scan: 88%
Sorry, I forgot to check the SQL-statements.
50 million is just lot of rows
Not anything you can do to optimize that query that I can see
Possibly a composite index on kind, name
Or try name, kind
Or name only
I think the query optimizer is smart enough for this to not be a factor but but switch the group by to name, kind as name is more unique
If kind is not very unique (just 1 and 2) then you may be better off no index on that
I defrag the indexes you have
To query the last day is no big deal because you already have a date column on witch you can put an index on.
For last week I would create a seperate date-table witch contains one row per day with columns id, date, week
You have to pre-calculate the week. And now if you want to query a specific week you can look in the date table, get the Dates and query only those dates from your tabele mytable
You should test if it is more performant to join the date columns or if you better put the id column in your myTable an join with id. For big tables id might be the better choice.
To query last hour you could add the column [hour] in myTable an query it in combination with the date

RANKX not working when data summarized in power pivot table

I am trying to rank records in power pivot table using DAX as below in MSSQL analysis service tabular model.
Example details:
I have a shop sales detail in table.
e.g.
ShopNo date sales
-----------------
1 2014-11-09 120
1 2014-11-09 130
2 2014-11-10 130
2 2014-11-10 135
In pivot table data is analyzed month and year wise.
I want to see result like
ShopNo sales rank
-----------------
2 265 1
3 250 2
Any solution is there to display statewise population automatically.
Thanks
You should be able to achieve the ranking quite easily with PowerPivot using this formula:
RankShop:=RANKX(ALL(SalesTable[ShopNo]), [Sum of sales],,,Dense)
With SalesTable being your shops sales table. If you then create a pivot table - drag ShopNo onto Rows and add new Measure (Excel 2010, in 2013 it's Calculated Field). The resulting table could then look like this:
To find out more about RANK function, I suggest this article.
In order to hide the rank value in Grand Total row, add a simple condition that puts blank values in case of grandtotals:
=IF(HASONEVALUE(SalesTable[ShopNo]), [RankShop], BLANK())
Hope this helps.

Resources