Tableau remove duplicates based on a condition - database

I am trying to remove duplicates from the Ticket field in my database but I want to remove the duplicates that have older dates. example,
Ticket | Date
MG17000 | 1/1/2017
MG17000 | 1/1/2018
MG17010 | 1/1/2018
so I want the answer to be
MG17000 | 1/1/2018
MG17010 | 1/1/2018
I used countd(Ticket) but it does not remove the right tickets(it removes the ticket that corresponds to 1/1/2018 instead of 1/1/2017). any suggestions on how to perform this task.
Thanks!

Try this:
Create formula [Rank - Date] with below code:
RANK_UNIQUE((MAX(SPLIT([database field],'|',2))))
//This will create a values for every ticket
Now one more formula to filter only date with max value and drag to filter and select True
[Rank - Date]=1
You should be able to get required data

Use a level-of-detail (LOD) calculation. Create the calculation with this formula and it will give you the number of records per ticket, regardless of what dimensions you have on rows and shelves.
{FIXED [ticket] : count([date])}
If you have any date filtering and you want the calculation to count tickets outside the date filter range, switch FIXED to INCLUDE.
Drag that as one of you measures. Then use the max([date]) to show the most recent date.
From the sample data you showed in the question, you will see something like
MG17000 1/1/2018 2
MG17010 1/1/2018 1

Related

Can I create a Running Totals Calculated Column in Ag Grid

I want to create a new Custom Column in AG grid which will display the calculated value of another column together with the value of the column in the previous row.
We have created lots of calculated columns in AdapTable but i cannot work out how do this.
In our example we have a Price and a Date Column and a Running Price Calculated Column.
For the row where Date is Today, I want to the value in the Running Price column to be 'Price' in this Row plus whatever the Running Price value is in the Row where Date is Yesterday.
And for yesterday's row I want Running Price to include the value for 2 days ago. And so on.
Perhaps this example will help explain:
Price | Date | Running Price
5 | 2 Days Ago | 10
7 | Yesterday | 17
9 | Today | 26
If I can do this without needing to sort AG Grid on the Date column then even better as my users like to do their own sorts and I don't want it to break the running total.
Yes, this can be done fairly easily in AdapTable.
You need to use what it calls an AggregatedScalarQuery.
Assuming that the columns in your grid are called 'Price', and 'MyDate' then the Expression for the 'RunningPrice' Calculated Column will be something like:
CUMUL(SUM([Price]), OVER([MyDate]))
See more at: https://docs.adaptabletools.com/guide/adaptable-ql-expression-aggregation-scalar#cumulative-aggregation
Edit: I should add that you dont need to sort the 'MyDate' column as per your initial message, since OVER will run over the dates in natural sort order. So your users can continue to sort AG Grid how they like without it affecting your Calculated Column.

Calculate daily targets based on monthly targets Sales Power bi

I had the following question which I just can't wrap my head around it to do it in a neat way.
I want to create a line graph with three lines. We call it a budget snake.
Created sales orders (black)
Invoiced orders (green)
Daily targets (red)
This per salesperson. The creation of this graph for the created and invoiced orders is easy as these are all on a daily granularity so creating the line graph is easy.
I just struggle how to create/generate such a line for the targets.
In this case, I manually created a table with date - salesperson - daily target
Eg.
Which is very cumbersome. What I would like to be able to do is
create a table on a monthly level for each salesperson and that
PowerBI can "generate/calculate" the daily target in such a way that
I can graph the red line without all the hassle of creating it for
each salesperson manually.
The input would be something like this
+-----------+----------+--------------+--------+----------------+--------------+---------------+
| Date | Month | Salesperson | Branch | Monthly Target | Daily Target | Business days |
+-----------+----------+--------------+--------+----------------+--------------+---------------+
| 1/01/2017 | January | salesperson1 | test | 73529 | 4325 | 17 |
| 1/02/2017 | February | salesperson1 | test | 73529 | 4325 | 20 |
+-----------+----------+--------------+--------+----------------+--------------+---------------+
I have a date dimension table so on my graph I have the date as the x-axis and then the runningorders/runningsales as the y-axis but I would something like a daily runningtarget so that the red line is nicely going with the orders and sales.
I had a look at this pattern but I just cannot figure out how this can generate
a line graph.
https://www.daxpatterns.com/budget-patterns/
So somehow, I guess I would need something which generates this first table with the second table as input. I tried some measures in Dax but none of them give me the cumulative steps for each day. It mostly just shows me the value.
These are the measures I use for the other lines. This works nicely when changing the date filters.
Running sales
RunningTotalSales = CALCULATE(sum(vw_invoice_trn_summary[NetInvoiceValue]),
FILTER(ALLSELECTED(DimTime),DimTime[Date] <= MAX(DimTime[Date])))
Running orders
RunningTotalOrders = CALCULATE(sum(vw_orders_raised[OrderTotal]),FILTER(ALLSELECTED(DimTime),DimTime[Date] <= MAX(DimTime[Date])))
In my current manual solution, the full year though does not work well with the targets line as I am not sure I do it right.
UPDATE
So thinking further about this. It feels like I just need to be able to create a table with a date - daily target - salesperson. based on the monthly targets but not sure how you can do that in power bi. Ideally, you can just add/remove a salesperson and that specific table gets regenerated.
I have two solutions to this. One using DAX and one using the query editor.
DAX Solution:
1. Create a calendar table that has all the dates you need.
If Targets is the table containing your monthly targets, create a new table using a formula like this:
Calendar = CALENDAR(EOMONTH(MIN(Targets[Date]),-1)+1,EOMONTH(MAX(Targets[Date]),0))
2. Create a new table DailyTargets as a cross join of your dates and salespersons.
The CROSSJOIN function creates a row for each date and salesperson combination:
DailyTargets = CROSSJOIN(VALUES('Calendar'[Date]),VALUES(Targets[Salesperson]))
3. Create a calculated column for your daily targets.
I do this by looking up the monthly target and dividing by the number of days in the month:
DailyTarget = DIVIDE(
LOOKUPVALUE(Targets[MonthlyTarget],
Targets[Month], FORMAT(DailyTargets[Date],"mmmm"),
Targets[Salesperson], DailyTargets[Salesperson]),
DAY(EOMONTH(DailyTargets[Date],0)))
Now you have a daily target for each date and each salesperson.
PowerQuery Solution:
1. Create a calendar table that has all the dates you need.
Create a blank query and use the following code:
= List.Dates(List.Min(Targets[Date]),
Duration.Days(Date.EndOfMonth(List.Max(Targets[Date]))
- List.Min(Targets[Date])) + 1,
#duration(1,0,0,0))
2. Convert this list to a table.
Click on "To Table" under the Transform tab and rename the column from "Column1" to "Date".
3. Create a custom column for the month name.
You can use the formulaDate.MonthName([Date]) for this.
4. Merge this query with the Targets table (joining on the Month columns).
5. After merging, expand the Salesperson and MonthlyTarget columns.
6. Create the daily target by dividing the monthly target by the number of days in the month.
You can use the formula [MonthlyTarget]/Date.DaysInMonth([Date]) for this.
The entire query should look like this:
let
Source = List.Dates(List.Min(Targets[Date]), Duration.Days(Date.EndOfMonth(List.Max(Targets[Date])) - List.Min(Targets[Date])) + 1, #duration(1,0,0,0)),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Date"}}),
#"Added Custom" = Table.AddColumn(#"Renamed Columns", "Month", each Date.MonthName([Date])),
#"Merged Queries" = Table.NestedJoin(#"Added Custom",{"Month"},Targets,{"Month"},"Targets",JoinKind.LeftOuter),
#"Expanded Targets" = Table.ExpandTableColumn(#"Merged Queries", "Targets", {"Salesperson", "MonthlyTarget"}, {"Salesperson", "MonthlyTarget"}),
#"Added Custom1" = Table.AddColumn(#"Expanded Targets", "DailyTarget", each [MonthlyTarget]/Date.DaysInMonth([Date]))
in
#"Added Custom1"
Instead of going step by step, you can just paste this into the Advanced Editor if you'd like. (Just be sure you use whatever table name you have instead of Targets.)
I have a much simpler solution for this:
I just need to be able to create a table with a date - daily target - salesperson. based on the monthly targets
Let's say I have a table like this:
Where Month contains the date for the first day of each month. We then add a custom column "Date" using the query editor menu (Add Column > Custom Column). We paste this formula for our new column:
= List.Dates([Month], Date.Day(Date.EndOfMonth([Month])), #duration(1, 0, 0, 0))
Each row of the new column will contain a list of all dates within that row's month. Expand that column by clicking on the button on the top right corner and choosing "Expand to new rows".
Now you have a row for each day, and you can simply add another custom column, "Daily Target", that divides the monthly target by the number of days in each month:
= [Monthly Target]/Date.DaysInMonth([Date])
And your table is ready:

Matching Duplicate values with unique attributes in a horizontal excel spreadsheet

Hopefully someone has had my problem before. I'm in the process of building an Excel model that sorts the prices that a certain product was sold for and the sales associated with that price. One spreadsheet houses the data and another sorts that data by sales and then matches the price that it sold for.
The problem is that there are cases where the number of sales are the same but the prices are different. In these cases, the first price is duplicated by the when the number of sales are the same. See below for a visual. I've looked tirelessly for a solution but because the formula needs to be designed horizontal
This sales volume sorting formula =IFERROR(LARGE('2016 Data Tab '!$B3:$BY3,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76}),"")
this formula matches the price with the sales. This is where I'm having the problem =IFERROR(INDEX(DataTableLanes16,$A3*$C$1,MATCH('2016 Input Lanes '!C3,'2016 Data Tab '!$A3:$BY3,0)),"")
See the pictures below:
This is where the data is housed:
This is where the data is sorted by sales:
Thanks in advance for your assistance.
James
It is certainly possible to just sort the data. Excel can sort left to right.
If you MUST use formulas, you need to calculate the rank of each sales number in a manner that accounts for duplicates.
You can add a "helper row". Name it SalesRank
You can then use this formula in (as in the screenshot), B4
B4: =RANK(B3,Sales,0)+COUNTIF($B$3:B3,B3)-1
and fill right
For the desired results, we first list the sales in descending order:
B9: =LARGE(Sales,COLUMNS($A:A))
and fill right
And then, for the associated prices:
B10: =INDEX(Price,1,MATCH(COLUMNS($A:A),SalesRank,0))
In the above formulas:
| Price | Refers to: | =Sheet3!$B$2:$K$2 |
| Sales | Refers to: | =Sheet3!$B$3:$K$3 |
| SalesRank | Refers to: | =Sheet3!$B$4:$K$4 |

Best way to store quarter and year in SQL Server?

What would be the best way to store Quarter and Year in database? I have payments table and I need to assign quarter/year so that it's easy to tell for which quarter the payment was made.
I was thinking on:
a) adding two int columns to each payment
b) adding another table and add possible values up to 5 years ahead and use the ID to join that table with payments one.
What are other options? Maybe some is better or/and easier to maintain. This database will be used with C# program.
If you have to use separate year and quarter instead of a date (since you seem to have specific reporting requirements), I would go for a tinyint for quarter and smallint for year and store them in the PAYMENT table itself.
I would not store it in a different table. This is bad since:
You have to make sure you have produced enough years/quarters
You have to join and use a foreign key
If you store the data with the record, it will help performance on reads. Your table could be small but it is always good to keep in mind performance.
WHY
Let's imagine you need to get
all payments in specific quarter
where payment has been more than
specific amount and customer is a
particular customer
In this case, you would need a covering index on all items and still does not help since your query is for specific quarter and not quarter year. Having the data on the table, however, will help with lighter execution plan.
I've always just used datetime value with the 1st of January/April/July/October representing each quarter. Makes computation of the start/end dates of the quarter simple:
Start Date: the datetime column itself.
End Data: dateadd(month,3,quarterColumn)
Another alternative would be ISO 8601. Here's an ISO 8601 profile for use in Internet protocols: RFC 3339 (proposed standard).
An ISO 8601 representation of each quarter of the year 2011 looks like this:
2011-01-01/P3M
2011-04-01/P3M
2011-07-01/P3M
2011-10-01/P3M
The above specify a duration by starting date and duration (in this case, 3 months).
The advantage of ISO 8601 date/time formats is that the strings are (A) human readable, (B) they collate properly, (C) they're easy to parse and (D) its an international standard.
Some people "extend" ISO 8601's week notation, where a week of the year looks like 2011W32 (the 32nd week of 2011), to a quarter notation. Using this unofficial extension, the quarters of the year 2011 looks like:
2011Q1
2011Q2
2011Q3
2011Q4
How about using computed columns based on the payment date? I'd rather do this than have both a date and quarter/year that might get out of sync. On the other hand, I suppose it's possible that you may need the ability to have a different year/quarter than the date indicates in which case you'd need to keep them separate. I'd at least think about using computed columns though as that seems the best way to ensure integrity.
For something so simple, I would just keep 2 int columns, and to build up the (pivotal) dates using dateadd when required to use date ranges.
Another option is a single date column, for which you can store the first day in the quarter, so the 4 dates in a year would be 1-Jan, 1-Apr, 1-Jul, 1-Oct. You can extract the quarter, year easily using datepart Q and Y.
How about two ints, one for the year, and one for the quarter (1-4). Is that what you meant by option "a"?
Option "b" would work, but you have to remember to maintain the table every year or so.
I agree two ints are fine.
I would add an index consisting of both columns in case you need to sort or filter by year and quarter.
You could even use a single tinyint. It's enough for storing in the form YYQ,like 111, 112,113,114, 121...for a few years.
Storing quarter and year in database depends on how your payment data is being organized. Examples would be; how many different payment values are being inserted. Will the quarter/year ranges vary? etc.
One good technique for "defining" a quarter/year range is making a separate table with a "DateTime" field that identifies a quarter. You don't need to join the table, you just need to do programming in C# to figure out if the range falls within a particular pay quarter.
For example:
Table 1: Payments
-----------------
paymentID (int)
paymentAmount (double(7,2))
paymentDateTime (DateTime)
Table 2: QuarterYear
--------------------
quarterYearID (int)
dateFrom (date)
dateTo (date)
quarter (tinyint)
description (varchar)
Example Data
paymentID | paymentAmount | paymentDateTime
------------------------------------------------
1 | 20.24 | 2011-04-18 08:14:20
2 | 34.15 | 2011-04-19 07:42:15
3 | 51.87 | 2011-04-20 13:04:22
quarterYearID | dateFrom | dateTo | quarter | description
-----------------------------------------------------------------
1 | 2011-01-01 | 2011-03-31 | 1 | first quarter
2 | 2011-04-01 | 2011-06-30 | 2 | second quarter
3 | 2011-07-01 | 2011-09-31 | 3 | third quarter
4 | 2011-10-01 | 2011-12-31 | 4 | forth quarter
Example Query for getting all payments for "Quarter 2"
dateValue is a dynamically pulled variable from the payments table. C# will handle 'dateValue' value.
SELECT quarter FROM QuarterYear WHERE cast('dateValue' AS date) BETWEEN dateFrom AND dateTo;

Storing occurrences for reporting

What is the best way to store occurrences of an event in a database so you can quickly pull reports on it? ie (total number of occurrences, number of occurrences between date range).
right now I have two database tables, one which holds all individual timestamps of the event - so I can query on a date range, and one which holds a total count so I can quickly pull that number for a tally
Table 1:
Event | Total_Count
------+------------
bar | 1
foo | 3
Table 2:
Event | Timestamp
------+----------
bar | 1/1/2010
foo | 1/1/2010
foo | 1/2/2010
foo | 1/2/2010
Is there a better approach to this problem? I'm thinking of converting Table 2, to hold date tallies, it should be more efficient, since my date range queries are only done on whole dates, not a timestamp (1/1/2010 vs 1/1/2010 00:01:12)
ie:
Updated Table 2
Event | Date | Total_Count
------+----------+------------
bar | 1/1/2010 | 1
foo | 1/1/2010 | 1
foo | 1/2/2010 | 2
Perhaps theres an even smarter way to tackle this problem? any ideas?
Your approach seems good. I see table 2 more as a detail table, while table 1 as a summary table. For the most part, you would be doing inserts only to table 2, and inserts and updates on table 1.
The updated table 2 may not give you much additional benefit. However, you should consider it if aggregations by day is most important to you.
You may consider adding more attributes (columns) to the tables. For example, you could add a first_date, and last date to table 1.
I would just have the one table with the timestamp of your event(s). Then your reporting is simply setting up your where clause correctly...
Or am I missing something in your question?
Seems like you don't really have any requirements:
Changing from timestamp to just the date portion is a big deal.
You don't ever want to do a time-of-day analysis?
like what's the best time of day to do maintenance if that stops "foo" from happening.
And you're not worried about size? You say you have millions of records (like that's a lot) and then you extend every single row by an extra column. One column isn't a lot until the row count skyrockets and then you really have to think about each column.
So to get the sum of event for the last 3 days you'd rather do this
SELECT SUM(totcnt) FROM (
SELECT MAX(Total_count) as totcnt from table where date = today and event = 'Foo'
UNION ALL
SELECT MAX(Total_count) from table where date = today-1 and event = 'Foo'
UNION ALL
SELECT MAX(Total_count) from table where date = today-2 and event = 'Foo'
)
Yeah, that looks much easier than>
SELECT COUNT(*) FROM table WHERE DATE BETWEEN today-2 and today and event = 'foo'
And think about the trigger it would take to add a row... get the max for that day and event and add one... every time you insert?
Not sure what kind of server you have but I summed 1 Million rows in 285ms. So... how many millions will you have and how many times do you need to sum them and is each time for the same date range or completely random?

Resources