DAX Calculated column based on two columns from other table - data-modeling

I need to write a DAX statement which is somewhat complex from a conceptual/logical standpoint- so this might be hard to explain.
I have two tables.
On the first table (shown below) I have a list of numeric values (Wages). For each value I have a corresponding date range. I also have EmployeeID and FunctionID. The purpose of this table is to keep track of the hourly Wages paid to employees performing specific functions during specific date ranges. Each Function has it's own Wage on the Wage table, BUT each employee might get paid a different Wage for the same Function ( there is also a dimension for functions and employees ).
'Wages'
Wage StartDate EndDate EmployeeID FunctionID
20 1/1/2016 1/30/2016 3456 20
15 1/15/2016 2/12/2016 3456 22
27.5 1/20/2016 2/20/2016 7890 20
20 1/21/2016 2/10/2016 1234 19
On 'Table 2' I have a record for every day that an Employee worked a certain Function. Remember, Table 1 contains the Wage information for every function.
'Table 2'
Date EmployeeID FunctionID DailyWage
1/1/2016 1234 $20 =CALCULATE( SUMX( ??? ) )
1/2/2016 1234 $20 =CALCULATE( SUMX( ??? ) )
1/3/2016 1234 $22 see below
1/4/2016 1234 $22
1/1/2016 4567 $27
1/2/2016 4567 $27
1/3/2016 4567 $27
(Note that wages can change over time)
What I'm trying to do is create a Calculated Column on 'Table 2' called 'DailyWage'. I want every row on 'Table 2' to tell me how much the EmployeeID was paid for the full day (assuming an 8 hour workday).
I'm really struggling with the logic steps, so I'm not sure what the best way to do this calculation is...
To make things worse, an EmployeeID might get paid a different Wage for the same Function on a different Date. They might start out at one wage working function X and then generally, their wage should go up a few months in the future... That means that if I try to concatenate the EmployeeID and the FunctionID, I won't be able to connect the tables on the concatenated value because neither table will have unique values.
So in other words, if we CONCATENATE the EmployeeID and FunctionID into EmpFunID, we need to take the EmpFunID + the date for the current row and then say "take the EmpFunID in the current row, plus the date for the current row and then return the value from the Wage column on the Wages table that has the same EmpFunID AND has a StartDate less that the CurrentRowDate AND has an EndDate greater than the CurrentRowDate
HERE IS WHAT I HAVE SO FAR:
Step 1 = Filter 'Wages' table so that StartDate < CurrentRowDate
Step 2 = Filter 'Wages' table so that EndDate > CurrentRowDate
Step 3 = LOOKUPVALUE( 'Wages'[Wage], 'Wages'[EmpFunID], Table2[EmpFunID])
Now I just need that converted into a DAX function.

Not sure if got it totally right, but maybe something similar? If you put this into Table2 as a calculated column, it will transform the current row context of the Table2 into a filter context.
So SUMX will use the current row's data from Table2, and will do a sum on a filtered version of the wages table: wages table will be filtered by using the current date, employeeid and functionid from Table2, and for each row in the Table2 itt will only sum those wages, which are belong to the current row.
CALCULATE(
SUMX(
FILTER(
'Wages',
'Wages'[StartDate] >= 'Table2'[Date],
'Wages'[EndDate] <= 'Table2'[Date],
'Wages'[EmployeeId] = 'Table2'[EmployeeId],
'Wages'[FunctionId] = 'Table2'[FunctionId]
),
'Wages'[Wage]
)

Related

How to collect all deference in rows between two periods?

I'm trying to see the difference between the two periods for a column.
For example, we see that sales decreased at the end of the month, and we need to see which products were not sold at the end of the month?
I can create SELECT to see quantity for each product for each period:
SELECT product_id, count(product_id) AS Count
FROM testDB
WHERE
sales_date IS NOT NULL
AND
delivery_date BETWEEN '2021-02-01 00:00:03.0000000' AND '2021-02-14 23:56:00.0000000'
GROUP BY
product_id
and the same SELECT with another period:
delivery_date BETWEEN '2021-02-14 00:00:03.0000000' AND '2021-02-28 23:56:00.0000000'
So, after these queries I see list for first period with 10 products with quantity and in second period I see list with 7 products with quantity. I can't get the difference between the lists of the two SELECTs. I tried to use != and NOT IN but without any results.
I will be very grateful for your help. Thanks
Sorry for the confusion. I meant the difference between the two selects:
The result of the first one (for first period):
Product_ID Count
grapes. 100
lime. 13
lemon. 15
cherry. 222
blueberry. 123
banana. 1
apple. 123
watermelon 56
and second one (for second period):
Product_ID Count
grapes. 10
lime. 1
lemon. 10
cherry. 2
blueberry. 13
banana. 12
and I wand to see difference between these selects:
Product_ID Count
apple. 0
watermelon. 0
So we did not sell any apples and watermelons in second period.
SELECT product_id, count(product_id) AS Count,delivery_date-sales_date as DIFFERENCE
FROM testDB
WHERE
sales_date IS NOT NULL
AND
delivery_date BETWEEN '2021-02-01 00:00:03.0000000' AND '2021-02-14 23:56:00.0000000'
GROUP BY
product_id
This should work for getting the difference between the 2 period columns.

Compare the dates, compute the difference between them, postgres

I have a date column and a balance column for each user. Every time user makes a transaction, a new row gets added to this table. It could be that the user makes 15 transactions during the day, and no transaction at all during 5 days.
Like this one
date balance
2017-06-01 95.63
2017-06-01 97.13
2017-06-01 72.14
2017-06-06 45.04
2017-06-08 20.04
2017-06-09 10.63
2017-06-09 -29.37
2017-06-09 -51.35
2017-06-13 -107.55
2017-06-13 -101.35
2017-06-15 -157.55
2017-06-16 -159.55
2017-06-17 -161.55
The goal is to select the positive and negative transactions made during the same day, compute their average or min value and to consider it as one transaction.If the next day no transaction has been made, then the amount of the previous day should be used.
it means for each day in a month i should calculate an interest and it the balance has not been updated then the balance of the previous day should be used.
Hypothetically my table should look like
date balance
1/6/2017 72.14
6/2/2017 72.14
6/3/2017 72.14
6/4/2017 72.14
6/5/2017 72.14
6/6/2017 45.04
7/6/2017 45.04
8/6/2017 20.04
9/6/2017 -51.35
10/6/2017 -51.35
11/6/2017 -51.35
12/6/2017 -51.35
13/06/2017 -107.55
14/06/2017 -107.55
15/06/2017 -157.55
16/06/2017 -159.55
17/06/2017 -161.55
i have added those days that were missing and group the days that were duplicate.
Once I have this done, I can select the number of positive balance days, e.g. 8 days, compute the average positive balance, and multiply it by 0.4%.
8*58.8525*0.004=0.23
The same should be done with negative balance. but with a different interest rate number of negative balance days, e.g. 9 multiplied by average negative balance during those days and 8.49%.
9*-99.90555556*0.00849=-0.848
So my expected result is just to have these two columns
Neg Pos
-0.848 0.23
How can I do that it in postgres? The function OVERLAP does not really help since I need to specify the dates.
Besides i do not know how to
loop the days and to see if there is a duplicate.
See which days are missing and use the previous balance for each of these missing days.
please try this.. replace table with your table name
with cte as
(
Select "date" as date
,min(balance) as balance
,lead("date") over(order by "date") next_date
,Coalesce(ABS("date" - lead("date") over(order by "date")),1) date_diff
from table
group by "date"
),
cte2 as
(
Select date_diff*balance as tot_bal , date_diff
from cte
Where balance > 0
),
cte3 as
(
Select date_diff*balance as tot_bal , date_diff
from cte
Where balance < 0
)
Select (sum(cte2.tot_bal) / sum(cte2.date_diff) ) * 0.004 as pos
,(sum(cte3.tot_bal) / sum(cte3.date_diff) ) * 0.00849 as neg
from cte2
,cte3;

Populating a list of dates without a defined end date - SQL server

I have a list of accounts and their cost which changes every few days.
In this list I only have the start date every time the cost updates to a new one, but no column for the end date.
Meaning, I need to populate a list of dates when the end date for a specific account and cost, should be deduced as the start date of the same account with a new cost.
More or less like that:
Account start date cost
one 1/1/2016 100$
two 1/1/2016 150$
one 4/1/2016 200$
two 3/1/2016 200$
And the result I need would be:
Account date cost
one 1/1/2016 100$
one 2/1/2016 100$
one 3/1/2016 100$
one 4/1/2016 200$
two 1/1/2016 150$
two 2/1/2016 150$
two 3/1/2016 200$
For example, if the cost changed in the middle of the month, than the sample data will only hold two records (one per each unique combination of account-start date-cost), while the results will hold 30 records with the cost for each and every day of the month (15 for the first cost and 15 for the second one). The costs are a given, and no need to calculate them (inserted manually).
Note the result contains more records because the sample data shows only a start date and an updated cost for that account, as of that date. While the results show the cost for every day of the month.
Any ideas?
Solution is a bit long.
I added an extra date for test purposes:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500) -- extra row
;WITH CTE as
( SELECT
row_number() over (partition by account order by startdate) rn,
*
FROM #t
),N(N)AS
(
SELECT 1 FROM(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))M(N)
),
tally(N) AS -- tally is limited to 1000 days
(
SELECT ROW_NUMBER()OVER(ORDER BY N.N) - 1 FROM N,N a,N b
),GROUPED as
(
SELECT
cte.account, cte.startdate, cte.cost, cte2.cost cost2, cte2.startdate enddate
FROM CTE
JOIN CTE CTE2
ON CTE.account = CTE2.account
and CTE.rn = CTE2.rn - 1
)
-- used DISTINCT to avoid overlapping dates
SELECT DISTINCT
CASE WHEN datediff(d, startdate,enddate) = N THEN cost2 ELSE cost END cost,
dateadd(d, N, startdate) startdate,
account
FROM grouped
JOIN tally
ON datediff(d, startdate,enddate) >= N
Result:
cost startdate account
100 2016-01-01 one
100 2016-01-02 one
100 2016-01-03 one
150 2016-01-01 two
150 2016-01-02 two
200 2016-01-03 two
200 2016-01-04 one
200 2016-01-04 two
200 2016-01-05 two
500 2016-01-06 two
Thank you #t-clausen.dk!
It didn't solve the problem completely, but did direct me in the correct way.
Eventually I used the LEAD function to generate an end date for every cost per account, and then I was able to populate a list of dates based on that idea.
Here's how I generate the end dates:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500)
select account
,[startdate]
,DATEADD(DAY, -1, LEAD([Startdate], 1,'2100-01-01') OVER (PARTITION BY account ORDER BY [Startdate] ASC)) AS enddate
,cost
from #t
It returned the expected result:
account startdate enddate cost
one 2016-01-01 2016-01-03 100
one 2016-01-04 2099-12-31 200
two 2016-01-01 2016-01-02 150
two 2016-01-03 2016-01-05 200
two 2016-01-06 2099-12-31 500
Please note that I set the end date of current costs to be some date in the far future which means (for me) that they are currently active.

sql sum a column and also return the last time stamp

I've got a table in SQL Server with several columns. The relevant ones are:
name
distance
create_date
I have many people identified by name, and every few days they travel a certain distance. For example:
name distance create_date
john 15 09/12/2014
john 20 09/22/2014
alex 10 08/15/2014
alex 12 09/05/2014
john 8 09/30/2014
alex 30 09/12/2014
What i would like is a query that for each person returns the sum of distance between two dates, and the create_date of the last entry during that date range, ordered by highest distance DESC. For example, given a date range of 08/01/2014 to 09/25/2014 I would expect this:
name distance create_date
alex 52 09/12/2014
john 35 09/22/2014
I thought of trying to do this with a SUM query with a sub query to get the newest date in the range but I think this is not efficient.
Does someone have an idea for this?
Thank you!
SELECT name,
SUM(distance) AS distance,
MAX(create_date) AS create_date
FROM Table
WHERE create_date >= '20140801' AND create_date < '20140925'
GROUP BY name
SQL Fiddle
You can use simple sum and max functions for this.
SELECT name,
SUM(distance) AS distance,
MAX(create_date) AS create_date
FROM theTable
WHERE create_date >= #startDate AND create_date < #endDate
GROUP BY name
ORDER BY distance DESC

SQL Pivot question

I'm having a hard time getting my head around a query im trying to build with SQL Server 2005.
I have a table, lets call its sales:
SaleId (int) (pk) EmployeeId (int) SaleDate(datetime)
I want to produce a report listing the total number of sales by an employee for each day in a given data range.
So, for example I want the see all sales in December 1st 2009 - December 31st 2009 with an output like:
EmployeeId Dec1 Dec2 Dec3 Dec4
1 10 10 1 20
2 25 10 2 2
..etc however the dates need to be flexible.
I've messed around with using pivot but cant quite seem to get it, any ideas welcome!
Here's a complete example. You can change the date range to fit your needs.
use sandbox;
create table sales (SaleId int primary key, EmployeeId int, SaleAmt float, SaleDate date);
insert into sales values (1,1,10,'2009-12-1');
insert into sales values (2,1,10,'2009-12-2');
insert into sales values (3,1,1,'2009-12-3');
insert into sales values (4,1,20,'2009-12-4');
insert into sales values (5,2,25,'2009-12-1');
insert into sales values (6,2,10,'2009-12-2');
insert into sales values (7,2,2,'2009-12-3');
insert into sales values (8,2,2,'2009-12-4');
SELECT * FROM
(SELECT EmployeeID, DATEPART(d, SaleDate) SaleDay, SaleAmt
FROM sales
WHERE SaleDate between '20091201' and '20091204'
) src
PIVOT (SUM(SaleAmt) FOR SaleDay
IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])) AS pvt;
Results (actually 31 columns (for all possible month days) will be listed, but I'm just showing first 4):
EmployeeID 1 2 3 4
1 10 10 1 20
2 25 10 2 2
I tinkered a bit, and I think this is how you can do it with PIVOT:
select employeeid
, [2009/12/01] as Dec1
, [2009/12/02] as Dec2
, [2009/12/03] as Dec3
, [2009/12/04] as Dec4
from sales pivot (
count(saleid)
for saledate
in ([2009/12/01],[2009/12/02],[2009/12/03],[2009/12/04])
) as pvt
(this is my table:
CREATE TABLE [dbo].[sales](
[saleid] [int] NULL,
[employeeid] [int] NULL,
[saledate] [date] NULL
data is: 10 rows for '2009/12/01' for emp1, 25 rows for '2009/12/01' for emp2, 10 rows for '2009/12/02' for emp1, etc.)
Now, i must say, this is the first time I used PIVOT and perhaps I am not grasping it, but this seems pretty useless to me. I mean, what good is it to have a crosstab if you cannot do anything to specify the columns dynamically?
EDIT: ok- dcp's answer does it. The trick is, you don't have to explicitly name the columns in the SELECT list, * will actually correctly expand to a column for the first 'unpivoted' column, and a dynamically generated column for each value that appears in the FOR..IN clause in the PIVOT construct.

Resources