I'm trying to see the difference between the two periods for a column.
For example, we see that sales decreased at the end of the month, and we need to see which products were not sold at the end of the month?
I can create SELECT to see quantity for each product for each period:
SELECT product_id, count(product_id) AS Count
FROM testDB
WHERE
sales_date IS NOT NULL
AND
delivery_date BETWEEN '2021-02-01 00:00:03.0000000' AND '2021-02-14 23:56:00.0000000'
GROUP BY
product_id
and the same SELECT with another period:
delivery_date BETWEEN '2021-02-14 00:00:03.0000000' AND '2021-02-28 23:56:00.0000000'
So, after these queries I see list for first period with 10 products with quantity and in second period I see list with 7 products with quantity. I can't get the difference between the lists of the two SELECTs. I tried to use != and NOT IN but without any results.
I will be very grateful for your help. Thanks
Sorry for the confusion. I meant the difference between the two selects:
The result of the first one (for first period):
Product_ID Count
grapes. 100
lime. 13
lemon. 15
cherry. 222
blueberry. 123
banana. 1
apple. 123
watermelon 56
and second one (for second period):
Product_ID Count
grapes. 10
lime. 1
lemon. 10
cherry. 2
blueberry. 13
banana. 12
and I wand to see difference between these selects:
Product_ID Count
apple. 0
watermelon. 0
So we did not sell any apples and watermelons in second period.
SELECT product_id, count(product_id) AS Count,delivery_date-sales_date as DIFFERENCE
FROM testDB
WHERE
sales_date IS NOT NULL
AND
delivery_date BETWEEN '2021-02-01 00:00:03.0000000' AND '2021-02-14 23:56:00.0000000'
GROUP BY
product_id
This should work for getting the difference between the 2 period columns.
Related
Using MS SQL Server 2019
I have a set of recurring donation records. Each have a First Gift Date and a Last Gift Date associated with them. I need to add a GroupedID to these rows so that I can get the full date range for the earliest FirstGiftDate and the oldest LastGiftDate as long as there is not a break of more than 45 days in between the recurring donations.
For example Bob is a long time supporter. His card has expired multiple times and he has always started new gifts within 45 days. All of his gifts need to be given a single grouped ID. On the opposite side June has been donating and her card expires. She doesn't give again for 6 months, but then continues to give after her card expires. The first gift of Junes should get its own "GroupedID" and the second and third should be grouped together.The grouping count should restart with each donor.
My initial attempt was to join the donation table back to itself aliased as D2. This did work to give me an indicator of which ones were within the 45 day mark but I can't wrap my head around how to then link them. My only thought was to use LEAD and LAG to try analyze each scenario and figure out the different combinations of LEAD and LAG values needed to make it catch each different scenario, but that doesn't seem as reliable as scaleable as I'd like it to be.
I appreciate any help anyone can give.
My code:
SELECT #Donation.*, D2.*
FROM #Donation
LEFT JOIN #Donation D2 ON #Donation.RecurringGiftID <> D2.RecurringGiftID
AND #Donation.Donor = D2.Donor
AND ABS(DATEDIFF(DAY, #Donation.FirstGiftDate, D2.LastGiftDate)) < 45
Table structure and sample data:
CREATE TABLE #Donation
(
RecurringGiftID int,
Donor nvarchar(25),
FirstGiftDate date,
LastGiftDate date
)
INSERT INTO #Donation
VALUES (1, 'Bob', '2017-02-15', '2018-07-01'),
(15, 'Bob', '2018-08-05', '2019-04-01'),
(32, 'Bob', '2019-04-15', '2022-06-15'),
(54, 'June', '2015-05-01', '2016-05-01'),
(96, 'June', '2016-12-15', '2018-02-01'),
(120, 'June', '2018-03-04', '2020-07-01')
Desired output:
RecurringGiftId
Donor
FirstGiftDate
LastGiftDate
GroupedID
1
Bob
2017-02-15
2018-07-01
1
15
Bob
2018-08-05
2019-04-01
1
32
Bob
2019-04-15
2022-06-15
1
54
June
2015-05-01
2016-05-01
1
96
June
2016-12-15
2018-02-01
2
120
June
2018-03-04
2020-07-01
2
use LAG() to detect when current row is more than 45 days from previous and perform a cumulative sum to form the required Group ID
select *,
GroupedID = sum(g) over (partition by Donor order by FirstGiftDate)
from
(
select *,
g = case when datediff(day,
lag(LastGiftDate, 1, '19000101') over (partition by Donor
order by FirstGiftDate),
FirstGiftDate)
> 45
then 1
else 0
end
from #Donation
) d
The goal is to rank the Movies table according to quantity in the Inventory table such that for each duplicate value, it skips the subsequent value so that the next non-duplicate value remains in its rightful position. Display MovieID, Latest Title, Price, and the Rank.
WhileMovieId ‘1’ from Movies table corresponds to MovieId ‘101’ of your Movie inventory table and so on.
These are the tables
Movies
MovieId
latest title
Price
1
Breaking Dawn
200.00
2
The Proposal
185.00
3
Iron Man 2
180.00
4
Up
180.00
5
The Karate Kid
190.00
6
How to train your Dragon
190.00
7
Spiderman 3
195.00
Movie Inventory
MovieId
Quantity
101
3
105
4
107
5
108
7
110
8
111
4
And this is my attempt at the code that is showing a lot of NULL
SELECT CASE
WHEN Movies.MovieId + 100 = MovieInventory.MovieID
THEN CAST(MovieInventory.MovieID AS INT)
END AS 'MovieId',
Movies.LatestTitle, Movies.Price,
DENSE_RANK() OVER (ORDER BY Movies.MovieId DESC) AS [Rank]
FROM Movies, MovieInventory WHERE MovieInventory.MovieID IS NOT NULL
GO
This is what you need.
Notes:
You need RANK not DENSE_RANK to achieve the result you want
You need to order by Quantity
Use proper JOIN syntax, not comma , joins
Use table aliases for better readability
The foreign and primary key relationships are weird: mi.MovieID appears to be varchar but when converted to int is 100 more than m.MovieID???
The calculation in the SELECT is not accessible to the JOIN conditions
Don't use apostrophes '' to quote column names
SELECT
mi.MovieId,
m.LatestTitle,
m.Price,
RANK() OVER (ORDER BY mi.Quantity DESC) AS [Rank]
FROM Movies m
JOIN MovieInventory mi ON TRY_CAST(mi.MovieID AS int) = m.MovieID + 100;
I need to write a DAX statement which is somewhat complex from a conceptual/logical standpoint- so this might be hard to explain.
I have two tables.
On the first table (shown below) I have a list of numeric values (Wages). For each value I have a corresponding date range. I also have EmployeeID and FunctionID. The purpose of this table is to keep track of the hourly Wages paid to employees performing specific functions during specific date ranges. Each Function has it's own Wage on the Wage table, BUT each employee might get paid a different Wage for the same Function ( there is also a dimension for functions and employees ).
'Wages'
Wage StartDate EndDate EmployeeID FunctionID
20 1/1/2016 1/30/2016 3456 20
15 1/15/2016 2/12/2016 3456 22
27.5 1/20/2016 2/20/2016 7890 20
20 1/21/2016 2/10/2016 1234 19
On 'Table 2' I have a record for every day that an Employee worked a certain Function. Remember, Table 1 contains the Wage information for every function.
'Table 2'
Date EmployeeID FunctionID DailyWage
1/1/2016 1234 $20 =CALCULATE( SUMX( ??? ) )
1/2/2016 1234 $20 =CALCULATE( SUMX( ??? ) )
1/3/2016 1234 $22 see below
1/4/2016 1234 $22
1/1/2016 4567 $27
1/2/2016 4567 $27
1/3/2016 4567 $27
(Note that wages can change over time)
What I'm trying to do is create a Calculated Column on 'Table 2' called 'DailyWage'. I want every row on 'Table 2' to tell me how much the EmployeeID was paid for the full day (assuming an 8 hour workday).
I'm really struggling with the logic steps, so I'm not sure what the best way to do this calculation is...
To make things worse, an EmployeeID might get paid a different Wage for the same Function on a different Date. They might start out at one wage working function X and then generally, their wage should go up a few months in the future... That means that if I try to concatenate the EmployeeID and the FunctionID, I won't be able to connect the tables on the concatenated value because neither table will have unique values.
So in other words, if we CONCATENATE the EmployeeID and FunctionID into EmpFunID, we need to take the EmpFunID + the date for the current row and then say "take the EmpFunID in the current row, plus the date for the current row and then return the value from the Wage column on the Wages table that has the same EmpFunID AND has a StartDate less that the CurrentRowDate AND has an EndDate greater than the CurrentRowDate
HERE IS WHAT I HAVE SO FAR:
Step 1 = Filter 'Wages' table so that StartDate < CurrentRowDate
Step 2 = Filter 'Wages' table so that EndDate > CurrentRowDate
Step 3 = LOOKUPVALUE( 'Wages'[Wage], 'Wages'[EmpFunID], Table2[EmpFunID])
Now I just need that converted into a DAX function.
Not sure if got it totally right, but maybe something similar? If you put this into Table2 as a calculated column, it will transform the current row context of the Table2 into a filter context.
So SUMX will use the current row's data from Table2, and will do a sum on a filtered version of the wages table: wages table will be filtered by using the current date, employeeid and functionid from Table2, and for each row in the Table2 itt will only sum those wages, which are belong to the current row.
CALCULATE(
SUMX(
FILTER(
'Wages',
'Wages'[StartDate] >= 'Table2'[Date],
'Wages'[EndDate] <= 'Table2'[Date],
'Wages'[EmployeeId] = 'Table2'[EmployeeId],
'Wages'[FunctionId] = 'Table2'[FunctionId]
),
'Wages'[Wage]
)
This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.
I want to find the rows which are similar to each other, and replace them with a new row. My table looks like this:
OrderID | Price | Minimum Number | Maximum Number | Volume
1 45 2 10 250
2 46 2 10 250
3 60 2 10 250
"Similar" in this context means that the rows that have same Maximum Number, Minimum Number, and Volume. Prices can be different, but the difference can be at most 2.
In this example, orders with OrderID of 1 and 2 are similar, but 3 is not (since even if it has same Minimum Number, Maximum Number, and Volume, its price is not within 2 units from orders 1 and 2).
Then, I want orders 1 and 2 be replaced by a new order, let's say OrderID 4, which has same Minimum Number and Maximum Number. Its Volume hass to be sum of volumes of the orders it is replacing. Its price can be the Price of any of the previous orders that will be deleted in the output table (45 or 46 in this example). So, the output for the example above would be:
OrderID | Price | Minimum Number | Maximum Number | Volume
4 45 2 10 500
3 60 2 10 250
Here is a way to do this in SQL Server 2012 or Oracle. The idea is to use lag() to find where groups should begin and end and then aggregate.
select min(id) as id, min(price) as price, MinimumNumber, MaximumNumber, sum(Volume)
from (select t.*,
sum(case when prev_price < price - 2 then 1 else 0 end) over
(partition by MinimumNumber, MaximumNumber, Volume order by price) as grp
from (select t.*,
lag(price) over (partition by MinimumNumber, MaximumNumber, Volume
order by price
) as prev_price
from table t
) t
) t
group by grp, price, MinimumNumber, MaximumNumber;
The only issue is the setting of the id. I'm not sure what the exact rule is for that.