UNION & ORDER two tables inside Common Table Expression - sql-server

I have a CTE inside a SQL Stored Procedure that is UNIONing values from two databases - the values are customer numbers and that customer's last order date.
Here is the original SQL -
;WITH CTE_last_order_date AS
(
SELECT c1.customer ,MAX(s2.dt_created) AS last_order_date
FROM customers c1 WITH (NOLOCK)
LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer
GROUP BY c1.customer
UNION ALL
SELECT c1.customer ,MAX(s1.dt_created) AS last_order_date
FROM customers c1 WITH (NOLOCK)
LEFT JOIN orders s1 WITH (NOLOCK)
ON c1.customer = s1.customer
GROUP BY c1.customer
)
Example Results:
customer, last_order_date
CF122595, 2011-11-15 15:30:22.000
CF122595, 2016-08-15 10:01:51.230
(2 row(s) affected)
This obviously doesn't apply the UNION distinct records rule because the date values are not matched, meaning SQL returned the max value from both tables (i.e. the final record set was not distinct)
To try and get around this, I tried another method borrowed from this question and implemented grouping:
;WITH CTE_last_order_date AS
(
SELECT max(last_order_date) as 'last_order_date', customer
FROM (
SELECT distinct cust.customer, max(s2.dt_created) AS last_order_date, '2' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer
GROUP BY c1.customer
UNION
SELECT distinct c1.customer, max(sord.dt_created) AS last_order_date, '1' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN orders s1 WITH (NOLOCK)
ON cust.customer = sord.customer
GROUP BY
c1.customer
) AS t
GROUP BY customer
ORDER BY MIN('group'), customer
)
Example Results:
customer, last_order_date
CF122595, 2016-08-15 10:01:51.230
(1 row(s) affected)
This had the distinction (hah) of working fine, up until clattering into the rule that prevents ORDER BY inside Common Table Expressions, which is needed in order to pick the lowest group (which would imply Live orders (group 1), whose date needs to take precedence over the Archive (group 2)).
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
All help or ideas appreciated.

Rather than grouping, then unioning, then grouping again, why not union the orders tables and work from there:
SELECT c1.customer ,MAX(s2.dt_created) AS last_order_date
FROM customers c1
INNER JOIN (select customer, dt_created from archive_orders
union all select customer, dt_created from orders) s2
ON c1.customer = s2.customer
GROUP BY c1.customer
Remember, in SQL your job is to tell the system what you want, not what steps/procedure to follow to get those results. The above, logically, describes what we're wanting - we want the last order date from each customer's orders, and we don't care whether that was an archived order or a non-archived one.
Since we're going to reduce the order information down to a single row (per customer) during the GROUP BY behaviour anyway, we don't also need the UNION to remove duplicates so I've switched to UNION ALL.
(I confess, I couldn't really see what the ORDER BY was supposed to be adding to the mix at this point so I've not tried to include it here. If this is going into a CTE, then reflect on the fact that CTEs, just like tables and views, have no inherent order. The only ORDER BY clause that affects the ordering of result rows is the one applied to the outermost/final SELECT)
Giving orders precedence over archived_orders:
;With CTE1 as (
SELECT c1.customer,group,MAX(s2.dt_created) as MaxInGroup
FROM customers c1
INNER JOIN (select customer, dt_created,2 as group from archive_orders
union all select customer, dt_created,1 from orders) s2
ON c1.customer = s2.customer
GROUP BY c1.customer,group
), CTE2 as (
SELECT *,ROW_NUMBER() OVER (PARTITION BY customer ORDER BY group) as rn
from CTE2
)
select * from CTE2 where rn = 1

An alternative approach could be to only get the customer from the archive table where we do not have a current one. Something like:
WITH CurrentLastOrders(customer, last_order_date) AS -- Get current last orders
(
SELECT o.customer, max(o.dt_created) AS last_order_date
FROM orders s WITH (NOLOCK) ON c.customer = o.customer
GROUP BY o.customer
),
ArchiveLastOrders(customer, last_order_date) AS -- Get archived last orders where customer does not have a current order
(
SELECT o.customer, max(o.dt_created) AS last_order_date
FROM archive_orders o WITH (NOLOCK)
WHERE NOT EXISTS ( SELECT *
FROM CurrentLastOrders lo
WHERE o.customer = lo.customer)
GROUP BY o.customer
),
AllLastOrders(customer, last_order_date) AS -- All customers with orders
(
SELECT customer, last_order_date
FROM CurrentLastOrders
UNION ALL
SELECT customer, last_order_date
FROM ArchiveLastOrders
)
AllLastOrdersPlusCustomersWithNoOrders(customer, last_order_date) AS -- All customerswith latest order if they have one
(
SELECT customer, last_order_date
FROM AllLastOrders
UNION ALL
SELECT customer, null
FROM customers c WITH (NOLOCK)
WHERE NOT EXISTS ( SELECT *
FROM AllLastOrders lo
WHERE c.customer = lo.customer)
)

I wouldn't try to nest SQL to achive a distinct result set, it's the same logic of grouping by customer in both unioned queries.
If you want a distinct ordered set, you can do that outside of the CTE
How about:
;WITH CTE_last_order_date AS
(
SELECT c1.customer ,s2.dt_created AS last_order_date, '2' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN archive_orders s2 WITH (NOLOCK) ON c1.customer = s2.customer
UNION ALL
SELECT c1.customer ,s1.dt_created AS last_order_date, '1' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN orders s1 WITH (NOLOCK) ON c1.customer = s1.customer
)
SELECT customer, MAX(last_order_date)
FROM CTE_last_order_date
GROUP BY customer
ORDER BY MIN('group'), customer

if you union all possible rows together, then calculate a row_number, partitioned on customer and ordered on 'group' then last_order_date descending, you can then select all the row=1 to give the 'top 1' per customer
;WITH CTE_last_order_date AS
(
SELECT max(last_order_date) as 'last_order_date', customer
FROM (
SELECT distinct cust.customer, max(s2.dt_created) AS last_order_date, '2' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer
GROUP BY c1.customer
UNION
SELECT distinct c1.customer, max(sord.dt_created) AS last_order_date, '1' AS 'group'
FROM customers c1 WITH (NOLOCK)
LEFT JOIN orders s1 WITH (NOLOCK)
ON cust.customer = sord.customer
GROUP BY
c1.customer
) AS t
GROUP BY customer
)
, --row_number below is 'per customer' and can be used to make rn=1 the top 1 for each customerid
ROWN AS (SELECT Customer,last_order_date,[group], row_number() OVER(partition by customer order by [group] ASC, sord.dt_created DESC) AS RN)
SELECT * FROM Rown WHERE Rown.rn = 1

Related

Return only rows from the joined table with the latest date

By running the following query I realized that I have duplicates on the column QueryExecutionId.
SELECT DISTINCT qe.QueryExecutionid AS QueryExecutionId,
wfi.workflowdefinitionid AS FlowId,
qe.publishing_date AS [Date],
c.typename AS [Type],
c.name As Name
INTO #Send
FROM
[QueryExecutions] qe
JOIN [Campaign] c ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN [WorkflowInstances] wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
WHERE qe.[customer_idhash] IS NOT NULL;
E.g. When I test with one of these QueryExecutionIds, I can two results
select * from ##Send
where QueryExecutionId = 169237
We realized the reason is that these two rows have a different FlowId (second returned value in the first query). After discussing this issue, we decided to take the record with a FlowId that has the latest date. This date is a column called lastexecutiontime that sits in the third joined table [WorkflowInstances] which is also the table where FlowId comes from.
How do I only get unique values of QueryExecutionId with the latest value of WorkflowInstances.lastexecution time and remove the duplicates?
You can use a derived table with first_value partitioned by workflowinstanceid ordered by lastexecutiontime desc:
SELECT DISTINCT qe.QueryExecutionid AS QueryExecutionId,
wfi.FlowId,
qe.publishing_date AS [Date],
c.typename AS [Type],
c.name As Name
INTO #Send
FROM
[QueryExecutions] qe
JOIN [Campaign] c ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN
(
SELECT DISTINCT workflowinstanceid, FIRST_VALUE(workflowdefinitionid) OVER(PARTITION BY workflowinstanceid ORDER BY lastexecutiontime DESC) As FlowId
FROM [WorkflowInstances]
) wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
WHERE qe.[customer_idhash] IS NOT NULL;
Please note that your distinct query is pertaining to the selected variables,
eg. Data 1 (QueryExecutionId = 169237 and typename = test 1)
    Data 2 (QueryExecutionId = 169237 and typename = test 2)
The above 2 data are considered as distinct
Try partition by and selection the [seq] = 1 (the below code are partition by their date)
SELECT *
into #Send
FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY [QueryExecutionid] ORDER BY [Date] DESC) [Seq]
FROM
(
SELECT qe.QueryExecutionid AS QueryExecutionId,
wfi.FlowId,
qe.publishing_date AS [Date], --should not have any null values
qe.[customer_idhash]
c.typename AS [Type],
c.name As Name
FROM [QueryExecutions] qe
JOIN [Campaign] c
ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica
ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN
(
SELECT DISTINCT workflowinstanceid, FIRST_VALUE(workflowdefinitionid) OVER(PARTITION BY workflowinstanceid ORDER BY lastexecutiontime DESC) As FlowId
FROM [WorkflowInstances]
) wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
) a
WHERE [customer_idhash] IS NOT NULL
) b
WHERE [Seq] = 1
ORDER BY [QueryExecutionid]

SQL Simple Join with two tables, but one is random

I am stuck with this. I have a simple set-up with two tables. One table is holding emailaddresses one table is holding vouchercodes. I want to join them in a third table, so that each emailaddress has one random vouchercode.
Unfortunatly I am stuck with this as there are no identic Ids to match both values. What I have so far brings no result:
Select
A.Email
B.CouponCode
FROM Emailaddresses as A
JOIN CouponCodes as B
on A.Email = B.CouponCode
A hint would be great as search did not bring me any further yet.
Edit -
Table A (Addresses)
-------------------
Column A | Column B
-------------------------
email1#gmail.com True
email2#gmail.com
email3#gmail.com True
email4#gmail.com
Table B (Voucher)
-------------------
ABCD1234
ABCD5678
ABCD9876
ABCD5432
Table C
-------------------------
column A | column B
-------------------------
email1#gmail.com ABCD1234
email2#gmail.com ABCD5678
email3#gmail.com ABCD9876
email4#gmail.com ABCD5432
Sample Data:
While joining without proper keys is not a good solution, for your case you can try this. (note: not tested, just a quick suggestion)
;with cte_email as (
select row_number() over (order by Email) as rownum, Email
from Emailaddresses
)
;with cte_coupon as (
select row_number() over (order by CouponCode) as rownum, CouponCode
from CouponCodes
)
select a.Email,b.CouponCode
from cte_email a
join cte_coupon b
on a.rownum = b.rownum
You want to randomly join records, one email with one coupon each. So create random row numbers and join on these:
select
e.email,
c.couponcode
from (select t.*, row_number() over (order by newid()) as rn from emailaddresses t) e
join (select t.*, row_number() over (order by newid()) as rn from CouponCodes t) c
on c.rn = e.rn;
Give a row number for both the tables and join it with row number.
Query
;with cte as(
select [rn] = row_number() over(
order by [Column_A]
), *
from [Table_A]
),
cte2 as(
select [rn] = row_number() over(
order by [Column_A]
), *
from [Table_B]
)
select t1.[Column_A] as [Email_Id], t2.[Column_A] as [Coupon]
from cte t1
join cte2 t2
on t1.rn = t2.rn;
Find a demo here

CASE Statement with condition

I think this will be easier to show an example first and then explain:
SELECT P.ID,
(CASE WHEN PC.NewCostPrice IS NULL
THEN P.Cost ELSE MAX(PC.Date) PC.NewCostPrice
END)
FROM price AS P
LEFT OUTER JOIN priceChange as PC
ON P.ID = PC.ID
So in the example, if the NewCostPrice IS NULL, meaning there wasn't a price change, then I want the normal cost (P.Cost). However, if there was a price change, I want the most recent (MAX(Date)) price change. I am not sure how to incorporate that into the CASE statement.
I feel like it can be done with a subquery and having clause but that didn't really work out when I tried. Any suggestions?
Thanks!
There are 2 approaches you might consider - I would test both to see which performs better for your situation.
Use ROW_NUMBER() in subquery to find most recent price change of all price changes, then join that to prices to get correct price.
Use correlated subquery (many ways of this, either in SELECT as in other answer or with OUTER APPLY) to get only most recent price change for each row of prices
If your price table is very large and you are getting a large number of prices at once, method #1 will likely be better so the correlated subquery doesn't run for every single row of the result set.
If your final query pulls back a relatively small number of records instead of huge result sets for your server, then the correlated subquery could be better for you.
1. The ROW_NUMBER() approach
SELECT
P.ID,
COALESCE(PC.NewCostPrice, P.Cost) AS LatestPrice
FROM Price AS P
LEFT OUTER JOIN (
SELECT
ID,
ROW_NUMBER() OVER (PARTITION BY ID
ORDER BY [Date] DESC) AS RowId,
NewCostPrice
FROM PriceChange
) PC
ON P.ID = PC.ID
AND PC.RowId = 1 -- Only most recent
2a. Correlated subquery (SELECT)
SELECT
P.ID,
COALESCE((
SELECT TOP 1
NewCostPrice
FROM PriceChange PC
WHERE PC.ID = P.ID
ORDER BY PC.[Date] DESC
), P.Cost) AS LatestPrice
FROM Price AS P
2b. Correlated subquery with OUTER APPLY
SELECT
P.ID,
COALESCE(PC.NewCostPrice, P.Cost) AS LatestPrice
FROM Price AS P
OUTER APPLY (
SELECT TOP 1
NewCostPrice
FROM PriceChange PC
WHERE PC.ID = P.ID
ORDER BY PC.[Date] DESC
) PC
Whether you use 2a or 2b is more likely a preference in how you want to maintain the query going forward.
Easy way
SELECT distinct P.ID,
ISNULL((SELECT TOP 1 PC1.NewCostPrice FROM priceChange as PC1 WHERE PC1.ID = p.id ORDER BY PC1.Date DESC), p.cost)
FROM price AS P
Here I assume PC.ID is not a primary key, or it makes no sense to join with ID while there could be different price on the same item.
From your query I assume you just want to fetch the latest NewCostPrice sorted by Date, by joining priceChange
SELECT
P.ID,
CASE
WHEN PC.NewCostPrice IS NULL THEN P.Cost
ELSE PC.NewCostPrice
END AS NewPrice
FROM
price AS P
LEFT JOIN
(SELECT *, RANK() OVER (PARTITION BY ID ORDER BY [Date] DESC) as rk FROM priceChange) PC ON P.ID = PC.ID AND PC.rk = 1
SELECT P.ID
,(CASE
WHEN PC.NewCostPrice IS NULL
THEN P.Cost
ELSE (SELECT TOP 1 PC1.NewCostPrice
FROM priceChange PC1
WHERE PC1.ID = PC.ID
GROUP BY PC1.NewCostPrice, PC1.Date
ORDER BY PC1.Date DESC
)
END
)
FROM price AS P
LEFT OUTER JOIN priceChange as PC
ON P.ID = PC.ID

SQL Server 2014 Consolidate Tables avoiding duplicates

I have 36 Sales tables each referred to one store:
st1.dbo.Sales
st2.dbo.Sales
...
st35.dbo.Sales
st36.dbo.Sales
Each record has the following key columns:
UserName, PostalCode, Location, Country, InvoiceAmount, ItemsCount, StoreID
Here is SQLFiddle
I need to copy into Customers table all Username (and their details) that are not already present into Customers
in case of duplicated it is required to use the fields of record where InvoiceAmount is MAX
I tried to build a query but looks too complicated and it is also wrong because in CROSS APPLY should consider the full list of Sales Tables
INSERT INTO Customers (.....)
SELECT distinct
d.UserName,
w.postalCode,
w.location,
W.country,
max(w.invoiceamount) invoiceamount,
max(w.itemscount) itemscount,
w.storeID
FROM
(SELECT * FROM st1.dbo.Sales
UNION
SELECT * FROM st2.dbo.Sales
UNION
...
SELECT * FROM st36.dbo.Sales) d
LEFT JOIN
G.dbo.Customers s ON d.Username = s.UserName
CROSS APPLY
(SELECT TOP (1) *
FROM s.dbo.[Sales]
WHERE d.Username=w.Username
ORDER BY InvoiceAmount DESC) w
WHERE
s.UserName IS NULL
AND d.username IS NOT NULL
GROUP BY
d.UserName, w.postalCode, w.location,
w.country, w.storeID
Can somebody please give some hints?
As a basic SQL query, I'd create a row_number in the inner subquery and then join to customers and then isolated the max invoice number for each customer not in the customer table.
INSERT INTO Customers (.....)
SELECT w.UserName,
w.postalCode,
w.location,
w.country,
w.invoiceamount,
w.itemscount,
w.storeID
FROM (select d.*,
row_number() over(partition by d.Username order by d.invoiceamount desc) rownumber
from (SELECT *
FROM st1.dbo.Sales
UNION
SELECT *
FROM st2.dbo.Sales
UNION
...
SELECT *
FROM st36.dbo.Sales
) d
LEFT JOIN G.dbo.Customers s
ON d.Username = s.UserName
WHERE s.UserName IS NULL
AND d.username IS NOT NULL
) w
where w.rownumber = 1
Using your fiddle this will select distinct usernames rows with max invoiceamount
with d as(
SELECT * FROM Sales
UNION
SELECT * FROM Sales2
)
select *
from ( select *,
rn = row_number() over(partition by Username order by invoiceamount desc)
from d) dd
where rn=1;
step 1 - use cte .
select username , invoiceamount ,itemscount from Sales
UNION all
select user name , invoiceamount ,itemscount from Sales
.....
...
step 2
next cte use group by and get max invoiceamount ,itemscount for user of last result set.
,cte2 as (
select user name , max (invoiceamount) as invoiceamount ,max(itemscount) as itemscount from cte)
step3
use left join with user table and find missing record and itemscount invoiceamount

create sql query to fetch repeat column values within time frame

Can someone help me with this query? I want to get the result of all the customer_id which repeats more than once in 24hrs
SELECT
O.Order_No, O.Customer_ID, O.DateOrdered, O.IPAddress,
C.FirstName, C.LastName, CD.nameoncard
FROM
Order_No O
INNER JOIN
CardData CD ON O.card_id = CD.id
INNER JOIN
Customers C ON O.customer_id = C.customer_id
ORDER BY
O.order_no desc
adding more details..
so suppose order with customer id xx was placed on 04/23 2:30 pm and again 2nd order was placed with same customer Id xx on same day 04/23 5:30 pm.
i want the query to return me customer Id xx
Thanks
select Customer_ID, CAST(DateOrdered as Date) DateOrdered, count(*) QTDE
from Order_No
group by Customer_ID, CAST(DateOrdered as Date)
having count(*) > 1
To get the customers who have orders issued after the first one, then you could use the following query:
select distinct A.Customer_ID
from Order_No A
inner join (select Customer_ID, min(DateOrdered) DateOrdered from Order_No group by Customer_ID ) B
on A.Customer_ID = B.Customer_ID
and A.DateOrdered - B.DateOrdered <= 1
and A.DateOrdered > B.DateOrdered
SQL Fiddle
To get all customers that have ANY TIME more than one order issued in period less or equal than 24h
select distinct A.Customer_ID
from Order_No A
inner join Order_No B
on A.Customer_ID = B.Customer_ID
and A.DateOrdered > B.DateOrdered
and A.DateOrdered - B.DateOrdered <= 1
SQL Fiddle
Self-join:
SELECT distinct O.Customer_ID
FROM
Order_No O
inner join Order_No o2
on o.customerID = o2.customerID
and datediff(hour, o.DateOrdered, o2.DateOrdered) between 0 and 24
and o.Order_No <> o2.Order_No
This will return all customer_IDs that have ever placed more than one order in any 24 hour period.
Edited to add the join criteria that the matching records should not be the same record. Should return customers who placed two different orders at the same time, but not customers who placed only one order.

Resources