How to efficiently match on dates in SQL Server? - sql-server

I am trying to return the first registration for a person based on the minimum registration date and then return full information. The data looks something like this:
Warehouse_ID SourceID firstName lastName firstProgramSource firstProgramName firstProgramCreatedDate totalPaid totalRegistrations
12345 1 Max Smith League Kid Hockey 2017-06-06 $100 3
12345 6 Max Smith Activity Figure Skating 2018-09-26 $35 1
The end goal is to return one row per person that looks like this:
Warehouse_ID SourceID firstName lastName firstProgramSource firstProgramName firstProgramCreatedDate totalPaid totalRegistrations
12345 1 Max Smith League Kid Hockey 2017-06-06 $135 4
So, this would aggregate the totalPaid and totalRegistrations variables based on the Warehouse_ID and would pull the rest of the information based on the min(firstProgramCreatedDate) specific to the Warehouse_ID.
This will end up in Tableau, so what I've recently tried ignores aggregating totalPaid and totalRegistrations for now (I can get that in another query pretty easily). The query I'm using seems to work, but it is taking forever to run; it seems to be going row by row for >50,000 rows, which is taking forever.
select M.*
from (
select Warehouse_ID, min(FirstProgramCreatedDate) First
from vw_FirstRegistration
group by Warehouse_ID
) B
left join vw_FirstRegistration M on B.Warehouse_ID = M.Warehouse_ID
where B.First in (M.FirstProgramCreatedDate)
order by B.Warehouse_ID
Any advice on how I can achieve my goal without this query taking an hour plus to run?

A combination of the ROW_NUMBER windowing function, plus the OVER clause on a SUM expression should perform pretty well.
Here's the query:
SELECT TOP (1) WITH TIES
v.Warehouse_ID
,v.SourceID
,v.firstName
,v.lastName
,v.firstProgramSource
,v.firstProgramName
,v.firstProgramCreatedDate
,SUM(v.totalPaid) OVER (PARTITION BY v.Warehouse_ID) AS totalPaid
,SUM(v.totalRegistrations) OVER (PARTITION BY v.Warehouse_ID) AS totalRegistrations
FROM
#vw_FirstRegistration AS v
ORDER BY
ROW_NUMBER() OVER (PARTITION BY v.Warehouse_ID
ORDER BY CASE WHEN v.firstProgramCreatedDate IS NULL THEN 1 ELSE 0 END,
v.firstProgramCreatedDate)
And here's a Rextester demo: https://rextester.com/GNOB14793
Results (I added another kid...):
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
| Warehouse_ID | SourceID | firstName | lastName | firstProgramSource | firstProgramName | firstProgramCreatedDate | totalPaid | totalRegistrations |
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
| 12345 | 1 | Max | Smith | League | Kid Hockey | 2017-06-06 | 135.00 | 4 |
| 12346 | 6 | Joe | Jones | Activity | Other Activity | 2017-09-26 | 125.00 | 4 |
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
EDIT: Changed the ORDER BY based on comments.

Try to use ROW_NUMBER() with PARTITIYION BY.
For more information please refer to:
https://learn.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017

Related

SQL Server find sum of values based on criteria within another table

I have a table consisting of ID, Year, Value
---------------------------------------
| ID | Year | Value |
---------------------------------------
| 1 | 2006 | 100 |
| 1 | 2007 | 200 |
| 1 | 2008 | 150 |
| 1 | 2009 | 250 |
| 2 | 2005 | 50 |
| 2 | 2006 | 75 |
| 2 | 2007 | 65 |
---------------------------------------
I then create a derived, aggregated table consisting of an ID, MinYear, and MaxYear
---------------------------------------
| ID | MinYear | MaxYear |
---------------------------------------
| 1 | 2006 | 2009 |
| 2 | 2005 | 2007 |
---------------------------------------
I then want to find the sum of Values between the MinYear and MaxYear foreach ID in the aggregated table, but I am having trouble determining a proper query.
The final table should look something like this
----------------------------------------------------
| ID | MinYear | MaxYear | SumVal |
----------------------------------------------------
| 1 | 2006 | 2009 | 700 |
| 2 | 2005 | 2007 | 190 |
----------------------------------------------------
Right now I can perform all the joins to create the second table. But then I use a fast forward cursor to iterate through each record of the second table with the code inside the for loop looking like the following
DECLARE #curMin int
DECLARE #curMax int
DECLARE #curID int
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT Sum(Value) FROM ValTable WHERE Year >= #curMin and Year <= #curMax and ID = #curID
Group By ID
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
Having found the sum of values between specified years, I can connect it back to the second table and I wind up the desired result (the third table).
However, the second table in reality is roughly 4 million rows, so this iteration is extremely time consuming (~generating 300 results a minute) and presumably not the best solution.
My question is, is there a way to generate the third table's results without having to use a cursor/for loop?
During a group by the sum will only be for the ID in question -- since the min year and max year is for the ID itself then you don't need to double query. The query below should give you exactly what you need. If you have a different requirement let me know.
SELECT ID, MIN(YEAR) as MinYear, MAX(YEAR) as MaxYear, SUM(VALUE) as SUMVALUE
FROM tablenameyoudidnotsay
GROUP BY ID
You could use query as bellow
TableA is your first table, and TableB is the second one
SELECT *,
(select SUM(Value) FROM TableA where tablea.ID=TableB.ID AND tableA.Year BETWEEN
TableB.MinYear AND TableB.MaxYear) AS SumValue
from TableB
You can put your criteria into a join and obtain the result all as one set which should be faster:
SELECT b.Id, b.MinYear, b.MaxYear, sum(a.Value)
FROM Table2 b
JOIN Table1 a ON a.Id=b.Id AND b.MinYear <= a.Year AND b.MaxYear >= a.Year
GROUP BY b.Id, b.MinYear, b.MaxYear

Do dates of service fall in between membership date range

I have two tables one is the customer_service table with dates of service and the other is the membership table where the member can exist multiple times if they have had lapses in their membership effective and expiration dates. Below is a basic example of how these table might layout.
How might I find dates of service that fall outside or in between membership date ranges. A simple join will not work with this due to the member possibly having multiple date ranges for their membership under the same ID. Would this require some form of iteration here? I am unsure as to the best way to approach this kind of issue.
Customer_Service Table
id | customers | Dos
-------------------------
1 | Rodney | 01/18/2018
2 | Jim | 02/15/2018
3 | Tom | 01/01/2018
1 | Rodney | 02/15/2018
3 | Tom | 03/01/2018
Membership Table
id | Effective_date | End_date
-------------------------
1 | 01/01/2017 | 12/31/2017
1 | 02/15/2018 | 05/20/2018
2 | 06/20/2016 | 01/25/2018
2 | 02/25/2018 | 12/31/2099
3 | 01/01/2018 | 06/01/2018
A simple approach is below. The query will identify rows in CUSTOMER_SERVICE where DOS does not fall between any periods in the membership table for that customer.
SELECT * FROM CUSTOMER_SERVICE CS
WHERE NOT EXISTS (
SELECT * FROM MEMBERSHIP M
WHERE CS.ID = M.ID
AND DOS BETWEEN EFFECTIVE_DATE AND END_DATE
)
Or alternatively:
SELECT CS.* FROM CUSTOMER_SERVICE CS
LEFT JOIN MEMBERSHIP M ON M.ID = CS.ID
AND DOS BETWEEN EFFECTIVE_DATE AND END_DATE
WHERE M.ID IS NULL

SQL - How can I get the number of duplicates in the non-aggregated result?

Suppose I have a table tb such that
select * from tb
returns
ID | City | Country
1 | New York | US
2 | Chicago | US
3 | Boston | US
4 | Beijing | China
5 | Shanghai | China
6 | London | UK
What is the easiest way to write a query that can return the following result?
ID | City | Country | Count
1 | New York | US | 3
2 | Chicago | US | 3
3 | Boston | US | 3
4 | Beijing | China | 2
5 | Shanghai | China | 2
6 | London | UK | 1
The only solution I can think of is
with cte as (select country, count(1) as Count from tb group by country)
select tb.*, cte.Count from tb join cte on tb.Country = cte.Country
But I feel that is not succinct enough. I am wondering if there is anything like Duplicate_Number() over (partition by country) to do this.
Try this:
select *
,COUNT(*) OVER (PARTITION BY Country)
from tb
The OVER clause
Determines the partitioning and ordering of a rowset before the
associated window function is applied.
So, we are basically telling to COUNT the records, but to group the rows per COUNTRY.
Another approach to achieve the result :
select t1.*, t2.Country_Count from tb t1
join
(select country, count(country) Country_Count from tb group by country) t2
on t1.country=t2.country
order by t1.id
SQL HERE

sum column with duplicates in another table

Wrong Result
So i have two tables
Order
Staging
Order Table having column structure
+-------+---------+-------------+---------------+----------+
| PO | cashAmt | ClaimNumber | TransactionID | Supplier |
+-------+---------+-------------+---------------+----------+
| 12345 | 100 | 99876 | abc123 | 0101 |
| 12346 | 50 | 99875 | abc123 | 0102 |
| 12345 | 100 | 99876 | abc123 | 0101 |
+-------+---------+-------------+---------------+----------+
Staging Table having column structure
+----------+------------+-------------+---------------+
| PONumber | paymentAmt | ClaimNumber | TransactionID |
+----------+------------+-------------+---------------+
| 12345 | 100 | 99876 | abc123 |
| 12346 | 50 | 99875 | abc123 |
+----------+------------+-------------+---------------+
The query i am executing is
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM [order] with (nolock)
WHERE TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging with (nolock)
where TransactionID='abc123'
but the sum is getting messed up because there is duplicate in one of the tables.
How can i edit that i get only uniques from the order table and the sums are correct
First ask yourself why are there duplicates in the Orders table? There must be a reason why they are there. I would deal with that first.
That issue aside, if the duplicates in the Orders table have a purpose and yet are not to be considered for this particular query, then you should be able to leave out the duplicates by simply changing the query to use DISTINCT on whatever field in the Orders table can reliably identify a duplicate.
select Distinct fieldname sum(cashAmt)... etc.
Assuming duplicates in your table are OK.
Not sure why you are using no lock, it seems like it shouldn't be included.
You could use a table variable to store the distinct values. You'll need to adjust the data types in the table variable to match your table structure.
I haven't tested the code below but it should look something like this.
DECLARE #OrderTmp TABLE (
cashAmt MyNumericColumn numeric(10,2)
, ClaimNumber int
, TransactionID Int
)
INSERT INTO #OrderTmp
select Distinct
cashAmt
,ClaimNumber
,TransactionID
FROM
[order]
WHERE TransactionID='abc123'
SELECT DISTINCT
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM #OrderTmp
where TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging
where TransactionID='abc123'

Finding duplicate in SQL Server Table

I have a table
+--------+--------+--------+--------+--------+
| Market | Sales1 | Sales2 | Sales3 | Sales4 |
+--------+--------+--------+--------+--------+
| 68 | 1 | 2 | 3 | 4 |
| 630 | 5 | 3 | 7 | 8 |
| 190 | 9 | 10 | 11 | 12 |
+--------+--------+--------+--------+--------+
I want to find duplicates between all the above sales fields. In above example markets 68 and 630 have a duplicate Sales value that is 3.
My problem is displaying the Market having duplicate sales.
This problem would be incredibly simple to solve if you normalised your table.
Then you would just have the columns Market | Sales, or if the 1, 2, 3, 4 are important you could have Market | Quarter | Sales (or some other relevant column name).
Given that your table isn't in this format, you could use a CTE to make it so and then select from it, e.g.
WITH cte AS (
SELECT Market, Sales1 AS Sales FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
UNION ALL
SELECT Market, Sales3 FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
)
SELECT a.Market
,b.Market
FROM cte a
INNER JOIN cte b ON b.Market > a.Market
WHERE a.Sales = b.Sales
You can easily do this without the CTE, you just need a big where clause comparing all the combinations of Sales columns.
Supposing the data size is not so big,
make a new temporay table joinning all data:
Sales
Market
then select grouping by Sales and after take the ones bigger than 1:
select Max(Sales), Count(*) as Qty
from #temporary
group by Sales

Resources