TSQL matching the first instances of multiple values in a resultset

TSQL matching the first instances of multiple values in a resultset - sql-server

Say I have part of a large query, as below, that returns a resultset with multiple rows of the same key information (PolNum) with different value information (PolPremium) in a random order.
Would it be possible to select the first matching PolNum fields and sum up the PolPremium. In this case I know that there are 2 PolNumber's used so given the screenshot of the resultset (yes I know it starts at 14 for illustration purposes) and return the first values and sum the result.
First match for PolNum 000035789547
(ROW 14) PolPremium - 32.00
First match for PolNum 000035789547
(ROW 16) PolPremium - 706043.00
Total summed should be 32.00 + 706043.00 = 706072.00
Query
OUTER APPLY
(
SELECT PolNum, PolPremium
FROM PN20
WHERE PolNum IN(SELECT PolNum FROM SvcPlanPolicyView
WHERE SvcPlanPolicyView.ControlNum IN (SELECT val AS ServedCoverages FROM ufn_SplitMax(
(SELECT TOP 1 ServicedCoverages FROM SV91 WHERE SV91.AccountKey = 3113413), ';')))
ORDER BY PN20.PolEffDate DESC
}
Resultset

Suppose that pic if the final result your query produces. Then you can do something like:
DECLARE #t TABLE
(
PolNum VARCHAR(20) ,
PolPremium MONEY
)
INSERT INTO #t
VALUES ( '000035789547', 32 ),
( '000035789547', 76 ),
( '000071709897', 706043.00 ),
( '000071709897', 1706043.00 )
SELECT t.PolNum ,
SUM(PolPremium) AS PolPremium
FROM ( SELECT * ,
ROW_NUMBER() OVER ( PARTITION BY PolNum ORDER BY PolPremium ) AS rn
FROM #t
) t
WHERE rn = 1
GROUP BY GROUPING SETS(t.PolNum, ( ))
Output:
PolNum PolPremium
000035789547 32.00
000071709897 706043.00
NULL 706075.00
Just replace #t with your query. Also I assume that row with minimum of premium is the first. You could probably do filtering top row in outer apply part but it really not clear for me what is going on there without some sample data.

Related

Create a SELECT with defined number of rows

I have a stored procedure that should insert some random rows in a table depending on the amount values
#amount1 INT --EligibilityID = 1
#amount2 INT --EligibilityID = 2
#amount3 INT --EligibilityID = 3
Maybe the obvious way is to use TOP(#amount) but there are a lot of amount values and the second select is much larger. So, I was looking for a way to do it in a single statement if possible.
INSERT INTO [dbo].[CaseInfo]
SELECT ([EligibilityID],[CaseNumber],[CaseMonth])
FROM (
SELECT TOP(#amount1) [EligibilityID],[CaseNumber],[CaseMonth]
FROM [dbo].[tempCases]
WHERE [EligibilityID] = 1
)
INSERT INTO [dbo].[CaseInfo]
SELECT ([EligibilityID],[CaseNumber],[CaseMonth])
FROM (
SELECT TOP(#amount2) [EligibilityID],[CaseNumber],[CaseMonth]
FROM [dbo].[tempCases]
WHERE [EligibilityID] = 2
)
INSERT INTO [dbo].[CaseInfo]
SELECT ([EligibilityID],[CaseNumber],[CaseMonth])
FROM (
SELECT TOP(#amount3) [EligibilityID],[CaseNumber],[CaseMonth]
FROM [dbo].[tempCases]
WHERE [EligibilityID] = 3
)

I would recommend to use row_number, partitioned by eligibilityID, and then compare it with a case statement to select the correct variable each time:
INSERT INTO [dbo].[CaseInfo]
SELECT ([EligibilityID],[CaseNumber],[CaseMonth])
FROM (
SELECT [EligibilityID],[CaseNumber],[CaseMonth]
,row_number() over (partition by EligibilityID order by CaseNumber) as rn -- you haven't mentioned an ORDER BY, you can change it here
FROM [dbo].[tempCases]
) as table1
where rn<=case
when EligibilityID=1 then #amount1
when EligibilityID=2 then #amount2
when EligibilityID=3 then #amount3
end

How to select the top 1 in case distinct returns 2 rows

I have a select distinct query that can return 2 rows with the same code since not all columns have the same value. Now my boss wants to get the first one. So how to I do it. Below is the sample result. I want only to return the get the first two unique pro

Use row_number in your query. Please find this link for more info link
; with cte as (
select row_number() over (partition by pro order by actual_quantity) as Slno, * from yourtable
) select * from cte where slno = 1

Your chances to get the proper answer can be much higher if you spend some time to prepare the question properly. Provide the DDL and sample data, as well as add the desired result.
To solve your problem, you need to know the right uniqueness order to get 1 record per window group. Google for window functions. In my example the uniqueness is --> Single row for every pro with earliest proforma_invoice_received_date date and small amount per this date.
DROP TABLE IF EXISTS #tmp;
GO
CREATE TABLE #tmp
(
pro VARCHAR(20) ,
actual_quantity DECIMAL(12, 2) ,
proforma_invoice_received_date DATE ,
import_permit DATE
);
GO
INSERT INTO #tmp
( pro, actual_quantity, proforma_invoice_received_date, import_permit )
VALUES ( 'N19-00945', 50000, '20190516', '20190517' ),
( 'N19-00945', 50001, '20190516', '20190517' )
, ( 'N19-00946', 50002, '20190516', '20190517' )
, ( 'N19-00946', 50003, '20190516', '20190517' );
SELECT a.pro ,
a.actual_quantity ,
a.proforma_invoice_received_date ,
a.import_permit
FROM ( SELECT pro ,
actual_quantity ,
proforma_invoice_received_date ,
import_permit ,
ROW_NUMBER() OVER ( PARTITION BY pro ORDER BY proforma_invoice_received_date, actual_quantity ) AS rn
FROM #tmp
) a
WHERE rn = 1;
-- you can also use WITH TIES for that to save some lines of code
SELECT TOP ( 1 ) WITH TIES
pro ,
actual_quantity ,
proforma_invoice_received_date ,
import_permit
FROM #tmp
ORDER BY ROW_NUMBER() OVER ( PARTITION BY pro ORDER BY proforma_invoice_received_date, actual_quantity );
DROP TABLE #tmp;

Try this-
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY pro ORDER BY Pro) RN
-- You need to add other columns in the ORDER BY clause
-- with 'pro' to get your desired row. other case you
-- will get first row returned by the query with only
-- order by 'pro' and this can vary for different execution
FROM your_table
)A
WHERE RN = 1

CREATE TABLE T (
A [numeric](10, 2) NULL,
B [numeric](10, 2) NULL
)
INSERT INTO T VALUES (100,20)
INSERT INTO T VALUES (100,30)
INSERT INTO T VALUES (200,40)
INSERT INTO T VALUES (200,50)
select *
from T
/*
A B
100.00 20.00
100.00 30.00
200.00 40.00
200.00 50.00
*/
select U.A, U.B
from
(select row_number() over(Partition By A Order By B) as row_num, *
from T ) U
where row_num = 1
/*
A B
100.00 20.00
200.00 40.00
*/

Window function behaves differently in Subquery/CTE?

I thought the following three SQL statements are semantically the same. The database engine will expand the second and third query to the first one internally.
select ....
from T
where Id = 1
select *
from
(select .... from T) t
where Id = 1
select *
from
(select .... from T where Id = 1) t
However, I found the window function behaves differently. I have the following code.
-- Prepare test data
with t1 as
(
select *
from (values ( 2, null), ( 3, 10), ( 5, -1), ( 7, null), ( 11, null), ( 13, -12), ( 17, null), ( 19, null), ( 23, 1759) ) v ( id, col1 )
)
select *
into #t
from t1
alter table #t add primary key (id)
go
The following query returns all the rows.
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4)))
over (order by id
rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t
id col1 lastval
-------------------
2 NULL NULL
3 10 NULL
5 -1 10
7 NULL -1
11 NULL -1
13 -12 -1
17 NULL -12
19 NULL -12
23 1759 -12
Without CTE/subquery: then I added a condition just return the row which Id = 19.
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t
where
id = 19;
However, lastval returns null?
With CTE/subquery: now the condition is applied to the CTE:
with t as
(
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
from
#t)
select *
from t
where id = 19;
-- Subquery
select
*
from
(select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t) t
where
id = 19;
Now lastval returns -12 as expected?

The logic order of operations of the SELECT statement is import to understand the results of your first example. From the Microsoft documentation, the order is, from top to bottom:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Note that the WHERE clause processing happens logically before the SELECT clause.
The query without the CTE is being filtered where id = 19. The order of operations causes the where to process before the window function in the select clause. There is only 1 row with an id of 19. Therefore, the where limits the rows to id = 19 before the window function can process the rows between unbounded preceding and 1 preceding. Since there are no rows for the window function, the lastval is null.
Compare this to the CTE. The outer query's filter has not yet been applied, so the CTE operates an all of the data. The rows between unbounded preceding finds the prior rows. The outer part of the query applies the filter to the intermediate results returns just the row 19 which already has the correct lastval.
You can think of the CTE as creating a temporary #Table with the CTE data in it. All of the data is logically processed into a separate table before returning data to the outer query. The CTE in your example creates a temporary work table with all of the rows that includes the lastval from the prior rows. Then, the filter in the outer query gets applied and limits the results to id 19.
(In reality, the CTE can shortcut and skip generating data, if it can do so to improve performance without affecting the results. Itzik Ben-Gan has a great example of a CTE that skips processing when it has returned enough data to satisfy the query.)
Consider what happens if you put the filter in the CTE. This should behave exactly like the first example query that you provided. There is only 1 row with an id = 19, so the window function does not find any preceding rows:
with t as ( select id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over ( order by id
rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
from #t
where id = 19 -- moved filter inside CTE
)
select *
from t

Window functions operate on your result set, so when you added where id = 19 your result set only had 1 row. Since your window function specifies rows between unbounded preceding and 1 preceding there was no preceding row, and resulted in null.
By using the subquery/cte you are allowing the window function to operate over the unfiltered result set (where the preceding rows exist), then retrieving only those rows from that result set where id = 19.

The querys you are comparing are not equivalent.
select id ,
(... ) as lastval
from #t
where id = 19;
will take only 1 row, so 'lastval' will take NULL from col1 as for the windowed function does not find preceding row.

How to query number based SQL Sets with Ranges in SQL

What I'm looking for is a way in MSSQL to create a complex IN or LIKE clause that contains a SET of values, some of which will be ranges.
Sort of like this, there are some single numbers, but also some ranges of numbers.
EX: SELECT * FROM table WHERE field LIKE/IN '1-10, 13, 24, 51-60'
I need to find a way to do this WITHOUT having to specify every number in the ranges separately AND without having to say "field LIKE blah OR field BETWEEN blah AND blah OR field LIKE blah.
This is just a simple example but the real query will have many groups and large ranges in it so all the OR's will not work.

One fairly easy way to do this would be to load a temp table with your values/ranges:
CREATE TABLE #Ranges (ValA int, ValB int)
INSERT INTO #Ranges
VALUES
(1, 10)
,(13, NULL)
,(24, NULL)
,(51,60)
SELECT *
FROM Table t
JOIN #Ranges R
ON (t.Field = R.ValA AND R.ValB IS NULL)
OR (t.Field BETWEEN R.ValA and R.ValB AND R.ValB IS NOT NULL)
The BETWEEN won't scale that well, though, so you may want to consider expanding this to include all values and eliminating ranges.

You can do this with CTEs.
First, create a numbers/tally table if you don't already have one (it might be better to make it permanent instead of temporary if you are going to use it a lot):
;WITH Numbers AS
(
SELECT
1 as Value
UNION ALL
SELECT
Numbers.Value + 1
FROM
Numbers
)
SELECT TOP 1000
Value
INTO ##Numbers
FROM
Numbers
OPTION (MAXRECURSION 1000)
Then you can use a CTE to parse the comma delimited string and join the ranges with the numbers table to get the "NewValue" column which contains the whole list of numbers you are looking for:
DECLARE #TestData varchar(50) = '1-10,13,24,51-60'
;WITH CTE AS
(
SELECT
1 AS RowCounter,
1 AS StartPosition,
CHARINDEX(',',#TestData) AS EndPosition
UNION ALL
SELECT
CTE.RowCounter + 1,
EndPosition + 1,
CHARINDEX(',',#TestData, CTE.EndPosition+1)
FROM CTE
WHERE
CTE.EndPosition > 0
)
SELECT
u.Value,
u.StartValue,
u.EndValue,
n.Value as NewValue
FROM
(
SELECT
Value,
SUBSTRING(Value,1,CASE WHEN CHARINDEX('-',Value) > 0 THEN CHARINDEX('-',Value)-1 ELSE LEN(Value) END) AS StartValue,
SUBSTRING(Value,CASE WHEN CHARINDEX('-',Value) > 0 THEN CHARINDEX('-',Value)+1 ELSE 1 END,LEN(Value)- CHARINDEX('-',Value)) AS EndValue
FROM
(
SELECT
SUBSTRING(#TestData, StartPosition, CASE WHEN EndPosition > 0 THEN EndPosition-StartPosition ELSE LEN(#TestData)-StartPosition+1 END) AS Value
FROM
CTE
)t
)u INNER JOIN ##Numbers n ON n.Value BETWEEN u.StartValue AND u.EndValue
All you would need to do once you have that is query the results using an IN statement, so something like
SELECT * FROM MyTable WHERE Value IN (SELECT NewValue FROM (/*subquery from above*/)t)

SQL Select Statement For Calculating A Running Average Column

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.

This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr

Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;

Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID

A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.

Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.

If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.

Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

TSQL matching the first instances of multiple values in a resultset - sql-server

Related

Create a SELECT with defined number of rows

How to select the top 1 in case distinct returns 2 rows

Window function behaves differently in Subquery/CTE?

How to query number based SQL Sets with Ranges in SQL

SQL Select Statement For Calculating A Running Average Column

Categories

Resources