Select n random rows from table per group of codes

Select n random rows from table per group of codes - sql-server

I have a table full of customer details from insurance policies or quotes. Each one is assigned an output code that relates to a marketing campaign and each occurs 4 times, one per "batch" which just represents a week in the month. I need to select a random 25 percent of the rows per code, per batch number (1-4) to put into another table so I can then hold those rows back and prevent the customer being marketed to.
All the solutions I've seen on stack so far instruct how to do this for a specific number of rows per group using a ROW_NUMBER in an initial CTE query then selecting from that where rn <= a given number. I need to do this but select 25 percent of each group instead.
I've tried this solution but the specific row number doesn't move me any further forward;
Select N random rows in group
Using the linked solution, this is how my code currently is without a complete where clause because I know this isn't quite what I need.
;WITH AttributionOutput AS (
SELECT [Output Code], BatchNo, MonthandYear
FROM [dbo].[Direct_Marketing_UK]
WHERE MonthandYear = 'Sep2019'
And [Output Code] NOT IN ('HOMELIVE','HOMELIVENB','HOMENBLE')
GROUP BY [Output Code], BatchNo, MonthandYear
HAVING COUNT(*) >= 60
)
, CodeandBatch AS (
SELECT dmuk.PK_ID,
dmuk.MonthandYear,
dmuk.PackNo,
dmuk.BatchNo,
dmuk.CustomerKey,
dmuk.URN,
dmuk.[Output Code],
dmuk.[Quote/Renewal Date],
dmuk.[Name],
dmuk.[Title],
dmuk.[Initial],
dmuk.[Forename],
dmuk.[Surname],
dmuk.[Salutation],
dmuk.[Address 1],
dmuk.[Address 2],
dmuk.[Address 3],
dmuk.[Address 4],
dmuk.[Address 5],
dmuk.[Address 6],
dmuk.[PostCode],
ROW_NUMBER() OVER(PARTITION BY dmuk.[Output Code], dmuk.BatchNo ORDER BY newid()) as rn
FROM [dbo].[Direct_Marketing_UK] dmuk INNER JOIN
AttributionOutput ao ON dmuk.[Output Code] = ao.[Output Code]
AND dmuk.BatchNo = ao.BatchNo
AND dmuk.MonthandYear = ao.MonthandYear
)
SELECT URN,
[Output Code],
[BatchNo]
FROM CodeandBatch
WHERE rn <=
I can't see how a ROW_NUMBER() can help me to grab 25 percent of the rows from every combination of Output Code and batch number.

I suggest you look at NTILE for this.

Related

SQL Pivot swapping values in SQL Server Query

I have two queries Sales & Forecast which I am bringing together in a UNION
These two queries at this point show correctly.
I then pivot the query to show each SKU and then a Column of Sales and then a column of Forecast but it appears that the values appear under the wrong column.
Below is my code - is there a obvious reason why it doesn't output as expected?
SELECT q.FY#, q.[Country Code], q.Family, q.Colour, q.Pack_Size, Forecast, Actuals, Forecast/nullif(Actuals,0) as Change
FROM (
SELECT FY#, [Country Code], Family, Colour, Pack_Size, Forecast, Actuals
FROM (
SELECT f.FY#, f.Attribute, f.[Country Code], f.Family, f.Colour, f.Pack_Size, sum(f.Packs) as Packs
FROM [V3.1_JDAForecast](#Country, #FY) f
group by f.FY#, f.Attribute, f.[Country Code], f.Family, f.Colour, f.Pack_Size
UNION
SELECT a.FY#, a.Attribute, a.[Country Code], a.Family, a.Colour, a.Pack_Size, sum(a.Packs) as Packs
FROM [V3.1_JDAActuals](#Country, #FY) as a
group by a.FY#, a.Attribute, a.[Country Code], a.Family, a.Colour, a.Pack_Size
) src
PIVOT
(
SUM(Packs)
for Attribute in ([Actuals], [Forecast])
) piv
) q

SQL Server: Min/Max/Percentiles

I'm trying to get the count, min, max, and some percentiles (10th, 25th, 50th, 75th, 90th) of base salaries for each master job title.
I'm getting the following error:
Msg 8120, Level 16, State 1, Line 1
Column 'dbo.ps_employee.Base' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

You must include all non-aggregated items in your select list, at the bottom in the GROUP BY section.
Documentation: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql

I was able to use the following query to get the percentiles. I used a separate one for count/min/max/average.
SELECT DISTINCT mj.title, PERCENTILE_DISC(.1) WITHIN GROUP (ORDER BY e.base)
OVER (PARTITION BY mj.title) AS '10th', PERCENTILE_DISC(.25) WITHIN GROUP (ORDER
BY e.base) OVER (PARTITION BY mj.title) AS '25th', PERCENTILE_DISC(.5) WITHIN
GROUP (ORDER BY e.base) OVER (PARTITION BY mj.title) AS '50th',
PERCENTILE_DISC(.75) WITHIN GROUP (ORDER BY e.base) OVER (PARTITION BY mj.title)
AS '75th', PERCENTILE_DISC(.9) WITHIN GROUP (ORDER BY e.base) OVER (PARTITION BY
mj.title) AS '90th' FROM dbo.ps_employee e FULL OUTER JOIN dbo.ps_jobs j on
e.title = j.job FULL OUTER JOIN dbo.ps_masterjobs mj
ON j.masterID = mj.ID;

Add "over (partition by title)" in your count, min, max and avg functions while also adding base to your group by functions. This will allow you to have all the values in a single row set but you will have duplicate rows in the output

T-SQL - Get last as-at date SUM(Quantity) was not negative

I am trying to find a way to get the last date by location and product a sum was positive. The only way i can think to do it is with a cursor, and if that's the case I may as well just do it in code. Before i go down that route, i was hoping someone may have a better idea?
Table:
Product, Date, Location, Quantity
The scenario is; I find the quantity by location and product at a particular date, if it is negative i need to get the sum and date when the group was last positive.
select
Product,
Location,
SUM(Quantity) Qty,
SUM(Value) Value
from
ProductTransactions PT
where
Date <= #AsAtDate
group by
Product,
Location

i am looking for the last date where the sum of the transactions previous to and including it are positive
Based on your revised question and your comment, here another solution I hope answers your question.
select Product, Location, max(Date) as Date
from (
select a.Product, a.Location, a.Date from ProductTransactions as a
join ProductTransactions as b
on a.Product = b.Product and a.Location = b.Location
where b.Date <= a.Date
group by a.Product, a.Location, a.Date
having sum(b.Value) >= 0
) as T
group by Product, Location
The subquery (table T) produces a list of {product, location, date} rows for which the sum of the values prior (and inclusive) is positive. From that set, we select the last date for each {product, location} pair.

This can be done in a set based way using windowed aggregates in order to construct the running total. Depending on the number of rows in the table this could be a bit slow but you can't really limit the time range going backwards as the last positive date is an unknown quantity.
I've used a CTE for convenience to construct the aggregated data set but converting that to a temp table should be faster. (CTEs get executed each time they are called whereas a temp table will only execute once.)
The basic theory is to construct the running totals for all of the previous days using the OVER clause to partition and order the SUM aggregates. This data set is then used and filtered to the expected date. When a row in that table has a quantity less than zero it is joined back to the aggregate data set for all previous days for that product and location where the quantity was greater than zero.
Since this may return multiple positive date rows the ROW_NUMBER() function is used to order the rows based on the date of the positive quantity day. This is done in descending order so that row number 1 is the most recent positive day. It isn't possible to use a simple MIN() here because the MIN([Date]) may not correspond to the MIN(Quantity).
WITH x AS (
SELECT [Date],
Product,
[Location],
SUM(Quantity) OVER (PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS Quantity,
SUM([Value]) OVER(PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS [Value]
FROM ProductTransactions
WHERE [Date] <= #AsAtDate
)
SELECT [Date], Product, [Location], Quantity, [Value], Positive_date, Positive_date_quantity
FROM (
SELECT x1.[Date], x1.Product, x1.[Location], x1.Quantity, x1.[Value],
x2.[Date] AS Positive_date, x2.[Quantity] AS Positive_date_quantity,
ROW_NUMBER() OVER (PARTITION BY x1.Product, x1.[Location] ORDER BY x2.[Date] DESC) AS Positive_date_row
FROM x AS x1
LEFT JOIN x AS x2 ON x1.Product=x2.Product AND x1.[Location]=x2.[Location]
AND x2.[Date]<x1.[Date] AND x1.Quantity<0 AND x2.Quantity>0
WHERE x1.[Date] = #AsAtDate
) AS y
WHERE Positive_date_row=1

Do you mean that you want to get the last date of positive quantity come to positive in group?
For example, If you are using SQL Server 2012+:
In following scenario, when the date going to 01/03/2017 the summary of quantity come to 1(-10+5+6).
Is it possible the quantity of following date come to negative again?
;WITH tb(Product, Location,[Date],Quantity) AS(
SELECT 'A','B',CONVERT(DATETIME,'01/01/2017'),-10 UNION ALL
SELECT 'A','B','01/02/2017',5 UNION ALL
SELECT 'A','B','01/03/2017',6 UNION ALL
SELECT 'A','B','01/04/2017',2
)
SELECT t.Product,t.Location,SUM(t.Quantity) AS Qty,MIN(CASE WHEN t.CurrentSum>0 THEN t.Date ELSE NULL END ) AS LastPositiveDate
FROM (
SELECT *,SUM(tb.Quantity)OVER(ORDER BY [Date]) AS CurrentSum FROM tb
) AS t GROUP BY t.Product,t.Location
Product Location Qty LastPositiveDate
------- -------- ----------- -----------------------
A B 3 2017-01-03 00:00:00.000

Find the date when a bit column toggled state

I have this requirement.
My table contains a series of rows with serialnos and several bit columns and date-time.
To Simplify I will focus on 1 bit column.In essence, I need to know the recent date that this bit was toggled.
Ex: The following table depicts the bit values for 7 serials for the latest 6 days (10 to 5).
SQl Fiddle schema + query
I have succesfully managed to get the result in a sample but is taking ages on the real table containing over 30 million records and approx 300K serial nos.
Pseudo -->
For each Serial:
Get (max Date) bit value as A (latest bit value ex 1)
Get (max Date) NOT A as B ( Find most recent date that was ex 0)
Get the (Min Date) > B
Group by SNO
I am sure an optimised approach exists.
For completeness the dataset contains rows that I need to filter out etc. However I can build and add these later when getting the basic executing more efficiently.
Tks for your time!

with cte as
(
select *, rn = ROW_NUMBER() OVER (ORDER BY sno)
from dbo.TestCape2
)
select MAX(y.Device_date) as MaxDate,
y.SNo
from cte x
inner join cte as y
on x.rn = y.rn + 1
and x.SNo = y.SNo
and x.Cape <> y.Cape
group by y.SNo
order by SNo;
And if you're using SQL-Server 2012 and up you can make use of LAG, which will take a look at the previous row.
select max(Device_date) as MaxDate,
SNo
from (
select SNo
,Device_date
,Cape
,LAG (Cape, 1, 0) OVER (PARTITION BY Sno ORDER BY Device_date) AS PrevCape
,LAG (Sno, 1, 0) OVER (PARTITION BY Sno ORDER BY Device_date) AS PrevSno
from dbo.TestCape2) t
where sno = PrevSno
and t.Cape <> t.PrevCape
group by sno
order by sno;

SQL Server pick random (or first) value with aggregation

How can I get SQL Server to return the first value (any one, I don't care, it just needs to be fast) it comes across when aggregating?
For example, let's say I have:
ID Group
1 A
2 A
3 A
4 B
5 B
and I need to get any one of the ID's for each group. I can do this as follows:
Select
max(id)
,group
from Table
group by group
which returns
ID Group
3 A
5 B
That does the job, but it seems stupid to me to ask SQL Server to calculate the highest ID when all it really needs to do is to pick the first ID it comes across.
Thanks
PS - the fields are indexed, so maybe it doesn't really make a difference?

There is an undocumented aggregate called ANY which is not valid syntax but is possible to get to appear in your execution plans. This does not provide any performance advantage however.
Assuming the following table and index structure
CREATE TABLE T
(
id int identity primary key,
[group] char(1)
)
CREATE NONCLUSTERED INDEX ix ON T([group])
INSERT INTO T
SELECT TOP 1000000 CHAR( 65 + ROW_NUMBER() OVER (ORDER BY ##SPID) % 3)
FROM sys.all_objects o1, sys.all_objects o2, sys.all_objects o3
I have also populated with sample data such that there are many rows per group.
Your original query
SELECT MAX(id),
[group]
FROM T
GROUP BY [group]
Gives Table 'T'. Scan count 1, logical reads 1367 and the plan
|--Stream Aggregate(GROUP BY:([[T].[group]) DEFINE:([Expr1003]=MAX([[T].[id])))
|--Index Scan(OBJECT:([[T].[ix]), ORDERED FORWARD)
Rewritten to get the ANY aggregate...
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [group] ORDER BY [group] ) AS RN
FROM T)
SELECT id,
[group]
FROM cte
WHERE RN=1
Gives Table 'T'. Scan count 1, logical reads 1367 and the plan
|--Stream Aggregate(GROUP BY:([[T].[group]) DEFINE:([[T].[id]=ANY([[T].[id])))
|--Index Scan(OBJECT:([[T].[ix]), ORDERED FORWARD)
Even though potentially SQL Server could stop processing the group as soon as the first value is found and skip to the next one it doesn't. It still processes all rows and the logical reads are the same.
For this particular example with many rows in the group a more efficient version would be a recursive CTE.
WITH RecursiveCTE
AS (
SELECT TOP 1 id, [group]
FROM T
ORDER BY [group]
UNION ALL
SELECT R.id, R.[group]
FROM (
SELECT T.*,
rn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM T
JOIN RecursiveCTE R
ON R.[group] < T.[group]
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
Which gives
Table 'Worktable'. Scan count 2, logical reads 19
Table 'T'. Scan count 4, logical reads 12
The logical reads are much less as it retrieves the first row per group then seeks into the next group rather than reading a load of records that don't contribute to the final result.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Select n random rows from table per group of codes - sql-server

I suggest you look at NTILE for this.

Related

SQL Pivot swapping values in SQL Server Query

SQL Server: Min/Max/Percentiles

T-SQL - Get last as-at date SUM(Quantity) was not negative

Find the date when a bit column toggled state

SQL Server pick random (or first) value with aggregation

Categories

Resources