I'm trying to make a stored sql server function that will return a table of median values that I can join back to another table, thusly:
CREATE FUNCTION [dbo].getmedian (#varPartionBy1 int,#varPartionBy2 int, #varForTieBreak int, #varForMeasure int)
RETURNS TABLE
AS
RETURN
(
SELECT
#varPartionBy1,
#varPartionBy2,
AVG(#varForMeasure)
FROM
(
SELECT
#varPartionBy1,
#varPartionBy2,
ROW_NUMBER() OVER (
PARTITION BY #varPartionBy1, #varPartionBy2
ORDER BY #varForMeasure ASC, #varForTieBreak ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY #varPartionBy1, #varPartionBy2
ORDER BY #varForMeasure ASC, #varForTieBreak DESC) AS RowDesc
from
[fakename].[dbo].[temptable] bp
) bp
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY #varPartionBy1, #varPartionBy2
)
GO
This is returning the error: "Msg 8155, Level 16, State 2, Procedure getmedian, Line 25
No column name was specified for column 1 of 'bp'." --indicating that I don't understand how to asign the table alias for a column in the context of a UDF, I guess.
What should I do to fix the error?
This is my very first USF so I appreciate any other helpful design insights you have while addressing them main question. Thanks for any help!
Where you have SELECT #varPartionBy1, #varPartionBy2 those need to have column names assigned to them. You can either assign them directly such as SELECT #varPartionBy1 AS varPartionBy1 or SELECT varPartionBy1 = #varPartionBy1 or you can specify it in the table alias ) bp(varPartionBy1, varPartionBy2,...
The correct function would likely be
CREATE FUNCTION [dbo].getmedian (#varPartionBy1 int,#varPartionBy2 int, #varForTieBreak int, #varForMeasure int)
RETURNS TABLE
AS
RETURN
(
SELECT
varPartionBy1,
varPartionBy2,
AVG(#varForMeasure) AS AvgVarForMeasure
FROM
(
SELECT
#varPartionBy1 AS varPartionBy1,
#varPartionBy2 As varPartionBy1,
ROW_NUMBER() OVER (
PARTITION BY #varPartionBy1, #varPartionBy2
ORDER BY #varForMeasure ASC, #varForTieBreak ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY #varPartionBy1, #varPartionBy2
ORDER BY #varForMeasure ASC, #varForTieBreak DESC) AS RowDesc
from
[fakename].[dbo].[temptable] bp
) bp
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY varPartionBy1, varPartionBy2
)
Well, once you get the syntax problem out of the way, your problems will not be over. For one, you can't really do what you're trying to do (pass in variables to the PARTITION BY and ORDER BY clauses). Those are just treated as constants, so the ROW_NUMBER() will be applied arbitrarily.
Observe what happens here:
DECLARE #foo SYSNAME = N'name';
SELECT name, foo = #foo, -- this is just a constant value, not a column
varASC = ROW_NUMBER() OVER (ORDER BY #foo ASC),
varDESC = ROW_NUMBER() OVER (ORDER BY #foo DESC),
colASC = ROW_NUMBER() OVER (ORDER BY name ASC),
colDESC = ROW_NUMBER() OVER (ORDER BY name DESC)
FROM sys.objects --WHERE is_ms_shipped = 0
ORDER BY varASC;
Partial results:
name foo varASC varDESC colASC colDESC
---- ---- ------ ------- ------ -------
t1 name 1 1 1 100
t2 name 2 2 2 99
t3 name 3 3 3 98
t4 name 4 4 4 97
t5 name 5 5 5 96
------ only column that deviates ----^^^^^^^
The variable value for #foo is the same on every single row, so, partitioning and ordering by that is completely meaningless.
Related
I like to group my table by [ID] while using SUM and also bring back
[Product_Name] of the top ROW_NUMBER - not sure if I should use ROW_NUMBER, GROUPING SETS or loop through everything with FETCH... this is what I tried:
DECLARE #SampleTable TABLE
(
[ID] INT,
[Price] MONEY,
[Product_Name] VARCHAR(50)
)
INSERT INTO #SampleTable
VALUES (1, 100, 'Product_1'), (1, 200, 'Product_2'),
(1, 300, 'Product_3'), (2, 500, 'Product_4'),
(2, 200, 'Product_5'), (2, 300, 'Product_6');
SELECT
[ID],
[Product_Name],
[Price],
SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total],
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM
#SampleTable T1
My desired results - only two records:
1 Product_1 100.00 600.00 1
2 Product_4 500.00 1000.00 1
Any help or guidance is highly appreciated.
UPDATE:
I end up using what Prateek Sharma suggested in his comment, to simply wrap the query with another SELECT WHERE [Row_Number] = 1
SELECT * FROM
(
SELECT
[ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
,ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM #SampleTable
) MultipleRows
WHERE [Row_Number] = 1
You should have a column on which you will perform ORDER BY for ROW_NUMBER(). In this case if you want to only rely on the table self index then it's OK to use ID column for ORDER BY.
Hence your query is correct and you can go with it.
Other option is to use WITH TIES clause. BUT again, If you will use WITH TIES clause with the ORDER BY on ID column then performance will be very poor. WITH TIES only performs well if you have well defined index. And, then can use that indexed column with WITH TIES clause.
SELECT TOP 1 WITH TIES *
FROM (
SELECT [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM #SampleTable
) TAB
ORDER BY ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY <IndexedColumn> DESC)
This query may help you bit. But remember, it is also not going to provide better performance than the query written by you. It is only reducing the line of code.
One option is using the WITH TIES clause. No extra field RN.
Hopefully, you have a proper sequence number or date which can be used in either the sum() over or in the final row_number() over
Example
SELECT Top 1 with ties *
From (
Select [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM #SampleTable T1
) A
Order By ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [Price_Total] Desc)
Returns
ID Product_Name Price Price_Total
1 Product_1 100.00 600.00
2 Product_4 500.00 1000.00
There is no "top ROW_NUMBER" unless you have a column that defines ordering.
If you just want an arbitary row per id you can use the below. To deterministically pick one you would need to order by deterministic unique criteria.
DECLARE #SampleTable TABLE
(
ID INT,
Price MONEY,
Product_Name VARCHAR(50),
INDEX cix CLUSTERED (ID)
);
INSERT INTO #SampleTable
VALUES (1,100,'Product_1'),
(1,200,'Product_2'),
(1,300,'Product_3'),
(2,500,'Product_4'),
(2,200,'Product_5'),
(2,300,'Product_6');
WITH T AS
(
SELECT *,
OrderingColumn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM #SampleTable
)
SELECT ID,
SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Product_Name)), 11, 50) AS Product_Name,
CAST(SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Price)), 11, 50) AS MONEY) AS Price,
SUM(Price) AS Price_Total
FROM T
GROUP BY ID
The plan for this is pretty efficient as it is able to use the index ordered by id and has no additional sorts, spools, or passes through the table.
I'm trying to do the following (which obviously doesn't work because I am attempting to order by a column not in a group by clause), where TransDateString is a varchar column, defined as cast(datepart(m,TransDate) as varchar)+'-'+cast(datepart(yyyy,TransDate) as varchar) of the TransDate (date) column.
SELECT c.TransDateString
FROM #dataSet c
GROUP BY c.TransDateString
ORDER BY c.TransDate asc
What I'm trying to accomplish is order the results by date but return only the column as the formatted string.
Here's what the data and output I'm looking for would be:
TransDate | TransDateString
2005-01-01 | 01-2005
2012-15-05 | 05-2012
2003-22-10 | 10-2003
Results:
TransDateString
10-2003
01-2005
05-2012
;With cteRows As
(
SELECT c.TransDateString,
Row_Number() Over (Partition By c.TransDateString Order By c.TransDate) RowNum
FROM #dataSet c
)
Select TransDateString From cteRows Where RowNum = 1
Without CTE:
Select TransDateString From
(
Select c.TransDateString,
Row_Number() Over (Partition By c.TransDateString Order By c.TransDate) RowNum
FROM #dataSet c
) A
Where RowNum = 1
Let's say I have the following table:
id name age
John 23
Mary 22
Mike 25
etc etc
I would like to generate consecutive number into id column for every record. Could anyone help me?
Sorry if I asked the same question as asked before.
You can use ROW_NUMBER() to add a sequential number:
SELECT
id = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
name,
age
FROM yourTable
Here is an example of how to update table through CTE:
DECLARE #t TABLE ( ID INT, Name NVARCHAR(50) )
INSERT INTO #t
VALUES ( 3, 'a' ),
( 5, 'b' ),
( 10, 'c' )
SELECT * FROM #t;
WITH cte
AS ( SELECT ID ,
ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS RN
FROM #t
)
UPDATE cte
SET ID = RN
SELECT * FROM #t
Output:
ID Name
3 a
5 b
10 c
ID Name
1 a
2 b
3 c
Instead of
ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS RN
you can do
ROW_NUMBER() OVER ( ORDER BY Name ) AS RN
in order to increment values by ordering in Name column.
Alter your table then add identity and it will generate value
ALTER TABLE dbo.YourTable
Add id Int Identity(1, 1)
I am trying to select and inset X rows into a temp table at each execution, I have numbered the rows. However, I tried using PERCENT and TOP but not getting the desire results.
Note: I am not trying to get specific number of rows between certain numbers. I just need to get a fix number, and if the number of available rows is below the select request, then get less but do not exceed.
I assigned the row numbers like this:
WITH GetRecordByIncrement AS
(
SELECT
[RowNumber] = ROW_NUMBER() OVER (ORDER BY UserID ASC)
,[Column1]
,[Column2]
,[Column3]
FROM MySchema.MyTable
)
How would I get, example: 100 or less rows? This does not work using the row number.
Something like this would work:
DECLARE #table TABLE ( c1 INT, c2 INT, c3 INT );
;
WITH MyRows
AS ( SELECT TOP 100
ROW_NUMBER() OVER ( ORDER BY message_id ) AS RowNum ,
message_id ,
severity
FROM sys.messages
)
INSERT INTO #Table
( c1 ,
c2 ,
c3
)
SELECT RowNum ,
message_id ,
severity
FROM MyRows;
SELECT *
FROM #table;
Try This :
WITH GetRecordByIncrement AS
(
SELECT
[RowNumber] = ROW_NUMBER() OVER (ORDER BY UserID ASC)
,[Column1]
,[Column2]
,[Column3]
FROM MySchema.MyTable
)
select * from GetRecordByIncrement where [RowNumber]<=desirednumber
I'm attempting to write a query that will return
The most recent AccountDate with a record of 0 per locationID
Then the second most recent AccountDate per locationID. The record can be either 1 or 0.
If there are two AccountDates with the same date then return the most recent AccountDate based on DateAccountLoaded
How ever my solution doesn't look very elegant. Has anyone got a better way of achieving this.
Please see below my solution
CREATE TABLE [dbo].[TopTwoKeyed](
ID INT IDENTITY(1,1) PRIMARY KEY(ID),
[LocationID] [int] NULL,
[AccountDate] [date] NULL,
[Record] [tinyint] NULL,
[DateAccountLoaded] [date] NULL
)
INSERT INTO [dbo].[TopTwoKeyed] (
[LocationID],
AccountDate,
Record,
DateAccountLoaded
)
VALUES(1,'2009-10-31',0,'2011-03-23'),
(1,'2008-10-31',1,'2011-03-23'),
(1,'2008-10-31',0,'2010-03-22'),
(1,'2008-10-31',1,'2009-03-23'),
(1,'2011-10-31',1,'2010-03-22'),
(1,'2009-10-31',0,'2010-03-23'),
(2,'2011-10-31',0,'2010-03-23'),
(2,'2010-10-31',0,'2010-03-23'),
(2,'2010-10-31',1,'2010-03-23'),
(2,'2010-10-31',1,'2009-03-23'),
(3,'2010-10-31',0,'2010-03-23'),
(3,'2009-10-31',0,'2010-03-23'),
(3,'2008-10-31',1,'2010-03-23')
-- Get the most recent Account Date per locationID which has a record type of 0
SELECT f.LocationID
,f.AccountDate
,f.DateAccountLoaded
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY LocationID ORDER BY AccountDate DESC,DateAccountLoaded DESC) AS RowNumber
,LocationID AS LocationID
,AccountDate AS AccountDate
,DateAccountLoaded AS DateAccountLoaded
FROM [dbo].[TopTwoKeyed]
WHERE Record = 0
) f
WHERE f.RowNumber = 1
UNION ALL
SELECT ff.LocationID
,ff.AccountDate
,ff.DateAccountLoaded
FROM (
-- Get the SECOND most recent AccountDate. Can be either Record 0 or 1.
SELECT ROW_NUMBER() OVER (PARTITION BY LocationID ORDER BY AccountDate DESC,DateAccountLoaded DESC) AS RowNumber
,LocationID AS LocationID
,AccountDate AS AccountDate
,DateAccountLoaded 'DateAccountLoaded'
FROM [dbo].[TopTwoKeyed] tt
WHERE EXISTS
(
-- Same query as top of UNION. Get the most recent Account Date per locationID which has a record type of 0
SELECT 1
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY LocationID ORDER BY AccountDate DESC,DateAccountLoaded DESC) AS RowNumber
,LocationID AS LocationID
,AccountDate AS AccountDate
FROM [dbo].[TopTwoKeyed]
WHERE Record = 0
) f
WHERE f.RowNumber = 1
AND tt.LocationID = f.LocationID
AND tt.AccountDate < f.AccountDate
)
) ff
WHERE ff.RowNumber = 1
-- DROP TABLE [dbo].[TopTwoKeyed]
You could use a row_number subquery to find the most recent account date. Then you can outer apply to search for the next most recent account date:
select MostRecent.LocationID
, MostRecent.AccountDate
, SecondRecent.AccountDate
from (
select row_number() over (partition by LocationID order by
AccountDate desc, DateAccountLoaded desc) as rn
, *
from TopTwoKeyed
where Record = 0
) MostRecent
outer apply
(
select top 1 *
from TopTwoKeyed
where Record in (0,1)
and LocationID = MostRecent.LocationID
and AccountDate < MostRecent.AccountDate
order by
AccountDate desc
, DateAccountLoaded desc
) SecondRecent
where MostRecent.rn = 1
EDIT: To place the rows below eachother, you probably have to use a union. A single row_number can't work because the second row has different criterium for the Record column.
; with Rec0 as
(
select ROW_NUMBER() over (partition by LocationID
order by AccountDate desc, DateAccountLoaded desc) as rn
, *
from TopTwoKeyed
where Record = 0
)
, Rec01 as
(
select ROW_NUMBER() over (partition by LocationID
order by AccountDate desc, DateAccountLoaded desc) as rn
, *
from TopTwoKeyed t1
where Record in (0,1)
and not exists
(
select *
from Rec0 t2
where t2.rn = 1
and t1.LocationID = t2.LocationID
and t2.AccountDate < t1.AccountDate
)
)
select *
from Rec0
where rn = 1
union all
select *
from Rec01
where rn = 1