How ROW_NUMBER used with insertions? - sql-server

I've multipe uniond statements in MSSQL Server that is very hard to find a unique column among the result.
I need to have a unique value per each row, so I've used ROW_NUMBER() function.
This result set is being copied to other place (actually a SOLR index).
In the next time I will run the same query, I need to pick only the newly added rows.
So, I need to confirm that, the newly added rows will be numbered afterward the last row_number value of the last time.
In other words, Is the ROW_NUMBER functions orders the results with the insertion order - suppose I don't adding any ORDER BY clause?
If no, (as I think), Is there any alternatives?
Thanks.

Without seeing the sql I can only give the general answer that MS Sql does not guarantee the order of select statements without an order clause so that would mean that the row_number may not be the insertion order.

I guess you can do something like this..
;WITH
cte
AS
(
SELECT * , rn = ROW_NUMBER() OVER (ORDER BY SomeColumn)
FROM
(
/* Your Union Queries here*/
)q
)
INSERT INTO Destination_Table
SELECT * FROM
CTE LEFT JOIN Destination_Table
ON CTE.Refrencing_Column = Destination_Table.Refrencing_Column
WHERE Destination_Table.Refrencing_Column IS NULL

I would suggest you consider 'timestamping' the row with the time it was inserted. Or adding an identity column to the table.
But what it sounds like you want to do is get current max id and then add the row_number to it.
Select col1, col2, mid + row_number() over(order by smt) id
From (
Select col1, col2, (select max(id) from tbl) mid
From query
) t

Related

Return column with varying values depending on change points

I'm fairly new to Microsoft SQL Server, so maybe this is very simple yet I just don't have the experience to pull from.
The data I have is similar to the first three columns shown (A, B, C). I want to use those columns to return the data in the yellow highlighted column (D). Basically, I'm trying to show all values of a variable from the current week onward, including when there are change points of the variable. The value of the variable should continue forward in time until the value of the variable changes (column C).
Thanks in advance.
SELECT T1.*, COALESCE(SQ.NewValue, T1.StartingValue) FROM YourTable T1
OUTER APPLY (SELECT TOP 1 T2.NewValue FROM YourTable T2
WHERE T1.Week <= T2.week AND
T2.NewValue IS NOT NULL
ORDER BY T2.Week DESC) SQ
One way is to make Column D a correlated sub-query that gets the most recent previous value of C that is not NULL.
One method, which doesn't need 2 table scans is to use a CTE to create a "group number" and then the OVER clause with a MAX:
WITH VTE AS (
SELECT *
FROM (VALUES(1,0.5,NULL),
(2,0.5,1),
(3,0.5,NULL),
(4,0.5,NULL),
(5,0.5,0.8),
(6,0.5,NULL)) V(WeekNo, Starting, New)),
CTE AS(
SELECT *,
COUNT(New) OVER (ORDER BY WeekNo ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM VTE)
SELECT WeekNo, Starting, New,
ISNULL(MAX(New) OVER (PARTITION BY CTE.Grp),Starting) AS Result
FROM CTE
ORDER BY WeekNo;

SQL Server - Delete Duplicate Rows - how does Partition By affect this query?

I've been using the following inherited query where I'm trying to delete duplicate rows and I'm getting some unexpected results when first running it as a SELECT - I believe it has something to do with my lack of understanding of the Partition part of the statement:
WITH CTE AS(
SELECT [Id],
[Url],
[Identifier],
[Name],
[Entity],
[DOB],
RN = ROW_NUMBER()OVER(PARTITION BY Name ORDER BY Name)
FROM Data.Statistics
where Id = 2170
)
DELETE FROM CTE WHERE RN > 1
Can someone help me understand exactly what I'm doing with the Partition BY Name part of this? This doesn't limit the query in any way to only looking for duplicates in the Name field, correct? I need to ensure that it's looking for records where all 5 of the fields inside the CTE definition are the same for a record to be considered a duplicate.
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) doesn't make a lot of sense. You wouldn't ORDER BY the same thing you used in PARTITION BY since it will be the same value for everything in the partition, making the ORDER BY part useless.
Basically the CTE part of this query is saying to split the matching rows (those with [Id] = 2170) temporarily into groups for each distinct name, and within each group of rows with the same name, order those by name (which are obviously all the same value) and then return the row number within that sequence group as RN. Unique names will all have a row number of 1, because there is only one row with that name. Duplicate names will have row numbers 1, 2, 3, and so on. The order of those rows is undefined in this case because of the silly ORDER BY clause, but if you changed the ORDER BY to something meaningful, the row numbers would follow that sequence.

SQL Server random rows on where clause

I need to select 10 random rows from a table, but it has to be done in the where clause, because the query is executed using another aplication that only allows to modify this part.
I searched for a lot of solutions (select top 10, RAND(), ORDER BY NEWID(), ...), but none work in the where clause.
There an option to do that? or some kind of workaround?
Try this:
SELECT *
FROM Test
WHERE Id IN (SELECT TOP 10 Id FROM Test ORDER BY NewId())
If your table has a unique column you can do something like :
SELECT * FROM TABLE WHERE PRIMARYCOLUMN IN (SELECT TOP(10) PRIMARYCOLUMN FROM TABLE ORDER BY NEWID())

SELECT INTO query

I have to write an SELECT INTO T-SQL script for a table which has columns acc_number, history_number and note.
How do i facilitate an incremental value of history_number for each record being inserted via SELECT INTO.
Note, that the value for history_number comes off as a different value for each account from a different table.
SELECT history_number = IDENTITY(INT,1,1),
... etc...
INTO NewTable
FROM ExistingTable
WHERE ...
You could use ROW_NUMBER instead of identity i.e. ROW_NUMBER() OVER (ORDER BY )
SELECT acc_number
,o.historynumber
,note
,o.historynumber+DENSE_RANK() OVER (Partition By acc_number ORDER BY Note) AS NewHistoryNumber
--Or some other order by probably a timestamp...
FROM Table t
INNER JOIN OtherTable o
ON ....
Working Fiddle
The will give you an incremented count starting from history number for each accnum. I suggest you use a better order by in the rank but there was not enough info in the question.
This answer to this question may help you as well
Question
Suppose your SELECT statement is like this
SELECT acc_number,
history_number,
note
FROM [Table]
Try this Query as below.
SELECT ROW_NUMBER() OVER (ORDER BY acc_number) ID,
acc_number,
history_number,
note
INTO [NewTable]
FROM [Table]

SQL Error with Order By in Subquery

I'm working with SQL Server 2005.
My query is:
SELECT (
SELECT COUNT(1) FROM Seanslar WHERE MONTH(tarihi) = 4
GROUP BY refKlinik_id
ORDER BY refKlinik_id
) as dorduncuay
And the error:
The ORDER BY clause is invalid in views, inline functions, derived
tables, subqueries, and common table expressions, unless TOP or FOR
XML is also specified.
How can I use ORDER BY in a sub query?
This is the error you get (emphasis mine):
The ORDER BY clause is invalid in
views, inline functions, derived
tables, subqueries, and common table
expressions, unless TOP or FOR XML is
also specified.
So, how can you avoid the error? By specifying TOP, would be one possibility, I guess.
SELECT (
SELECT TOP 100 PERCENT
COUNT(1) FROM Seanslar WHERE MONTH(tarihi) = 4
GROUP BY refKlinik_id
ORDER BY refKlinik_id
) as dorduncuay
If you're working with SQL Server 2012 or later, this is now easy to fix. Add an offset 0 rows:
SELECT (
SELECT
COUNT(1) FROM Seanslar WHERE MONTH(tarihi) = 4
GROUP BY refKlinik_id
ORDER BY refKlinik_id OFFSET 0 ROWS
) as dorduncuay
Besides the fact that order by doesn't seem to make sense in your query....
To use order by in a sub select you will need to use TOP 2147483647.
SELECT (
SELECT TOP 2147483647
COUNT(1) FROM Seanslar WHERE MONTH(tarihi) = 4
GROUP BY refKlinik_id
ORDER BY refKlinik_id
) as dorduncuay
My understanding is that "TOP 100 PERCENT" doesn't gurantee ordering anymore starting with SQL 2005:
In SQL Server 2005, the ORDER BY
clause in a view definition is used
only to determine the rows that are
returned by the TOP clause. The ORDER
BY clause does not guarantee ordered
results when the view is queried,
unless ORDER BY is also specified in
the query itself.
See SQL Server 2005 breaking changes
Hope this helps,
Patrick
If building a temp table, move the ORDER BY clause from inside the temp table code block to the outside.
Not allowed:
SELECT * FROM (
SELECT A FROM Y
ORDER BY Y.A
) X;
Allowed:
SELECT * FROM (
SELECT A FROM Y
) X
ORDER BY X.A;
You don't need order by in your sub query. Move it out into the main query, and include the column you want to order by in the subquery.
however, your query is just returning a count, so I don't see the point of the order by.
A subquery (nested view) as you have it returns a dataset that you can then order in your calling query. Ordering the subquery itself will make no (reliable) difference to the order of the results in your calling query.
As for your SQL itself:
a) I seen no reason for an order by as you are returning a single value.
b) I see no reason for the sub query anyway as you are only returning a single value.
I'm guessing there is a lot more information here that you might want to tell us in order to fix the problem you have.
Add the Top command to your sub query...
SELECT
(
SELECT TOP 100 PERCENT
COUNT(1)
FROM
Seanslar
WHERE
MONTH(tarihi) = 4
GROUP BY
refKlinik_id
ORDER BY
refKlinik_id
) as dorduncuay
:)
maybe this trick will help somebody
SELECT
[id],
[code],
[created_at]
FROM
( SELECT
[id],
[code],
[created_at],
(ROW_NUMBER() OVER (
ORDER BY
created_at DESC)) AS Row
FROM
[Code_tbl]
WHERE
[created_at] BETWEEN '2009-11-17 00:00:01' AND '2010-11-17 23:59:59'
) Rows
WHERE
Row BETWEEN 10 AND 20;
here inner subquery ordered by field created_at (could be any from your table)
In this example ordering adds no information - the COUNT of a set is the same whatever order it is in!
If you were selecting something that did depend on order, you would need to do one of the things the error message tells you - use TOP or FOR XML
Try moving the order by clause outside sub select and add the order by field in sub select
SELECT * FROM
(SELECT COUNT(1) ,refKlinik_id FROM Seanslar WHERE MONTH(tarihi) = 4 GROUP BY refKlinik_id)
as dorduncuay
ORDER BY refKlinik_id
For me this solution works fine as well:
SELECT tbl.a, tbl.b
FROM (SELECT TOP (select count(1) FROM yourtable) a,b FROM yourtable order by a) tbl
Good day
for some guys the order by in the sub-query is questionable.
the order by in sub-query is a must to use if you need to delete some records based on some sorting.
like
delete from someTable Where ID in (select top(1) from sometable where condition order by insertionstamp desc)
so that you can delete the last insertion form table.
there are three way to do this deletion actually.
however, the order by in the sub-query can be used in many cases.
for the deletion methods that uses order by in sub-query review below link
http://web.archive.org/web/20100212155407/http://blogs.msdn.com/sqlcat/archive/2009/05/21/fast-ordered-delete.aspx
i hope it helps. thanks you all
For a simple count like the OP is showing, the Order by isn't strictly needed. If they are using the result of the subquery, it may be. I am working on a similiar issue and got the same error in the following query:
-- I want the rows from the cost table with an updateddate equal to the max updateddate:
SELECT * FROM #Costs Cost
INNER JOIN
(
SELECT Entityname, costtype, MAX(updatedtime) MaxUpdatedTime
FROM #HoldCosts cost
GROUP BY Entityname, costtype
ORDER BY Entityname, costtype -- *** This causes an error***
) CostsMax
ON Costs.Entityname = CostsMax.entityname
AND Costs.Costtype = CostsMax.Costtype
AND Costs.UpdatedTime = CostsMax.MaxUpdatedtime
ORDER BY Costs.Entityname, Costs.costtype
-- *** To accomplish this, there are a few options:
-- Add an extraneous TOP clause, This seems like a bit of a hack:
SELECT * FROM #Costs Cost
INNER JOIN
(
SELECT TOP 99.999999 PERCENT Entityname, costtype, MAX(updatedtime) MaxUpdatedTime
FROM #HoldCosts cost
GROUP BY Entityname, costtype
ORDER BY Entityname, costtype
) CostsMax
ON Costs.Entityname = CostsMax.entityname
AND Costs.Costtype = CostsMax.Costtype
AND Costs.UpdatedTime = CostsMax.MaxUpdatedtime
ORDER BY Costs.Entityname, Costs.costtype
-- **** Create a temp table to order the maxCost
SELECT Entityname, costtype, MAX(updatedtime) MaxUpdatedTime
INTO #MaxCost
FROM #HoldCosts cost
GROUP BY Entityname, costtype
ORDER BY Entityname, costtype
SELECT * FROM #Costs Cost
INNER JOIN #MaxCost CostsMax
ON Costs.Entityname = CostsMax.entityname
AND Costs.Costtype = CostsMax.Costtype
AND Costs.UpdatedTime = CostsMax.MaxUpdatedtime
ORDER BY Costs.Entityname, costs.costtype
Other possible workarounds could be CTE's or table variables. But each situation requires you to determine what works best for you. I tend to look first towards a temp table. To me, it is clear and straightforward. YMMV.
On possible needs to order a subquery is when you have a UNION :
You generate a call book of all teachers and students.
SELECT name, phone FROM teachers
UNION
SELECT name, phone FROM students
You want to display it with all teachers first, followed by all students, both ordered by. So you cant apply a global order by.
One solution is to include a key to force a first order by, and then order the names :
SELECT name, phone, 1 AS orderkey FROM teachers
UNION
SELECT name, phone, 2 AS orderkey FROM students
ORDER BY orderkey, name
I think its way more clear than fake offsetting subquery result.
I Use This Code To Get Top Second Salary
I am Also Get Error Like
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
TOP 100 I Used To Avoid The Error
select * from (
select tbl.Coloumn1 ,CONVERT(varchar, ROW_NUMBER() OVER (ORDER BY (SELECT 1))) AS Rowno from (
select top 100 * from Table1
order by Coloumn1 desc) as tbl) as tbl where tbl.Rowno=2

Resources