Fulltext index query become slowly when use row_number?

Fulltext index query become slowly when use row_number? - sql-server

The table1 has huge rows and field1 is text, so I create a fulltext-index of field1.
I run the sql below, it's very slowly, and the CPU to 100% all the way.
select * from (
select *, ROW_NUMBER() OVER(Order by [createtime] DESC) AS RowId
from table1
where CONTAINS(field1, 'sometext')
) AS t1
where t1.RowId between 1 and 10
I remove RowId query, it becomes fast, less than 1s.
select * from (
select *, ROW_NUMBER() OVER(Order by [createtime] DESC) AS RowId
from table1
where CONTAINS(field1, 'sometext')
) AS t1
Then I think it's about of SQLServer optimization, I try to add zero to RowId query field.
It goes fast! But why?
select * from (
select *, ROW_NUMBER() OVER(Order by [createtime] DESC) AS RowId
from table1
where CONTAINS(field1, 'sometext')
) AS t1
where t1.RowId + 0 between 1 and 10

Related

Unexpected result using CTE to perform a random join on two tables for all rows one-to-many

I am attempting to randomly join the rows of two tables (TableA and TableB) such that each row in TableA is joined to only one row in TableB and every row in TableB is joined to at least one row in TableA.
For example, a random join of TableA with 5 distinct rows and TableB with 3 distinct rows should result in something like this:
TableA TableB
1 3
2 1
3 1
4 2
5 1
However, sometimes not all the rows from TableB are included in the final result; so in the example above might have row 2 from TableB missing because in its place is either row 1 or 3 joined to row 4 on TableA. You can see this occur by executing the script a number of times and checking the result. It seems that it is necessary for some reason to use an interim table (#Q) to be able to ensure that a correct result is returned which has all rows from both TableA and TableB.
Can someone please explain why this is happening?
Also, can someone please advise on what would be a better way to get the desired result?
I understand that sometimes no result is returned due to a failure of some kind in the cross apply and ordering which i have yet to identify and goes to the point that I am sure there is a better way to perform this operation. I hope that makes sense. Thanks in advance!
declare #TableA table (
ID int
);
declare #TableB table (
ID int
);
declare #Q table (
RN int,
TableAID int,
TableBID int
);
with cte as (
select
1 as ID
union all
select
ID + 1
from cte
where ID < 5
)
insert #TableA (ID)
select ID from cte;
with cte as (
select
1 as ID
union all
select
ID + 1
from cte
where ID < 3
)
insert #TableB (ID)
select ID from cte;
select * from #TableA;
select * from #TableB;
with cte as (
select
row_number() over (partition by TableAID order by newid()) as RN,
TableAID,
TableBID
from (
select
a.ID as TableAID,
b.ID as TableBID
from #TableA as a
cross apply #TableB as b
) as M
)
select --All rows from TableB not always included
TableAID,
TableBID
from cte
where RN in (
select
top 1
iCTE.RN
from cte as iCTE
group by iCTE.RN
having count(distinct iCTE.TableBID) = (
select count(1) from #TableB
)
)
order by TableAID;
with cte as (
select
row_number() over (partition by TableAID order by newid()) as RN,
TableAID,
TableBID
from (
select
a.ID as TableAID,
b.ID as TableBID
from #TableA as a
cross apply #TableB as b
) as M
)
insert #Q
select
RN,
TableAID,
TableBID
from cte;
select * from #Q;
select --All rows from both TableA and TableB included
TableAID,
TableBID
from #Q
where RN in (
select
top 1
iQ.RN
from #Q as iQ
group by iQ.RN
having count(distinct iQ.TableBID) = (
select count(1) from #TableB
)
)
order by TableAID;

See if this gives you what you're looking for...
DECLARE
#CountA INT = (SELECT COUNT(*) FROM #TableA ta),
#CountB INT = (SELECT COUNT(*) FROM #TableB tb),
#MinCount INT;
SELECT #MinCount = CASE WHEN #CountA < #CountB THEN #CountA ELSE #CountB END;
WITH
cte_A1 AS (
SELECT
*,
rn = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM
#TableA ta
),
cte_B1 AS (
SELECT
*,
rn = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM
#TableB tb
),
cte_A2 AS (
SELECT
a1.ID,
rn = CASE WHEN a1.rn > #MinCount THEN a1.rn - #MinCount ELSE a1.rn end
FROM
cte_A1 a1
),
cte_B2 AS (
SELECT
b1.ID,
rn = CASE WHEN b1.rn > #MinCount THEN b1.rn - #MinCount ELSE b1.rn end
FROM
cte_B1 b1
)
SELECT
A = a.ID,
B = b.ID
FROM
cte_A2 a
JOIN cte_B2 b
ON a.rn = b.rn;

Deleting duplicates in a time series

I have a large set of measurements taken every 1 millisecond stored in a SQL Server 2012 table. Whenever there are 3 or more duplicate values in some rows that I would like to delete the middle duplicates. Highlighted values in this image of sample data are the ones that I want to delete. Is there a way to do this with a SQL query?

You can do this using a CTE and ROW_NUMBER:
SQL Fiddle
WITH CteGroup AS(
SELECT *,
grp = ROW_NUMBER() OVER(ORDER BY MS) - ROW_NUMBER() OVER(PARTITION BY Value ORDER BY MS)
FROM YourTable
),
CteFinal AS(
SELECT *,
RN_FIRST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS),
RN_LAST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS DESC)
FROM CteGroup
)
DELETE
FROM CteFinal
WHERE
RN_FIRST > 1
AND RN_LAST > 1

I'm sure there must be a more efficient way to do this, but you could join the table to itself twice to find the previous and next value in the list, and then delete all of the entries where all three values are the same.
DELETE FROM tbl
WHERE ms IN
(
SELECT T.ms
FROM tbl T
INNER JOIN tbl T1 ON T.ms = T1.ms + 1
INNER JOIN tbl T2 ON T.ms = T2.ms - 1
WHERE T.value = T1.value AND T.value = T2.value
)
If the table is really big, I can see this blowing tempdb though.

Yes there is
select * from table group by table.field ->value

update column of all rows of table

I have 1 table with 500 rows, and another table with 750 rows or so. What I'm doing is, I'm getting a random 500 rows of a certain column from the second table, and I want to update a newly added column on the first table with those 500 values.
I know how to do updates that look like this:
UPDATE schema.table1
SET column = cl.column FROM schema.table1 cl
INNER JOIN table2 cf ON cf.column = cl.column
but I don't have any columns that are matching in both tables. Is there a way to do this without having to match the columns on the inner join?
so basically, I want to update 500 rows of 1 column in one table, with 500 values coming from another table

You can do it by using ROW_NUMBER to generate column to join two tables. take a look at the example and the output
DECLARE #T1 TABLE ( column1 INT ,column2 VARCHAR(2) )
DECLARE #T2 TABLE ( column1 VARCHAR(2) )
INSERT INTO #T1 ( column1, column2 )
VALUES ( 0, 'A' ), ( 1, 'B' ), ( 2, 'C' )
INSERT INTO #T2 ( column1 )
VALUES ( 'D'),( 'F'),( 'G' )
SELECT *, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT NULL) ) AS RN FROM #T1
SELECT *, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT NULL) ) AS RN FROM #T2
;WITH CTE_1 AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT NULL) ) AS RN FROM #T1)
,cte_2 AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT NULL) ) AS RN FROM #T2)
UPDATE t1
SET t1.column2 = t2.column1
FROM CTE_1 t1
JOIN cte_2 t2
ON t1.rn = t2.rn
SELECT *, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT NULL) ) AS RN FROM #T1
SELECT *, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT NULL) ) AS RN FROM #T2

How to optimize the query, adding rowNumber in where clause

In a table I have more than 700,000 records. When I run this query it takes more than 3 minutes to fetch the rows, and returns 390 records based on rowNum. Is there way to optimize this query?
SELECT ID, Lat, Long, SDateTime,
row_number() OVER (partition BY [ID] ORDER BY [SDateTime] DESC) AS rowNum
into #temp
FROM
dbo.myTable WITH (NOLOCK)
select * from #temp where rowNum = 1 -- returns 390 records
drop table #temp
Can I select data in one query without putting it in temp table? like this:
SELECT ID, Lat, Long, SDateTime,
row_number() OVER (partition BY [ID] ORDER BY [SDateTime] DESC) AS rowNum
FROM
dbo.myTable WITH (NOLOCK)
where (row_number() OVER (partition BY [ID] ORDER BY [SDateTime] DESC)) = 1

try doing this !
select * from
(
SELECT ID, Lat, Long, SDateTime,
row_number() OVER (partition BY [ID] ORDER BY [SDateTime] DESC) AS rowNum
FROM
dbo.myTable WITH (NOLOCK)
)x
where x.rowNum=1

This should do the job, but it will work very well with an index on (id,SDateTime)
;with d as (
select distinct id
from myTable
)
select
mt.*
from d
cross apply (
select top 1 *
from myTable m
on m.id = d.id
order by [SDateTime] DESC
) mt

RANK() Over Partition BY not working

When I run the code below the ROWID is always 1.
I need to the ID to start at 1 for each item with the same Credit Value.
;WITH CTETotal AS (SELECT
TranRegion
,TranCustomer
,TranDocNo
,SUM(TranSale) 'CreditValue'
FROM dbo.Transactions
LEFT JOIN customers AS C
ON custregion = tranregion
AND custnumber = trancustomer
LEFT JOIN products AS P
ON prodcode = tranprodcode
GROUP BY
TranRegion
,TranCustomer
,TranDocNo)
SELECT
r.RegionDesc
,suppcodedesc
,t.tranreason as [Reason]
,t.trandocno as [Document Number]
,sum(tranqty) as Qty
,sum(tranmass) as Mass
,sum(transale) as Sale
,cte.CreditValue AS 'Credit Value'
,RANK() OVER (PARTITION BY cte.CreditValue ORDER BY cte.CreditValue)AS ROWID
FROM transactions t
LEFT JOIN dbo.Regions AS r
ON r.RegionCode = TranRegion
LEFT JOIN CTETotal AS cte
ON cte.TranRegion = t.TranRegion
AND cte.TranCustomer = t.TranCustomer
AND cte.TranDocNo = t.TranDocNo
GROUP BY
r.RegionDesc
,suppcodedesc
,t.tranreason
,t.trandocno
,cte.CreditValue
ORDER BY CreditValue ASC
EDIT
All the credit values with 400 must have the ROWID set to 1. And all the credit values with 200 must have the ROWID set to 2. And so on and so on.

Do you need something like this?
with cte (item,CreditValue)
as
(
select 'a',8 as CreditValue union all
select 'b',18 union all
select 'a',8 union all
select 'b',18 union all
select 'a',8
)
select CreditValue,dense_rank() OVER (ORDER BY item)AS ROWID from cte
Result
CreditValue ROWID
----------- --------------------
8 1
8 1
8 1
18 2
18 2
In your code replace
,RANK() OVER (PARTITION BY cte.CreditValue ORDER BY cte.CreditValue)AS ROWID
by
,DENSE_RANK() OVER (ORDER BY cte.CreditValue)AS ROWID

You just don't have to use PARTITION, just DENSE_RANK() OVER (ORDER BY cte.CreditValue)

I think the problem is with the RANK() OVER (PARTITION BY clause
you have to partition it by item not by CreditValue

Try this
RANK() OVER (PARTITION BY cte.CreditValue ORDER BY cte.RegionDesc)AS ROWID

Edit: The issue here isn't actually the nesting of the subquery, it's potentially based on partition by having columns that truly make each row unique (or 1)
Rather than ranking within your complex query like this
select
rank() over(partition by...),
*
from
data_source
join table1
join table2
join table3
join table4
order by
some_column
Try rank() or row_number() on the resulting data set, not within it.
For example, using the query above, remove rank() and implement it this way:
select
rank() over(partition by...),
results.*
from (
select
*
from
data_source
join table1
join table2
join table3
join table4
order by
some_column
) as results