Slow SQL query with SQL Server - sql-server

I have two SQL queries to count co-occurrences between id2 values among different id1 values. The sample table looks like
id1 | id2
101 | 1
101 | 2
101 | 3
102 | 2
102 | 3
102 | 4
103 | 15
103 | 3
103 | 4
and the desired output is:
A B Count
1 2 1
1 3 2
2 3 4
1 4 2
2 4 3
3 4 4
1 15 1
2 15 2
3 15 2
4 15 1
Both solutions are pasted below.
-- Solution 1
SELECT bar.id2 AS A, foo.id2 AS B, COUNT(*) AS Count
FROM
(SELECT * FROM TestTab) AS bar,
(SELECT * FROM TestTab) AS foo
WHERE bar.id1 <> foo.id1
AND bar.id2 < foo.id2
GROUP BY bar.id2, foo.id2
-- Solution 2
SELECT bar.id2 AS A, foo.id2 AS B, COUNT(*) AS Count
FROM TestTab AS bar
JOIN TestTab AS foo
ON bar.id1 <> foo.id1
WHERE bar.id2 < foo.id2
GROUP BY bar.id2, foo.id2
Both queries work fine on small tables (i.e., 100 - 1000 rows), but I need to query much larger table (e.g., 100.000 rows). I wonder how to speed up the queries and improve performance. Thanks in advance for any pointers.
- Create table TestTab and insert dummy data
CREATE TABLE TestTab
INSERT INTO TestTab VALUES
(101,1),
(101,2),
(101,3),
(102,2),
(102,3),
(102,4),
(103,15),
(103,3),
(103,4)

I suggest adding an index on id2 to TestTab (if one doesn't already exist) and then try running the following:
select distinct id2 into #id2 from TestTab;
SELECT bar.id2 AS A, foo.id2 AS B, COUNT(*) AS Count
FROM #id2 AS bar
JOIN #id2 AS foo ON bar.id2 < foo.id2
JOIN TestTab AS buz ON bar.id2 = buz.id2
JOIN TestTab AS fuz ON foo.id2 = fuz.id2
WHERE buz.id1 <> fuz.id1
GROUP BY bar.id2, foo.id2;
(If you already have a table with the distinct values of id2 on it, skip creating the temporary table and use that instead.)

Both queries are joins and equivalent.
The first one is an implicit join with additional subselects. It might be slower, if SQL Server doesn't optimize the subselects away.
As others already observed, add indexes to the join condition column id1 and the where clause column id2, if you haven't done so already.

Related

How to select the value from the table based on category_id USING SQL SERVER

How to select the value from the table based on category_id?
I have a table like this. Please help me.
Table A
ID Name category_id
-------------------
1 A 1
2 A 1
3 B 1
4 C 2
5 C 2
6 D 2
7 E 3
8 E 3
9 F 3
How to get the below mentioned output from table A?
ID Name category_id
--------------------
1 A 1
2 A 1
4 C 2
5 C 2
7 E 3
8 E 3
Give a row number for each row based on group by category_id and sort by ascending order of ID. Then select the rows having row number 1 and 2.
Query
;with cte as (
select [rn] = row_number() over(
partition by [category_id]
order by [ID]
), *
from [your_table_name]
)
select [ID], [Name], [category_id]
from cte
where [rn] < 3;
Kindly run this query It really help You Out.
SELECT tbl.id,tbl.name, tbl.category_id FROM TableA as tbl WHERE
tbl.name IN(SELECT tbl2.name FROM TableA tbl2 GROUP BY tbl2.name HAVING Count(tbl2.name)> 1)
Code select all category_id from TableA which has Name entries more then one. If there is single entry of any name group by category_id then such data will be excluded. In above example questioner want to eliminate those records that have single Name entity like wise category_id 1 has name entries A and B among which A has two entries and B has single entry so he want to eliminate B from result set.

SQL Server query involving subqueries - performance issues

I have three tables:
Table 1: | dbo.pc_a21a22 |
batchNbr Other columns...
-------- ----------------
12345
12346
12347
Table 2: | dbo.outcome |
passageId record
---------- ---------
00003 200
00003 9
00004 7
Table 3: | dbo.passage |
passageId passageTime batchNbr
---------- ------------- ---------
00001 2015.01.01 12345
00002 2016.01.01 12345
00003 2017.01.01 12345
00004 2018.01.01 12346
What I want to do: for each batchNbr in Table 1 get first its latest passageTime and the corresponding passageID from Table 3. With that passageID, get the relevant rows in Table 2 and establish whether any of these rows contains the record 200. Per passageId there are at most 2 records in Table 2
What is the most efficient way to do this?
I have already created a query that works, but it's awfully slow and thus unfit for tables with millions of rows. Any suggestion on how to either change the query or do it another way? Altering the table structure is not an option, I only have read rights to the database.
My current solution (slow):
SELECT TOP 50000
a.batchNbr,
CAST ( CASE WHEN 200 in (SELECT TOP 2 record FROM dbo.outcome where passageId in (
SELECT SubqueryResults.passageId From (SELECT Top 1 passageId FROM dbo.passage pass WHERE pass.batchNbr = a.batchNbr ORDER BY passageTime Desc) SubqueryResults
)
) then 1 else 0 end as bit) as KGT_IO_END
FROM dbo.pc_a21a22 a
The desired output is:
batchNbr 200present
--------- ----------
12345 1
12346 0
I suggest you use table joining rather than subqueries.
select
a.*, b.*
from
dbo.table1 a
join
dbo.table2 b on a.id = b.id
where
/*your where clause for filtering*/
EDIT:
You could use this as a reference Join vs. sub-query
Try this
SELECT TOP 50000 a.*, (CASE WHEN b.record = 200 THEN 1 ELSE 0 END) AS
KGT_IO_END
FROM dbo.Test1 AS a
LEFT OUTER JOIN
(SELECT record, p.batchNbr
FROM dbo.Test2 AS o
LEFT OUTER JOIN (SELECT MAX(passageId) AS passageId, batchNbr FROM
dbo.Test3 GROUP BY batchNbr) AS p ON o.passageId = p.passageId
) AS b ON a.batchNbr = b.batchNbr;
The MAX subquery is to get the latest passageId by batchNbr.
However, your example won't get the record 200, since the passageId of the record with 200 is 00001, while the latest passageId of the batchNbr 12345 is 00003.
I used LEFT OUTER JOIN since the passageId from Table2 no longer match any of the latest passageId from Table3. The resulting subquery would have no records to join to Table1. Therefore INNER JOIN would not show any records from your example data.
Output from your example data:
batchNbr KGT_IO_END
12345 0
12346 0
12347 0
Output if we change the passageId of record 200 to 00003 (the latest for 12345)
batchNbr KGT_IO_END
12345 1
12346 0
12347 0

Transitive Group Query on 2 Columns in SQL Server

I need help with a transitive query in SQL Server.
I have a table with [ID] and [GRPID].
I would like to update a third column [NEWGRPID] based on the following logic:
For each [ID], get its GRPID;
Get all of the IDs associated with the GRPID from (1);
Set [NEWGRPID] equal to an integer (variable that is incremented by 1), for all of the rows from step (2)
The idea is several of these IDs are "transitively" linked across different [GRPID]s, and should all be having the same [GRPID].
The below table is the expected result, with [NEWGRPID] populated.
ID GRPID NEWGRPID
----- ----- ------
1 345 1
1 777 1
2 777 1
3 345 1
3 777 1
4 345 1
4 999 1
5 345 1
5 877 1
6 999 1
7 877 1
8 555 2
9 555 2
Try this code:
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
BEGIN
DROP TABLE #tmp;
END;
SELECT GRPID, count (*) AS GRPCNT
INTO #tmp
FROM yourtable
GROUP BY GRPID
UPDATE TGT
SET TGT.NEWGRPID = SRC.GRPCNT
FROM yourtable TGT
JOIN #tmp ON #tmp.GRPID = TGT.GRPID
If the values are likely to change over time you should think about a computed column or a trigger.

Create a view using SQL Server with repeating rows and new column

I have a table with the following columns.
EVAL_ID | GGRP_ID | GOAL_ID
1 1 1
2 2 1
2 2 2
3 1 3
I want to create a view with another columns called GOAL_VERSION which has values from 1 to 3. So that each row from the above table should be duplicated 5 times for different GOAL_VERSION numbers. The out put should be like this.
EVAL_ID | GGRP_ID | GOAL_ID |GOAL_VERSION
1 1 1 1
1 1 1 2
1 1 1 3
1 1 1 4
1 1 1 5
2 2 1 1
2 2 1 2
2 2 1 3
2 2 1 4
2 2 1 5
How can I do that. Help me. Thank you.
Is it this you are looking for?
DECLARE #tbl TABLE(EVAL_ID INT,GGRP_ID INT,GOAL_ID INT);
INSERT INTO #tbl VALUES
(1,1,1)
,(2,2,1)
,(2,2,2)
,(3,1,3);
SELECT tbl.*
,x.Nr
FROM #tbl AS tbl
CROSS JOIN (VALUES(1),(2),(3),(4),(5)) AS x(Nr)
EDIT: Varying count of repetition
DECLARE #tbl TABLE(EVAL_ID INT,GGRP_ID INT,GOAL_ID INT);
INSERT INTO #tbl VALUES
(1,1,1)
,(2,2,1)
,(2,2,2)
,(3,1,3);
DECLARE #tblCountOfRep TABLE(CountOfRep INT);
INSERT INTO #tblCountOfRep VALUES(3);
SELECT tbl.*
,y.Nr
FROM #tbl AS tbl
CROSS JOIN (SELECT TOP (SELECT CountOfRep FROM #tblCountOfRep) * FROM(VALUES(1),(2),(3),(4),(5) /*add the max count here*/) AS x(Nr)) AS y
In this case I'd prefer I numbers table...
Take a look at CROSS JOIN. If you make a table that's got one column with the 5 rows you want you can just CROSS JOIN it to get the result you're after.
You can achieve this using a CTE and CROSS APPLY:
;WITH CTE AS
(
SELECT 1 AS GOAL_VERSION
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
)
SELECT * FROM <your table>
CROSS APPLY CTE
use "with" (cte) with rank clause for creating view.
If you have a numbers table in SQL database, you can cross join your table with the numbers table for numbers between 1 and 5
Here is my SQL solution for your requirement
select
goals.*,
n.i as GOAL_VERSION
from goals, dbo.NumbersTable(1,5,1) n
And here is the modified version with "cross join" as suggested in the comments
select
goals.*,
n.i as GOAL_VERSION
from goals
cross join dbo.NumbersTable(1,5,1) n
You can realize, I used a SQL table-valued function for SQL numbers table
Please create that SQL function using the source codes given in the referred tutorial
I hope it helps,

selecting all but the last X rows from a join in sql server

I'm having trouble with some maintenance on my database. We have a table A and a table B with a one-to-many relationship. Right now there are between 1 and 10 rows for every row in table B and I want to see every row except the 5 most recent. If there are 5 or less rows in B from any row in A, I don't want to see it because I don't care about that data.
Here's the query I have so far:
WITH cte (id, number)
AS
(
SELECT A.id, COUNT(*)
FROM A INNER JOIN B ON A.id=B.a
GROUP BY A.id
)
SELECT A.id, B.id, number
FROM cte c
INNER JOIN B ON B.a=c.id
WHERE number > 5
ORDER BY A.id, B.id DESC;
GO
It will give me the IDs of the rows in A and B, and the number is just to help me see what is going on (it will be 10 if there are 10 matching rows, 9 if 9, etc).
I just don't really know where to go next. I have a list of rows in A and their matches in B and I want to see only the last 5 rows in B for every row in A. My data might look like this:
A | B | number
---------
1 | 7 | 7
1 | 6 | 7
1 | 5 | 7
1 | 4 | 7
1 | 3 | 7
1 | 2 | 7
1 | 1 | 7
2 | 9 | 2
2 | 8 | 2
And what I want is this:
A | B | number
---------
1 | 2 | 7
1 | 1 | 7
So really my question is - how can I filter out the last 5 rows in B for every row in A like this? I don't even know if I am heading in the right direction with what I've got so far, but it seemed like a reasonable starting point.
Try this query:
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY b.A ORDER BY b.id DESC) AS RowNum, ... other columns from b ...
FROM dbo.B as b
) x
WHERE x.RowNum > 5
Note: ROW_NUMBER() OVER(PARTITION b.A ORDER BY B.id DESC) will start numbering rows from 1 for every b.A in descending order => last row will have RowNum = 1, last but one row will have RowNum = 2, etc.
WITH cte (id, number) AS ( SELECT A.id, COUNT(*) FROM A
INNER JOIN B ON A.id=B.a GROUP BY A.id )
SELECT TOP 5 A.id, B.id, number FROM cte c
INNER JOIN B ON B.a=c.id
WHERE number > 5 ORDER BY A.id ASC, B.id ASC; GO

Resources