SQL query performance for huge data - sql-server

I have a query:
SELECT c.somecolumn,p.someothercolumn
FROM table1 co
INNER JOIN table2 p(NOLOCK) ON co.COLUMN = p.COLUMN
INNER JOIN table3 c(NOLOCK) ON co.column11 = c.column11
WHERE co.filterColumn = 1
Table2 is a junction table and the join between table1 and table2 is on a column without distinct values (that’s the requirement and can't be changed) and hence there are cross joins.
Output of this query results in 180 million records.
Record count:
table 1: 2 190 561
table 2: 568 277
table 3: 300 150
How to optimize the above query? Execution plan:

Make sure you at least have indexes on the columns in the joins that include the columns you're returning (for example, in table2, you should have a non-clustered index that is keyed on "p.COLUMN" and includes "p.someothercolumn". For table 3, key on c.column11 and include c.somecolumn. You should have an index on table1.filtercolumn.
Consider also, that you have to return 180 million rows to the caller, that takes time. Try just inserting that data into a throwaway table just to keep the network load time out of your equation.

These could be ideally the indexes that are required:
For table1 - filtered index on COLUMN and column11 where co.filterColumn = 1
For table2 - Index on COLUMN include someothercolumn
For table3 - Index on column11 include somecolumn

SELECT c.somecolumn
,tmp.someothercolumn
FROM table1 co
INNER JOIN table3 c(NOLOCK) ON co.column11 = c.column11
AND co.filterColumn = 1
CROSS APPLY (SELECT TOP (1) p.SomeOtherColumn
FROM table2 p(NOLOCK)
WHERE p.COLUMN = co.Column) tmp

Related

Merge 2 tables into 1 single Table in T-SQL

How do I merge 2 tables into 1 table in T-SQL? I tried merging with full outer join which helps in joining 2 tables but with "customer Account" 2 times. I need all the columns from table A and Table B with only once "Customer Account Field" and all the rest of the columns from table A and Table B.ields.
Here is my example in more detail:
Table A - my first Table with 5 columns:
Table B - my second table with 6 columns:
I'm expecting the output like this:
Output with all fields in table A and in Table B but the common field only once:
Thanks a lot.
Add the required(all) columns from t1 and t2 to the select statement
SELECT COALESCE(t1.customeraccount, t2.customeraccount) as customeraccount,
t1.BasicCardType,
t2.MonthlySet
FROM table1 t1
FULL JOIN table2 t2 ON t1.customeraccount = t2.customeraccount;
(Edited based on comments): Join the tables on the CustomerAccount ID field (giving you entries that exist in both tables), then add a union for all entries that only exist in table A, then add a union for entries that only exist in table B. In principle:
-- get entries that exist in both tables
select Table_A.CustomerAccount, TableAField1, TableAField2, TableBField1, TableBField2
from Table_A
join Table_B on Table_A.CustomerAccount = Table_B.CustomerAccount
-- get entries that only exist in table_a
union select Table_A.CustomerAccount, TableAField1, TableAField2, null, null
from Table_A
where Table_A.CustomerAccount not in (select CustomerAccount from Table_B)
-- get entries that only exist in table_B
union select Table_B.CustomerAccount, null, null, TableBField1, TableBField2
from Table_B
where Table_B.CustomerAccount not in (select CustomerAccount from Table_A)

T-SQL to find rows with at least one specific column value

Data set
Key Stage balance ForeignKey
---------------------------------------------
11805008 ABC 50 123
11805008 DEF 0 123
14567898 DEF 100 456
Query so far
Select key, two.Stage, two.balance
from table_a one, table_b two
where one.ForeignKey = two.foreignKey
I am looking for keys, stage and balance where key has stage of ABC and others. If key does not have stage ABC then it should not return any row for that key. But if that key has 'ABC' stage then it should return all rows for that key
Key Stage balance ForeignKey
11805008 ABC 50 123
11805008 DEF 0 123
You could use an IN clause to get all of the keys that have at least one stage of ABC. Also, use a more modern inner join syntax.
SELECT one.key, two.Stage, two.balance
FROM table_a one
INNER JOIN table_b two ON one.ForeignKey = two.foreignKey
WHERE key IN (
SELECT key
FROM table_a
INNER JOIN table_b ON table_a.ForeignKey = table_b.foreignKey
WHERE table_b.stage = 'ABC')
First, learn to use proper JOIN syntax.
Second, you can do this using window functions:
select key, stage, balance
from (Select key, two.Stage, two.balance,
sum(case when two.stage = 'ABC' then 1 else 0 end) over (partition by key) as num_abc
from table_a one join
table_b two
on one.ForeignKey = two.foreignKey
) t
where num_abc > 0;
Select key, two.Stage, two.balance
from table_a one
inner join table_b two
on one.foreignKey = two.foreignKey
where exists (
select 1 from table_b x
where x.foreignKey=one.foreignKey
and x.Stage='ABC' )
I can only assume what your original data is. The statement works on my demo, see here: http://rextester.com/SUTS17842

How can I find all rows in master table with the same records in its child table?

I need to find every record in TableA that has the same child records in TableB
for example :
tableA
keyA
1
2
3
tableB
keyA....keyB....valueB
1...........11...........4
1...........12...........5
2...........21...........4
2...........22...........5
3...........31...........4
3...........32...........6
So suppose I want to search for doubles.
It should return the two first rows in tableA because both these rows have the same amount of child records in tableB with the same values for valueB
the first row in tableA as 2 child records, one with valueB = 4 and one with valueB = 5
the second row in tableA also has 2 child records, and with the same values in field tableB
the third row also has 2 child records, but with different values in field valueB
so the 2 first rows in tableA should be returned if I search for doubles.
I tried this but it gives an error on the first subquery, it may not return more than one value :
select *
from tableA t1
where (select t2.valueB
from tableB t2
where t2.keyA = 1
)
in
(select t3.valueB
from tableB t3
where t3.keyA = t1.KeyA
)
So, can this be done ?
EDIT : the output for my example should be
tableA
keyA
1
2
Edit 2 : rephrasing the question :
1. tableB is a childtable for tableA
2. there will be records in tableA that have records in tableB with the same values for field valueB as other records in tableA
3. I want to find these records.
EDIT: findings so far :
this query seems to produce what I need :
declare #keyA int = 1
select distinct r.keyA
from tableA r
inner join tableB eb on r.keyA = eb.keyA
where (select count(1) from tableB eb1 where eb1.keyA = #keyA) = (select count(1) from tableB eb2 where eb2.keyA = r.keyA)
and eb.valueB in (select eb4.valueB from tableB eb4 where eb4.keyA = #keyA)
The first where clause only allows master records where the number of child records are the same as for the first row in tableA. (all rows in tableA are found)
The second where clause only allows master records where the valueB of the child records are also present in the child records for the first row in tableA. (only first 2 rows in tableA are found)
The idea is to get all master records (tableA) that have the same amount of child records as the first row, and where all the values for valueB in these child records are also present in the child records for the first row of tableA.
Both where clauses combined should give me what I need, that is what I am hoping.
It seems to produce the correct result, but I would like some confirmation if its correct or wrong.
select
t1.A
from
t1
where exists(
select 1 from t2 where t2.keya=t1.keya
group by t2.keya,t2.valueb
having count(*)>1
)

Insert values into a table from multiple tables using sqlite query

If I have Table1 as
A B C
1 b.1 c.1
2 b.2 c.2
1 b.3 c.3
My second table Table2 as
A D E F G
1 d.1 e.1 f.1 g.1
2 d.2 e.2 f.2 g.2
I need to insert into an empty Table3 the values from above such that it looks like this.
A B C D E
1 b.1 c.1 d.1 e.1
2 b.2 c.2 d.2 e.2
1 b.3 c.3 d.1 e.1
So basically I need to insert each row of Table1 into Table3. For each row I need to check for column A and find the corresponding value D and E from the column and insert into Table3. Is it possible to do this in one single query?
To copy Table1 to Table3 I can use the query
INSERT INTO Table3(A,B,C) SELECT A,B,C FROM Table1
And then I need to take each row from Table3 and using A update the values of D and E from Table2. Is there a better solution that I can use to insert directly from both tables to Table3? Any help is appreciated, as I am a beginner with database and queries.
To merge two tables, use a join:
-- INSERT ...
SELECT A, B, C, D, E
FROM Table1
JOIN Table2 USING (A);
This will not generate a result row if no matching Table2 row is found. If you want a result row in this case (with NULLs for the missing values), use an outer join instead.
INSERT INTO Table3 (A,B,C,D,E)
SELECT t1.A, t1.B, t1.C, t2.D, t2.E FROM Table1 t1
INNER JOIN Table2 t2 ON t2.A = t1.A
This might solve your problem.

SQL set operation with different number of columns in each set

Let say I have set 1:
1 30 60
2 45 90
3 120 240
4 30 60
5 20 40
and set 2
30 60
20 40
I would like to do some sort of union where I only keep rows 1,4,5 from set 1 because the latter 2 columns of set 1 can be found in set 2.
My problem is that set based operations insist on the same numnber of columns.
I've thought of concatenating the columns contents, but it feels dirty to me.
Is there a 'proper' way to accomplish this?
I'm on SQL Server 2008 R2
In the end, I would like to end up with
1 30 60
4 30 60
5 20 40
CLEARLY I need to go sleep as a simple join on 2 columns worked.... Thanks!
You are literally asking for
give me the rows in t1 where the 2 columns match in T2
So if the output is only rows 1, 4 and 5 from table 1 then it is a set based operation and can be done with EXISTS or INTERSECT or JOIN. For the "same number of column", then you simply set 2 conditions with an AND. This is evaluated per row
EXISTS is the most portable and compatible way and allows any column from table1
select id, val1, val2
from table1 t1
WHERE EXISTS (SELECT * FROM table2 t2
WHERE t1.val1 = t2.val1 AND t1.val2 = t2.val2)
INTERSECT requires the same columns in each clause and not all engines support this (SQL Server does since 2005+)
select val1, val2
from table1
INTERSECT
select val1, val2
from table2
With an INNER JOIN, if you have duplicate values for val1, val2 in table2 then you'll get more rows than expected. The internals of this usually makes it slower then EXISTS
select t1.id, t1.val1, t1.val2
from table1 t1
JOIN
table2 t2 ON t1.val1 = t2.val1 AND t1.val2 = t2.val2
Some RBDMS support IN on multiple columns: this isn't portable and SQL Server doesn't support it
Edit: some background
Relationally, it's a semi-join (One, Two).
SQL Server does it as a "left semi join"
INTERSECT and EXISTS in SQL Server usually give the same execution plan. The join type is a "left semi join" whereas INNER JOIN is a full "equi-join".
You could use union which, as opposed to union all, eliminates duplicates:
select val1, val2
from table1
union
select val1, val2
from table1
EDIT: Based on your edited question, you can exclude rows that match the second table using a not exists subquery:
select id, col1, col2
from table1 t1
where not exists
(
select *
from table2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
)
union all
select null, col1, col2
from table2
If you'd like to exclude rows from table2, omit union all and everything below it.

Resources