Updating a table with missing records from another table - sql-server

I am running a query that returns records from the LEFT of the EXCEPT that are not on the right query;
USE AdventureWorks;
GO
SELECT ProductID
FROM Production.Product
EXCEPT
SELECT ProductID
FROM Production.WorkOrder;
Lets say there are 6 records returned (there are 6 records in Production.Product table, that are not in Production.WorkOrder)
How would I write the query to update the 6 records into Production.WorkOrder table?

insert into workorder (productid)
select productid from product where productid not in (select productid from workorder)
This will insert into workorder all the productid's in the product table that aren't already in workorder.

I'd use a left join, like this:
USE AdventureWorks;
GO
SELECT p.ProductID
FROM Production.Product p
LEFT
JOIN Production.WorkOrder wo
ON p.ProductID = wo.ProductID
WHERE wo.ProductID IS NULL;
This will return all the ProductID values from Product that do not appear in WorkOrder. The problem with the other answer (WHERE NOT IN) is that the sub-query will execute once/row, and if the tables are large, this will be really slow. A LEFT JOIN will only execute once, and then SQL will match the rows up - on a small table, there won't be much difference in practice, but on a larger table or in a production database, the difference will be immense.

Just turn your query into an INSERT?
INSERT INTO Production.WorkOrder(ProductID, ...)
SELECT ProductID, ...
FROM Production.Product
EXCEPT
SELECT ProductID
FROM Production.WorkOrder;

Related

Efficiently group query by one column, taking the maximum of another column and a third column that comes from the same row as the maximum column

I have a table of 100,000,000+ values, so efficiency is very important to me. I need to take information from table A, join it to an index table B, then join to table C using the index retrieved from table B. The problem is, there are multiple indexes for each value in table A, and I want to retrieve the one with the most recent date.
The query below creates duplicates:
SELECT ID_1, ID_2, Date
INTO #DEST_TABLE FROM Table_1 t1
INNER JOIN Table_2 t2 ON t1.ID_1=t2.ID_1
INNER JOIN Table_3 t3 ON t2.ID_2=t3.ID_2
This one does not, but when running with more than 35,000 vs 40,000 elements, the execution time goes from <5sec to >1min:
SELECT ID_1, ID_2, Date
INTO #DEST_TABLE FROM
(SELECT * FROM Table_1 l CROSS APPLY Table_2 t2 WHERE t1.ID_1=t2.ID_1) t_temp
LEFT JOIN Table_3 t3 ON t_temp.ID_2=t3.ID_2
How can I decrease my execution time as much as possible?
Here is an example table:
For this table, I would be trying to get the most recent location for each person.
None of the columns are indexed and I cannot create indexes on this table.
First of all, when you are working on 100 Million+ records and that
too joining to other tables, first thing I would ask is what is the
rationale behind not creating indexes which can cover your query. If
you are not the admin of that system, I would suggest that you
should bring this up to admin group and try to understand what is
the exact reason (if any) they do not want index on that huge table.
Specially because you mentioned "efficiency is very important to
me".
Remember that 'SQL Tuning' is only one of the steps of 'Database Performance Tuning' and you can tune only as much with writing a good SQL Query. When the data volume gets huge, a good SQL Query is never sufficient without taking other Performance Tuning Measures.
Apart from what Roger has already provided, here are a few solutions that you can try out:
Solution 1
SELECT T1.ID_1, OA.ID_2, OA.Location
FROM Table1 T1
OUTER APPLY (
SELECT TOP 1 T3.ID_2, T3.Location
FROM Table2 T2
INNER JOIN Table3 T3
ON T2.ID_2 = T3.ID_2
WHERE T2.ID_1 = T1.ID_1
ORDER BY T3.Date DESC
) OA;
Solution 2:
SELECT DISTINCT
T1.ID_1
,T2.ID_2
,Location = FIRST_VALUE(T3.Location) OVER (PARTITION BY T1.ID_1 ORDER BY T3.Date DESC)
FROM Table1 T1
INNER JOIN Table2 T2
ON T1.ID_1 = T2.ID_1
INNER JOIN Table3 T3
ON T2.ID_2 = T3.ID_2;
Data Preparation:
DROP TABLE IF EXISTS Table1
DROP TABLE IF EXISTS Table2
DROP TABLE IF EXISTS Table3
SELECT TOP 10000 ID_1 = object_id, name
INTO Table1
FROM sys.all_objects
ORDER BY object_id
SELECT ID_1 = T1.ID_1, ID_2 = IDENTITY(INT, 1, 1)
INTO Table2
FROM Table1 T1
CROSS JOIN Table1 T2
SELECT ID_2, Location = 'City_'+ CAST(ID_2 AS VARCHAR(100)), Date = CAST(DATEADD(DAY, ID_2/10000, GETDATE()) AS DATE)
INTO Table3
FROM Table2
Indexes to cover the Solution 1:
CREATE NONCLUSTERED INDEX IX_TABLE1_ID_1 ON Table1 (ID_1)
CREATE NONCLUSTERED INDEX IX_TABLE2_ID_2 ON Table2 (ID_1, ID_2)
CREATE NONCLUSTERED INDEX IX_TABLE3_ID_2 ON Table3 (ID_2, Date DESC) INCLUDE (Location)
Execution Plan:
You can see that all are 'Index Seek' except for Table1 which is an legitimate 'Index Scan' because you are doing scans for each value of Table1's ID_1 value. If you put a where clause in the outer loop to search for a few specific ID_1 values, then that 'Index Scan' will turn to a 'Index Seek' as well.
I will leave the Index Strategy for the 2nd solution to you (as a homework :) ). Tips: You have to make the Location as a key as well. Or you can go with COLUMNSTORE index approach.
You can use something like this:
select top (1) with ties
a.A_Id, b.B_Id, b.Date
from dbo.TableA a
inner join dbo.TableB b on a.A_Id = it.A_Id
inner join dbo.TableC c on c.B_Id = b.B_Id
order by row_number() over(partition by a.A_Id order by b.Date desc);
Alternatively, you can try an olde fashioneth approache:
select a.A_Id, b.B_Id, b.Date
from dbo.TableA a
inner join dbo.TableB b on a.A_Id = b.A_Id
inner join dbo.TableC c on c.B_Id = b.B_Id
where not exists (
select 0 from dbo.TableB pb where pb.B_Id = b.B_Id and pb.Date > b.Date
);
However, as with all such situations, its performance will heavily depend on indices. SSMS can suggest you some, if you will look at the execution plan; off the top of my head, you will need all Id columns to be indexed, and you will need either a single (Date) or a composite (A_Id, Date, B_Id) on the TableB.
UPD: If you can't create or modify any indices, and performance is paramount, I would suggest copying the data in question into a separate schema or database, where you might have appropriate permissions. Apart from that... it's impossible to get something out of nothing.

SQL queries combined into one row

I'm having some difficulty combining the following queries, so that the results display in one row rather than in multiple rows:
SELECT value FROM dbo.parameter WHERE name='xxxxx.name'
SELECT dbo.contest.name AS Event_Name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name
SELECT COUNT(*) FROM open_option AS total_people
SELECT SUM(scanned) AS TotalScanned,SUM(number) AS Totalnumber
FROM dbo.open_box
GROUP BY contest_id
SELECT COUNT(*) FROM open AS reff
WHERE refer = 'True'
I would like to display data from the fields in each column similar to what is shown in the image below. Any help is appreciated!
Tab's solution is fine, I just wanted to show an alternative way of doing this. The following statement uses subqueries to get the information in one row:
SELECT
[xxxx.name]=(SELECT value FROM dbo.parameter WHERE name='xxxxx.name'),
[Event Name]=(SELECT dbo.contest.name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name),
[Total People]=(SELECT COUNT(*) FROM open_option),
[Total Scanned]=(SELECT SUM(scanned)
FROM dbo.open_box
GROUP BY contest_id),
[Total Number]=(SELECT SUM(number)
FROM dbo.open_box
GROUP BY contest_id),
Ref=(SELECT COUNT(*) FROM open WHERE refer = 'True');
This requires the Total Scanned and Total Number to be queried seperately.
Update: if you then want to INSERT that into another table there are essentially two ways to do that.
Create the table directly from the SELECT statement:
SELECT
-- the fields from the first query
INTO
[database_name].[schema_name].[new_table_name]; -- creates table new_table_name
Insert into a table that already exists from the INSERT
INSERT INTO [database_name].[schema_name].[existing_table_name](
-- the fields in the existing_table_name
)
SELECT
-- the fields from the first query
Just CROSS JOIN the five queries as derived tables:
SELECT * FROM (
Query1
) AS q1
CROSS JOIN (
Query2
) AS q2
CROSS JOIN (...
Assuming that each of your individual queries only returns one row, then this CROSS JOIN should result in only one row.

sql server select from two tables as null or value

This shouldn't be hard but somehow I am unable to make this work. So I have two tables: Splanning_Buildings and Splanning_RoomData. The primary key of OBJECTID in the Splanning_Buildings table is represented as Objectid_bldg in the Splanning_RoomData table.
What I need is to select all rows from the Splanning_Buildings table in such a way that if its value is found inside the Splanning_RoomData table then show the value as "Existing" string; but if not found in Splanning_RoomData then show as blank.
This is SQL Server 2012.
Here is my sql:
SELECT t1.objectid, name, t2.building_n FROM splanning_buildings t1 LEFT JOIN splanning_roomdata t2 ON t1.objectid = t2.objectid_bldg WHERE t2.objectid_bldg IS NULL
SELECT t1.objectid, name, t2.building_n FROM splanning_buildings t1 LEFT JOIN splanning_roomdata t2 ON t1.objectid = t2.objectid_bldg
The first query returns all rows from the splanning-buildings table except the row which exists the splanning_RoomData table; so t2.building_n is 'null'. Makes sense.
The second query returns all rows from the buildings table but multiple t2.buildings_n rows.
None of these are the desired results! I guess I need some kind of sql Case() function?
Any idea?
Thanks!
SELECT t1.objectid, name,
CASE WHEN (EXISTS(SELECT TOP(1) 1 FROM splanning_roomdata t2 WHERE t1.objectid = t2.objectid_bldg )) THEN
'Exists'
ELSE 'Not exists' END AS [status]
FROM splanning_buildings t1
select objectid,
(select top 1 1 from room_data
where objectid_bldg=objectid)
as roomdata_exists from buildings
I simplified your table names a little bit - hope you can still make sense of it. ;-)

How to optimize view performance in SQL Server 2012 by indexing

I have a view like that:
create view dbo.VEmployeeSalesOrders
as
select
employees.employeeID, Products.productID,
Sum(Price * Quantity) as Total,
salesDate,
COUNT_BIG() as [RecordCount]
from
dbo.Employees
inner join
dbo.sales on employees.employeeID = sales.employeeID
inner join
dbo.products on sales.productID = products.ProductID
group by
Employees.employeeID, products.ProductID, salesDate
When I select * from dbo.VEmployeeSalesOrders it takes 97% of the execution plan. It needs it to be faster.
And when I try to create an index, an exception fires with the following message:
select list doesn't include a proper use on count_Big()
Why am getting this error?
1-first you need to alter your view and make it contains COUNT_BIG() function because you used aggregate function in select statment,
AND THE REASON FOR USING THAT is that SQL Server needs to track the record where the record is ,number of records
like this
create view dbo.VEmployeeSalesOrders
as
select employees.employeeID,Products.productID,Sum(Price*Quantity)
as Total,salesDate,COUNT_BIG(*) as [RecordCount]
from dbo.Employees
inner join dbo.sales on employees.employeeID=sales.employeeID
inner join dbo.products on sales.productID-products.ProductID
group by Employees.employeeID,products.ProductID,salesDate
2- then you need to create index like that
Create Unique Clustered Index Cidx_IndexName
on dbo.VEmployeeSalesOrders(employedID,ProductID,SalesDate)
Hope It Works

FreeText COUNT query on multiple tables is super slow

I have two tables:
**Product**
ID
Name
SKU
**Brand**
ID
Name
Product table has about 120K records
Brand table has 30K records
I need to find count of all the products with name and brand matching a specific keyword.
I use freetext 'contains' like this:
SELECT count(*)
FROM Product
inner join Brand
on Product.BrandID = Brand.ID
WHERE (contains(Product.Name, 'pants')
or
contains(Brand.Name, 'pants'))
This query takes about 17 secs.
I rebuilt the FreeText index before running this query.
If I only check for Product.Name. They query is less then 1 sec. Same, if I only check the Brand.Name. The issue occurs if I use OR condition.
If I switch query to use LIKE:
SELECT count(*)
FROM Product
inner join Brand
on Product.BrandID = Brand.ID
WHERE Product.Name LIKE '%pants%'
or
Brand.Name LIKE '%pants%'
It takes 1 secs.
I read on MSDN that: http://msdn.microsoft.com/en-us/library/ms187787.aspx
To search on multiple tables, use a
joined table in your FROM clause to
search on a result set that is the
product of two or more tables.
So I added an INNER JOINED table to FROM:
SELECT count(*)
FROM (select Product.Name ProductName, Product.SKU ProductSKU, Brand.Name as BrandName FROM Product
inner join Brand
on product.BrandID = Brand.ID) as TempTable
WHERE
contains(TempTable.ProductName, 'pants')
or
contains(TempTable.BrandName, 'pants')
This results in error:
Cannot use a CONTAINS or FREETEXT predicate on column 'ProductName' because it is not full-text indexed.
So the question is - why OR condition could be causing such as slow query?
After a bit of trial an error I found a solution that seems to work. It involves creating an indexed view:
CREATE VIEW [dbo].[vw_ProductBrand]
WITH SCHEMABINDING
AS
SELECT dbo.Product.ID, dbo.Product.Name, dbo.Product.SKU, dbo.Brand.Name AS BrandName
FROM dbo.Product INNER JOIN
dbo.Brand ON dbo.Product.BrandID = dbo.Brand.ID
GO
CREATE UNIQUE CLUSTERED INDEX IX_VW_PRODUCTBRAND_ID
ON vw_ProductBrand (ID);
GO
If I run the following query:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO
SELECT count(*)
FROM Product
inner join vw_ProductBrand
on Product.BrandID = vw_ProductBrand.ID
WHERE (contains(vw_ProductBrand.Name, 'pants')
or
contains( vw_ProductBrand.BrandName, 'pants'))
It now takes 1 sec again.
I ran into a similar problem but i fixed it with union, something like:
SELECT *
FROM Product
inner join Brand
on Product.BrandID = Brand.ID
WHERE contains(Product.Name, 'pants')
UNION
SELECT *
FROM Product
inner join Brand
on Product.BrandID = Brand.ID
WHERE contains(Brand.Name, 'pants'))
Have you tried something like:
SELECT count(*)
FROM Product
INNER JOIN Brand ON Product.BrandID = Brand.ID
WHERE CONTAINS((Product.Name, Brand.Name), 'pants')

Resources