Correlation names using insert and outer join - database

I am trying to run a code to insert rows from one table using rows from a different table on a different database.
I had this:
INSERT [testDB].[dbo].[table1]
SELECT * FROM
[sourceDB].[dbo].[table1]
LEFT OUTER JOIN [testDB].[dbo].[table1]
ON [sourceDB].[dbo].[table1].[PKcolumn] = [testDB].[dbo].[table1].[PKcolumn]
WHERE [testDB].[dbo].[table1].[PKcolumn] IS NULL
However I was told to add correlation names so I made this:
INSERT test
SELECT * FROM
[sourceDB].[dbo].[table1] as source
LEFT OUTER JOIN
[testDB].[dbo].[table1] as test
ON
source.[PKcolumn] = test.[PKcolumn]
WHERE test.[PKcolumn] IS NULL
I ended up getting this as an error message:
Msg 208, Level 16, State 1, Line 1
Invalid object name 'test'.
Does anyone know what I'm doing wrong?

In the first line you should use the real table name as in
insert into testDB.dbo.table1
SQLServer does not accept an alias or correlation name in that spot, and I confirmed that by testing.
But you can use the alias later in the query and it can be quite useful to do so to avoid ambiguity about which table a column comes from.
Another potential problem in this query is the use of select *. This tries to insert the combined column set from sourcedb.dbo.table1 and testdb.dbo.table1 into testdb.dbo.table1. That can't work.
Instead of select * you could say...(assuming source and test have exactly the same columns)
select source.*
or you could call out the specific columns as in...
select source.colA, source.col3, etc....
I don't know the names of your columns.

INSERT test
SELECT *
FROM [sourceDB].[dbo].[table1] as source
LEFT OUTER JOIN [testDB].[dbo].[table1] as test
ON source.[PKcolumn] = test.[PKcolumn]
WHERE test.[PKcolumn] IS NULL
Let's talk about what is wrong with this. First select * would have all the columns from source and test in it which is clearly more columns than the table you plan to insert into has. It never acceptable to use select * in an insert statement for several reasons.
First, if anyone changes the order of the columns or the structure of table, the insert breaks. Second, when you have a join like this, then it has the wrong number of columns. Third, even if they have the same columns if they are orginally in a differnt order, you may put the data into the worng column. If they are similar datatypes and the data fits or can be implicity converted, the database won't stop you from doing this.
Next you can't use an alias from the select as the destination in an insert, you must reference the actual tablename.
Finally, it is a very poor practice to not use a column list in every insert. This helps with maintenance and makes sure you can check to see if the columns inteh selct match up to the columns in the insert. Further, if you have an autogenerated field, you must use a column list or it will try to insert into the autogenerated field and thus error.
So your statement should look something like this:
INSERT [testDB].[dbo].[table1] (field1, field2, field3)
SELECT source.field1, source.field2, source.field3
FROM [sourceDB].[dbo].[table1] as source
LEFT OUTER JOIN [testDB].[dbo].[table1] as test
ON source.[PKcolumn] = test.[PKcolumn]
WHERE test.[PKcolumn] IS NULL
Or (possibly more efficient, you will have to test in your particular situation):
INSERT [testDB].[dbo].[table1] (field1, field2, field3)
SELECT source.field1, source.field2, source.field3
FROM [sourceDB].[dbo].[table1] as source
WHERE NOT EXISTS (SELECT * FROM testDB].[dbo].[table1] test
WHERE source.[PKcolumn]=test.[PKcolumn])

Related

Split field and insert rows in SQL Server trigger, when mutliple rows are affected without using a cursor

I have an INSERT trigger of a table, where one field of the table contains a comma-separated list of key-value pairs, that are separated by a :
I can select this field with the two values into a temp table easily with this statement:
-- SAMPLE DATA FOR PRESENTATION ONLY
DECLARE #messageIds VARCHAR(2000) = '29708332:55197,29708329:54683,29708331:54589,29708330:54586,29708327:54543,29708328:54539,29708333:54538,29708334:62162,29708335:56798';
SELECT
SUBSTRING(value, 1,CHARINDEX(':', value) - 1)AS MessageId,
SUBSTRING(value, CHARINDEX(':', value) + 1, LEN(value)-SUBSTRING(value,0,CHARINDEX(value,':'))) AS DeviceId
INTO #temp_messages
FROM STRING_SPLIT(#messageIds, ',')
SELECT * FROM #temp_messages
DROP TABLE #temp_messages
The result will look like this
29708332 55197
29708329 54683
29708331 54589
29708330 54586
29708327 54543
29708328 54539
29708333 54538
29708334 62162
29708335 56798
From here I can join the temp table to other tables and insert some of the results into a third table.
Inside the trigger I can get the messageIds with a simple SELECT statement like
DECLARE #messageIds VARCHAR(2000) = (SELECT ProcessMessageIds FROM INSERTED)
Now I create the temp table (like described above) and process my
INSERT INto <new_table> SELECT col1, col1, .. FROM #temp_messages
JOIN <another_table> ON ...
Unfortunately this will only work for single row inserts. As soon as there is more than one row, my SELECT ProcessMessageIds FROM INSERTED will fail, as there are multiple rows in the INSERTED table.
I can process the rows in a CURSOR but as far as I know CURSORS are a no-go in triggers and I should avoid them whenever it is possible.
Therefore my question is, if there is another way to do this without using a CURSOR inside the trigger?
Before we get into the details of the solution, let me point out that you would have no such issues if you normalized your database, as #Larnu pointed out in the comment section of your question.
Your
DECLARE #messageIds VARCHAR(2000) = (SELECT ProcessMessageIds FROM INSERTED)
statement assumes that there will be a single value to be assigned to #messageIDs and, as you have pointed out, this is not necessarily true.
Solution 1: Join with INSERTED rather than load it into a variable
INSERT INTO t1
SELECT ...
FROM t2
JOIN T3
ON ...
JOIN INSERTED
ON ...
and then you can reach INSERTED.ProcessMessageIds without issues. This will no longer assume that a single value was used.
Solution 2: cursors
You can use a CURSOR, as you have already pointed out, but it's not a very good idea to use cursors inside a trigger, see https://social.msdn.microsoft.com/Forums/en-US/87fd1205-4e27-413d-b040-047078b07756/cursor-usages-in-trigger-in-sql-server?forum=aspsqlserver
Solution 3: insert a single line at a time
While this would not require a change in your trigger, it would require a change in how you insert and it would increase the number of db requests necessary, so I would advise you not to choose this approach.
Solution 4: normalize
See https://www.simplilearn.com/tutorials/sql-tutorial/what-is-normalization-in-sql
If you had a proper table rather than a table of composite values, you would have no such issues and you would have a much easier time to process the message ids in general.
Summary
It would be wise to normalize your tables and perform the refactoring that would be needed afterwards. It's a great effort now, but you will enjoy its fruits. If that's not an option, you can "act as if it was normalized" and choose Solution 1.
As pointed out in the answers, joining with the INSERTED table solved my problem.
SELECT INTAB.Id,
SUBSTRING(value, 1,CHARINDEX(':', value) - 1)AS MessageId,
SUBSTRING(value, CHARINDEX(':', value) + 1, LEN(value)-SUBSTRING(value,0,CHARINDEX(value,':'))) AS DeviceId
FROM INSERTED AS INTAB
CROSS APPLY STRING_SPLIT(ProcessMessageids,',')
I never used "CROSS APPLY" before, thank you.

How to append data from one table to another table in Snowflake

I have a table of all employees (employees_all) and then created a new table (employees_new) with the same structure that I would like to append to the original table to include new employees.
I was looking for the right command to use and found that INSERT lets me add data as in the following example:
create table t1 (v varchar);
insert into t1 (v) values
('three'),
('four');
But how do I append data coming from another table and without specifying the fields (both tables have the same structure and hundreds of columns)?
With additional research, I found this specific way to insert data from another table:
insert into employees_all
select * from employees_new;
This script lets you append all rows from a table into another one without specifying the fields.
Hope it helps!
Your insert with a select statement is the most simple answer, but just for fun, here's some extra options that provide some different flexibility.
You can generate the desired results in a select query using
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new;
This allows you to have a few more options with how you use this data downstream.
--use a view to preview the results without impacting the table
CREATE VIEW employees_all_preview
AS
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new;
--recreate the table using a sort,
-- generally not super common, but could help with clustering in some cases when the table
-- is very large and isn't updated very frequently.
INSERT OVERWRITE INTO employees_all
SELECT * FROM (
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new
) e ORDER BY name;
Lastly, you can also do a merge to give you some extra options. In this example, if your new table might have records that already match an existing record then instead of inserting them and creating duplicates, you can run an update for those records
MERGE INTO employees_all a
USING employees_new n ON a.employee_id = n.employee_id
WHEN MATCHED THEN UPDATE SET attrib1 = n.attrib1, attrib2 = n.attrib2
WHEN NOT MATCHED THEN INSERT (employee_id, name, attrib1, attrib2)
VALUES (n.employee_id, n.name, n.attrib1, n.attrib2)

String or binary data would be truncated error in SQL server. How to know the column name throwing this error

I have an insert Query and inserting data using SELECT query and certain joins between tables.
While running that query, it is giving error "String or binary data would be truncated".
There are thousands of rows and multiple columns I am trying to insert in that table.
So it is not possible to visualize all data and see what data is throwing this error.
Is there any specific way to identify which column is throwing this error? or any specific record not getting inserted properly and resulted into this error?
I found one article on this:
RareSQL
But this is when we insert data using some values and that insert is one by one.
I am inserting multiple rows at the same time using SELECT statements.
E.g.,
INSERT INTO TABLE1 VALUES (COLUMN1, COLUMN2,..) SELECT COLUMN1, COLUMN2,.., FROM TABLE2 JOIN TABLE3
Also, in my case, I am having multiple inserts and update statements and even not sure which statement is throwing this error.
You can do a selection like this:
select TABLE2.ID, TABLE3.ID TABLE1.COLUMN1, TABLE1.COLUMN2, ...
FROM TABLE2
JOIN TABLE3
ON TABLE2.JOINCOLUMN1 = TABLE3.JOINCOLUMN2
LEFT JOIN TABLE1
ON TABLE1.COLUMN1 = TABLE2.COLUMN1 and TABLE1.COLUMN2 = TABLE2.COLUMN2, ...
WHERE TABLE1.ID = NULL
The first join reproduces the selection you have been using for the insert and the second join is a left join, which will yield null values for TABLE1 if a row having the exact column values you wanted to insert does not exist. You can apply this logic to your other queries, which were not given in the question.
You might just have to do it the hard way. To make it a little simpler, you can do this
Temporarily remove the insert command from the query, so you are getting a result set out of it. You might need to give some of the columns aliases if they don't come with one. Then wrap that select query as a subquery, and test likely columns (nvarchars, etc) like this
Select top 5 len(Col1), *
from (Select col1, col2, ... your query (without insert) here) A
Order by 1 desc
This will sort the rows with the largest values in the specified column first and just return the rows with the top 5 values - enough to see if you've got a big problem or just one or two rows with an issue. You can quickly change which column you're checking simply by changing the column name in the len(Col1) part of the first line.
If the subquery takes a long time to run, create a temp table with the same columns but with the string sizes large (like varchar(max) or something) so there are no errors, and then you can do the insert just once to that table, and run your tests on that table instead of running the subquery a lot
From this answer,
you can use temp table and compare with target table.
for example this
Insert into dbo.MyTable (columns)
Select columns
from MyDataSource ;
Become this
Select columns
into #T
from MyDataSource;
select *
from tempdb.sys.columns as TempCols
full outer join MyDb.sys.columns as RealCols
on TempCols.name = RealCols.name
and TempCols.object_id = Object_ID(N'tempdb..#T')
and RealCols.object_id = Object_ID(N'MyDb.dbo.MyTable)
where TempCols.name is null -- no match for real target name
or RealCols.name is null -- no match for temp target name
or RealCols.system_type_id != TempCols.system_type_id
or RealCols.max_length < TempCols.max_length ;

SQL Script add records with identity FK

I am trying to create an SQL script to insert a new row and use that row's identity column as an FK when inserting into another table.
This is what I use for a one-to-one relationship:
INSERT INTO userTable(name) VALUES(N'admin')
INSERT INTO adminsTable(userId,permissions) SELECT userId,255 FROM userTable WHERE name=N'admin'
But now I also have a one-to-many relationship, and I asked myself whether I can use less SELECT queries than this:
INSERT INTO bonusCodeTypes(name) VALUES(N'1500 pages')
INSERT INTO bonusCodeInstances(codeType,codeNo,isRedeemed) SELECT name,N'123456',0 FROM bonusCodeTypes WHERE name=N'1500 pages'
INSERT INTO bonusCodeInstances(codeType,codeNo,isRedeemed) SELECT name,N'012345',0 FROM bonusCodeTypes WHERE name=N'1500 pages'
I could also use sth like this:
INSERT INTO bonusCodeInstances(codeType,codeNo,isRedeemed)
SELECT name,bonusCode,0 FROM bonusCodeTypes JOIN
(SELECT N'123456' AS bonusCode UNION SELECT N'012345' AS bonusCode)
WHERE name=N'1500 pages'
but this is also a very complicated way of inserting all the codes, I don't know whether it is even faster.
So, is there a possibility to use a variable inside SQL statements? Like
var lastinsertID = INSERT INTO bonusCodeTypes(name) OUTPUT inserted.id VALUES(N'300 pages')
INSERT INTO bonusCodeInstances(codeType,codeNo,isRedeemed) VALUES(lastinsertID,N'123456',0)
OUTPUT can only insert into a table. If you're only inserting a single record, it's much more convenient to use SCOPE_IDENTITY(), which holds the value of the most recently inserted identity value. If you need a range of values, one technique is to OUTPUT all the identity values into a temp table or table variable along with the business keys, and join on that -- but provided the table you are inserting into has an index on those keys (and why shouldn't it) this buys you nothing over simply joining the base table in a transaction, other than lots more I/O.
So, in your example:
INSERT INTO bonusCodeTypes(name) VALUES(N'300 pages');
DECLARE #lastInsertID INT = SCOPE_IDENTITY();
INSERT INTO bonusCodeInstances(codeType,codeNo,isRedeemed) VALUES (#lastInsertID, N'123456',0);
SELECT #lastInsertID AS id; -- if you want to return the value to the client, as OUTPUT implies
Instead of VALUES, you can of course join on a table instead, provided you need the same #lastInsertID value everywhere.
As to your original question, yes, you can also assign variables from statements -- but not with OUTPUT. However, SELECT #x = TOP(1) something FROM table is perfectly OK.

SQL WHERE NOT EXISTS (skip duplicates)

Hello I'm struggling to get the query below right. What I want is to return rows with unique names and surnames. What I get is all rows with duplicates
This is my sql
DECLARE #tmp AS TABLE (Name VARCHAR(100), Surname VARCHAR(100))
INSERT INTO #tmp
SELECT CustomerName,CustomerSurname FROM Customers
WHERE
NOT EXISTS
(SELECT Name,Surname
FROM #tmp
WHERE Name=CustomerName
AND ID Surname=CustomerSurname
GROUP BY Name,Surname )
Please can someone point me in the right direction here.
//Desperate (I tried without GROUP BY as well but get same result)
DISTINCT would do the trick.
SELECT DISTINCT CustomerName, CustomerSurname
FROM Customers
Demo
If you only want the records that really don't have duplicates (as opposed to getting duplicates represented as a single record) you could use GROUP BY and HAVING:
SELECT CustomerName, CustomerSurname
FROM Customers
GROUP BY CustomerName, CustomerSurname
HAVING COUNT(*) = 1
Demo
First, I thought that #David answer is what you want. But rereading your comments, perhaps you want all combinations of Names and Surnames:
SELECT n.CustomerName, s.CustomerSurname
FROM
( SELECT DISTINCT CustomerName
FROM Customers
) AS n
CROSS JOIN
( SELECT DISTINCT CustomerSurname
FROM Customers
) AS s ;
Are you doing that while your #Tmp table is still empty?
If so: your entire "select" is fully evaluated before the "insert" statement, it doesn't do "run the query and add one row, insert the row, run the query and get another row, insert the row, etc."
If you want to insert unique Customers only, use that same "Customer" table in your not exists clause
SELECT c.CustomerName,c.CustomerSurname FROM Customers c
WHERE
NOT EXISTS
(SELECT 1
FROM Customers c1
WHERE c.CustomerName = c1.CustomerName
AND c.CustomerSurname = c1.CustomerSurname
AND c.Id <> c1.Id)
If you want to insert a unique set of customers, use "distinct"
Typically, if you're doing a WHERE NOT EXISTS or WHERE EXISTS, or WHERE NOT IN subquery,
you should use what is called a "correlated subquery", as in ypercube's answer above, where table aliases are used for both inside and outside tables (where inside table is joined to outside table). ypercube gave a good example.
And often, NOT EXISTS is preferred over NOT IN (unless the WHERE NOT IN is selecting from a totally unrelated table that you can't join on.)
Sometimes if you're tempted to do a WHERE EXISTS (SELECT from a small table with no duplicate values in column), you could also do the same thing by joining the main query with that table on the column you want in the EXISTS. Not always the best or safest solution, might make query slower if there are many rows in that table and could cause many duplicate rows if there are dup values for that column in the joined table -- in which case you'd have to add DISTINCT to the main query, which causes it to SORT the data on all columns.
-- Not efficient at all.
And, similarly, the WHERE NOT IN or NOT EXISTS correlated subqueries can be accomplished (and give the exact same execution plan) if you LEFT OUTER JOIN the table you were going to subquery -- and add a WHERE . IS NULL.
You have to be careful using that, but you don't need a DISTINCT. Frankly, I prefer to use the WHERE NOT IN subqueries or NOT EXISTS correlated subqueries, because the syntax makes the intention clear and it's hard to go wrong.
And you do not need a DISTINCT in the SELECT inside such subqueries (correlated or not). It would be a waste of processing (and for WHERE EXISTS or WHERE IN subqueries, the SQL optimizer would ignore it anyway and just use the first value that matched for each row in the outer query). (Hope that makes sense.)

Resources